19.05.2022 | sbekker

How to Run Airbyte and Get the Most Value of the Facebook API

Airbyte Facebook API

In this article, we’ll show you how to deploy Airbyte on your computer and use it to connect to the Facebook Marketing API.

What is Airbyte?

Airbyte is an open-source data integration platform for building ELT pipelines that helps you replicate your data across your different types of storage, data lakes, and databases. In addition, it has a user-friendly interface that is also beautifully designed (just look at that cute octopus!).

 

Deploying Airbyte

Installing Airbyte on a machine is quite simple. You can be assured of that by looking at the official documentation. You can see you only need to follow a few steps:

  1. Install Docker (see instructions) and ensure you have the latest version of docker-compose.
  2. Run the following commands in your terminal:
    1. git clone https://github.com/airbytehq/airbyte.git
    2. cd airbyte
    3. docker-compose-up

As soon as the Airbyte banner appears in the terminal output, the UI becomes available at http://localhost:8000.

data analysis consultancy

Connect to Facebook

Full documentation on working with the Facebook Marketing API from Airbyte can be found<style=”color: #0000ff;”> here. We will highlight and explain only the key points. So, in order to connect Facebook to Airbyte, you need to consider the following:

  • Having a Facebook Ad Account ID (more info)
  • Availability of Facebook App with enabled Marketing API (more info)
  • Availability of Facebook Marketing API Access Token (more info)
  • Requesting a rate increase limit: Facebook heavily restricts API tokens generated from Facebook Apps with the ‘Standard Access’ tier (the default tier for new apps), making it infeasible to use the tokens for syncs with Airbyte. You’ll need to request an upgrade to ‘Advanced Access’ for your app on the following permissions (more info):
    • Ads Management Standard Access
    • Ads_read
    • Ads_management

Before this request, you’ll need to go through the business verification process (more info).

 

After connecting to Facebook, you can connect Facebook to Airbyte using the following steps:

  1. Go to the UI on the ‘Sources tab’ and click ‘+ new source’.
    predictive analytics solutions
  2. Select Facebook Marketing as the source and set the connection configuration:
    • The name of the source under which it will appear in Airbyte
    • Facebook Ad Account ID
    • Start date – data from Facebook will be loaded from this date (⚠️WARNING: Insight tables are only able to pull data from the last 37 months)
    • End date – data from Facebook will be loaded until this date (this can be left blank to receive the latest data from the account)
    • Facebook Marketing API Access Token
      types of data analytics
  3. There are also several additional configuration options:
    • Enable remote Campaigns, Ads, and AdSets data
    • Saving thumbnail images
    • Setting Up Custom Insights
      machine learning analytics
  4. After filling in the fields, click ‘Set up source’ and wait until the connection process is completed.
    business analytics services
  5. If necessary, the settings for connecting to the source can be changed on the ‘Settings’ tab.
    business intelligence consulting services

An error may occur if your version of the framework is outdated. In that case, you will need to go to the main ‘Settings’ tab (the gear in the lower-left corner), select ‘Sources’, find Facebook Marketing in the list, and click the ‘Change’ button (or click ‘Upgrade all’ to download all available updates).

Connecting the data warehouse

After connecting to Facebook Marketing, we need to link this source to our data store.

  1. Click ‘Add destination’ on the tab with the configured source (look at the previous screenshot) or go to the ‘Destinations’ tab and click ‘+ new destination’ there.
    data visualization services
  2. Next, select the type of storage we will work with. Airbyte can connect to a large number of popular DBMSs. Let’s look at the necessary configuration steps using the example of working with PostgreSQL. Fill in the following fields:
    • The name of the storage under which it will appear in Airbyte
    • Host
    • Port
    • Database
    • Scheme
    • User
    • Password
    • Whether we need to use SSL
    • SSH tunnel settings (optional)
      engineering analytics
  3. After filling in the fields, click ‘Set up destination’ and wait until the connection process is completed.
  4. If you connected to the repository separately from the ‘Destinations’ tab, then you need to associate a source with it by clicking on ‘Add source’.
    power bi consulting services
  5. If necessary, the settings for connecting to the storage can be changed on its ‘Settings’ tab.
  6. The version of the image for working with the repository can be updated if necessary (the instructions would be similar to those at the end of the previous section).

Setting up the connection between the source and the data store

After all the previous steps have been taken, the ‘Set up connection’ tab opens. Here we can configure how and what data will be loaded using Airbyte, including:

  • How often the download will occur
  • Where exactly we will put the data in the storage
  • What data we will take from the source and how (Full refresh | Increment or Overwrite | Append)
  • Whether it is necessary to normalize the data (raw data will still remain in the storage)
  • Whether to set up custom transformations (optional)

data analytics consulting services

After filling in the fields, click ‘Set up a connection’. Next, the connection mode page opens, where we see the data as it is uploaded and make changes in the established source–storage connection.

data consulting services

If you click on the ‘Sync now’ button, it will bring up execution logs.

 

Results

Once the download has been completed successfully, we can view the storage vault.

data engineering services

You will easily be able to see the number of generated tables.

data integration consulting

Since we noted that we need to normalize the data, the result is a little less than 400 tables for the [activities, ‘ad_account’, ‘ad_creatives’, ‘ad_sets’, ‘ads’, ‘ads_insights’, ‘campaigns’] streams, and many of them turned out to be empty (as generated by Airbyte).

machine learning predictive analytics

Raw data is found in tables named ‘_airbyte_raw_*’. Such tables have only three fields: ‘_airbyte_ab_id’, ‘_airbyte_data’, and ‘_airbyte_emmited_at’. The column ‘_airbyte_data’ is of the type jsonb. The following is and example for ‘_airbyte_raw_ads_insights’:

advancing analytics

In the process of normalizing each field from json, you end up with a separate column. For example, here is part of a list of columns from the ‘ads_insights’ table:

analytics engineering

After receiving the usual normalized table for each stream, various failures are suspected, for example, for ‘ads_insights’:

bi consulting

You need this guide, because it includes:

  • evolution of data stack
  • reasons why many well-known tools do not meet the challenges of the new world
  • vital structure of modern data stack
  • the newest tools for every part of modern data stack
mail

Turn it on to get exclusive guide on modern data stack

Emails suck. This newsletter doesn’t

Subscribe to the newsletter and get the most useful guide on modern data stack

The newsletter

You will also receive other useful materials on data analysis hacks with case examples from our company.