19.05.2022 | sbekker

How to run Airbyte and get the the most value of the Facebook API

In this article, we’ll show you how to deploy Airbyte on your computer and use it to connect to the Facebook Marketing API.

What is Airbyte?

Airbyte is an open-source data integration platform for building ELT pipelines that helps you replicate your data across your different types of storage, data lakes, and databases. In addition, it has a user-friendly interface that is also beautifully designed (just look at that cute octopus!).

 

Deploying Airbyte

Installing Airbyte on a machine is quite simple. You can be assured of that by looking at the official documentation. You can see you only need to follow a few steps:

  1. Install Docker (see instructions) and ensure you have the latest version of docker-compose.
  2. Run the following commands in your terminal:
    1. git clone https://github.com/airbytehq/airbyte.git
    2. cd airbyte
    3. docker-compose-up

As soon as the Airbyte banner appears in the terminal output, the UI becomes available at http://localhost:8000.

Connect to Facebook

Full documentation on working with the Facebook Marketing API from Airbyte can be found here. We will highlight and explain only the key points. So, in order to connect Facebook to Airbyte, you need to consider the following:

  • Having a Facebook Ad Account ID (more info)
  • Availability of Facebook App with enabled Marketing API (more info)
  • Availability of Facebook Marketing API Access Token (more info)
  • Requesting a rate increase limit: Facebook heavily restricts API tokens generated from Facebook Apps with the ‘Standard Access’ tier (the default tier for new apps), making it infeasible to use the tokens for syncs with Airbyte. You’ll need to request an upgrade to ‘Advanced Access’ for your app on the following permissions (more info):
    • Ads Management Standard Access
    • Ads_read
    • Ads_management

Before this request, you’ll need to go through the business verification process (more info).

 

After connecting to Facebook, you can connect Facebook to Airbyte using the following steps:

  1. Go to the UI on the ‘Sources tab’ and click ‘+ new source’.

  2. Select Facebook Marketing as the source and set the connection configuration:
    • The name of the source under which it will appear in Airbyte
    • Facebook Ad Account ID
    • Start date – data from Facebook will be loaded from this date (⚠️WARNING: Insight tables are only able to pull data from the last 37 months)
    • End date – data from Facebook will be loaded until this date (this can be left blank to receive the latest data from the account)
    • Facebook Marketing API Access Token

  3. There are also several additional configuration options:
    • Enable remote Campaigns, Ads, and AdSets data
    • Saving thumbnail images
    • Setting Up Custom Insights

  4. After filling in the fields, click ‘Set up source’ and wait until the connection process is completed.

  5. If necessary, the settings for connecting to the source can be changed on the ‘Settings’ tab.

An error may occur if your version of the framework is outdated. In that case, you will need to go to the main ‘Settings’ tab (the gear in the lower-left corner), select ‘Sources’, find Facebook Marketing in the list, and click the ‘Change’ button (or click ‘Upgrade all’ to download all available updates).

Connecting the data warehouse

After connecting to Facebook Marketing, we need to link this source to our data store.

  1. Click ‘Add destination’ on the tab with the configured source (look at the previous screenshot) or go to the ‘Destinations’ tab and click ‘+ new destination’ there.

  2. Next, select the type of storage we will work with. Airbyte can connect to a large number of popular DBMSs. Let’s look at the necessary configuration steps using the example of working with PostgreSQL. Fill in the following fields:
    • The name of the storage under which it will appear in Airbyte
    • Host
    • Port
    • Database
    • Scheme
    • User
    • Password
    • Whether we need to use SSL
    • SSH tunnel settings (optional)

  3. After filling in the fields, click ‘Set up destination’ and wait until the connection process is completed.
  4. If you connected to the repository separately from the ‘Destinations’ tab, then you need to associate a source with it by clicking on ‘Add source’.

  5. If necessary, the settings for connecting to the storage can be changed on its ‘Settings’ tab.
  6. The version of the image for working with the repository can be updated if necessary (the instructions would be similar to those at the end of the previous section).

Setting up the connection between the source and the data store

After all the previous steps have been taken, the ‘Set up connection’ tab opens. Here we can configure how and what data will be loaded using Airbyte, including:

  • How often the download will occur
  • Where exactly we will put the data in the storage
  • What data we will take from the source and how (Full refresh | Increment or Overwrite | Append)
  • Whether it is necessary to normalize the data (raw data will still remain in the storage)
  • Whether to set up custom transformations (optional)

After filling in the fields, click ‘Set up a connection’. Next, the connection mode page opens, where we see the data as it is uploaded and make changes in the established source–storage connection.

If you click on the ‘Sync now’ button, it will bring up execution logs.

 

Results

Once the download has been completed successfully, we can view the storage vault.

You will easily be able to see the number of generated tables.

Since we noted that we need to normalize the data, the result is a little less than 400 tables for the [activities, ‘ad_account’, ‘ad_creatives’, ‘ad_sets’, ‘ads’, ‘ads_insights’, ‘campaigns’] streams, and many of them turned out to be empty (as generated by Airbyte).

Raw data is found in tables named ‘_airbyte_raw_*’. Such tables have only three fields: ‘_airbyte_ab_id’, ‘_airbyte_data’, and ‘_airbyte_emmited_at’. The column ‘_airbyte_data’ is of the type jsonb. The following is and example for ‘_airbyte_raw_ads_insights’:

In the process of normalizing each field from json, you end up with a separate column. For example, here is part of a list of columns from the ‘ads_insights’ table:

After receiving the usual normalized table for each stream, various failures are suspected, for example, for ‘ads_insights’: