In this article, we’ll show you how to deploy Airbyte on your computer and use it to connect to the Facebook Marketing API.
What is Airbyte?
Airbyte is an open-source data integration platform for building ELT pipelines that helps you replicate your data across your different types of storage, data lakes, and databases. In addition, it has a user-friendly interface that is also beautifully designed (just look at that cute octopus!).
Deploying Airbyte
Installing Airbyte on a machine is quite simple. You can be assured of that by looking at the official documentation. You can see you only need to follow a few steps:
As soon as the Airbyte banner appears in the terminal output, the UI becomes available at http://localhost:8000.
Connect to Facebook
Full documentation on working with the Facebook Marketing API from Airbyte can be found here. We will highlight and explain only the key points. So, in order to connect Facebook to Airbyte, you need to consider the following:
Before this request, you’ll need to go through the business verification process.
After connecting to Facebook, you can connect Facebook to Airbyte using the following steps:
An error may occur if your version of the framework is outdated. In that case, you will need to go to the main ‘Settings’ tab (the gear in the lower-left corner), select ‘Sources’, find Facebook Marketing in the list, and click the ‘Change’ button (or click ‘Upgrade all’ to download all available updates).
Connecting the data warehouse
After connecting to Facebook Marketing, we need to link this source to our data store.
Setting up the connection between the source and the data store
After all the previous steps have been taken, the ‘Set up connection’ tab opens. Here we can configure how and what data will be loaded using Airbyte, including:
After filling in the fields, click ‘Set up a connection.’ Next, the connection mode page opens, where we see the data as it is uploaded and make changes in the established source–storage connection.
If you click on the ‘Sync now’ button, it will bring up execution logs.
Once the download has been completed successfully, we can view the storage vault.
You will easily be able to see the number of generated tables.
Since we noted that we need to normalize the data, the result is a little less than 400 tables for the [activities, ‘ad_account’, ‘ad_creatives’, ‘ad_sets’, ‘ads’, ‘ads_insights’, ‘campaigns’] streams, and many of them turned out to be empty (as generated by Airbyte).
Raw data is found in tables named ‘_airbyte_raw_*’. Such tables have only three fields: ‘_airbyte_ab_id’, ‘_airbyte_data’, and ‘_airbyte_emmited_at’. The column ‘_airbyte_data’ is of the type jsonb. The following is and example for ‘_airbyte_raw_ads_insights’:
In the process of normalizing each field from json, you end up with a separate column. For example, here is part of a list of columns from the ‘ads_insights’ table:
After receiving the usual normalized table for each stream, various failures are suspected, for example, for ‘ads_insights’:
You need this guide, because it includes:
Turn it on to get exclusive guide on modern data stack
Emails suck. This newsletter doesn’t
Subscribe to the newsletter and get the most useful guide on modern data stack
The newsletter
You will also receive other useful materials on data analysis hacks with case examples from our company.