Collecting Data on Ad Campaigns from VK.com

We have a lot to share in today’s longread: we’ll retrieve data on ad campaigns from Vkontakte (widely popular social network in Russia and CIS countries) and compare them to Google Analytics data in Redash. This time we don’t need to create a server, as our data will be transferred to Google Docs via Google Sheets API.

Getting an Access Token
We need to create an app to receive our access token. Follow this link https://vk.com/apps?act=manage and click “Create app” on the developer’s page. Choose a name for your app and check it as a “Standalone app”. Then, click Settings in the left menu and save your app ID.

More details on access tokens can be found here: Getting an access token

Copy this link:

https://oauth.vk.com/authorize?client_id=YourClientID&scope=ads&response_type=token

And change YourClientID to your app ID, this will allow you to get information about your advertising account. Open this link in your browser and you will be redirected to another page, which URL address holds your generated access token.

Access token expires in 86400 seconds or 24 hours. If you want to generate a token with an unlimited lifetime period, just pass scope to the offline parameter. In case if you need to generate a new token – change your password account or terminate all active sessions in security settings.

You will also need your advertising account ID to make API requests. It can be found via this link, just copy it:  https://vk.com/ads?act=settings

Using APIs to collect data
Let’s write a script that would allow us to retrieve information on all user’s ad campaigns: number of impressions, сlicks and costs. The script will pass this data to a DataFrame and send it to Google Docs.

from oauth2client.service_account import ServiceAccountCredentials
from pandas import DataFrame
import requests
import gspread
import time

We have several constant variables: access token, advertising account ID and Vkontakte API Version. Here we are using the most recent API version, which is 5.103.

token = 'fa258683fd418fafcab1fb1d41da4ec6cc62f60e152a63140c130a730829b1e0bc'
version = 5.103
id_rk = 123456789

To get advertising stats you need to use the  ads.getStatistics method and pass your ad campaign ID to it. Since we don’t run any advertisements yet, we’ll use the  ads.getAds method that returns IDs of ads and campaigns.

Learn more about the API methods available for Vkontakte here

Use the requests library to send a request and convert the response to JSON.


campaign_ids = []
ads_ids = []
r = requests.get('https://api.vk.com/method/ads.getAds', params={
    'access_token': token,
    'v': version,
    'account_id': id_rk
})
data = r.json()['response']

We have a familiar list of dictionaries returned, similar to the one we have reviewed in the previous article, “Analysing data on Facebook Ad Campaigns with Redash”.

Fill in the ad_campaign_dict dictionary as follows: specify ad ID as a key, and campaign ID as a value, where this ad belongs to.

ad_campaign_dict = {}
for i in range(len(data)):
    ad_campaign_dict[data[i]['id']] = data[i]['campaign_id']

Having ID for every ad needed we can invoke the  ads.getStatistics method to collect data on the number of impressions, clicks, costs, and dates for a particular ad, so create several empty lists in advance.

ads_campaign_list = []
ads_id_list = []
ads_impressions_list = []
ads_clicks_list = []
ads_spent_list = []
ads_day_start_list = []
ads_day_end_list = []

We need to invoke the getStatistics method for each ad separately, let’s refer to the ad_campaign_dict and iterate our requests. Retrieve all-time data by calling the ‘period’ method with the  ‘overall’ value. Some ads may not have impression or clicks if they haven’t been launched yet, this may cause a  KeyError. Let’s recall to the try — except approach to handle this error.

for ad_id in ad_campaign_dict:
        r = requests.get('https://api.vk.com/method/ads.getStatistics', params={
            'access_token': token,
            'v': version,
            'account_id': id_rk,
            'ids_type': 'ad',
            'ids': ad_id,
            'period': 'overall',
            'date_from': '0',
            'date_to': '0'
        })
        try:
            data_stats = r.json()['response']
            for i in range(len(data_stats)):
                for j in range(len(data_stats[i]['stats'])):
                    ads_impressions_list.append(data_stats[i]['stats'][j]['impressions'])
                    ads_clicks_list.append(data_stats[i]['stats'][j]['clicks'])
                    ads_spent_list.append(data_stats[i]['stats'][j]['spent'])
                    ads_day_start_list.append(data_stats[i]['stats'][j]['day_from'])
                    ads_day_end_list.append(data_stats[i]['stats'][j]['day_to'])
                    ads_id_list.append(data_stats[i]['id'])
                    ads_campaign_list.append(ad_campaign_dict[ad_id])
        except KeyError:
            continue

Now, create a DataFrame and print out the first 5 data points

df = DataFrame()
df['campaign_id'] = ads_campaign_list
df['ad_id'] = ads_id_list
df['impressions'] = ads_impressions_list
df['clicks'] = ads_clicks_list
df['spent'] = ads_spent_list
df['day_start'] = ads_day_start_list
df['day_end'] = ads_day_end_list
print(df.head())

Exporting Data to Google Docs
We’ll need a Google API access token, navigate to https://console.developers.google.com and create one. Choose any name you like, then go to your Dashboard and click “Enable APIs and Services”. Choose Google Drive API from the list, enable it and do exactly the same for Google Sheets API.

After activation you will be redirected to the API control panel. Click Credentials – Create Credentials, click choose data type and create an account. Choosing a role is optional, just proceed and specify JSON as a key type.

After these steps you can download a JSON file with your credentials, we’ll rename it to «credentials.json». On the main page you’ll find the email field – copy your email address.

Go to https://docs.google.com/spreadsheets and create a new file named data, we’ll pass data from our DataFrame to it. Put the  credentials.json file in one directory with the script and continue coding. Add these links to the scope list:

scope = ['https://spreadsheets.google.com/feeds', 'https://www.googleapis.com/auth/drive']

We will use the  ServiceAccountCredentials.from_json_keyfile_name and  gspread.authorize methods available in the  oauth2client and  gspread libraries for authenticaion process. Specify your file name and the scope variable in the ServiceAccountCredentials.from_json_keyfile_name method. The  sheet variable will allow us to send requests to our file in Google Docs.

creds = ServiceAccountCredentials.from_json_keyfile_name('credentials.json', scope)
client = gspread.authorize(creds)
sheet = client.open('data').sheet1

Apply the update_cell method to enter new value in a table cell. It’s worth mentioning that the indexing starts at 0, not 1. With the first loop we’ll move the column names of our DataFrame. And with the following loops we’ll move the rest of our data points. The default limits allow us to make 100 loops for 100 seconds. These restrictions may cause errors and stop our script, that’s why we need to use time.sleep and make the script sleep for 1 second after each loop.

count_of_rows = len(df)
count_of_columns = len(df.columns)
for i in range(count_of_columns):
    sheet.update_cell(1, i + 1, list(df.columns)[i])
for i in range(1, count_of_rows + 1):
    for j in range(count_of_columns):
        sheet.update_cell(i + 1, j + 1, str(df.iloc[i, j]))
        time.sleep(1)

In case of success, you’ll get the same table:

Exporting data to Redash

See how you can connect Google Analytics to Redash in this article «How to connect Google Analytics to Redash?».

Having a table with Google Analytics and ad campaigns from Vkontakte exported we can compare them by writing the following query:

SELECT
    query_50.day_start,
    CASE WHEN ga_source LIKE '%vk%' THEN 'vk.com' END AS source,
    query_50.spent,
    query_50.impressions,
    query_50.clicks,
    SUM(query_49.ga_sessions) AS sessions,
    SUM(query_49.ga_newUsers) AS users
FROM query_49
JOIN query_50
ON query_49.ga_date = query_50.day_start
WHERE query_49.ga_source LIKE '%vk%' AND DATE(query_49.ga_date) BETWEEN '2020-05-16' AND '2020-05-20'
GROUP BY query_49.ga_date, source

ga_source — the traffic source, from which a user was redirected. Use the  CASE method to combine everything that contains “vk” in one column called «vk.com». With the help of JOIN operator we can add the table with the data on ad campaigns, merging by date. Let’s take the day of the last ad campaign and a couple of days after, this will result in the following output:

Takeaways
Now we have a table that reflects how much were spent in ad costs on a certain day, the number of users who viewed this ad, were engaged and redirected to our website, and then completed the sign-up process.

Share
Send
Pin
 118   1 mon   BI-tools   data analytics   Data engineering   longread
Popular