10 posts tagged


Analysing data on Facebook Ad Campaigns with Redash

Our previous article was dedicated to collecting data on Facebook ad campaigns. Today we will analyze this data using Redash. As a first step, we need to upload our script to AWS cloud and create a server with AIOHTTP before passing the data to Redash. Let’s improve the script a little bit for this task:

from facebook_business.api import FacebookAdsApi
from facebook_business.adobjects.adaccount import AdAccount
from facebook_business.adobjects.adreportrun import AdReportRun
from facebook_business.adobjects.adsinsights import AdsInsights
from facebook_business.adobjects.adaccountuser import AdAccountUser as AdUser
from facebook_business.exceptions import FacebookRequestError
import time

Redash receives data in JSON format, which we haven’t covered yet. A JSON file is a file that consists of a list of dictionaries. The script will convert it to JSON and pass the data to Redash. We’ll need the variable that stores our access token, app id, app secret and two functions: wait_for_async_job() and get_insights(). The latter function receives account parameters and collects data on ad campaigns. Select the following fields: clicks, impressions, costs and dates.

We need the entire script from our previous article – Collecting Data on Facebook Ad Campaigns

The return_json() function will call the get_insights(), which will append data to insights_lists. Since we may have several ad campaigns, our output can be a list of lists with dictionaries instead of just a list of dictionaries. Create a simple lambda expression that will smooth the output and return our insights_lists_flatten. Now, the script returns a list of dictionaries:

def return_json():
   insights_lists = []
   date_preset = 'last_year'
   for elem in my_accounts:
       elem_insights = get_insights(elem, date_preset)
   flatten = lambda lst: [item for sublist in lst for item in sublist]
   insights_lists_flatten = flatten(insights_lists)
   return insights_lists_flatten

We also need a AIOHTTP server that will return our output as JSON. Create a new file, import the AIOHTTP library and the get_json() function from the preceding script. Write a simple query handler: the script will receive data from Facebook asynchronously and our asynchronous handler function will sleep until all data is collected and transferred. The function uses json_response to transfer data in json format.

from aiohttp import web
from get_json import return_json
async def handler(request):
   data = return_json()
   return web.json_response(data)

Initialize and run our application.

app = web.Application()
app.add_routes([web.get('/json', handler)])

Now, go to AWS cloud, create a new folder and upload the two scripts via SFTP connection. Check if the needed port is open, click console — network & security — security groups — default.

Run our file from the server. You can check whether it works by accessing it via server IP address specifying 0880 in the route field. Connect to Redash using URL and we’ll get the same table returned by our script:

url: server ip

Having query results, we can write the following query:

select date_start, sum(clicks) as clicks, sum(impressions) as impressions, sum(spend) as spend from query_45
group by date_start

It returns this table, grouped by the date_start column:

We can now plot this data on the chart, let’s see if there is any correlation between ad costs and clicks:

Bingo! Next time, we will show you how to get data on ad campaigns from Vkontakte.

 No comments    229   23 d   BI-tools   data analytics   redash

Cohort analysis in Redash

In one of the previous articles we have reviewed building of Retention-report and have partially addressed the concept of cohorts therein.
Cohort usually implies group of users of a product or a company. Most often, groups are allocated on the basis of time of app installation / appearance of a user in a system.
It turns out, that, using cohort analysis, one can track down how the changes in a product affected the behaviour of users (for example, of old and new users).

Along with that, cohorts can be defined also proceeding from other parameters: geography of a user, traffic source, device platform and other important parameters of your product.

We will figure out, how to compare Retention of users of weekly cohorts in Redash, since Redash has special type of visualization for building such type of report.
Firstly, let’s sort out SQL-query. We still have two tables – user (id of a user and time of app installation) and client_session – timestamps (created_at) of activity of each user (user_id). Let’s consider the Retention of the first seven days for last 60 days.
The query is written in Cloudera Impala, let’s review it.

For starters, let’s build the total size of cohorts:

select trunc(from_unixtime(user.installed_at), "WW") AS cohort_week, 
	ndv(distinct user.id) as cohort_size //counting the number of users in the cohort
	from user 
	where from_unixtime(user.installed_at) between date_add(now(), -60) and now() //taking registered users for last 60 days
group by trunc(from_unixtime(user.installed_at), "WW")

The second part of the query can calculate the quantity of active users for every day during the first thirty days:

select trunc(from_unixtime(user.installed_at), "WW") AS cohort_week, 
        datediff(cast(cs.created_at as timestamp),cast(user.installed_at as timestamp)) as days,
	ndv(distinct user.id) as value  //counting the number of active users for every day
		from user 
		left join client_session cs on user.id=cs.user_id
where from_unixtime(user.installed_at) between date_add(now(), -60) and now()
and from_unixtime(cs.created_at) >= date_add(now(), -60) //taking sessions for last 60 days
and datediff(cast(cs.created_at as timestamp),cast(user.installed_at as timestamp)) between 0 and 30 //cutting off only the first 30 days of activity
group by 1,2

Bottom line, all the query entirely:

select size.cohort_week, size.cohort_size, ret.days, ret.value from
(select trunc(from_unixtime(user.installed_at), "WW") AS cohort_week, 
		ndv(distinct user.id) as cohort_size 
	from user 
	where from_unixtime(user.installed_at) between date_add(now(), -60) and now()
group by trunc(from_unixtime(user.installed_at), "WW")) size
left join (select trunc(from_unixtime(user.installed_at), "WW") AS cohort_week, 
        datediff(cast(cs.created_at as timestamp),cast(user.installed_at as timestamp)) as days,
		ndv(distinct user.id) as value 
		from user 
		left join client_session cs on user.id=cs.user_id
where from_unixtime(user.installed_at) between date_add(now(), -60) and now()
and from_unixtime(cs.created_at) >= date_add(now(), -60)
and datediff(cast(cs.created_at as timestamp),cast(user.installed_at as timestamp)) between 0 and 30
group by 1,2) ret on size.cohort_week=ret.cohort_week

Great, now correctly calculated data is available to us.

Data of cohorts in tabular form

Let’s create new visualization in Redash and indicate the parameters correctly:

It’s important to indicate the parameters correctly – the columns of the resulting query are compliant therewith.

Let’s make sure to indicate that we have weekly cohorts:

Voila, our visualization of cohorts is ready:

You can add filters and parameters to it and use in a dashboard

Materials on the topic

 No comments    712   3 mon   BI-tools   redash   sql   visualisation

Using parameters in Redash

The most convenient and useful thing in Redash is parameters. Parameters can be both in a report and in dashboards.
Parameter – is an element of the interface, controlled by a user producing a report.

Parameter in a report can be of the following types:

Each type explained separately:

  • Text – text input field, can be used in constructions of LIKE type, such parameter can’t be applied in public dashboards
  • Number – number, entered by a user
  • Dropdown list – list of values, from which a user can select just one or several values (not long ago, an option of multi-entry of parameters in a dropdown list appeared)
  • Query dropdown list – similarly to the previous one, however the values will be taken from the existing query results
  • Date / Date and Time / Date and Time (with seconds) – fields of date entry
  • Date Range / Date and Time Range / Date and Time Range (with seconds) – fields of date ranges entry.
    It’s convenient to use for the following construction
between '{{parameter.start}}' and '{{parameter.end}}'

In case of dashboard, a situation might arise, when a parameter is named the same way in several queries, then it will become common for all the dashboard, it’s handy.

One of not obvious, but rather useful solutions: how to make a parameter disabled?
Let’s assume, that parameter of “dropdown list” type is named parameter and we want to set it for a column geo of a table, then the code of the query will look approximately like that:

    ('{{parameter}}' = 'Disabled' or geo = '{{parameter}}')

At that, surely, ‘Disabled’ should be added to the values of the dropdown list.

 No comments    318   4 mon   BI-tools   parameter   redash

Setting up alerts in Redash

A very handy function on alert building is implemented in Redash. Alerts stand for notifications, arising at adjustment of some specific indicator. At that, the degree of change is set manually by a user on his own in the interface. Alerts can be set to a mail or to a channel / direct messages Slack.

Let’s recall how to collect the data, using Google Analytics, and set up alerts for this data, applying the internal Redash database (query-results). As a foundation for an alert we will focus on the reduction of number of users on the site for the previous day by more than 30%.

Building a query to Google Analytics

    "ids": "ga:128886640",
    "start_date": "30daysAgo",
    "end_date": "yesterday",
    "metrics": "ga:users", 
    "dimensions": "ga:date"

As a result of performance of this query, we will receive a number of users for the last 30 days.

Turning on the storage of query results

One of the main features of redash is an opportunity of calling upon the results of query performance, that are stored in the internal DBMS SQLite.
In order to turn on the storage of results, we need to go to Data Sources and turn on query-results (beta).

And now, with a simple command presented below:

select * from query_37

we will receive a result, similar to the one of the previous query to GA, however this one has an opportunity of using SQL language for processing of the data set obtained.

Building a query for evaluation of changes within the number of users

In order to set up the alert, first of all we need to write a query, that will eventually provide us with a target indicator for check, in our case it is growth or reduction in the number of users on the website.
Lets write a query, calling upon the internal DBMS of Redash:

SELECT sum(CASE WHEN date(ga_date)=DATE('now', '-1 day') THEN ga_users ELSE NULL END) AS yesterday,
sum(CASE WHEN date(ga_date)=DATE('now', '-2 day') THEN ga_users ELSE NULL END) AS before_yesterday,
(sum(CASE WHEN date(ga_date)=DATE('now', '-1 day') THEN ga_users ELSE NULL END)*1.0/
sum(CASE WHEN date(ga_date)=DATE('now', '-2 day') THEN ga_users ELSE NULL END)*1.0-1)*100 AS difference
FROM query_37

In the above-shown query we are calculating the number of users for two previous days, and also evaluating the change of number of users in percentage.
In the current example, we have received the following table of data, that we will use further at the process of alert setting:

Now, in order for us to be able to receive alerts, we need to set query performance by schedule (regular update of results / scheduled query in redash terminology).

We set an update for 10 o’clock of every morning:

Setting alert

Going to the menu Create – Alert. Inserting the name of query, in my case it is “Alert on users”.
Hereafter, we can change the reflected name of notification or leave the one, proposed by the system.

Choosing a target metric within the example reviewed – difference. Below, in the comparison operator (Op) selecting less then and setting a value of -30.

In the block on the right you need to select where exactly the alert will be sent. You can read about the setting of alert’s path more thoroughly on the official website of Redash.

Now, the alert has appeared on the page with the list of alerts and is by default in the status OK.

As soon as the indicator exceeds the level that we’ve set, the status will change to TRIGGERED, and the alert will be sent to mail / to Slack.

More on the topic

 No comments    27   6 mon   alert   BI-tools   redash

Updates in Redash v8

Two weeks ago the final release of Redash edition 8 took place. The complete list of improvements can be found on the page of beta version of v8 release.

In my overview of the novelties, I will focus my attention only on those, that have the most significant impact on user’s experience of Redash utilization, i.e. that are more illustrative for you and me:

  • Support for multi-select in dropdown (and query dropdown) parameters.
  • Support for dynamic values in date and date-range parameters.
  • Search dropdown parameter values.
  • Pivot Table: support hiding totals.
  • New Visualization: Details View.

Support for multi-select in dropdown (and query dropdown) parameters.

In the 8th version of Redash we finally receive the long-awaited support of selection of several values in a parameter such as “dropdown list” or “dropdown list on the basis of the query”:

When we select several different clues, the above-mentioned values transform into a list, divided by a comma. At that, there is an opportunity to wrap the values of such a list into single quotes or double quotes.

For us it means an opportunity to use requirements of IN type with a parameter in the queries:

At application of such parameter values, we will receive the following URL in the line of the browser:

Support for dynamic values in date and date-range parameters.

Extremely handy and useful feature for selection of dates, that is often used in Tableau, however hadn’t been implemented into Redash until recently: utilization of dynamic values for dates:

Pressing the zipper button, you can select a random period, and Redash will insert the required values into the query on its own. Thus, for instance, you can quickly look through the statistics for last week or last month.

Search dropdown parameter values.

Let’s assume that we are using a parameter of a type “dropdown list on the basis of the query”, and in the query selected there are more than 300 resulting lines, based on which we need to find a value of our interest. In the 8th version this problem is solved by auto-complete and search by parameter values. In the example below I am inserting a line “Orga” and obtain all the values, in which this line is met, very convenient.

Pivot Table: support hiding totals.

Quite long-awaited feature, that allows to aggregate outcomes by rows / columns.

New Visualization: Details View

New presentation of data results: view of details by each line. I assume, it can be convenient at use in a dashboard, when you apply filtration by a specific user / partner.
It visualizes the names of columns and results of a specific line in a table form.

 No comments    21   6 mon   BI-tools   redash   visualisation
Earlier Ctrl + ↓