Data Pipeline Development & ETL Processing

Setting Up Data Engineering Processes and Looker Tables for Improved Data Pipelines and Telephony Insights for Aircall

Setting Up Data Engineering Processes and Looker Tables for Improved Data Pipelines and Telephony Insights

Impact
99.9%
Pipeline Reliability
Automated
Daily Refresh
Self-service
Customer Dashboards

Built production-grade data pipeline with 99.9% reliability, enabling customer-facing analytics for Aircall's telephony platform.

The Challenge

Aircall, a leading French cloud-based call center and phone system, needed to deliver meaningful telephony analytics to their customers. The raw call data was stored in S3 buckets, but there was no pipeline to transform it into structured, queryable data — and no visualization layer for customers to explore their telephony insights.

Aircall’s engineering team was focused on core product development and didn’t have the specialized data engineering capacity to build a production-grade pipeline from S3 to Amazon Redshift and then to Looker. They needed an expert partner with deep experience in cloud data infrastructure, pipeline orchestration, and BI implementation.

The technical requirements were demanding. Call data is inherently high-volume and time-sensitive — thousands of calls per day across hundreds of client accounts, each with detailed metadata including duration, wait times, agent assignments, call outcomes, and customer satisfaction indicators. The pipeline needed to process this volume reliably every day while maintaining data accuracy that Aircall’s enterprise customers would trust for their own business decisions. Any data quality issues would reflect poorly on Aircall’s core product offering.

Our Approach

We structured the engagement into three workstreams, each building on the previous:

  • Data Pipeline Architecture: We designed and implemented Python scripts and AWS Glue jobs to extract raw telephony data from S3, transform it through a series of cleaning and enrichment steps, and load it into Amazon Redshift. The pipeline was built to run on a daily basis with built-in monitoring and alerting for failures.
  • Data Modeling: In Redshift, we built a dimensional data model optimized for telephony analytics — call volumes, duration distributions, agent performance, peak hours, and queue metrics. Working data marts were designed to support both aggregate reporting and drill-down analysis at the individual call level.
  • Customer-Facing Analytics: We built flexible Looker dashboards that Aircall’s customers could use for enhanced data analysis. The dashboards featured interactive filters for date ranges, agent groups, and call types, with drill-down capabilities from summary metrics to individual call records.

A key design consideration was scalability. The pipeline needed to handle growing call volumes as Aircall onboarded more customers, so we built the architecture with partitioning strategies and incremental processing to keep costs and latency manageable at scale.

We implemented a comprehensive testing framework that compared pipeline outputs against known test datasets with each release, ensuring that code changes never silently altered business logic. Error handling was designed to be graceful — if one client’s data had issues, the pipeline would quarantine that data and continue processing other clients, rather than failing entirely. This “blast radius containment” approach was essential for a multi-tenant data pipeline serving enterprise customers.

Results

  • Production-grade data pipeline processing daily call data from S3 to Redshift with 99.9% reliability.
  • Regularly updated data marts providing near-real-time insights for Aircall’s customer base.
  • Flexible Looker dashboards enabling Aircall customers to analyze call patterns, agent performance, and customer experience metrics independently.
  • Scalable architecture designed to handle 10x growth in call volume without rearchitecting.
  • Comprehensive documentation enabling Aircall’s internal team to maintain and extend the solution.

Technologies Used

Python, AWS Glue, Amazon S3, Amazon Redshift, Looker, SQL, AWS CloudWatch for monitoring.

Project Screenshots

Facing similar data challenges?

Book a Discovery Call →

Key Takeaways

01

Understand the Client’s domain and dive into details to build a data pipeline with the business objectives and specifics in mind.

02

Set up all environments (test/stage/dev/prod) for the project during the initial stages.

03

Get familiar with the required stack of tools, AWS Glue in this case, for enhanced performance and a faster delivery time.

Telecom France

Have a similar challenge?
Let's talk about your data

A 30-minute conversation about your data stack, pain points, and opportunities.

30-min video call No commitment Actionable next steps

Explore related projects

View All Case Studies →
Need help with your data strategy? Book a Discovery Call →