Setting Up Data Engineering Processes and Looker Tables for Improved Data Pipelines and Telephony Insights for Aircall
Setting Up Data Engineering Processes and Looker Tables for Improved Data Pipelines and Telephony Insights
Built production-grade data pipeline with 99.9% reliability, enabling customer-facing analytics for Aircall's telephony platform.
The Challenge
Aircall, a leading French cloud-based call center and phone system, needed to deliver meaningful telephony analytics to their customers. The raw call data was stored in S3 buckets, but there was no pipeline to transform it into structured, queryable data — and no visualization layer for customers to explore their telephony insights.
Aircall’s engineering team was focused on core product development and didn’t have the specialized data engineering capacity to build a production-grade pipeline from S3 to Amazon Redshift and then to Looker. They needed an expert partner with deep experience in cloud data infrastructure, pipeline orchestration, and BI implementation.
The technical requirements were demanding. Call data is inherently high-volume and time-sensitive — thousands of calls per day across hundreds of client accounts, each with detailed metadata including duration, wait times, agent assignments, call outcomes, and customer satisfaction indicators. The pipeline needed to process this volume reliably every day while maintaining data accuracy that Aircall’s enterprise customers would trust for their own business decisions. Any data quality issues would reflect poorly on Aircall’s core product offering.
Our Approach
We structured the engagement into three workstreams, each building on the previous:
- Data Pipeline Architecture: We designed and implemented Python scripts and AWS Glue jobs to extract raw telephony data from S3, transform it through a series of cleaning and enrichment steps, and load it into Amazon Redshift. The pipeline was built to run on a daily basis with built-in monitoring and alerting for failures.
- Data Modeling: In Redshift, we built a dimensional data model optimized for telephony analytics — call volumes, duration distributions, agent performance, peak hours, and queue metrics. Working data marts were designed to support both aggregate reporting and drill-down analysis at the individual call level.
- Customer-Facing Analytics: We built flexible Looker dashboards that Aircall’s customers could use for enhanced data analysis. The dashboards featured interactive filters for date ranges, agent groups, and call types, with drill-down capabilities from summary metrics to individual call records.
A key design consideration was scalability. The pipeline needed to handle growing call volumes as Aircall onboarded more customers, so we built the architecture with partitioning strategies and incremental processing to keep costs and latency manageable at scale.
We implemented a comprehensive testing framework that compared pipeline outputs against known test datasets with each release, ensuring that code changes never silently altered business logic. Error handling was designed to be graceful — if one client’s data had issues, the pipeline would quarantine that data and continue processing other clients, rather than failing entirely. This “blast radius containment” approach was essential for a multi-tenant data pipeline serving enterprise customers.
Results
- Production-grade data pipeline processing daily call data from S3 to Redshift with 99.9% reliability.
- Regularly updated data marts providing near-real-time insights for Aircall’s customer base.
- Flexible Looker dashboards enabling Aircall customers to analyze call patterns, agent performance, and customer experience metrics independently.
- Scalable architecture designed to handle 10x growth in call volume without rearchitecting.
- Comprehensive documentation enabling Aircall’s internal team to maintain and extend the solution.
Technologies Used
Python, AWS Glue, Amazon S3, Amazon Redshift, Looker, SQL, AWS CloudWatch for monitoring.
Project Screenshots
Facing similar data challenges?
Book a Discovery Call →Key Takeaways
Understand the Client’s domain and dive into details to build a data pipeline with the business objectives and specifics in mind.
Set up all environments (test/stage/dev/prod) for the project during the initial stages.
Get familiar with the required stack of tools, AWS Glue in this case, for enhanced performance and a faster delivery time.
Have a similar challenge?
Let's talk about your data
A 30-minute conversation about your data stack, pain points, and opportunities.
Or email directly: nick@valiotti.com
Explore related projects
A $25M ARR B2B SaaS with 200 employees suffered from data silos, no single source of truth, and rising churn.…
A $12M ARR SaaS platform had zero product analytics and no A/B testing capability. We built their experimentation infrastructure from…
Delivered a complete data strategy and 12-month roadmap for a $6M pet-tech marketplace in 4 weeks. Unified 10+ fragmented data…




