Building a scalable data engineering pipeline processing 1B+ events daily with Airflow, dbt, and ClickHouse.
Modern Data Platform
dataWe built a modern data platform that ingests data from 20+ sources, processes billions of events daily, and provides sub-second analytics for business users. The platform enables self-service analytics while maintaining data quality and governance.
Set up Debezium CDC for real-time change data capture from PostgreSQL. Built connectors for databases, APIs, and file sources.
Deployed Airflow with custom operators. Created 100+ DAGs for various data pipelines.
Implemented dbt models with testing and documentation. Created standardized metrics definitions.
Deployed ClickHouse cluster for OLAP queries. Built materialized views for common aggregations.