Why ETL Copy Jobs Are Eating Your Cloud Bill
Every ETL job that copies data into a warehouse also doubles your egress bill, your storage bill, and your data freshness lag. Here's what the math looks like.
Engineering posts on data lakehouses, streaming, SQL performance, and ML from the DataLynxr team in Denver.
Every ETL job that copies data into a warehouse also doubles your egress bill, your storage bill, and your data freshness lag. Here's what the math looks like.
We ran TPC-DS at 10 TB against a popular cloud data warehouse with ETL from S3, and directly against the same S3 Iceberg tables. The results surprised us.
Kafka + Delta Lake + idempotent writes = exactly-once delivery without a distributed transaction manager. Walking through the checkpoint protocol step by step.
Most training/serving skew comes from using different feature pipelines for batch training and real-time serving. Iceberg time-travel gives you a better option.
Our engineering team ran the full TPC-DS suite at 10 TB scale. Here's the methodology, the raw numbers, and what they mean for teams considering a lakehouse migration.
Why we chose to bootstrap DataLynxr, what that means for product focus, and how Denver's data engineering community has shaped how we build.
Pushing compute to where data lives — instead of moving data to compute — eliminates cross-region egress on queries you're already paying for. A practical breakdown.