Engineering Blog

The Lakehouse Lab

Engineering posts on data lakehouses, streaming, SQL performance, and ML from the DataLynxr team in Denver.

Cost Engineering January 14, 2025

Why ETL Copy Jobs Are Eating Your Cloud Bill

Every ETL job that copies data into a warehouse also doubles your egress bill, your storage bill, and your data freshness lag. Here's what the math looks like.

SQL Analytics February 18, 2025

SQL on Lakehouse vs. Copy-to-Warehouse: A Honest Benchmark

We ran TPC-DS at 10 TB against a popular cloud data warehouse with ETL from S3, and directly against the same S3 Iceberg tables. The results surprised us.

Streaming March 28, 2025

Exactly-Once Streaming into Delta Tables: How It Works

Kafka + Delta Lake + idempotent writes = exactly-once delivery without a distributed transaction manager. Walking through the checkpoint protocol step by step.

Machine Learning May 6, 2025

Point-in-Time Correct ML Features Without a Dedicated Feature Store

Most training/serving skew comes from using different feature pipelines for batch training and real-time serving. Iceberg time-travel gives you a better option.

Benchmarks July 15, 2025

Benchmarking TPC-DS at 10 TB on Lakehouse Storage

Our engineering team ran the full TPC-DS suite at 10 TB scale. Here's the methodology, the raw numbers, and what they mean for teams considering a lakehouse migration.

Company October 3, 2025

Building Bootstrapped in Denver: Data Infrastructure Without VC Pressure

Why we chose to bootstrap DataLynxr, what that means for product focus, and how Denver's data engineering community has shaped how we build.

Cost Engineering December 9, 2025

Reducing Cloud Egress Costs with Lakehouse-Native Queries

Pushing compute to where data lives — instead of moving data to compute — eliminates cross-region egress on queries you're already paying for. A practical breakdown.