A lakehouse engine built for all three workloads.
DataLynxr's query engine, streaming runtime, and ML layer all share the same storage cursor — not separate stores bolted together. That's the difference.
What's inside the engine
Unified Storage Cursor
SQL queries, streaming ingestion, and ML feature reads all target the same lakehouse tables via a single storage abstraction. No copies, no divergence.
Vectorized SQL Engine
Column-store vectorized execution with JIT compilation. Sub-second P95 latency on 10TB datasets against S3-backed Iceberg and Delta tables.
Streaming Ingest Runtime
Kafka, Kinesis, Pulsar to lakehouse tables with exactly-once semantics and ACID transaction support. Sub-5s end-to-end latency, no landing zone.
ML Feature Layer
Python SDK exposes point-in-time correct reads from lakehouse tables. Training and inference use the same path — zero training/serving skew by architecture.
ACID Transactions
Full ACID guarantees across all workload types. Snapshot isolation, time-travel queries (up to 7 days), and Z-ordering for file-level compaction.
Multi-Cloud Storage
S3, GCS, and ADLS Gen2 native connectors. Data stays in your account. No storage copy, no lock-in. Bring your own bucket policy.
Iceberg, Delta, Hudi — your choice.
DataLynxr reads and writes all three major open table formats. You pick the format that fits your existing stack.
Apache Iceberg
Full read/write support including partition evolution, schema evolution, hidden partitioning, and branching/tagging.
- Partition evolution
- Schema evolution
- Hidden partitioning
- Branch & tag
Delta Lake
Full Delta protocol support: DML operations (MERGE, UPDATE, DELETE), Change Data Feed, liquid clustering.
- DML (MERGE / UPDATE / DELETE)
- Change Data Feed
- Liquid clustering
- Column stats
Apache Hudi
Copy-on-Write and Merge-on-Read tables. Incremental queries and bootstrapping from existing Hudi datasets.
- Copy-on-Write
- Merge-on-Read
- Incremental queries
- Bootstrap from existing
Stateless compute. Decoupled storage.
DataLynxr separates compute from storage at the protocol level. Your query nodes are stateless and horizontally autoscalable. Your data lives in your cloud object storage, not in our infrastructure.
The catalog layer tracks all table metadata and transaction logs. Every read — SQL, streaming, or ML — goes through the same catalog transaction before touching storage. That's how ACID is enforced across workload types.
↓ same cursor ↓
↓ your bucket ↓
Connects to your stack
What DataLynxr is not
Knowing what a tool doesn't do is as important as knowing what it does.
Not a data warehouse
DataLynxr does not store your data in proprietary columnar storage. Your data lives in your S3/GCS/ADLS bucket in open Parquet format. If you need a traditional OLAP warehouse with its own storage layer, Snowflake or Redshift is the right tool.
Not a BI or dashboarding tool
DataLynxr provides the SQL query interface. Visualization and dashboarding are handled by tools that connect to it: Tableau, Metabase, Apache Superset, or any JDBC/ODBC-compatible client. DataLynxr doesn't build charts.
Not a managed Spark service
There is no Spark runtime in DataLynxr. The query engine is purpose-built and vectorized. If your workload depends on Spark-specific APIs (RDD operations, Spark MLlib, SparkR), you need a Spark-native platform such as Databricks or EMR.
One platform. Three workloads. No ETL.
Connect your cloud storage and run your first query in under 10 minutes.