The Platform

A lakehouse engine built for all three workloads.

DataLynxr's query engine, streaming runtime, and ML layer all share the same storage cursor — not separate stores bolted together. That's the difference.

Abstract representation of DataLynxr's unified storage layer — three parallel channels funneling into a single cohesive data plane, rendered in deep teal and amber
Core Components

What's inside the engine

Unified Storage Cursor

SQL queries, streaming ingestion, and ML feature reads all target the same lakehouse tables via a single storage abstraction. No copies, no divergence.

Vectorized SQL Engine

Column-store vectorized execution with JIT compilation. Sub-second P95 latency on 10TB datasets against S3-backed Iceberg and Delta tables.

Streaming Ingest Runtime

Kafka, Kinesis, Pulsar to lakehouse tables with exactly-once semantics and ACID transaction support. Sub-5s end-to-end latency, no landing zone.

ML Feature Layer

Python SDK exposes point-in-time correct reads from lakehouse tables. Training and inference use the same path — zero training/serving skew by architecture.

ACID Transactions

Full ACID guarantees across all workload types. Snapshot isolation, time-travel queries (up to 7 days), and Z-ordering for file-level compaction.

Multi-Cloud Storage

S3, GCS, and ADLS Gen2 native connectors. Data stays in your account. No storage copy, no lock-in. Bring your own bucket policy.

Open Format Support

Iceberg, Delta, Hudi — your choice.

DataLynxr reads and writes all three major open table formats. You pick the format that fits your existing stack.

Apache Iceberg

Full read/write support including partition evolution, schema evolution, hidden partitioning, and branching/tagging.

  • Partition evolution
  • Schema evolution
  • Hidden partitioning
  • Branch & tag

Delta Lake

Full Delta protocol support: DML operations (MERGE, UPDATE, DELETE), Change Data Feed, liquid clustering.

  • DML (MERGE / UPDATE / DELETE)
  • Change Data Feed
  • Liquid clustering
  • Column stats

Apache Hudi

Copy-on-Write and Merge-on-Read tables. Incremental queries and bootstrapping from existing Hudi datasets.

  • Copy-on-Write
  • Merge-on-Read
  • Incremental queries
  • Bootstrap from existing
Architecture

Stateless compute. Decoupled storage.

DataLynxr separates compute from storage at the protocol level. Your query nodes are stateless and horizontally autoscalable. Your data lives in your cloud object storage, not in our infrastructure.

The catalog layer tracks all table metadata and transaction logs. Every read — SQL, streaming, or ML — goes through the same catalog transaction before touching storage. That's how ACID is enforced across workload types.

Integrations

Connects to your stack

Python SDK
JDBC / ODBC
Apache Airflow
dbt Core
Apache Superset
Apache Kafka
Amazon Kinesis
Pulsar
View all connectors
Scope

What DataLynxr is not

Knowing what a tool doesn't do is as important as knowing what it does.

Not a data warehouse

DataLynxr does not store your data in proprietary columnar storage. Your data lives in your S3/GCS/ADLS bucket in open Parquet format. If you need a traditional OLAP warehouse with its own storage layer, Snowflake or Redshift is the right tool.

Not a BI or dashboarding tool

DataLynxr provides the SQL query interface. Visualization and dashboarding are handled by tools that connect to it: Tableau, Metabase, Apache Superset, or any JDBC/ODBC-compatible client. DataLynxr doesn't build charts.

Not a managed Spark service

There is no Spark runtime in DataLynxr. The query engine is purpose-built and vectorized. If your workload depends on Spark-specific APIs (RDD operations, Spark MLlib, SparkR), you need a Spark-native platform such as Databricks or EMR.

One platform. Three workloads. No ETL.

Connect your cloud storage and run your first query in under 10 minutes.