Quickstart

Connect your object storage and run your first SQL query on lakehouse tables in under 10 minutes.

Prerequisites

  • An AWS, GCP, or Azure account with access to S3 / GCS / ADLS Gen2
  • A DataLynxr account — sign up free
  • Existing Parquet, Iceberg, or Delta Lake data in object storage (or use our sample dataset)

Step 1 — Connect your storage

In the DataLynxr dashboard, navigate to Connections → Add Connection. Enter your bucket or container details and grant read access via an IAM role (AWS) or service account (GCP/Azure).

DataLynxr CLI — connection setup
$ dlx connect s3 --bucket my-data-lake --region us-east-1 \
    --role-arn arn:aws:iam::123456789012:role/DatalynxrReadRole
✓  Connection verified — 847 GB accessible
✓  Discovered 12 Iceberg tables, 3 Delta tables

Step 2 — Run your first SQL query

DataLynxr exposes a JDBC/ODBC endpoint and a REST SQL API. Use the web SQL editor for quick exploration.

SQL Editor
-- Count rows by day across an Iceberg table
SELECT
  DATE_TRUNC('day', event_ts) AS day,
  COUNT(*) AS events,
  COUNT(DISTINCT user_id) AS users
FROM iceberg.prod.events
WHERE event_ts >= CURRENT_DATE - INTERVAL '7' DAY
GROUP BY 1
ORDER BY 1 DESC;

7 rows returned in 420ms — 2.1 GB scanned

Step 3 — Stream data into Delta tables

Configure a Kafka source and a Delta Lake sink to stream data with exactly-once semantics.

stream-ingest.yaml
source:
  type: kafka
  brokers: kafka-broker:9092
  topic: user-events
  format: json

sink:
  type: delta
  table: s3://my-data-lake/events/
  checkpoint: s3://my-data-lake/_checkpoints/events/
  trigger_interval: 60s
✓  Stream started — 2,847 records/s ingested

Step 4 — Read features for ML

Use the Python SDK to fetch point-in-time correct features for training or serving.

feature_fetch.py
from datalynxr import LakehouseClient
import pandas as pd

client = LakehouseClient(workspace="acme-analytics")

# Point-in-time training features
df = client.get_feature_values(
    table="delta.prod.user_features",
    entity_ids=training_user_ids,
    timestamp=label_timestamps,
)
→  DataFrame(shape=(50000, 34), dtype=mixed)

Apache Iceberg support

DataLynxr reads Iceberg catalogs via REST, AWS Glue, or Hive Metastore. Hidden partitioning, partition evolution, and branching are all supported. Run SHOW TABLES IN iceberg.<catalog>; to discover existing tables.

Delta Lake support

DataLynxr reads Delta Lake tables directly from object storage using the Delta transaction log. MERGE, UPDATE, and DELETE operations are supported. Change Data Feed enables downstream incremental processing.