Quickstart
Connect your object storage and run your first SQL query on lakehouse tables in under 10 minutes.
Prerequisites
- An AWS, GCP, or Azure account with access to S3 / GCS / ADLS Gen2
- A DataLynxr account — sign up free
- Existing Parquet, Iceberg, or Delta Lake data in object storage (or use our sample dataset)
Step 1 — Connect your storage
In the DataLynxr dashboard, navigate to Connections → Add Connection. Enter your bucket or container details and grant read access via an IAM role (AWS) or service account (GCP/Azure).
$ dlx connect s3 --bucket my-data-lake --region us-east-1 \
--role-arn arn:aws:iam::123456789012:role/DatalynxrReadRole
✓ Connection verified — 847 GB accessible
✓ Discovered 12 Iceberg tables, 3 Delta tables
Step 2 — Run your first SQL query
DataLynxr exposes a JDBC/ODBC endpoint and a REST SQL API. Use the web SQL editor for quick exploration.
SELECT
DATE_TRUNC('day', event_ts) AS day,
COUNT(*) AS events,
COUNT(DISTINCT user_id) AS users
FROM iceberg.prod.events
WHERE event_ts >= CURRENT_DATE - INTERVAL '7' DAY
GROUP BY 1
ORDER BY 1 DESC;
7 rows returned in 420ms — 2.1 GB scanned
Step 3 — Stream data into Delta tables
Configure a Kafka source and a Delta Lake sink to stream data with exactly-once semantics.
type: kafka
brokers: kafka-broker:9092
topic: user-events
format: json
type: delta
table: s3://my-data-lake/events/
checkpoint: s3://my-data-lake/_checkpoints/events/
trigger_interval: 60s
✓ Stream started — 2,847 records/s ingested
Step 4 — Read features for ML
Use the Python SDK to fetch point-in-time correct features for training or serving.
from datalynxr import LakehouseClient
import pandas as pd
client = LakehouseClient(workspace="acme-analytics")
df = client.get_feature_values(
table="delta.prod.user_features",
entity_ids=training_user_ids,
timestamp=label_timestamps,
)
→ DataFrame(shape=(50000, 34), dtype=mixed)
Apache Iceberg support
DataLynxr reads Iceberg catalogs via REST, AWS Glue, or Hive Metastore. Hidden partitioning, partition evolution, and branching are all supported. Run SHOW TABLES IN iceberg.<catalog>; to discover existing tables.
Delta Lake support
DataLynxr reads Delta Lake tables directly from object storage using the Delta transaction log. MERGE, UPDATE, and DELETE operations are supported. Change Data Feed enables downstream incremental processing.