daita@system:~$ cat ./data-engineering.md

# Data Engineering

Pipelines, lakehouse, streaming, ML data infra.

## Focus

Building and stress-testing data systems that survive scale, schema drift, and the entropy of real production environments.

## Research themes

  • Lakehouse architectures (Iceberg, Delta, Hudi)
  • Streaming ingestion and CDC patterns
  • Schema evolution and data contracts
  • Observability and lineage
  • ML feature pipelines and offline/online parity

## Public artefacts

  • Pipeline reference implementations
  • Migration tooling (SciCat, ingestion harnesses)
  • Open benchmarks and writeups

Have a research problem or a hard system to build? Talk to us.

Start a conversation