daita@system:~$ cat ./data-engineering.md
# Data Engineering
Pipelines, lakehouse, streaming, ML data infra.
## Focus
Building and stress-testing data systems that survive scale, schema drift, and the entropy of real production environments.
## Research themes
- Lakehouse architectures (Iceberg, Delta, Hudi)
- Streaming ingestion and CDC patterns
- Schema evolution and data contracts
- Observability and lineage
- ML feature pipelines and offline/online parity
## Public artefacts
- Pipeline reference implementations
- Migration tooling (SciCat, ingestion harnesses)
- Open benchmarks and writeups
Have a research problem or a hard system to build? Talk to us.
Start a conversation →