# Selected cases — Dewei Zhai

Past and current freelance engagements. The HTML CV at /cv surfaces the
same set with more visual context.

## Enyquant — 2026 — present

**Role**: Platform Architect & end-to-end Data Engineer — raw → modeled (Azure + Alibaba Cloud) — FTE

**Summary**: Sole hands-on architect for a dual-region Lakehouse (Azure EU + Alibaba Cloud China) at an energy-trading, AI-first startup. Full ownership of platform, pipelines, IaC, CI/CD, IAM, and cross-region architecture.
**Stack**: Azure (ADLS Gen2, Databricks, ADF, Functions, Key Vault, Entra ID), Alibaba Cloud (OSS, Function Compute, DataWorks, EMR Spark), Terraform + CDKTF (TypeScript), GitHub Actions + OIDC, Python, SQL

**Outcomes**:
- Medallion-architecture Lakehouse on ADLS Gen2 + Databricks Unity Catalog with dev/prod parity
- 100% IaC (CDKTF + Terraform); zero click-ops drift
- Event-driven serverless pipelines for energy-market data across both regions
- Multi-team IAM, cost allocation, and secure cross-region data access models
- Reusable multi-cloud architecture layer for consistent EU ↔ China deployment
- Designed for AWS portability (Lambda / Step Functions / API Gateway / S3 equivalents)
- Built an AI-assisted (Harness Engineering) delivery practice → ~3× individual throughput

## PVH Corp (Tommy Hilfiger, Calvin Klein) — 2024 — 2025 (returned engagement) & 2020 — 2022

**Role**: Lead Data Engineer — Freelance

**Summary**: Lead engineer for a 500+TB AWS data lake with 90+ sources and 1000+ datasets. Two engagements covering the Hadoop → AWS migration and a later platform-modernisation phase.
**Stack**: AWS (S3, Glue, EMR, ECS, Lambda, API Gateway, Athena, Step Functions), Spark, Kafka, dbt, Airflow, Terraform, AWS CDK, GitLab CI/CD, PyDeequ for data quality, Azure Databricks, GCP BigQuery (cross-cloud sources)

**Outcomes**:
- Migrated the data lake from Hadoop to AWS
- Designed and shipped external integrations: Adobe, Salesforce, SAP, +others
- Refactored the ETL layer to be idempotent and config-driven
- Fixed long-standing timezone issues across data and scheduling
- Self-service analytics platform with 60+ dashboards used by CRM & C-suite — TTM 2 weeks → 10 minutes
- Real-time GDPR (de)anonymization service: latency 2h → real-time, cost 10× lower
- Migrated workloads to Azure Databricks; integrated GCP BigQuery & Google Analytics sources
- Acted as senior platform engineer advising on DataOps, IaC, and production readiness

## VodafoneZiggo — 2022 — 2024

**Role**: Data Engineer & Infra Admin — Freelance

**Summary**: Migrated legacy DWH workloads from Oracle to Snowflake on a 1PB+ enterprise datalake. Owned CDC ingestion, IaC, and the CI/CD foundation that the broader team builds on.
**Stack**: Snowflake, AWS, Terraform, AWS CDK (TypeScript), Python, SQL, Spark, AWS DMS for CDC, GitLab CI/CD

**Outcomes**:
- Supported Snowflake migration from legacy Oracle DWH
- Designed and maintained CDC-based ingestion pipelines (DMS-based)
- Managed AWS infrastructure as code (CDK + Terraform)
- Introduced CI/CD improvements that saved the broader team 60+ hours of manual work
- Delivered ETL pipelines and an internal data-engineering framework on a 1PB+ datalake
- Supported data scientists and analytics teams with reliable datasets

## Related

- About: /about.md
- Services: /services.md
- Stack: /stack.md
- JSON: /api/cases
