Dewei Zhai

/about

A short version, then a longer one.

I'm a freelance senior data engineer based in the Netherlands. Eight-plus years building 500TB–1PB datalakes across AWS, Azure, and GCP. The work I'm best known for: idempotent ETL design, platform-level data quality, and being the engineer the team calls when something is on fire at 2am.

I work project-based, not on retainers. I don't take work I can't personally see through. If you've got a stalled platform or a hard migration, that's the conversation I most want to have.

Track record

  1. 2026 — now

    Platform Architect (Azure + Alibaba Cloud) — Enyquant

    Sole hands-on architect for a dual-region Lakehouse (Azure EU + Alibaba Cloud China) at an energy-trading startup. 100% IaC (CDKTF + Terraform), event-driven serverless pipelines, multi-team IAM. Built an AI-augmented engineering practice (~3× delivery throughput).

  2. 2024 — 2025

    Lead Data Engineer (AWS Datalake) — PVH Corp · 2nd engagement

    500+TB AWS data lake, 90+ sources, 1000+ datasets. Led ETL, platform architecture, CI/CD, data quality. Real-time GDPR (de)anonymization service: latency 2h → real-time, cost 10× lower.

  3. 2022 — 2024

    Data Engineer & Infra Admin — VodafoneZiggo

    1PB+ datalake. Snowflake migration from Oracle DWH, CDC ingestion (DMS), AWS IaC (CDK + Terraform). Introduced new CI/CD that saved the team 60+ hours.

  4. 2020 — 2022

    Lead Data Engineer (AWS Datalake) — PVH Corp · 1st engagement

    Migrated data lake from Hadoop to AWS. Designed external integrations (Adobe, Salesforce, SAP). Built self-service analytics: TTM 2 weeks → 10 minutes.

  5. 2018 — 2020

    Data DevOps Engineer — FedEx Digital

    ETL pipelines on AWS & GCP, productized data-science models, Kinesis-based streaming.

  6. 2016 — 2018

    Data Engineer / Hadoop Admin — KPN, ABN AMRO

    Hadoop administration, Hive/Spark ETL, automation with Ansible & Jenkins.

Selected work

The cases below are public; I have references with names attached on request.

Tech stack

Cloud
AWS (deepest), Azure, Alibaba Cloud, GCP
Lakehouse
Databricks, Snowflake, Unity Catalog, Iceberg, DuckDB
ETL
Spark, dbt, Glue, Kafka, AWS DMS (CDC)
Languages
Python, SQL, TypeScript, Scala, Shell, Solidity, Cython
Orchestration
Airflow, ADF, Step Functions, DataWorks, Oozie
IaC
Terraform, AWS CDK, CDKTF, Ansible
CICD
GitHub Actions, GitLab CI/CD, Jenkins
Quality
PyDeequ, Great Expectations

Certifications

Education

MSc, Communication & Information Systems — Xidian University, China. IDW evaluation: equivalent to MSc Computing Science (NL).

How to work with me

Three ways, in increasing order of commitment: ask my agent some questions, send me an email, or fill in the contact form with a project brief.