Dewei Zhai

/cv

Taking the pain out of your data — at any scale.

8+ years across 500TB–1PB datalakes on AWS, Azure, GCP, and Alibaba Cloud — long enough to have made, and fixed, most of the mistakes you're about to.

What I sell is certainty: the kind earned from idempotent ETL, platform-level data quality, and the on-fire calls at 2am.

Project work or longer engagements — both fine. What I won't take is work I can't personally see through.

Track record

  1. 2026 — now

    Platform Architect (Azure + Alibaba Cloud) Enyquant

    Sole hands-on architect for a dual-region Lakehouse (Azure EU + Alibaba Cloud China) at an energy-trading, AI-first startup. 100% IaC (CDKTF + Terraform), event-driven serverless pipelines, multi-team IAM. Built an AI-augmented engineering practice (~3× delivery throughput).

  2. 2024 — 2025

    Lead Data Engineer (AWS Datalake) PVH Corp · 2nd engagement

    500+TB AWS data lake, 90+ sources, 1000+ datasets. Led ETL, platform architecture, CI/CD, data quality. Real-time GDPR (de)anonymization service: latency 2h → real-time, cost 10× lower.

  3. 2022 — 2024

    Data Engineer & Infra Admin VodafoneZiggo

    1PB+ datalake. Snowflake migration from Oracle DWH, CDC ingestion (DMS), AWS IaC (CDK + Terraform). Introduced new CI/CD that saved the team 60+ hours.

  4. 2020 — 2022

    Lead Data Engineer (AWS Datalake) PVH Corp · 1st engagement

    Migrated data lake from Hadoop to AWS. Designed external integrations (Adobe, Salesforce, SAP). Built self-service analytics: TTM 2 weeks → 10 minutes.

  5. 2018 — 2020

    Data DevOps Engineer FedEx Digital

    ETL pipelines on AWS & GCP, productized data-science models, Kinesis-based streaming.

  6. 2016 — 2018

    Data Engineer / Hadoop Admin ABN AMRO

    Hadoop administration, Hive/Spark ETL.

  7. 2016 — 2018

    Data Engineer / Hadoop Admin KPN

    Hadoop administration, Hive/Spark ETL, automation with Ansible & Jenkins.

Selected work

The cases below are public; I have references with names attached on request.

Tech stack

Cloud
AWS (deepest), Azure, Alibaba Cloud, GCP
Lakehouse
Databricks, Snowflake, Unity Catalog, Iceberg, DuckDB
ETL
Spark, dbt, Glue, Kafka, AWS DMS (CDC)
Languages
Python, SQL, TypeScript, Scala, Shell, Solidity, Cython
Orchestration
Airflow, ADF, Step Functions, DataWorks, Oozie
IaC
Terraform, AWS CDK, CDKTF, Ansible
CICD
GitHub Actions, GitLab CI/CD, Jenkins
Quality
PyDeequ, Great Expectations

Certifications

Education

MSc, Communication & Information Systems — Xidian University, China. IDW evaluation: equivalent to MSc Computing Science (NL).