/cv
Taking the pain out of your data — at any scale.
8+ years across 500TB–1PB datalakes on AWS, Azure, GCP, and Alibaba Cloud — long enough to have made, and fixed, most of the mistakes you're about to.
What I sell is certainty: the kind earned from idempotent ETL, platform-level data quality, and the on-fire calls at 2am.
Project work or longer engagements — both fine. What I won't take is work I can't personally see through.
Track record
- 2026 — now
Platform Architect (Azure + Alibaba Cloud) —
Enyquant Sole hands-on architect for a dual-region Lakehouse (Azure EU + Alibaba Cloud China) at an energy-trading, AI-first startup. 100% IaC (CDKTF + Terraform), event-driven serverless pipelines, multi-team IAM. Built an AI-augmented engineering practice (~3× delivery throughput).
- 2024 — 2025
Lead Data Engineer (AWS Datalake) —
PVH Corp · 2nd engagement
500+TB AWS data lake, 90+ sources, 1000+ datasets. Led ETL, platform architecture, CI/CD, data quality. Real-time GDPR (de)anonymization service: latency 2h → real-time, cost 10× lower.
- 2022 — 2024
Data Engineer & Infra Admin —
VodafoneZiggo 1PB+ datalake. Snowflake migration from Oracle DWH, CDC ingestion (DMS), AWS IaC (CDK + Terraform). Introduced new CI/CD that saved the team 60+ hours.
- 2020 — 2022
Lead Data Engineer (AWS Datalake) —
PVH Corp · 1st engagement
Migrated data lake from Hadoop to AWS. Designed external integrations (Adobe, Salesforce, SAP). Built self-service analytics: TTM 2 weeks → 10 minutes.
- 2018 — 2020
Data DevOps Engineer —
FedEx Digital
ETL pipelines on AWS & GCP, productized data-science models, Kinesis-based streaming.
- 2016 — 2018
Data Engineer / Hadoop Admin —
ABN AMRO
Hadoop administration, Hive/Spark ETL.
- 2016 — 2018
Data Engineer / Hadoop Admin —
KPN
Hadoop administration, Hive/Spark ETL, automation with Ansible & Jenkins.
Selected work
The cases below are public; I have references with names attached on request.
-
2026 — present
Enyquant
— Platform Architect & end-to-end Data Engineer — raw → modeled (Azure + Alibaba Cloud) — FTE Sole hands-on architect for a dual-region Lakehouse (Azure EU + Alibaba Cloud China) at an energy-trading, AI-first startup. Full ownership of platform, pipelines, IaC, CI/CD, IAM, and cross-region architecture.
- — Medallion-architecture Lakehouse on ADLS Gen2 + Databricks Unity Catalog with dev/prod parity
- — 100% IaC (CDKTF + Terraform); zero click-ops drift
- — Event-driven serverless pipelines for energy-market data across both regions
- — Multi-team IAM, cost allocation, and secure cross-region data access models
- — Reusable multi-cloud architecture layer for consistent EU ↔ China deployment
- — Designed for AWS portability (Lambda / Step Functions / API Gateway / S3 equivalents)
- — Built an AI-assisted (Harness Engineering) delivery practice → ~3× individual throughput
Azure (ADLS Gen2, Databricks, ADF, Functions, Key Vault, Entra ID) · Alibaba Cloud (OSS, Function Compute, DataWorks, EMR Spark) · Terraform + CDKTF (TypeScript) · GitHub Actions + OIDC · Python, SQL
-
2024 — 2025 (returned engagement) & 2020 — 2022
PVH Corp (Tommy Hilfiger, Calvin Klein) — Lead Data Engineer — Freelance
Lead engineer for a 500+TB AWS data lake with 90+ sources and 1000+ datasets. Two engagements covering the Hadoop → AWS migration and a later platform-modernisation phase.
- — Migrated the data lake from Hadoop to AWS
- — Designed and shipped external integrations: Adobe, Salesforce, SAP, +others
- — Refactored the ETL layer to be idempotent and config-driven
- — Fixed long-standing timezone issues across data and scheduling
- — Self-service analytics platform with 60+ dashboards used by CRM & C-suite — TTM 2 weeks → 10 minutes
- — Real-time GDPR (de)anonymization service: latency 2h → real-time, cost 10× lower
- — Migrated workloads to Azure Databricks; integrated GCP BigQuery & Google Analytics sources
- — Acted as senior platform engineer advising on DataOps, IaC, and production readiness
AWS (S3, Glue, EMR, ECS, Lambda, API Gateway, Athena, Step Functions) · Spark, Kafka, dbt, Airflow · Terraform, AWS CDK · GitLab CI/CD · PyDeequ for data quality · Azure Databricks, GCP BigQuery (cross-cloud sources)
-
2022 — 2024
VodafoneZiggo
— Data Engineer & Infra Admin — Freelance Migrated legacy DWH workloads from Oracle to Snowflake on a 1PB+ enterprise datalake. Owned CDC ingestion, IaC, and the CI/CD foundation that the broader team builds on.
- — Supported Snowflake migration from legacy Oracle DWH
- — Designed and maintained CDC-based ingestion pipelines (DMS-based)
- — Managed AWS infrastructure as code (CDK + Terraform)
- — Introduced CI/CD improvements that saved the broader team 60+ hours of manual work
- — Delivered ETL pipelines and an internal data-engineering framework on a 1PB+ datalake
- — Supported data scientists and analytics teams with reliable datasets
Snowflake · AWS · Terraform, AWS CDK (TypeScript) · Python, SQL, Spark · AWS DMS for CDC · GitLab CI/CD
Tech stack
- Cloud
- AWS (deepest), Azure, Alibaba Cloud, GCP
- Lakehouse
- Databricks, Snowflake, Unity Catalog, Iceberg, DuckDB
- ETL
- Spark, dbt, Glue, Kafka, AWS DMS (CDC)
- Languages
- Python, SQL, TypeScript, Scala, Shell, Solidity, Cython
- Orchestration
- Airflow, ADF, Step Functions, DataWorks, Oozie
- IaC
- Terraform, AWS CDK, CDKTF, Ansible
- CICD
- GitHub Actions, GitLab CI/CD, Jenkins
- Quality
- PyDeequ, Great Expectations
Certifications
- — AWS Certified Solutions Architect — Associate
- — Databricks Certified Associate Developer for Apache Spark
- — Databricks Certified Data Engineer Associate
- — Certified Associate in Python Programming
Education
MSc, Communication & Information Systems — Xidian University, China. IDW evaluation: equivalent to MSc Computing Science (NL).