Created at: 2021-10-21 17:32:50
Language: Jupyter Notebook
Data Engineering Zoomcamp
Syllabus
Taking the course
2023 Cohort
Self-paced mode
All the materials of the course are freely available, so that you
can take the course at your own pace
- Follow the suggested syllabus (see below) week by week
- You don't need to fill in the registration form. Just start watching the videos and join Slack
- Check FAQ if you have problems
- If you can't find a solution to your problem in FAQ, ask for help in Slack
2022 Cohort
Asking for help in Slack
The best way to get support is to use DataTalks.Club's Slack. Join the #course-data-engineering
channel.
To make discussions in Slack more organized:
Syllabus
Note: NYC TLC changed the format of the data we use to parquet. But you can still access
the csv files here.
- Course overview
- Introduction to GCP
- Docker and docker-compose
- Running Postgres locally with Docker
- Setting up infrastructure on GCP with Terraform
- Preparing the environment for the course
- Homework
More details
- Data Lake
- Workflow orchestration
- Setting up Airflow locally
- Ingesting data to GCP with Airflow
- Ingesting data to local Postgres with Airflow
- Moving data from AWS to GCP (Transfer service)
- Homework
More details
- Data Warehouse
- BigQuery
- Partitioning and clustering
- BigQuery best practices
- Internals of BigQuery
- Integrating BigQuery with Airflow
- BigQuery Machine Learning
More details
- Basics of analytics engineering
- dbt (data build tool)
- BigQuery and dbt
- Postgres and dbt
- dbt models
- Testing and documenting
- Deployment to the cloud and locally
- Visualizing the data with google data studio and metabase
More details
- Batch processing
- What is Spark
- Spark Dataframes
- Spark SQL
- Internals: GroupBy and joins
More details
- Introduction to Kafka
- Schemas (avro)
- Kafka Streams
- Kafka Connect and KSQL
More details
Putting everything we learned to practice
- Week 7 and 8: working on your project
- Week 9: reviewing your peers
More details
Overview
Architecture diagram

Technologies
-
Google Cloud Platform (GCP): Cloud-based auto-scaling platform by Google
-
Google Cloud Storage (GCS): Data Lake
-
BigQuery: Data Warehouse
-
Terraform: Infrastructure-as-Code (IaC)
-
Docker: Containerization
-
SQL: Data Analysis & Exploration
-
Airflow: Pipeline Orchestration
-
dbt: Data Transformation
-
Spark: Distributed Processing
-
Kafka: Streaming
Prerequisites
To get the most out of this course, you should feel comfortable with coding and command line
and know the basics of SQL. Prior experience with Python will be helpful, but you can pick
Python relatively fast if you have experience with other programming languages.
Prior experience with data engineering is not required.
Instructors
Tools
For this course, you'll need to have the following software installed on your computer:
- Docker and Docker-Compose
- Python 3 (e.g. via Anaconda)
- Google Cloud SDK
- Terraform
See Week 1 for more details about installing these tools
FAQ
-
Q: I registered, but haven't received a confirmation email. Is it normal?
A: Yes, it's normal. It's not automated. But you will receive an email eventually
-
Q: At what time of the day will it happen?
A: Office hours will happen on Mondays at 17:00 CET. But everything will be recorded, so you can watch it whenever it's convenient for you
-
Q: Will there be a certificate?
A: Yes, if you complete the project
-
Q: I'm 100% not sure I'll be able to attend. Can I still sign up?
A: Yes, please do! You'll receive all the updates and then you can watch the course at your own pace.
-
Q: Do you plan to run a ML engineering course as well?
A: Glad you asked. We do :)
-
Q: I'm stuck! I've got a technical question!
A: Ask on Slack! And check out the student FAQ; many common issues have been answered already. If your issue is solved, please add how you solved it to the document. Thanks!
Supporters and partners
Do you want to support our course and our community? Please reach out to alexey@datatalks.club
Big thanks to other communities for helping us spread the word about the course: