Your First Mission
Your first week as a Data Engineer
You just joined the data platform team at a retail company. On day two, a raw orders table lands in the warehouse, freshly dumped from the app's database. Your lead drops by:
"Before this feeds the analytics dashboards, make sure it's sane. How many rows? Any junk in there? Don't let bad data flow downstream."
The tool you reach for first is SQL, and by the end of this 20-minute intro you will have profiled that table yourself, in your browser, with zero setup.
Your mission: sanity-check the raw orders table before it moves downstream: see its shape, find the bad rows, and profile it. You'll run real queries every step of the way.
A database is just tables
A data warehouse is a set of tables. A table is a spreadsheet with a strict shape: columns are the fields, rows are the records. Each row in orders is one order from the app:
| id | customer | segment | country | channel | amount | status |
|---|---|---|---|---|---|---|
| 1 | Acme Corp | enterprise | USA | web | 4200 | completed |
| 5 | CloudFirst | mid-market | USA | store | 1980 | cancelled |
| 8 | StartupHub | smb | Canada | web | 640 | cancelled |
The status column (completed, cancelled) is exactly the kind of field an engineer profiles: how many of each? Any unexpected values? That's a data-quality check.
Why SQL is the engineer's first tool
Before you build pipelines in Python or dbt, you inspect the source in SQL. It's the fastest way to understand a table's shape, catch bad data, and decide what the pipeline must clean. That's what the rest of this intro builds toward.
Next: let's open the table and look inside.
