Your First Mission

Your first week as a Data Engineer

You just joined the data platform team at a retail company. On day two, a raw orders table lands in the warehouse, freshly dumped from the app's database. Your lead drops by:

"Before this feeds the analytics dashboards, make sure it's sane. How many rows? Any junk in there? Don't let bad data flow downstream."

The tool you reach for first is SQL, and by the end of this 20-minute intro you will have profiled that table yourself, in your browser, with zero setup.

ℹ️

Your mission: sanity-check the raw orders table before it moves downstream: see its shape, find the bad rows, and profile it. You'll run real queries every step of the way.

A database is just tables

A data warehouse is a set of tables. A table is a spreadsheet with a strict shape: columns are the fields, rows are the records. Each row in orders is one order from the app:

id	customer	segment	country	channel	amount	status
1	Acme Corp	enterprise	USA	web	4200	completed
5	CloudFirst	mid-market	USA	store	1980	cancelled
8	StartupHub	smb	Canada	web	640	cancelled

The status column (completed, cancelled) is exactly the kind of field an engineer profiles: how many of each? Any unexpected values? That's a data-quality check.

Why SQL is the engineer's first tool

Before you build pipelines in Python or dbt, you inspect the source in SQL. It's the fastest way to understand a table's shape, catch bad data, and decide what the pipeline must clean. That's what the rest of this intro builds toward.

Next: let's open the table and look inside.