Data portfolio project ideas that get you hired (by role)
Data portfolio project ideas for every role: Data Analyst, Data Scientist, Data Engineer, Business Analyst. Real datasets, what to actually show, and how to build each one.

A certificate says you watched a course. A portfolio says you can do the work, and recruiters can tell the difference. The problem is rarely motivation, it is the blank page: what should you actually build? Here are project ideas for every data role, with real datasets, what to show, and how each one maps to a guided build you can finish this week.
What turns a tutorial into a portfolio piece
Most "projects" online are tutorials in disguise: you follow steps, you get the expected output, you learn nothing a recruiter can see. A portfolio project is different on three points.
Start from a question
Not "use pandas" but "which customers are about to leave, and why?". The question is what a stakeholder cares about, and what your write-up answers.
Use real, messy data
Public datasets with missing values, weird encodings, and duplicates. Cleaning them is half the job, and the half that proves you can work with reality.
Ship something openable
A dashboard link, a notebook that reads like a story, a deployed app, a one-page recommendation. If a stranger cannot open it in 30 seconds, it does not count.
Iris, Titanic, and the MNIST digits are how you learn a library, not how you stand out. Every reviewer has seen them. Pick a dataset tied to a real business: e-commerce orders, web traffic, churn, sales over time. The skills are identical; the signal is not.
The rest of this guide is organized by role. Not sure which one fits you? Read DA, DS, DE, BA: what's the actual difference? or take the 2-minute path quiz first.
Data Analyst portfolio projects
A Data Analyst portfolio has to prove one thing: you can take a business question, answer it with SQL, and present the answer so a non-technical person acts on it. SQL plus one dashboard plus a written insight beats any number of Kaggle notebooks.
Answer real questions with SQL
Load a real e-commerce dataset (the Olist Brazilian orders set is public and messy in the right ways). Write five queries that find the top revenue drivers, the repeat-purchase rate, and where orders stall. Show each query with its business answer in one sentence. On D8A, this is the SQL Foundations & E-Commerce Analysis project.
Turn web traffic into a funnel story
Use the GA4 sample dataset in BigQuery (free, public). Trace sessions to add-to-cart to purchase, find the biggest drop-off, and write the one change you would test. Show the funnel and the recommendation, not just the SQL. On D8A, this is the BigQuery & GA4 Web Analytics project.
Ship a dashboard someone would use
Build a one-page dashboard with three KPIs and one filter, then record a 60-second walkthrough as if briefing a stakeholder. The walkthrough is the part most people skip and the part that gets you hired. On D8A, this is covered by the Looker Studio Dashboard and Power BI Analytics Dashboard projects.
Clean a messy real-world dataset
Take Inside Airbnb listings for your own city, clean them with pandas, and surface three findings a host could act on. The deliverable is a notebook that reads like a story, with the code supporting the narrative rather than the other way around.
The D8A Data Analyst path is exactly this sequence, with the data, an in-browser SQL playground, and a structural review built in: SQL Foundations, Python for Data Analysis, BigQuery & GA4, Looker Studio and Power BI dashboards, up to an end-to-end BI capstone. The first projects are free. Start the free Data Analyst intro →
Data Scientist portfolio projects
A Data Scientist portfolio has to show that you can frame an open-ended problem, model it, and be honest about where the model fails. Accuracy alone fools no one who has hired before: show the evaluation and the trade-offs.
Predict who will churn
Train a classifier on the Telco customer churn dataset. Evaluate it honestly with precision and recall (not just accuracy), then explain the top drivers in plain language. Show the confusion matrix and what you would tell the business. On D8A, this is the Supervised Learning project.
Forecast next quarter
Take a sales time series (the Favorita store sales set works well), build a naive baseline first, then a model, and backtest both. Show forecast versus actual with the error quantified. The baseline is what proves you understand the problem. On D8A, this is the Time Series Forecasting project.
Classify free text
Classify product reviews or support tickets. Start with a TF-IDF baseline, then a transformer, and compare them fairly. Show where the model fails and why, not only where it wins. On D8A, this is the NLP & Text Classification project.
Build a Q&A app over your own docs
Build a retrieval-augmented app over a set of PDFs you know well: embeddings, retrieval, and an LLM answering with sources. Show a working demo and honest notes on hallucinations and how you reduced them. On D8A, this is the LLM & RAG Application project.
The D8A Data Scientist path runs from Python & Statistics and Supervised Learning (both free) through Feature Engineering, Time Series, NLP, Deep Learning, MLOps, and a RAG application, up to an end-to-end ML capstone. Start the free Data Scientist intro →
Data Engineer portfolio projects
A Data Engineer portfolio is about reliability, not insight. The signal a reviewer looks for: can you move data from A to B on a schedule, validate it, and recover when it breaks? Show the code, the tests, and a run log.
Pull from an API on a schedule
Extract from a public API (GitHub, weather, or a transit feed), validate the payload, and load it to a database, with logging, retries, and error handling. Show the code and a real run log, including a failure you handled. On D8A, this is the Python ETL Pipeline project.
Model a warehouse with dbt
Take a raw public dataset and build staging and mart models in dbt, with tests and documentation. Show the lineage graph and passing tests. The tests are the proof you build for production, not just for a demo. On D8A, this is the dbt Data Transformation project.
Orchestrate it with Airflow
Turn the pipeline above into a scheduled Airflow DAG with dependencies and alerting, and run a backfill. Show the DAG graph and the backfill run. On D8A, this is the Airflow Pipeline Orchestration project.
Process a stream of events
Simulate a stream of events into Kafka, then consume and aggregate them in near real time. Show the topic and a live consumer reacting to new events. On D8A, this is the Streaming with Kafka project.
The D8A Data Engineer path moves from SQL & Data Modeling and a Python ETL pipeline (both free) through dbt, Airflow, a BigQuery warehouse, Docker and infrastructure as code, and Kafka streaming, up to a production data platform capstone. Start the free Data Engineer intro →
Business Analyst portfolio projects
A Business Analyst portfolio proves you can sit between the business and the build: scope a problem, model the process, justify the spend, and present it. The deliverables are documents and dashboards, and they are exactly what to show.
Scope a feature like a real BA
Pick a product you use and scope a new feature: a requirements document with user stories and acceptance criteria, plus a BPMN process map of the current and proposed flow. Show the document and the diagram. On D8A, this is the Requirements & Process Modeling project.
Make the business case
Build an ROI and NPV model for that feature in Excel, with best, base, and worst-case scenarios, then a one-page recommendation. Show the model and the page a director would actually read. On D8A, this is the Business Case & Financial Modeling project.
Report the KPIs that matter
Build a Power BI KPI dashboard on a star schema, and for each KPI write one line on why it earns its place. The "why" is what separates a BA from someone who just adds charts. On D8A, this is the Power BI KPI Dashboard project.
Run a sprint, on paper
Set up a backlog in Jira, plan a sprint, write the tickets with acceptance criteria, and close with a short retro. Show the board and what you would change next sprint. On D8A, this is the Agile & Jira Project Simulation.
The D8A Business Analyst path covers SQL for Business Reporting and Excel & Power Query (both free), then Requirements & Process Modeling, a Power BI KPI dashboard, an Agile and Jira simulation, a business case, and a stakeholder presentation, up to a full BA engagement capstone. Start the free Business Analyst intro →
How to choose, and how many you need
You do not need ten projects. You need three or four that show range and at least one that goes end to end. A practical sequence for any role:
- 1
One project to learn the core skill
Guided is fine here. The goal is to learn the pattern (a SQL analysis, a model, a pipeline, a requirements doc) without fighting the blank page.
- 2
One project in a domain you care about
Apply the same pattern to data you find interesting: your hobby, your old industry, a public dataset you like. This is the one that sounds genuine in an interview.
- 3
One end-to-end capstone
Tie the skills together into a single, finished piece a stranger can open and understand. This is the project you lead with.
Each of D8A's four paths is built as exactly this arc, with the dataset, an in-browser playground, and a review that turns every finished project into a public portfolio piece. See the full project tree across all four paths, or if you are still deciding, start with the free intros, one per path.
Frequently asked questions
- What makes a good data portfolio project?
- A good portfolio project starts from a real question, uses real and slightly messy data, and ends with something a stranger can open: a dashboard, a notebook with a clear narrative, a working app, or a written recommendation. The point is not to show that you can run an algorithm, it is to show that you can take a vague problem and turn it into a decision or a working system.
- How many projects should be in a data portfolio?
- Three to four is plenty if they show range and at least one goes end to end. Most people are better off with three deep, finished projects than ten half-built notebooks. Recruiters skim, so depth and a clear story beat volume.
- Can I use guided projects in my portfolio, or do they have to be original?
- Guided projects are fine as long as the work and the write-up are genuinely yours. The dataset and the brief can be shared; your analysis, your code, your decisions, and your explanation are what a recruiter is reading. The fastest path is usually a guided project to learn the pattern, then one original project that applies it to a domain you care about.
- What are good datasets for data portfolio projects?
- Public, real-world datasets beat toy ones. The Olist Brazilian e-commerce dataset, Inside Airbnb listings, the GA4 sample in BigQuery, the Telco customer churn dataset, and store sales time series are all solid starting points. Avoid the over-used Titanic and Iris sets: every recruiter has seen them a thousand times.



