Building a Portfolio for Data Engineering

Practical projects for when you're just starting out.

17th April 2024  |  Richard Honour

Are you looking to bolster your data engineering skills and showcase your abilities to potential employers? 

Settling on the right projects to suitably demonstrate your skills can be difficult. 

This is especially true if you are just starting out and may not have 100% confidence in your abilities just yet.

In this guide, we'll provide you with practical project suggestions along with step-by-step instructions to  help you get started on putting together a portfolio.

We also understand how difficult it can be for some to see a project through to the end. For that reason, projects highlighted with the     Quick Project     tag are specifically designed to be short in scope and hopefully easier to complete in one go.

ETL Microservice

This project involves building a live-updating microservice that extracts raw data from a public API, processes it, stores it, transforms it into a usable format, and finally, makes it available for analysis.

Completing this project demonstrates your proficiency in backend development, working with APIs, Python, Pandas, ETL/ELT processes, cloud platforms, and data analysis.

Suggestion by artfully_rearranged

🧠 Prerequisites

You will need a good understanding of the following:

You will need access to the following:

🪜 Project Guide: ETL Microservice

Find a suitable public API that provides the data you're interested in. 

Ensure that it's free to access.

Create an account with a cloud provider and set up a virtual machine (VM) instance where your microservice will run. 

Note down the VM's external IP address.


Develop a Flask application and create a route to fetch data from the chosen API.

Use the requests library to ingest data and display it as a JSON print statement for debugging.


Set up a database or data lake (e.g., Google Cloud Storage, Amazon S3, or a SQL database) and modify your Flask app to store the raw data into it.


Extract data from the database/lake, use Pandas to transform it into a clean, tabular format, and verify the results.


Deploy your Flask microservice on the cloud VM and ensure it fetches updated data at regular intervals.


Create a GitHub repository for your project, commit and push your code, and provide comprehensive documentation in the README.


Choose a destination for exporting data (e.g., Google Sheets, BigQuery) and modify your Flask app to export transformed data for analysis.

Basic ETL Script

   Quick Project   

This project involves extracting and transforming data from an API.

Completing this project demonstrates your proficiency in...

Suggestion by miscbits

🧠  Prerequisites

You will need a good understanding of the following:

You will need access to the following:

🪜 Project Guide: Basic ETL Script

Find a suitable public API that provides the data you're interested in. 

Ensure that it's free to access.

Create an account with a cloud provider and set up a virtual machine (VM) instance where your microservice will run. 

Note down the VM's external IP address.


Develop a Flask application and create a route to fetch data from the chosen API.

Use the requests library to ingest data and display it as a JSON print statement for debugging.


Set up a database or data lake (e.g., Google Cloud Storage, Amazon S3, or a SQL database) and modify your Flask app to store the raw data into it.


Extract data from the database/lake, use Pandas to transform it into a clean, tabular format, and verify the results.


Deploy your Flask microservice on the cloud VM and ensure it fetches updated data at regular intervals.


Create a GitHub repository for your project, commit and push your code, and provide comprehensive documentation in the README.


Choose a destination for exporting data (e.g., Google Sheets, BigQuery) and modify your Flask app to export transformed data for analysis.

Basic Web Scraper

   Quick Project   

This project involves writing a Python script that scrapes data from a website using libraries such as BeautifulSoup and Requests.

Completing this project demonstrates your proficiency in...

Suggestion by miscbits

🧠  Before you start...

You will need a good understanding of the following:

You will need access to the following:

🪜 Project Guide

Find...

Projects for the Easily Distracted

Suggestion by miscbits

Description:

These project ideas are tailored for individuals who may have limited time or attention spans. They are short in scope but still provide valuable experience in typical data engineering tasks.

Project Suggestions:

Benefits:

These projects are perfect for building a diverse portfolio while demonstrating your ability to handle various data engineering tasks efficiently.

By working on these projects, not only will you enhance your technical skills, but you'll also create tangible evidence of your expertise that can impress potential employers. So, roll up your sleeves and start building your data engineering portfolio today!

Looking for work.

Reach out to me anytime, I reply quickly.

Hopefully I'm a great fit for what you're looking for.

EmailWhatsAppLinkedIn
Illustration by Pablo Stanley