Skip to content

Latest commit

 

History

History
50 lines (38 loc) · 1.21 KB

README.md

File metadata and controls

50 lines (38 loc) · 1.21 KB

Football Data Engineering

This Python-based project crawls data from Wikipedia using Apache Airflow, cleans it and pushes it Azure Data Lake for processing.

Table of Contents

  1. System Architecture
  2. Requirements
  3. Getting Started
  4. Running the Code With Docker
  5. How It Works
  6. Video

System Architecture

system_architecture.png

Requirements

  • Python 3.9 (minimum)
  • Docker
  • PostgreSQL
  • Apache Airflow 2.6 (minimum)

Getting Started

  1. Clone the repository.

    git clone https://github.com/airscholar/FootballDataEngineering.git
  2. Install Python dependencies.

    pip install -r requirements.txt

Running the Code With Docker

  1. Start your services on Docker with
    docker compose up -d
  2. Trigger the DAG on the Airflow UI.

How It Works

  1. Fetches data from Wikipedia.
  2. Cleans the data.
  3. Transforms the data.
  4. Pushes the data to Azure Data Lake.

Video

FootballDataEngineering