Smart City Streaming Data Pipeline

The Smart City Streaming Data Pipeline simulates a dynamic, real-time data infrastructure, streaming information from simulated smart city sensors such as vehicle, GPS, weather, traffic, and emergency incident data. Using Kafka for robust data streaming and Apache Spark for real-time processing, this pipeline efficiently routes and processes large-scale data, storing it in S3 as Parquet files for optimized access and storage. Key features include realistic data generation simulating a vehicle journey covering aspects like location, speed, weather conditions, and incident status. JSON serialization with Kafka producer callbacks ensures reliable data handling, while configurable environment variables offer deployment flexibility. Built with Docker, the pipeline leverages a containerized environment for isolated, scalable, and modular development.

Architecture

This architecture leverages Docker containers to encapsulate Spark, Kafka, and Zookeeper, enabling consistent and modular deployments

Data Generation: Each data type (vehicle, GPS, weather, traffic, and emergency) is generated with Python and serialized to JSON.
Kafka Streaming: Kafka serves as the data bus, with dedicated topics for each data source.
Spark Streaming: Spark processes and stores data streams, using schemas to parse each data type.
S3 Storage: Final data is stored as Parquet files on S3 with checkpoints, supporting historical analysis.

Setup and Installation

Prerequisites

Docker and Docker Compose
AWS access keys for S3 storage
Python 3.10.7

Installation

Clone the repository:

git clone git@github.com:LiliValGo/SmartCity.git
cd SmartCity

Docker Setup:

docker-compose up -d

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
jobs		jobs
share/py4j		share/py4j
.gitignore		.gitignore
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Smart City Streaming Data Pipeline

Architecture

Setup and Installation

Prerequisites

Installation

About

Packages

Languages

LiliValGo/SmartCity

Folders and files

Latest commit

History

Repository files navigation

Smart City Streaming Data Pipeline

Architecture

Setup and Installation

Prerequisites

Installation

About

Topics

Resources

Stars

Watchers

Forks

Packages 0

Languages

Packages