- Introduction
- Objectives
- Dependencies
- Data Preprocessing
- Analysis Techniques
- Technical Deep Dive
- How to Run the Project
The project titled "Crime in Chicago" is an in-depth data analysis initiative. Initially, it was designed to assess the crime landscape in Chicago for a client concerned about safety. The project has been expanded upon to include time-series analysis, inter-variable relationships, and forecasting techniques. It specifically focuses on three types of crimes: theft, battery, and criminal damage, which have the highest occurrences in Chicago.
The overarching objectives of this analysis are three-fold:
- To determine the temporal trends in crime rates, particularly whether crime is decreasing over time.
- To analyze the relationship between different types of crimes and their corresponding arrest rates.
- To employ data-driven techniques to forecast the projected number of theft incidents for the year 2018.
The project makes use of several Python libraries for data manipulation, analysis, and visualization. Make sure to install the following dependencies before running the project:
- Pandas
- Matplotlib
- Seaborn
- NumPy
- scikit-learn
You can install these packages using pip:
bash
pip install pandas matplotlib seaborn numpy scikit-learn
Given the size of the dataset, the initial phase involved a rigorous data cleaning process. Columns that were not essential for the analysis were removed to optimize performance. This also resulted in a more concise and manageable dataset.
The project employs a variety of techniques for a rigorous analysis:
- Time-Series Analysis: To understand the crime rates over a timeline.
- Correlation Metrics: To explore the relationship between different types of crimes and arrest rates.
- Forecasting: To predict future crime rates using machine learning techniques.
We used advanced time-series methods like ARIMA (AutoRegressive Integrated Moving Average) to analyze crime trends over time. The time-series analysis allows us to answer questions like:
- Is crime increasing or decreasing over time?
- Are there seasonal patterns in crime rates?
The ARIMA model was fine-tuned to find the best-fit model, which was then used for forecasting future crime rates.
python
from statsmodels.tsa.arima_model import ARIMA
# Fit the ARIMA model
model = ARIMA(time_series_data, order=(1,1,1))
model_fit = model.fit(disp=0)
# Forecast
forecast = model_fit.forecast(steps=10)
To explore the relationship between different variables, we computed Pearson's correlation coefficients and visualized them using heatmaps.
python
import seaborn as sns
# Compute correlation matrix
correlation_matrix = df.corr()
# Generate a heatmap
sns.heatmap(correlation_matrix, annot=True)
We employed machine learning techniques like Random Forest and XGBoost for more accurate crime rate predictions. Feature importance was evaluated to understand which variables contribute most to the crime rate.
python
from sklearn.ensemble import RandomForestRegressor
# Initialize and fit the model
model = RandomForestRegressor(n_estimators=100)
model.fit(X_train, y_train)
# Predict
y_pred = model.predict(X_test)
Data preprocessing was handled using Pandas. Missing values were imputed, and categorical variables were encoded.
python
import pandas as pd
# Drop unnecessary columns
df.drop(columns=['Unused_Column'], inplace=True)
# Impute missing values
df.fillna(method='ffill', inplace=True)
# Encode categorical variables
df = pd.get_dummies(df, columns=['Category'])
- Clone the repository to your local machine.
- Navigate to the project directory.
- Install all the dependencies mentioned in the Dependencies section.
- Run the Jupyter Notebook titled "Final Project 2 Notebook.ipynb".