Disguised Intentions: How DBs Keep Offenses Guessing

This repository supports my submission to the NFL Big Data Bowl 2025 submission, Disguised Intentions. The project quantifies how effectively defensive backs (DBs) disguise their intentions pre-snap and how this impacts defensive outcomes.

Prerequisites

To set up the project locally,

Ensure you have Poetry installed (check this link for more details).
Ensure you have a compatible Python version (see pyproject.toml for supported versions).

Installation

Clone the repository to your desired folder:

git clone git@github.com:miguelmendesduarte/big-data-bowl-2025.git <desired-folder-name>

Enter the desired folder:
```
cd <desired-folder-name>
```
Install the dependencies:
```
poetry install
```
Initiate the virtual environment:
```
poetry shell
```

Running

Download the NFL Big Data Bowl 2025 data:

Download the data from this link and save it in the data/raw/ directory.
Process raw data:

Run the following command to created processed datasets:
```
python -m src.data_processing.process_data
```
This will store the processed data in the data/processed/ directory.
Prepare train and test datasets:

Run this command to prepare the train and test datasets:
```
python -m src.data_processing.training.datasets
```
The resulting datasets are divided into:
- Train: Weeks 1 to 5 (stored in data/train/)
- Test: Weeks 6 to 9 (stored in data/test/)
Train the model:
- Start the MLflow UI to track experiments and results:
```
sh scripts/start_mlflow_ui.sh
```
- Adjust hyperparameters in src/config/training_settings.py under the HYPERPARAMETER_GRID variable. The current options include:
```
"n_estimators": [100, 150, 200, 300],
"learning_rate": [0.01, 0.05, 0.1],
"max_depth": [3, 5, 7, 9],
"min_child_weight": [1, 3, 5]
```
- Train the model with:
```
python -m src.training.train
```
- In the MLflow UI, review results and identify the best-performing hyperparameter combination based on log loss. For this project, the optimal settings are:
  - max_depth: 9
  - n_estimators: 300
  - min_child_weight: 3
  - learning_rate: 0.1
Update the model settings:

Once the best model is identified, update the src/config/settings.py file with the experiment ID and run ID of the model to be used.
Create the inference dataset:

To generate the dataset for inference, run:
```
python -m src.data_processing.inference.dataset
```
This will create the inference.csv dataset, which will be used to obtain the blitz probability results.
Get blitz probability predictions:

Use the trained model to predict blitz probabilities by running:
```
python -m src.inference.predictions
```
This will generate blitz_probability_results.csv inside the data/inference/ directory.
Compute disguise scores:

Calculate the disguise scores by running:
```
python -m src.metric.metric
```
This will create two files inside the data/metric/ directory:
- play_disguise_results.csv: Average frame disguise scores (not used in the submission).
- weighted_play_disguise_results.csv: Weighted average of frame results (used in the submission).

Results

Once you have obtained the weighted_play_disguise_results.csv, it can be used to test hypotheses and plot graphs based on the Disguise Score metric.

Notes

This project uses MLflow for experiment tracking. If you're unfamiliar with MLflow, you can refer to the official documentation for more information on how to use the UI and manage experiments.
The dataset used for this project comes from the NFL Big Data Bowl 2025. Ensure you have access to the data before proceeding with the steps above.
You may adjust the hyperparameters based on experimentation and your preferences.

Contact

I'm available for any questions or clarifications. Feel free to reach out! 😄

Name		Name	Last commit message	Last commit date
Latest commit History 151 Commits
assets		assets
data		data
notebooks		notebooks
reports/figures		reports/figures
scripts		scripts
src		src
tests		tests
.flake8		.flake8
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Makefile		Makefile
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Disguised Intentions: How DBs Keep Offenses Guessing

Prerequisites

Installation

Running

Results

Notes

Contact

About

Languages

miguelmendesduarte/big-data-bowl-2025

Folders and files

Latest commit

History

Repository files navigation

Disguised Intentions: How DBs Keep Offenses Guessing

Prerequisites

Installation

Running

Results

Notes

Contact

About

Resources

Stars

Watchers

Forks

Languages