This GitHub repository houses the codebase for a study investigating the impact of Swedish weather conditions, particularly heat, on dairy cows' milk production on Swedish farms. Using weather data sourced from Sveriges meteorologiska och hydrologiska institut (SMHI) and extensive dairy data from the Gigacow project at Sveriges lantbruksuniversitet (SLU).
This project studies the relationship between weather conditions and dairy cow milk production in Swedish farms. The motivation stems from the critical importance of understanding how varying temperatures, specifically heat, influence this aspect of agriculture. By combining data from weather and dairy sources, the study employs a diverse set of mathematical and machine learning techniques. These methods, ranging from normalization techniques to modeling and statistical frameworks, enables a exploration of the dynamics. This GitHub repository serves as a hub for the codebase, providing a foundation for future studies. The report can be found HERE.
- Lena-Mari Tamminen
- Tomas Klingström
- Martin Johnsson
- Data preprocessing of dairy and weather data.
- Employment of several statistical methods.
HeatStressEvaluation (project-root)/
|-- Data/
| |-- TheData.csv
| |
| |-- CowData/
| | |-- CowData_README.md
| | |-- GIGACOW/
| | | |-- Cow_filtered.csv
| | | |-- DiagnosisTreatment_filtered.csv
| | | |-- Lactation_filtered.csv
| | | |-- MilkYield_filtered.csv
| | | |-- Robot_filtered.csv
| | |
| | |-- RawGIGACOW/
| | |-- Cow.csv
| | |-- DiagnosisTreatment.csv
| | |-- Lactation.csv
| | |-- MilkYield.csv
| | |-- Robot.csv
| |
| |-- WeatherData/
| |-- WeatherData_README.md
| |-- Coordinates/
| | |-- Coordinates.csv
| |
| |-- MESAN/
| | |-- processed_data_XXXX.csv
| | | ...
| | |-- ...
| |
| |-- RawMESAN/
| |-- XXXX_2022-2023.csv
| | ...
| |-- ...
|
|-- DataPreprocessing/
| |-- Preprocesses.py
| |-- DataPreprocessing.ipynb
|
|-- Modeling/
| |-- Bayesian.py
| |-- BayesianGAM.ipynb
| |-- BayesianLinear.ipnyb
| |-- DataExploration.ipynb
| |-- DIMReduction.ipynb
| |-- RandomForest.ipynb
| |-- ShortBreedStudy.ipynb
| |-- BoxPlots.ipynb
|
|-- README.md
|-- requirements.txt
Before running the scripts, make sure to fulfill the following prerequisites:
git clone https://github.com/axeUUeng/HeatStressEvaluation.git
And then change into the project directory:
cd /path/to/HeatStressEvaluation
Replace /path/to/HeatStressEvaluation
with the actual path to the HeatStressEvaluation
project directory.
Follow one of the installation guides for conda
Python version used by the authors is 3.10.13
.
Then to get the proper environment:
# Conda env installation command
conda create --name your_environment_name --file requirements.txt
If the creation of the environment doesn't work for some reason, the most important libraries are:
Numpy
requests
scikit-learn
Numba
matplotlib
Seaborn
Pandas
SciPy
Patsy
tqdm
statsmodels
Umap
itertools
Some datasets are necessary and should be placed in the "Data" folder according to the structure provided above. Ensure the availability of the following datasets and their correct placement:
- The Gigacow data from SLU in the
Data/CowData/RawGIGACOW/
directoryCow.csv
DiagnosisTreatment.csv
Lactation.csv
MilkYield.csv
Robot.csv
- The MESAN data from SMHI in the
Data/WeatherData/RawMESAN/
directoryXXXX_2022-2023.csv
- The coordinate file in the
Data/WeatherData/Coordinates/
directoryCoordinates.csv
Run the two cells in DataPreprocessing/DataPreprocessing.ipynb
.
Resulting dataset with milk-records merged with weather is named and stored under Data/TheData.csv
.
DataExploration.ipynb
- contains some initial exploration of the data, mainly focusing on the number of records for each farm.Bayesian.py
- contains scripts and functions used inBayesianLinear.ipynb
BayesianLinear.ipynb
- fits a linear combinations of features to normalized daily total yield. For one cow, all cows on one farm and one model for one farm.BayesianGAM.ipynb
- fits a GAM model to either one farm or one cow.BoxPlots.ipynb
- shows basic differences in temperature and yield for mainly summer 22 and summer 23.DIMReduction.ipynb
- short attempt at dimension reductions on the dataset.ShortBreedStudy.ipynb
- short attempt at visualising differences between breeds in yield duringHW=1
andHW=0
.RandomForest.ipynb
- Applies normalization to yield and then uses RandomForests to find patterns in the data.