Speech Emotion Recognition using MFCC(Mel-Frequency Cepstral Coefficients)

Overview

Speech Emotion Recognition (SER) is a project that uses machine learning techniques to identify human emotions from speech signals. This repository demonstrates how to preprocess audio data, extract features, train models, and evaluate their performance for emotion classification.

Features

Audio Feature Extraction: Utilizes MFCCs (Mel-frequency cepstral coefficients) to extract relevant features from audio signals.
Emotion Classification: Predicts emotions such as happiness, sadness, anger, and neutrality.
Model Training and Evaluation: Implements machine learning models to classify emotions.
Visualization: Includes visualizations for audio signal characteristics and performance metrics.

Datasets

The project uses the following datasets:

[RAVDESS]: Ryerson Audio-Visual Database of Emotional Speech and Song. Contains audio recordings of speech performed with different emotions. Link to dataset
[CREMA-D]: Crowd-sourced Emotional Multimodal Actors Dataset. Includes audio-visual recordings with emotional expressions. Link to dataset
[TESS]: Toronto Emotional Speech Set. Features emotional speech data recorded by female speakers. Link to dataset
[SAVEE]: Surrey Audio-Visual Expressed Emotion. Comprises audio recordings of speech expressing different emotions by male speakers. Link to dataset

Please download these datasets from the provided links and place them in the appropriate folder structure.

Prerequisites

Python 3.7+
Required Libraries:
- numpy: For numerical operations and handling arrays.
- librosa: For audio processing and feature extraction.
- matplotlib: For creating visualizations.
- scikit-learn: For implementing machine learning models and evaluation metrics.
- seaborn: For enhanced data visualization.

Install the required libraries using the following commands:

pip install numpy librosa matplotlib scikit-learn seaborn

Contribution

Contributions are welcome! If you find any issues or have suggestions for improvement, feel free to open an issue or submit a pull request.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgments

Inspiration for the project from various open-source emotion recognition research.
I took help from this Kaggle notebook on Speech Emotion Recognition.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
dataset		dataset
LICENSE		LICENSE
README.md		README.md
Speech Emotion Recogntion.ipynb		Speech Emotion Recogntion.ipynb
data_path.csv		data_path.csv
emotion_model.h5		emotion_model.h5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speech Emotion Recognition using MFCC(Mel-Frequency Cepstral Coefficients)

Overview

Features

Datasets

Prerequisites

Contribution

License

Acknowledgments

About

Releases

Packages

Languages

License

hallowshaw/Speech-Emotion-Recognition-with-MFCC

Folders and files

Latest commit

History

Repository files navigation

Speech Emotion Recognition using MFCC(Mel-Frequency Cepstral Coefficients)

Overview

Features

Datasets

Prerequisites

Contribution

License

Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages