This is a python project transforming sentiment analysis of audio to MQTT messages.
It using the pywhispercpp package to run the whisper model for speech-to-text and huggingface package to run the text-classification model. The output is streamed to a MQTT broker.
Before we can start we need to install ffmpeg
# on Ubuntu or Debian
sudo apt update && sudo apt install ffmpeg
# on Arch Linux
sudo pacman -S ffmpeg
# on MacOS using Homebrew (https://brew.sh/)
brew install ffmpeg
# on Windows using Chocolatey (https://chocolatey.org/)
choco install ffmpeg
# on Windows using Scoop (https://scoop.sh/)
scoop install ffmpeg
The application is intended to be ran as a standalone application as this requires the minimal amount of effort to get started.
However, it is possible to run the application in a docker container to avoid installing the required packages on your host machine.
Install the dependencies:
pip install -r requirements.txt
pip install pywhispercpp
Run the application:
python main.py
- or
python3 main.py
It is possible to run this application in a docker container.
Because we need to stream the audio from the host machine to the docker container, we need to install PulseAudio on the host machine and run the PulseAudio server.
Below we will make the distinction between the host machine and the docker container.
The host machine is your current device (e.g. your laptop, raspberry pi, etc.) and the docker container is the container that is running the sentiment analysis.
TODO: add instructions for running on raspberry pi
-
Install PulseAudio on your macbook
run
./install-pulseaudio-for-mac.sh
-
Get your current IP address of your macbook
ipconfig getifaddr en0
The output should be something like
192.168.2.0
, and this value should be used to replace<HOST>
below for thePULSE_SERVER
variable.⚠️ Your IP address will be different from each network you connect to. -
Run docker container using your IP address
The docker container needs to be run with the
--net=host
and--privileged
flags to be able to connect to the PulseAudio server on the host machine.Make sure to always update the
PULSE_SERVER
environment variable to the correct IP address when changing locations/networks.-
You can either build the container yourself
docker build -f Dockerfile -t whisper .
docker run --net=host --privileged -e PULSE_SERVER=<HOST> whisper
-
or using the pre-built container from the Docker Hub
docker run --net=host --privileged -e PULSE_SERVER=<HOST> xiduzo/whisper-sentiment-analysis:latest
-
Variable | Description | Default value |
---|---|---|
WHISPER_MODEL |
The whisper model | base.en |
PULSE_SERVER * |
When running in docker | - |
TEXT_CLASSIFICATION_MODEL |
The text-classification model | j-hartmann/emotion-english-distilroberta-base |
MQTT_HOST |
The host of the MQTT broker | test.mosquitto.org |
MQTT_USER |
When using a MQTT broker which requires authentication | None |
MQTT_PWD |
When using a MQTT broker which requires authentication | None |
MQTT_BASE_TOPIC |
The host of the MQTT broker | sentiment_analysis_base_topic |
* Required
Volumes (Optional)
Volume | Maps to | Description | Default value |
---|---|---|---|
~/.config/pulse |
/root/.config/pulse |
The pulseaudio configuration files. Only used when running the application in a docker container | - |
Check if there is a connection between the docker-container and the host machine by running the following command on the host machine:
netstat -an | grep 4713
Should say something like:
tcp4 0 0 <HOST>.4713 <HOST>.<PORT> ESTABLISHED
tcp4 0 0 <HOST>.<PORT> <HOST>.4713 ESTABLISHED
tcp4 0 0 *.4713 *.* LISTEN
tcp6 0 0 *.4713 *.* LISTEN
⚠️ Whenever you attach a new audio device to your host machine, you need to reconfigure the input and output devices.
⚠️ Whenever you restart your host machine, you need to reconfigure the input and output devices.
PulseAudio
will stream audio from the host machine to the docker container. However, you need to manually configure which input and output devices to use.
Command | Description |
---|---|
pactl list |
List all sinks and sources |
Command | Description |
---|---|
pacmd list-sources | grep -e 'index:' -e device.string -e 'name:' |
List all available input devices |
pacmd set-default-source <INDEX> |
Set temporary input device |
Example output of listing input devices:
index: 0
name: <Channel_1__Channel_2.monitor>
device.string = "External screen microphone"
index: 1
name: <Channel_1>
device.string = "USB Audio Device"
* index: 2
name: <Channel_1.2>
device.string = "MacBook Pro Microphone"
Command | Description |
---|---|
pacmd list-sinks | grep -e 'index:' -e 'name:' |
List all available output devices |
pacmd set-default-sink <INDEX> |
Set temporary output device |
Example output of listing output devices:
* index: 0
name: <Channel_1__Channel_2>
index: 1
name: <Front_Left__Front_Right.2>
index: 2
name: <1__2>
Read this blog post for some more useful examples.
To validate that the audio streaming it working properly you can try to play audio from the docker container --> host machine.
Run the following commands in order:
Command | Description |
---|---|
docker ps |
List running containers, find the CONTAINER ID of one with a name including whisper |
docker exec -it <CONTAINER_ID> bash |
Get into the sentiment analysis container |
ls /usr/share/sounds/alsa/ |
List available sounds, should be a list of .wav files |
paplay /usr/share/sounds/alsa/<SOUND> |
Should play the sound on host machine output device |
exit |
Exit the container |