My first foray into EdgeAI: An Urban Noise Monitor using Raspberry Pi
Since I encountered the formidable Balena IoT community, I have been intrigued and excited by machine-learning (ML) on tiny low-powered devices, the eponymous TinyML, or artificial intelligence (AI) on the edge; EdgeAI. These are the buzzwords that is slowly taking over the ever-growing connected world of ours. Tiny devices with amazing capabilities that we carry with us are astounding, from tiny rings on our fingers to the powerful smartphones and tablets that make a couple of years old PCs look pitiful. In light of these, I decided to experiment with two assets I had. One, a tiny device that I had lying around, a Raspberry Pi, and two, the machine learning skills that I have acquired through the past years.
The first rule in ML model deployment (or any software/algorithm development AFAIK) is, don’t reinvent the wheel. Hence, I decided to test my skills with a project that I found the most attracted to (in the wonderful balenaCloud): Sound Analysis using AI. The project uses single-board computers like Raspberry Pi, a microphone, and an optional Google Coral TPU to classify common urban noise categories that are encountered in a city: ranging from children playing to gunshot (ahem.. USA.. ahem). The ML model used in this project was trained on the UrbanSound8k dataset from Urban Sound Datasets, hence the different sound categories come from this set.
One could always deploy balena’s model using balena cloud with a single click. However, there is no fun or challenge in doing that. Therefore, inspired from the project, I decided to implement an ML pipeline from the start to finish that achieves urban noise sound classification. The goal of this project was not to obtain the perfect urban noise sound classification, but to study the different steps involved in the process and see the classification in action- real-time. I broke down the different steps in this process to the list below:
1. Train an ML model on a standard dataset.
2. Compile the trained model to run on a device like Raspberry Pi.
3. Record audio using Raspberry Pi in real-time and classify them using the above model.
4. Visualize the classification results real-time (remotely).
The scripts used in the entire project can be found here: https://github.com/mabhijithn/urbannoise-edgeai.
There are plenty of ML models out there which has been built for the Urban Sound dataset, with >200 papers since 2020 alone. However, I went for nothing fancy. I wanted a quick and efficient way to build a model and deploy it on Raspberry Pi. Inspired by (and mostly based on) this excellent Github repo, I began to create a training pipeline for the sound classification model. First, I used the feature extraction method used here, which converts a sound file to a feature vector. This feature vector contains short-time Fourier transform (STFT), Mel-spectrogram, Mel-Frequency Cepstral Coefficients (MFCC), and chroma-STFT components. These are some of the standard frequency (and log frequency) domain audio features widely used in sound classification problems. A fully connected neural network (NN) with two dense layers was built using Tensorflow 2.4 to classify the 10 categories of sound in the UrbanSound8k dataset. Using a normal 70–20–10 train-validation-test split of the data, the model was trained, validated, and tested on the UrbanSound8k dataset and achieved ~90% accuracy. We will save this model and keep it aside for some further processing before deploying. train_nn.py is the script that extracts features from the UrbanSound8K dataset and trains a model using it.
The edge device I will use to run the trained classification model is a Raspberry Pi. I have no intention of training NNs and models on this device and will use the device for only running the NN models. Hence, it makes more sense to just install the TensorFlow Lite runtime package called the tflite_runtime on the Pi. The tflite_runtime package consists of just the interpreter required to run TensorFlow Lite models (Note the Lite in the name. The model that we trained earlier is not a lite model.) How to install tflite_runtime on a pi is described here. As mentioned before, we will now be dealing with TensorFlow lite models to run our sound classification on the pi. These models are built to be small, portable, and executable on embedded and mobile devices. We will convert our saved NN model to a lite model using the TensorFlow Lite converter. This model can now be executed by the tflite_runtime package which is installed on Raspberry Pi.
Recording Audio on Raspberry Pi
We have now trained our NN model to classify sounds to one of the 10 categories from the UrbanSound8k dataset. Now, we need to provide this model with sounds so that it can classify them in real-time. This requires recording audio on the Raspberry Pi. I connected a USB microphone to my Raspberry Pi (4B with 2GB of RAM) running the latest Raspberry Pi OS and placed it outside my apartment window as shown in the picture below.
I used the
pyaudio python package to continuously record audio along with
audioop package to extract some features from raw audio. The latter is used to compute power in recorded audio, which is used to estimate ambient background urban noise, power in the audio during ‘silent’ periods and also to estimate urban noise power of each of the recordings during the day with respect to ambient noise (in dB). The final computed noise power of each audio recording is compared against a manually set threshold to decide if the recording is indeed an urban noise or not. If the audio recording is deemed to be an urban noise, it is saved as a WAV file for classification. The code can be found in the
sound_record.py script in the Github repository.
Note: You may have to find the device ID of your USB mic to make the above code work out-of-the-box. There are many tutorials to find the same:  , 
Classification and visualization of results
At this stage, we have a python script that continuously records audio and calculates the urban noise power of the audio recording. If the urban noise exceeds a pre-defined threshold, the recorded audio is saved for classification. We had converted the NN model to the TensorFlow Lite version to classify on Raspberry Pi. The classify.py script can be run to classify the recorded audio using the TensorFlow Lite model. I have also uploaded the lite model to the GitHub repository so that you can easily get started. In reality, you can train your model, fine-tune and update them to be used instead.
We have built a NN model using TensorFlow to classify sounds, recorded urban noises, and used a lite-version of the TensorFlow to classify these noises. However, how do we visualize the results? I am partial to Dash and Plotly and therefore will be using it to spin up a python web server on the Raspberry Pi to view the results from your local network.
As my script classifies urban noise that is recorded, which is usually 3–4 seconds long, like engine idling or children playing, it also continuously monitors background urban noise power for a longer duration. This background urban noise power, average audio power over 10–15 minutes, is compared against the average audio power during a ‘silent period. This silent period was recorded during the night around 11PM-12AM. Therefore, I have now the individual noise classification results and the background urban noise calculated every 10–15 minutes. The script writes these into two CSV files respectively (you can observe this in the classify.py script). I have written a simple web app which can be found in index.py in the Github repository. Running this script, using the command python3 index.py will spin up a python web app at http://0.0.0.0:5000 . This page can either be opened directly on the Raspberry Pi, provided the pi is connected to a monitor or accessed via VNC-viewer. Otherwise, you can view this page from any devices’ browser connected to the same network (WiFi or LAN) as the Raspberry Pi. Assume the Pi has the network address of 192.168.0.10. By visiting the following address http://192.168.0.10:5000 in the browser, you can observe the results.
Visualization results of our Urban Noise Monitor can be seen in the first figure at the beginning of this article. The webpage lists all the audio files that have been classified in a table along with the top 2 predictions (with output ‘probability’ for each of these predictions, called the Certainty). On the top of this table, a bar plot visualizing the background urban noise power at 15-minute intervals is also plotted. This noise power, calculated in dB, is compared to the urban noise at night time. One can see that the urban noise is at least 10dB higher during the daytime. The classification results might not be as intuitive as you expect. Apart from engine-idling or children-playing, I was hard-pressed to find the rest of the categories from the urban noise dataset in my neighborhood. However, you can re-train your network using newly collected data and adding new labels! Thereby, you can update your NN models and get a better classification of the sounds! This will require no change in the setup apart from adding new labels, re-training, and using the updated model file for classification.
This was my first foray into EdgeAI and TinyML. The system is far from perfect. The network itself was crudely developed, as basic as it gets. The scripts were adapted to my needs from off-the-shelf projects. However, in its entirety, it was a fun exercise and a great peek into the possibilities in this area.