Quinn Mair
About Projects Contact

Signal Processing · Algorithm Development

Audio Note Recognition System

A MATLAB system to identify and label individual notes in an input audio file.

MATLAB Fast Fourier Transform Frequency Domain Analysis Data Parsing
MATLAB windowing output graph

Abstract

In this project, I intended to design a system to identify individual music notes in a larger audio file. My system parses an input audio file into individual notes and then analyzes the frequency content of those notes to identify the dominant frequency. This dominant frequency is then used to identify the pitch and octave of the note and return those values for each note. I tested this system on audio files of varying types and lengths, with varying success. With smaller audio inputs, this system correctly identified the pitch and octave of every note window.

To identify the music notes corresponding to each frequency in a larger audio file, I created a MATLAB program to: read a file, parse the file into individual notes, convert those notes to the frequency domain, and use the dominant frequency to identify a corresponding music note. Throughout the program, I added graphing functions to troubleshoot individual functions and better visualize the audio recognition process. This code uses the audioread command to get an array of samples at a standard sample rate, then uses these samples in the following process:

A basic overview of the process is outlined in the block diagram below:

Assembly code screenshot
Algorithm Overview

Results

For the majority of the dubugging and testing of my audio recognition system I used a three second clip of the Star Wars Cantina Song republished by the University of Illinois. The original graph of this audio file is shown below left. In this figure, the amplitude of the notes in the audio file are plotted against time on the x axis, making it easy to visualize individual notes. This file was then parsed using the previously described parsing function. With an upper threshold at 25% and a lower threshold at 15% of the maximum amplitude, the signal is separated into distinct notes, illustrated on the right side of figure 2 below.

Audio note parsing results
Figure 1: Note parsing for input audio signal.

For this file, my program identified 13 separate windows: twelve distinct notes and an empty edge case window at the end of the loop. With these windows, I was able to use the MATLAB fft function to calculate the frequency content of each window. When subplotted with the original audio window, figures similar to those below were generated for every note.

Audio note parsing results
Figure 2: Time vs. frequency of note windows.

After storing each window's frequency content, the program identified the dominant frequency (largest magnitude) for each window and recorded it. Following another band pass filter to remove inaudible noise, the findOctave and findNote functions labeled the octave and note of each window based on established frequency ranges. The final output of the program for this three second clip was twelve labeled notes as shown below.

Audio note parsing results
Figure 3: Program Output for 3 Second Version of ”Star Wars Cantina Song”.

As demonstrated in the figure, the audio recognition system correctly identified the pitch and octave for each audio window's most dominant frequency. These results match very closely to sheet music for the same song, with variations often occurring for notes with many conflicting instruments or significant background noise. While these results demonstrate this program is highly effective with short audio samples, they also highlight potential shortcomings of the program when handling larger files.

Working with larger files

Assembly code screenshot
Note Windows for Jingle Bell Rock.

While the process and procedure of this note recognition program works well for smaller audio files, there are substantial MATLAB and memory issues precluding it from being as effective with larger data sets. With properly identified parsing thresholds, larger audio files necessitate creating significantly more note windows. The figure below illustrates the result of the note parsing function for ”Jingle Bell Rock,” which is just shy of three minutes long. This file requires over 200 note windows, and unsurprisingly leads to indexing and memory issues particularly with the recursive parts of the program.

With better edge case handling for recursive indexing and higher data allotment, the process outlined in this program could be adapted to better suit larger files, albeit with a proportional increase in required memory. Moreover, finer tuning of the parsing thresholds for larger files would lead to proportionally more efficient program performance by normalizing the data dimensions used in the program. Without these adaptations, the program would require excessive memory and communicate several errors when working with large files.

Conclusion

Using MATLAB's built in fast Fourier transform function for discrete signals in conjunction with a note parsing function, my program was able to very effectively identify individual music notes in a small audio file. Despite initially appearing like a straightforward process, this project was complicated by complex, changing data types and structures resulting from the Fourier transform and repeated loop operations. The most difficult parts of working through this project came from troubleshooting algorithmic issues with frequency conversions and working with arrays and vectors to store note windows at various stages of processing. For example, much of the debugging stage of the project was dedicating to fixing array index errors resulting from Hz to index conversion immediately after the FFT process, and subsequent edge case issues raised by looping.

If given more time on the project, I would make improvements to the data storage and iteration techniques used throughout the program by adding additional helper methods to clean up the file. If I were to redo the project, I would reevaluate my strategy for identifying note parsing threshold values. By incorporating an embedded function to evaluate the extent to which the maximum amplitude of the input file deviates as an outlier and evaluating thresholds accordingly I could ensure that the note windows are better correlated to notes in the input file. These changes would also aid the program to better handle larger data sets and reduce overall runtime.