Audio Feature Extraction

Audio Feature Extraction has been one of the significant focus of Machine Learning over the years. The most frequent common state of data is a text where we can perform feature extraction quite smoothly. Then we have Feature Extraction for the image, which is a challenging task.

Now I will show you Audio Feature Extraction, which is a bit more complicated task in Machine Learning. Feature Extraction is the process of reducing the number of features in the data by creating new features using the existing ones. The new extracted features must be able to summarise most of the information contained in the original set of elements in the data.

Audio Feature Extraction

Audio Feature Extraction plays a significant part in analyzing the audios. The idea is to extract those powerful features that can help in characterizing all the complex nature of audio signals which at the end will help in to identify the discriminatory subspaces of audio and all the keys that you need to analyze sound signals. Now let’s start with importing all the libraries that we need for this task:

import tensorflow as tf import numpy as np import pandas as pd from pyAudioAnalysis import audioBasicIO from pyAudioAnalysis import audioFeatureExtraction import matplotlib.pyplot as plt import os

Audio Basic IO is used to extract the audio data like a data frame and creating sample data for audio signals. Audio Feature Extraction is responsible for obtaining all the features from the signals of audio that we need for this task.

Now I will define a utility function that will help us in taking a file name as argument:

def preProcess( fileName ): [Fs, x] = audioBasicIO.readAudioFile(fileName) if( len( x.shape ) > 1 and x.shape[1] == 2 ): x = np.mean( x, axis = 1, keepdims = True ) else: x = x.reshape( x.shape[0], 1 ) F, f_names = audioFeatureExtraction.stFeatureExtraction( x[ :, 0 ], Fs, 0.050*Fs, 0.025*Fs ) return (f_names, F)

Now I would like to use only the chronogram feature from the audio signals, so I will now separate the data from our function:

def getChromagram( audioData ): temp_data = audioData[ 21 ].reshape( 1, audioData[ 21 ].shape[0] ) chronograph = temp_data for i in range( 22, 33 ): temp_data = audioData[ i ].reshape( 1, audioData[ i ].shape[0] )just say you love me chronograph = np.vstack( [ chronograph, temp_data ] ) return chronographjust say you love me

Now I will create a function that will be used to find the best note in each window, and then we can easily find the frequencies from the audio signals:

def getNoteFrequency( chromagram ): numberOfWindows = chromagram.shape[1] freqVal = chromagram.argmax( axis = 0 ) histogram, bin = np.histogram( freqVal, bins = 12 ) normalized_hist = histogram.reshape( 1, 12 ).astype( float ) / numberOfWindows #D return normalized_hist

Now I will create a function to iterate over the files in the path of our directory. Here I will be using a pandas data frame to store our feature vectors:

fileList = [] def getDataset( filePath ): X = pd.DataFrame( ) columns=[ "G#", "G", "F#", "F", "E", "D#", "D", "C#", "C", "B", "A#", "A" ] for root, dirs, filenames in os.walk( filePath ): for file in filenames: fileList.append( file ) feature_name, features = preProcess(filePath + file ) chromagram = getChromagram( features ) noteFrequency = getNoteFrequency( chromagram ) x_new = pd.Series(noteFrequency[ 0, : ]) X = pd.concat( [ X, x_new ], axis = 1 ) data = X.T.copy() data.columns = columns data.index = [ i for i in range( 0, data.shape[ 0 ] ) ] return data data
G#GF#FED#DC#CBA#A
00.0000000.00.00.0000000.00.01.0000000.00.00.00.00.000000
10.9917130.00.00.0000000.00.00.0000000.00.00.00.00.008287
20.0218340.00.00.0000000.00.00.0000000.00.00.00.00.978166
30.0153850.00.00.0000000.00.00.0000000.00.00.00.00.984615
40.0909090.00.00.0909090.00.00.5454550.00.00.00.00.272727
50.0270270.00.00.0000000.00.00.0000000.00.00.00.00.972973
60.0000000.00.00.0000000.00.01.0000000.00.00.00.00.000000
70.0975610.00.00.0000000.00.00.0000000.00.00.00.00.902439
80.0000000.00.00.0000000.00.01.0000000.00.00.00.00.000000
90.0000000.00.00.0000000.00.01.0000000.00.00.00.00.000000

In the data frame above each row represents a data point, and each column represents the features. So we have 19 files and 12 features each in our audio signals.

Also, Read: Polynomial Regression Algorithm in Machine Learning.

Machine Learning Algorithm for Audio Feature Extraction

Here I will use the K-means clustering algorithm. Now I will define the hyperparameters for our Machine Learning Algorithm. Here K will represent the number of clusters, and epochs represent the number of iterations our Machine Learning Algorithm will run for:

k = 4 epochs = 1000

Now I will make a function to select the k data points as initial centroids:

def initilizeCentroids( data, k ): centroids = data[ 0: k ] return centroids

Now, I will define tensors that will represent the placeholders of our data. Here X is a representation of the data, C is the list of k centroids, and C_labels is the index of the centroids that we have assigned to our each data point:

X = tf.placeholder( dtype = tf.float32 ) C = tf.placeholder( dtype = tf.float32 ) C_labels = tf.placeholder( dtype = tf.int32 )

Now I will prepare our data for audio feature extraction with Machine Learning:

expanded_vectors = tf.expand_dims( X, 0 ) expanded_centroids = tf.expand_dims( C, 1 ) distance = tf.reduce_sum( tf.square( tf.subtract( expanded_vectors, expanded_centroids ) ), axis = 2 ) getCentroidsOp = tf.argmin( distance, 0 )

Now I will compute the new centroids from our assigned labels and data values:

sums = tf.unsorted_segment_sum( X, C_labels, k ) counts = tf.unsorted_segment_sum( tf.ones_like( X ), C_labels, k ) reCalculateCentroidsOp = tf.divide( sums, counts )

Driver Code

Now I will define the driver code for our algorithm. Using this function, we will feed the necessary data so that we could train it using our Machine Learning Algorithm:

data_labels = [] centroids = [] with tf.Session() as sess: sess.run( tf.global_variables_initializer() ) centroids = initilizeCentroids( data, k ) for epoch in range( epochs ): data_labels = sess.run( getCentroidsOp, feed_dict = { X: data, C: centroids } ) centroids = sess.run( reCalculateCentroidsOp, feed_dict = { X: data, C_labels: data_labels } ) print( data_labels ) print( centroids )
[0 1 2 2 0 2 0 2 0 0]
[[0.01818182 0.         0.         0.01818182 0.         0.
  0.9090909  0.         0.         0.         0.         0.05454545]
 [0.9917127  0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.00828729]
 [0.04045167 0.         0.         0.         0.         0.
  0.         0.         0.         0.         0.         0.95954835]]

Now we have trained the model for audio feature extraction. Let’s have a look at our output:

final_labels = pd.DataFrame( { "Labels": data_labels, "File Names": fileList } ) final_labels
File NamesLabels
0c3_1.wav0
1c4_1.wav1
2c1_2.wav2
3c2_2.wav2
4c3_2.wav0
5c2_3.wav2
6c2_4.wav0
7c1_1.wav2
8c2_1.wav0
9c4_2.wav0

I hope you liked this article on Audio Feature Extraction using the k-means clustering algorithm. Feel free to ask your valuable questions in the comments section below. You can also follow me on Medium to read more amazing articles.

Follow Us:

Default image
Aman Kharwal

I am a programmer from India, and I am here to guide you with Data Science, Machine Learning, Python, and C++ for free. I hope you will learn a lot in your journey towards Coding, Machine Learning and Artificial Intelligence with me.

Leave a Reply