Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to plot MFCC in Python using Matplotlib?
MFCC (Mel-Frequency Cepstral Coefficients) are widely used features in audio processing and speech recognition. Python's python_speech_features library combined with Matplotlib allows us to extract and visualize these features effectively.
What are MFCCs?
MFCCs represent the shape of the spectral envelope of audio signals. They capture the most important characteristics of audio for speech recognition and audio analysis tasks.
Installing Required Libraries
First, install the necessary packages ?
pip install python_speech_features matplotlib scipy numpy
Step-by-Step Implementation
Here's how to extract and plot MFCC features from an audio file ?
from python_speech_features import mfcc
import scipy.io.wavfile as wav
import matplotlib.pyplot as plt
import numpy as np
# Set figure size and layout
plt.rcParams["figure.figsize"] = [10, 6]
plt.rcParams["figure.autolayout"] = True
# Read the audio file
(sample_rate, audio_signal) = wav.read("audio_file.wav")
# Extract MFCC features
mfcc_features = mfcc(audio_signal, sample_rate)
# Create the plot
fig, ax = plt.subplots()
# Transpose the MFCC data for proper visualization
mfcc_features = np.swapaxes(mfcc_features, 0, 1)
# Display MFCC as a heatmap
cax = ax.imshow(mfcc_features, interpolation='nearest', cmap='viridis', origin='lower')
# Add labels and title
ax.set_xlabel('Time Frames')
ax.set_ylabel('MFCC Coefficients')
ax.set_title('MFCC Features Visualization')
# Add colorbar
plt.colorbar(cax)
plt.show()
Creating Sample Audio for Testing
If you don't have an audio file, you can create a synthetic signal ?
import numpy as np
import matplotlib.pyplot as plt
from python_speech_features import mfcc
# Create synthetic audio signal (sine wave)
sample_rate = 16000 # Hz
duration = 2.0 # seconds
t = np.linspace(0, duration, int(sample_rate * duration))
# Mix of different frequencies to simulate speech-like signal
frequencies = [440, 880, 1320] # Hz
audio_signal = np.zeros_like(t)
for freq in frequencies:
audio_signal += np.sin(2 * np.pi * freq * t) * np.exp(-t/2)
# Add some noise
audio_signal += 0.1 * np.random.randn(len(t))
# Extract MFCC features
mfcc_features = mfcc(audio_signal, sample_rate)
# Plot the MFCC
plt.figure(figsize=(12, 8))
# Subplot 1: Original signal
plt.subplot(2, 1, 1)
plt.plot(t[:1000], audio_signal[:1000])
plt.title('Original Audio Signal (first 1000 samples)')
plt.xlabel('Time (s)')
plt.ylabel('Amplitude')
# Subplot 2: MFCC features
plt.subplot(2, 1, 2)
mfcc_transposed = np.swapaxes(mfcc_features, 0, 1)
plt.imshow(mfcc_transposed, interpolation='nearest', cmap='viridis', origin='lower', aspect='auto')
plt.title('MFCC Features')
plt.xlabel('Time Frames')
plt.ylabel('MFCC Coefficients')
plt.colorbar()
plt.tight_layout()
plt.show()
[Displays a two-panel plot showing the original audio signal and its corresponding MFCC visualization]
Understanding the Visualization
In the MFCC plot:
- X-axis: Represents time frames (windows of audio)
- Y-axis: Represents the 13 MFCC coefficients
- Colors: Intensity represents the coefficient values
- Lower coefficients: Capture overall spectral shape
- Higher coefficients: Capture finer spectral details
Key Parameters
| Parameter | Description | Default |
|---|---|---|
numcep |
Number of cepstral coefficients | 13 |
nfilt |
Number of filters in filterbank | 26 |
winlen |
Window length in seconds | 0.025 |
winstep |
Step between windows in seconds | 0.01 |
Conclusion
MFCCs provide a compact representation of audio signals that's ideal for machine learning applications. The visualization helps understand the temporal evolution of spectral characteristics in your audio data.
