Last modified: Apr 16, 2026 By Alexander Williams
Generate Spectrogram from Audio in Python
A spectrogram is a powerful visual tool. It shows how the frequencies in a sound change over time.
Think of it as a picture of sound. This guide will show you how to create one using Python.
You will learn the essential steps. We will use popular libraries to make the process simple.
What is a Spectrogram?
A spectrogram is a two-dimensional plot. Time is on the x-axis. Frequency is on the y-axis.
The color or intensity at each point shows the amplitude of a frequency at a specific time. It turns audio data into a visual map.
This is crucial for audio analysis. It helps in music information retrieval, speech processing, and bioacoustics.
Prerequisites and Libraries
You need Python installed on your system. Basic knowledge of Python is helpful.
We will use three main libraries. Install them using pip if you haven't already.
pip install librosa matplotlib numpy
Librosa is the star for audio loading and analysis. Matplotlib will create the visual plot. NumPy handles the numerical data.
For a broader look at tools, see our Python Audio Libraries: Play, Record, Process guide.
Step 1: Load an Audio File
The first step is to get your audio data into Python. We use the librosa.load() function.
This function returns two things. The audio time series (y) and the sample rate (sr).
import librosa
import librosa.display
import matplotlib.pyplot as plt
# Load an audio file. Use your own file path.
audio_path = 'your_audio_file.wav'
y, sr = librosa.load(audio_path)
print(f"Audio length: {len(y)} samples")
print(f"Sample rate: {sr} Hz")
print(f"Duration: {len(y)/sr:.2f} seconds")
Audio length: 220500 samples
Sample rate: 22050 Hz
Duration: 10.00 seconds
The sample rate (sr) is critical. It defines how many data points represent one second of audio.
Step 2: Compute the STFT
The core of a spectrogram is the Short-Time Fourier Transform (STFT). It analyzes small, overlapping windows of the audio.
We use Librosa's librosa.stft() function. It converts the time-domain signal to a frequency-domain representation.
# Compute the Short-Time Fourier Transform (STFT)
D = librosa.stft(y)
# Convert the complex values to magnitude
S_db = librosa.amplitude_to_db(abs(D), ref=np.max)
print(f"STFT shape: {D.shape}")
print(f"Magnitude Spectrogram shape: {S_db.shape}")
STFT shape: (1025, 431)
Magnitude Spectrogram shape: (1025, 431)
The output shape (1025, 431) means 1025 frequency bins and 431 time frames. The amplitude_to_db function converts amplitude to decibels for a better visual.
Step 3: Plot the Spectrogram
Now we visualize the computed STFT data. We use librosa.display.specshow() with Matplotlib.
This function is designed specifically for displaying spectrograms.
import numpy as np
# Create a figure
plt.figure(figsize=(10, 6))
# Display the spectrogram
librosa.display.specshow(S_db, sr=sr, x_axis='time', y_axis='log', cmap='viridis')
# Add a color bar to show amplitude scale
plt.colorbar(format='%+2.0f dB')
# Set the title and labels
plt.title('Spectrogram of Audio File')
plt.xlabel('Time (s)')
plt.ylabel('Frequency (Hz)')
# Show the plot
plt.tight_layout()
plt.show()
This code generates a complete spectrogram plot. The y-axis is on a log scale, which is common for audio.
It better represents how we perceive sound. The 'viridis' colormap provides clear contrast.
Complete Example Code
Here is the full script from start to finish. It loads an audio file and creates a labeled spectrogram.
import librosa
import librosa.display
import matplotlib.pyplot as plt
import numpy as np
# 1. Load Audio
audio_path = 'example.wav'
y, sr = librosa.load(audio_path)
# 2. Compute STFT and Convert to dB
D = librosa.stft(y)
S_db = librosa.amplitude_to_db(np.abs(D), ref=np.max)
# 3. Plot
plt.figure(figsize=(12, 8))
img = librosa.display.specshow(S_db, sr=sr, x_axis='time', y_axis='log', cmap='magma')
plt.colorbar(img, format='%+2.0f dB')
plt.title('Log-Frequency Power Spectrogram')
plt.tight_layout()
plt.show()
Customizing Your Spectrogram
The basic method works well. But you can adjust many parameters for different needs.
You can change the STFT window size with the n_fft parameter. A larger window gives better frequency resolution.
You can change the hop length with hop_length. This controls the overlap between windows.
# Custom STFT parameters
n_fft = 2048 # Window size for FFT
hop_length = 512 # Number of samples between successive frames
D_custom = librosa.stft(y, n_fft=n_fft, hop_length=hop_length)
S_db_custom = librosa.amplitude_to_db(np.abs(D_custom), ref=np.max)
You can also use a linear frequency scale. Just change y_axis='linear' in the specshow function.
Experiment with different colormaps like 'plasma', 'inferno', or 'coolwarm'.
Why Use Spectrograms?
Spectrograms are not just pretty pictures. They are fundamental for many audio tasks.
In music, they can identify notes, chords, and instruments. In speech processing, they help with speech recognition and speaker identification.
They are also used in machine learning. Convolutional Neural Networks (CNNs) can treat spectrograms as images for classification.
For more foundational knowledge, check out our Python Audio Processing Guide for Beginners.
Common Issues and Tips
You might encounter a silent or blank spectrogram. This often means the audio signal is too quiet or the dB scaling is off.
Ensure your audio file is not corrupted. Use librosa.get_duration(y, sr) to check the length.
If the plot looks pixelated, increase the figure size or the n_fft parameter for more detail.
Always remember to convert amplitude to decibels. Our ears perceive sound on a logarithmic scale.
Conclusion
Generating a spectrogram in Python is straightforward. The combination of Librosa and Matplotlib is powerful.
You learned to load audio, compute the STFT, and create a visual plot. You also saw how to customize the output.
Spectrograms are a key step in understanding audio data. They bridge the gap between raw sound and actionable insights.
Start by analyzing different types of audio files. Experiment with the parameters to see how they change the visualization.