miniDSP
A small C library for audio DSP
Loading...
Searching...
No Matches
Mel Filterbank and MFCCs

This tutorial introduces two classic speech/audio front-end features:

  • Mel filterbank energies — triangular spectral bands spaced on the mel scale.
  • MFCCs — DCT of log mel energies.

miniDSP provides both as frame-level APIs in minidsp.h.

Why mel and MFCC?

The linear FFT axis over-resolves high frequencies compared to human pitch perception. Mel filterbanks compress frequency spacing to be denser at low frequencies and coarser at high frequencies.

MFCCs then decorrelate log mel energies via a DCT, producing compact features used widely in speech recognition and audio classification.

Step 1: Build mel energies

miniDSP uses:

  • HTK mel mapping:

    \[ m(f) = 2595 \log_{10}\!\left(1 + \frac{f}{700}\right) \]

  • Internal Hann windowing
  • One-sided PSD bins: \(|X(k)|^2 / N\)

Reading the formula in C:

// f -> freq_hz, m(f) -> mel
double mel = 2595.0 * log10(1.0 + freq_hz / 700.0);

Compute one frame of mel energies:

MD_mel_energies(signal, N, sample_rate, num_mels,
min_freq_hz, max_freq_hz, mel);
void MD_mel_energies(const double *signal, unsigned N, double sample_rate, unsigned num_mels, double min_freq_hz, double max_freq_hz, double *mel_out)
Compute mel-band energies from a single frame.

Step 2: Compute MFCCs

MFCCs are computed from log mel energies with a DCT-II:

\[ c_n = \alpha_n \sum_{m=0}^{M-1} \log(\max(E_m, 10^{-12})) \cos\!\left(\frac{\pi n (m + 1/2)}{M}\right) \]

Reading the formula in C:

// E_m -> mel_energy[m], c_n -> mfcc[n], M -> num_mels, n -> coeff index
for (unsigned n = 0; n < num_coeffs; n++) {
double alpha = (n == 0)
? sqrt(1.0 / (double)num_mels)
: sqrt(2.0 / (double)num_mels);
double acc = 0.0;
for (unsigned m = 0; m < num_mels; m++) {
double log_em = log(fmax(mel_energy[m], 1e-12));
double basis = cos(M_PI * (double)n * ((double)m + 0.5) / (double)num_mels);
acc += log_em * basis;
}
mfcc[n] = alpha * acc;
}

where:

  • \(M\) is the number of mel bands
  • \(\alpha_0 = \sqrt{1/M}\)
  • \(\alpha_n = \sqrt{2/M}\) for \(n > 0\)

miniDSP returns C0 in mfcc_out[0].

MD_mfcc(signal, N, sample_rate, num_mels, num_coeffs,
min_freq_hz, max_freq_hz, mfcc);
void MD_mfcc(const double *signal, unsigned N, double sample_rate, unsigned num_mels, unsigned num_coeffs, double min_freq_hz, double max_freq_hz, double *mfcc_out)
Compute mel-frequency cepstral coefficients (MFCCs) from a single frame.

Visualisations

These plots are generated from one deterministic signal: \(x[n] = 0.7\sin(2\pi\cdot440t) + 0.2\cos(2\pi\cdot1000t) + 0.1\sin(2\pi\cdot3000t)\), with \(t=n/f_s\) and \(f_s=8192\) Hz. Mel energies and MFCCs are computed from the first 1024-sample analysis frame.

Practical notes

  • Requested frequency bounds are runtime-clamped to [0, Nyquist].
  • If the clamped band is empty, mel energies are zero and MFCCs remain finite via the log floor.
  • MD_shutdown() should be called when done with FFT-based APIs.

API reference