Why mel and MFCC?

The linear FFT axis over-resolves high frequencies compared to human pitch perception. Mel filterbanks compress frequency spacing to be denser at low frequencies and coarser at high frequencies.

MFCCs then decorrelate log mel energies via a DCT, producing compact features used widely in speech recognition and audio classification.

Step 1: Build mel energies

miniDSP uses:

HTK mel mapping:
\[ m(f) = 2595 \log_{10}\!\left(1 + \frac{f}{700}\right) \]
Internal Hann windowing
One-sided PSD bins: \(|X(k)|^2 / N\)

Reading the formula in C:

// f -> freq_hz, m(f) -> mel

double mel = 2595.0 * log10(1.0 + freq_hz / 700.0);

Compute one frame of mel energies:

MD_mel_energies(signal, N, sample_rate, num_mels,

min_freq_hz, max_freq_hz, mel);

MD_mel_energies

void MD_mel_energies(const double *signal, unsigned N, double sample_rate, unsigned num_mels, double min_freq_hz, double max_freq_hz, double *mel_out)

Compute mel-band energies from a single frame.

Definition minidsp_spectrum.c:531

Step 2: Compute MFCCs

MFCCs are computed from log mel energies with a DCT-II:

\[ c_n = \alpha_n \sum_{m=0}^{M-1} \log(\max(E_m, 10^{-12})) \cos\!\left(\frac{\pi n (m + 1/2)}{M}\right) \]

Reading the formula in C:

// E_m -> mel_energy[m], c_n -> mfcc[n], M -> num_mels, n -> coeff index
for (unsigned n = 0; n < num_coeffs; n++) {
    double alpha = (n == 0)
        ? sqrt(1.0 / (double)num_mels)
        : sqrt(2.0 / (double)num_mels);
 
    double acc = 0.0;
    for (unsigned m = 0; m < num_mels; m++) {
        double log_em = log(fmax(mel_energy[m], 1e-12));
        double basis = cos(M_PI * (double)n * ((double)m + 0.5) / (double)num_mels);
        acc += log_em * basis;
    }
    mfcc[n] = alpha * acc;
}

where:

\(M\) is the number of mel bands
\(\alpha_0 = \sqrt{1/M}\)
\(\alpha_n = \sqrt{2/M}\) for \(n > 0\)

miniDSP returns C0 in mfcc_out[0].

MD_mfcc(signal, N, sample_rate, num_mels, num_coeffs,

min_freq_hz, max_freq_hz, mfcc);

MD_mfcc

void MD_mfcc(const double *signal, unsigned N, double sample_rate, unsigned num_mels, unsigned num_coeffs, double min_freq_hz, double max_freq_hz, double *mfcc_out)

Compute mel-frequency cepstral coefficients (MFCCs) from a single frame.

Definition minidsp_spectrum.c:567

Visualisations

These plots are generated from one deterministic signal: \(x[n] = 0.7\sin(2\pi\cdot440t) + 0.2\cos(2\pi\cdot1000t) + 0.1\sin(2\pi\cdot3000t)\), with \(t=n/f_s\) and \(f_s=8192\) Hz. Mel energies and MFCCs are computed from the first 1024-sample analysis frame.

Practical notes

Requested frequency bounds are runtime-clamped to [0, Nyquist].
If the clamped band is empty, mel energies are zero and MFCCs remain finite via the log floor.
MD_shutdown() should be called when done with FFT-based APIs.

API reference

MD_mel_filterbank() — build mel triangular weight matrix
MD_mel_energies() — compute mel-band energies from one frame
MD_mfcc() — compute MFCC vector (C0 included)
MD_shutdown() — release cached FFT resources