|
miniDSP
A small C library for audio DSP
|
This tutorial introduces two classic speech/audio front-end features:
miniDSP provides both as frame-level APIs in minidsp.h.
The linear FFT axis over-resolves high frequencies compared to human pitch perception. Mel filterbanks compress frequency spacing to be denser at low frequencies and coarser at high frequencies.
MFCCs then decorrelate log mel energies via a DCT, producing compact features used widely in speech recognition and audio classification.
miniDSP uses:
\[ m(f) = 2595 \log_{10}\!\left(1 + \frac{f}{700}\right) \]
Reading the formula in C:
Compute one frame of mel energies:
MFCCs are computed from log mel energies with a DCT-II:
\[c_n = \alpha_n \sum_{m=0}^{M-1} \log(\max(E_m, 10^{-12})) \cos\!\left(\frac{\pi n (m + 1/2)}{M}\right) \]
Reading the formula in C:
where:
miniDSP returns C0 in mfcc_out[0].
These plots are generated from one deterministic signal: \(x[n] = 0.7\sin(2\pi\cdot440t) + 0.2\cos(2\pi\cdot1000t) + 0.1\sin(2\pi\cdot3000t)\), with \(t=n/f_s\) and \(f_s=8192\) Hz. Mel energies and MFCCs are computed from the first 1024-sample analysis frame.