This pitch detection tutorial compares two classic fundamental-frequency (F0) estimators:
Both are implemented in miniDSP and demonstrated in examples/pitch_detection.c.
Build and run the example from the repository root:
make -C examples pitch_detection
cd examples && ./pitch_detection
open pitch_detection.html
Autocorrelation F0
For a voiced frame, the fundamental period shows up as a strong peak in the autocorrelation function:
\[
R[\tau] = \frac{\sum_{n=0}^{N-1-\tau} x[n]x[n+\tau]}
{\sum_{n=0}^{N-1} x[n]^2},
\qquad
f_0 = \frac{f_s}{\tau_{\text{peak}}}
\]
We search only lags mapped from a desired F0 range (min_freq_hz..max_freq_hz), then choose the strongest local peak.
Reading the formula in C:
double r0 = 0.0;
for (unsigned n = 0; n < N; n++) {
r0 += frame[n] * frame[n];
}
double best_r = -1.0;
unsigned best_lag = 0;
for (unsigned tau = lag_min; tau <= lag_max; tau++) {
double num = 0.0;
for (unsigned n = 0; n < N - tau; n++) {
num += frame[n] * frame[n + tau];
}
double r_tau = (r0 > 0.0) ? (num / r0) : 0.0;
if (r_tau > best_r ) {
best_r = r_tau;
best_lag = tau;
}
}
double f0_hz = (best_lag > 0) ? (sample_rate / (double)best_lag) : 0.0;
FFT-based F0
This method applies a Hann window, computes the one-sided FFT magnitude, and picks the dominant peak in a frequency range:
\[
f_0 = \frac{k_{\text{peak}} f_s}{N}
\]
It is simple and fast, but more sensitive to noise and harmonic dominance than autocorrelation.
Reading the formula in C:
for (unsigned n = 0; n < N; n++) {
double w = 0.5 * (1.0 - cos(2.0 * M_PI * (double)n / (double)(N - 1)));
xw[n] = frame[n] * w;
}
for (unsigned k = 0; k <= N / 2; k++) {
double re = 0.0, im = 0.0;
for (unsigned n = 0; n < N; n++) {
double phase = 2.0 * M_PI * (double)k * (double)n / (double)N;
re += xw[n] * cos(phase);
im -= xw[n] * sin(phase);
}
mag[k] = sqrt(re * re + im * im);
}
unsigned k_min = (unsigned)ceil(min_freq_hz * (double)N / sample_rate);
unsigned k_max = (unsigned)floor(max_freq_hz * (double)N / sample_rate);
unsigned k_peak = k_min;
for (unsigned k = k_min; k <= k_max; k++) {
if (mag[k] > mag[k_peak]) k_peak = k;
}
double f0_hz = (double)k_peak * sample_rate / (double)N;
Frame-Wise Tracking
In practice, pitch is estimated frame-by-frame over time:
for (unsigned f = 0; f < num_frames; f++) {
unsigned start = f * hop;
const double *frame = signal + start;
unsigned center = start + frame_len / 2;
if (center >= N) center = N - 1;
f0_true[f] = (center < seg1) ? 140.0 : (center < seg2 ? 220.0 : 320.0);
times[f] = (double)center / sample_rate;
min_f0, max_f0);
f0_fft[f] =
MD_f0_fft(frame, frame_len, sample_rate,
min_f0, max_f0);
}
double MD_f0_fft(const double *signal, unsigned N, double sample_rate, double min_freq_hz, double max_freq_hz)
Estimate the fundamental frequency (F0) using FFT peak picking.
double MD_f0_autocorrelation(const double *signal, unsigned N, double sample_rate, double min_freq_hz, double max_freq_hz)
Estimate the fundamental frequency (F0) using autocorrelation.
Visual Comparison
Ground truth vs estimated tracks (entire signal)
Autocorrelation peak (single frame)
FFT peak pick (single frame)
Failure Modes and Trade-offs
- Autocorrelation can fail on weakly voiced/noisy frames where no clear lag peak exists.
- FFT peak pick can lock onto harmonics (e.g. 2f0, 3f0) when the fundamental is weak.
- Restricting the search range (
min_freq_hz, max_freq_hz) is critical for both methods.
- Short frames improve time resolution but reduce frequency/lag resolution.
Both miniDSP APIs return 0.0 when no reliable F0 peak is found.
API Reference