miniDSP
A small C library for audio DSP
Loading...
Searching...
No Matches
Pitch Detection

This pitch detection tutorial compares two classic fundamental-frequency (F0) estimators:

Both are implemented in miniDSP and demonstrated in examples/pitch_detection.c.

Build and run the example from the repository root:

make -C examples pitch_detection
cd examples && ./pitch_detection
open pitch_detection.html

Autocorrelation F0

For a voiced frame, the fundamental period shows up as a strong peak in the autocorrelation function:

\[ R[\tau] = \frac{\sum_{n=0}^{N-1-\tau} x[n]x[n+\tau]} {\sum_{n=0}^{N-1} x[n]^2}, \qquad f_0 = \frac{f_s}{\tau_{\text{peak}}} \]

We search only lags mapped from a desired F0 range (min_freq_hz..max_freq_hz), then choose the strongest local peak.

Reading the formula in C:

// x[n] -> frame[n], fs -> sample_rate
// lag_min/lag_max come from requested min/max F0 range.
double r0 = 0.0; // denominator: SUM x[n]^2
for (unsigned n = 0; n < N; n++) {
r0 += frame[n] * frame[n];
}
double best_r = -1.0;
unsigned best_lag = 0;
for (unsigned tau = lag_min; tau <= lag_max; tau++) {
// numerator: SUM x[n] * x[n+tau]
double num = 0.0;
for (unsigned n = 0; n < N - tau; n++) {
num += frame[n] * frame[n + tau];
}
double r_tau = (r0 > 0.0) ? (num / r0) : 0.0; // normalised R[tau]
// local-max check using neighbors (R[tau-1], R[tau], R[tau+1])
if (r_tau > best_r /* and is_local_peak */) {
best_r = r_tau;
best_lag = tau;
}
}
double f0_hz = (best_lag > 0) ? (sample_rate / (double)best_lag) : 0.0;

FFT-based F0

This method applies a Hann window, computes the one-sided FFT magnitude, and picks the dominant peak in a frequency range:

\[ f_0 = \frac{k_{\text{peak}} f_s}{N} \]

It is simple and fast, but more sensitive to noise and harmonic dominance than autocorrelation.

Reading the formula in C:

// x[n] -> frame[n], fs -> sample_rate
// 1) Apply Hann window:
for (unsigned n = 0; n < N; n++) {
double w = 0.5 * (1.0 - cos(2.0 * M_PI * (double)n / (double)(N - 1)));
xw[n] = frame[n] * w;
}
// 2) Compute magnitude spectrum directly from DFT definition (educational form):
for (unsigned k = 0; k <= N / 2; k++) {
double re = 0.0, im = 0.0;
for (unsigned n = 0; n < N; n++) {
double phase = 2.0 * M_PI * (double)k * (double)n / (double)N;
re += xw[n] * cos(phase);
im -= xw[n] * sin(phase);
}
mag[k] = sqrt(re * re + im * im);
}
// 3) Search bins mapped from requested F0 range:
unsigned k_min = (unsigned)ceil(min_freq_hz * (double)N / sample_rate);
unsigned k_max = (unsigned)floor(max_freq_hz * (double)N / sample_rate);
unsigned k_peak = k_min;
for (unsigned k = k_min; k <= k_max; k++) {
if (mag[k] > mag[k_peak]) k_peak = k;
}
double f0_hz = (double)k_peak * sample_rate / (double)N;

Frame-Wise Tracking

In practice, pitch is estimated frame-by-frame over time:

for (unsigned f = 0; f < num_frames; f++) {
unsigned start = f * hop;
const double *frame = signal + start;
unsigned center = start + frame_len / 2;
if (center >= N) center = N - 1;
/* Ground truth for this frame (piecewise-constant by construction). */
f0_true[f] = (center < seg1) ? 140.0 : (center < seg2 ? 220.0 : 320.0);
times[f] = (double)center / sample_rate;
f0_acf[f] = MD_f0_autocorrelation(frame, frame_len, sample_rate,
min_f0, max_f0);
f0_fft[f] = MD_f0_fft(frame, frame_len, sample_rate,
min_f0, max_f0);
}
double MD_f0_fft(const double *signal, unsigned N, double sample_rate, double min_freq_hz, double max_freq_hz)
Estimate the fundamental frequency (F0) using FFT peak picking.
double MD_f0_autocorrelation(const double *signal, unsigned N, double sample_rate, double min_freq_hz, double max_freq_hz)
Estimate the fundamental frequency (F0) using autocorrelation.

Visual Comparison

Ground truth vs estimated tracks (entire signal)

Autocorrelation peak (single frame)

FFT peak pick (single frame)


Failure Modes and Trade-offs

  • Autocorrelation can fail on weakly voiced/noisy frames where no clear lag peak exists.
  • FFT peak pick can lock onto harmonics (e.g. 2f0, 3f0) when the fundamental is weak.
  • Restricting the search range (min_freq_hz, max_freq_hz) is critical for both methods.
  • Short frames improve time resolution but reduce frequency/lag resolution.

Both miniDSP APIs return 0.0 when no reliable F0 peak is found.


API Reference