Dual-Tone Multi-Frequency (DTMF) is the signalling system used by touch-tone telephones. Each keypad button is encoded as the sum of two sinusoids – one from a low-frequency row group and one from a high-frequency column group. The receiver decodes the button by identifying both frequencies.
miniDSP provides ITU-T Q.24-compliant detection and generation in src/minidsp_dtmf.c, demonstrated in examples/dtmf_detector.c.
Build and run the self-test from the repository root:
make -C examples dtmf_detector
cd examples && ./dtmf_detector
The DTMF frequency table
Each button sits at the intersection of one row and one column frequency:
| 1209 Hz | 1336 Hz | 1477 Hz | 1633 Hz |
| 697 Hz | 1 | 2 | 3 | A |
| 770 Hz | 4 | 5 | 6 | B |
| 852 Hz | 7 | 8 | 9 | C |
| 941 Hz | * | 0 | # | D |
The frequencies were chosen so that no tone is a harmonic of another (ratios are never simple integers), preventing false triggers from harmonically rich signals like speech.
Spectrogram of the sequence "159#" (70 ms tones, 70 ms pauses, 8 kHz). Each digit appears as a pair of horizontal bands — one row frequency and one column frequency. Dashed lines mark the eight DTMF frequencies:
ITU-T Q.24 timing constraints
ITU-T Recommendation Q.24 specifies minimum timing for reliable DTMF signalling:
| Parameter | Minimum |
| Tone duration for valid digit | 40 ms |
| Inter-digit pause | 40 ms |
In practice, telephone systems use 70–120 ms tones and pauses. The miniDSP detector enforces the 40 ms minimums via a frame-counting state machine; the generator asserts that requested durations meet the minimums.
Signal model
A single DTMF digit is the sum of two sinusoids at equal amplitude:
\[
x[n] = A\,\sin\!\bigl(2\pi\, f_{\text{row}}\, n / f_s\bigr)
+ A\,\sin\!\bigl(2\pi\, f_{\text{col}}\, n / f_s\bigr),
\qquad n = 0, 1, \ldots, N_{\text{tone}}-1
\]
where \(A = 0.5\) so the peak combined amplitude is 1.0, \(f_s\) is the sampling rate, and \(N_{\text{tone}}\) is the number of samples per tone.
Reading the formula in C:
for (unsigned i = 0; i < tone_samples; i++) {
double t = (double)i / sample_rate;
output[offset + i] = 0.5 * sin(2 * M_PI * row_freq * t)
+ 0.5 * sin(2 * M_PI * col_freq * t);
}
The library implementation uses MD_sine_wave() to generate each component separately, then sums them.
Generation
API:
tone_ms, pause_ms);
double *sig = malloc(len * sizeof(double));
void MD_dtmf_generate(double *output, const char *digits, double sample_rate, unsigned tone_ms, unsigned pause_ms)
Generate a DTMF tone sequence.
unsigned MD_dtmf_signal_length(unsigned num_digits, double sample_rate, unsigned tone_ms, unsigned pause_ms)
Calculate the number of samples needed for MD_dtmf_generate().
The total signal length in samples is:
\[
N = D \cdot \left\lfloor \frac{t_{\text{tone}} \cdot f_s}{1000} \right\rfloor
+ (D - 1) \cdot \left\lfloor \frac{t_{\text{pause}} \cdot f_s}{1000} \right\rfloor
\]
where \(D\) is the number of digits.
Reading the formula in C:
unsigned tone_samples = (unsigned)(tone_ms * sample_rate / 1000.0);
unsigned pause_samples = (unsigned)(pause_ms * sample_rate / 1000.0);
unsigned N = num_digits * tone_samples
+ (num_digits - 1) * pause_samples;
Quick example – generate a DTMF sequence and save as WAV:
static int valid_dtmf_char(char ch)
{
return (ch >= '0' && ch <= '9') || ch == '*' || ch == '#'
|| ch == 'A' || ch == 'a' || ch == 'B' || ch == 'b'
|| ch == 'C' || ch == 'c' || ch == 'D' || ch == 'd';
}
static int generate_wav(const char *digits, const char *outfile)
{
const double sample_rate = 8000.0;
const unsigned tone_ms = 70;
const unsigned pause_ms = 70;
if (digits[0] == '\0') {
fprintf(stderr, "Digit string must not be empty\n");
return 1;
}
for (const char *p = digits; *p; p++) {
if (!valid_dtmf_char(*p)) {
fprintf(stderr, "Invalid DTMF character '%c'. "
"Valid: 0-9, A-D, *, #\n", *p);
return 1;
}
}
unsigned num_digits = (unsigned)strlen(digits);
tone_ms, pause_ms);
double *signal = malloc(signal_len * sizeof(double));
if (!signal) { fprintf(stderr, "allocation failed\n"); return 1; }
float *fdata = malloc(signal_len * sizeof(float));
if (!fdata) { free(signal); fprintf(stderr, "allocation failed\n"); return 1; }
for (unsigned i = 0; i < signal_len; i++)
fdata[i] = (float)signal[i];
int ret =
FIO_write_wav(outfile, fdata, signal_len, (
unsigned)sample_rate);
if (ret == 0)
printf("Generated DTMF \"%s\" -> %s (%u samples, %.3f s)\n",
digits, outfile, signal_len,
(double)signal_len / sample_rate);
else
fprintf(stderr, "Error writing %s\n", outfile);
free(fdata);
free(signal);
return ret;
}
int FIO_write_wav(const char *outfile, const float *data, size_t datalen, unsigned samprate)
Write mono float audio to a WAV file.
Detection algorithm
Detection slides a Hanning-windowed FFT frame across the audio signal:
- FFT size is the largest power of two whose window fits within 35 ms (e.g. \(N = 256\) at 8 kHz, giving \(\Delta f = 31.25\) Hz). Keeping the window shorter than the 40 ms Q.24 minimum pause ensures the state machine can resolve inter-digit gaps.
- Hop is \(N/4\) (75 % overlap).
- Per frame: apply Hanning window, compute MD_magnitude_spectrum(), normalise to single-sided amplitude, then check the magnitude at each of the eight DTMF frequency bins.
- A digit is detected when both the strongest row and strongest column exceed a threshold (8 \(\times\) the mean spectral magnitude, roughly 18 dB above the noise floor).
- A state machine enforces ITU-T Q.24 timing:
| State | Transition condition | Action |
| IDLE | Digit detected | Enter PENDING, start counter |
| PENDING | Same digit for \(\geq\) 40 ms | Enter ACTIVE (confirmed) |
| PENDING | Different digit or silence | Return to IDLE |
| ACTIVE | Same digit continues | Update end time |
| ACTIVE | Silence / different for \(\geq\) 40 ms | Emit tone, return to IDLE |
Single-sided amplitude normalisation:
\[
\hat{X}[k] = \begin{cases}
|X[k]| / N & k = 0 \text{ or } k = N/2 \\[4pt]
2\,|X[k]| / N & 0 < k < N/2
\end{cases}
\]
Reading the normalisation in C:
for (unsigned k = 0; k < num_bins; k++) {
mag[k] /= (double)N;
if (k > 0 && k < N / 2)
mag[k] *= 2.0;
}
Quick example – detect DTMF tones in a WAV file:
static int detect_file(const char *infile)
{
float *fdata = nullptr;
size_t datalen = 0;
unsigned samprate = 0;
fprintf(stderr, "Error reading %s\n", infile);
return 1;
}
if (datalen == 0) {
fprintf(stderr, "File contains no audio samples\n");
free(fdata);
return 1;
}
if (samprate < 4000) {
fprintf(stderr, "Sample rate %u Hz is too low for DTMF detection "
"(minimum 4000 Hz)\n", samprate);
free(fdata);
return 1;
}
if (datalen > UINT_MAX) {
fprintf(stderr, "File too large (%zu samples, max %u)\n",
datalen, UINT_MAX);
free(fdata);
return 1;
}
printf("Read %s: %zu samples at %u Hz (%.3f s)\n",
infile, datalen, samprate, (double)datalen / (double)samprate);
double *signal = malloc(datalen * sizeof(double));
if (!signal) {
free(fdata);
fprintf(stderr, "allocation failed\n");
return 1;
}
for (size_t i = 0; i < datalen; i++)
signal[i] = (double)fdata[i];
free(fdata);
(double)samprate, tones, 256);
printf("\nDetected %u DTMF tone%s:\n", n, n == 1 ? "" : "s");
if (n > 0) {
printf(" %-6s %-12s %-12s\n", "Digit", "Start (s)", "End (s)");
for (unsigned i = 0; i < n; i++)
printf(" %-6c %-12.3f %-12.3f\n",
tones[i].digit, tones[i].start_s, tones[i].end_s);
}
free(signal);
return 0;
}
int FIO_read_audio(const char *infile, float **indata, size_t *datalen, unsigned *samprate, unsigned donorm)
Read a single-channel audio file into memory.
unsigned MD_dtmf_detect(const double *signal, unsigned signal_len, double sample_rate, MD_DTMFTone *tones_out, unsigned max_tones)
Detect DTMF tones in an audio signal.
void MD_shutdown(void)
Free all internally cached FFT plans and buffers.
A single detected DTMF tone with timing information.
Self-test mode
Running the example with no arguments generates a known digit sequence, detects it, and verifies correctness:
static int self_test(void)
{
const char *test_digits = "14*258039#";
const double sample_rate = 8000.0;
const unsigned tone_ms = 70;
const unsigned pause_ms = 70;
unsigned num_digits = (unsigned)strlen(test_digits);
tone_ms, pause_ms);
printf("Self-test: generating DTMF sequence \"%s\"\n", test_digits);
printf(" sample_rate = %.0f Hz, tone = %u ms, pause = %u ms\n",
sample_rate, tone_ms, pause_ms);
double *signal = malloc(signal_len * sizeof(double));
if (!signal) { fprintf(stderr, "allocation failed\n"); return 1; }
unsigned n =
MD_dtmf_detect(signal, signal_len, sample_rate, tones, 64);
printf("\nDetected %u DTMF tone%s:\n", n, n == 1 ? "" : "s");
printf(" %-6s %-12s %-12s\n", "Digit", "Start (s)", "End (s)");
for (unsigned i = 0; i < n; i++)
printf(" %-6c %-12.3f %-12.3f\n",
tones[i].digit, tones[i].start_s, tones[i].end_s);
int pass = 1;
if (n != num_digits) {
printf("\nSelf-test FAILED: expected %u digits, detected %u\n",
num_digits, n);
pass = 0;
} else {
for (unsigned i = 0; i < num_digits; i++) {
if (tones[i].digit != test_digits[i]) {
printf("\nSelf-test FAILED: digit %u expected '%c' got '%c'\n",
i, test_digits[i], tones[i].digit);
pass = 0;
break;
}
}
}
if (pass)
printf("\nSelf-test PASSED: all %u digits detected correctly\n",
num_digits);
free(signal);
return pass ? 0 : 1;
}
Frequency resolution and bin mapping
For a given FFT size \(N\) and sampling rate \(f_s\), each bin \(k\) corresponds to frequency:
\[
f_k = k \cdot \frac{f_s}{N}
\]
Reading the formula in C:
double freq = (double)k * sample_rate / (double)N;
The nearest bin for a DTMF frequency \(f\) is:
\[
k = \mathrm{round}\!\left(\frac{f \cdot N}{f_s}\right)
\]
Reading the formula in C:
unsigned bin = (unsigned)(dtmf_freq * N / sample_rate + 0.5);
The detector checks bins \(k-1\), \(k\), and \(k+1\) and takes the maximum magnitude, compensating for the slight frequency mismatch when the DTMF frequency does not fall exactly on a bin centre.
At 8 kHz with \(N = 256\):
| DTMF freq | Nearest bin | Bin freq | Error |
| 697 Hz | 22 | 687.5 Hz | -9.5 |
| 770 Hz | 25 | 781.3 Hz | +11.3 |
| 852 Hz | 27 | 843.8 Hz | -8.2 |
| 941 Hz | 30 | 937.5 Hz | -3.5 |
| 1209 Hz | 39 | 1218.8 Hz | +9.8 |
| 1336 Hz | 43 | 1343.8 Hz | +7.8 |
| 1477 Hz | 47 | 1468.8 Hz | -8.2 |
| 1633 Hz | 52 | 1625.0 Hz | -8.0 |
All errors are well within the ±1.5 % tolerance specified by ITU-T. The detector also checks the two adjacent bins (±1) to handle residual frequency mismatch.