miniDSP
A small C library for audio DSP
Loading...
Searching...
No Matches
DTMF Tone Detection and Generation

Dual-Tone Multi-Frequency (DTMF) is the signalling system used by touch-tone telephones. Each keypad button is encoded as the sum of two sinusoids – one from a low-frequency row group and one from a high-frequency column group. The receiver decodes the button by identifying both frequencies.

miniDSP provides ITU-T Q.24-compliant detection and generation in src/minidsp_dtmf.c, demonstrated in examples/dtmf_detector.c.

Build and run the self-test from the repository root:

make -C examples dtmf_detector
cd examples && ./dtmf_detector

The DTMF frequency table

Each button sits at the intersection of one row and one column frequency:

1209 Hz 1336 Hz 1477 Hz 1633 Hz
697 Hz 1 2 3 A
770 Hz 4 5 6 B
852 Hz 7 8 9 C
941 Hz * 0 # D

The frequencies were chosen so that no tone is a harmonic of another (ratios are never simple integers), preventing false triggers from harmonically rich signals like speech.

Spectrogram of the sequence "159#" (70 ms tones, 70 ms pauses, 8 kHz). Each digit appears as a pair of horizontal bands — one row frequency and one column frequency. Dashed lines mark the eight DTMF frequencies:


ITU-T Q.24 timing constraints

ITU-T Recommendation Q.24 specifies minimum timing for reliable DTMF signalling:

Parameter Minimum
Tone duration for valid digit 40 ms
Inter-digit pause 40 ms

In practice, telephone systems use 70–120 ms tones and pauses. The miniDSP detector enforces the 40 ms minimums via a frame-counting state machine; the generator asserts that requested durations meet the minimums.


Signal model

A single DTMF digit is the sum of two sinusoids at equal amplitude:

\[ x[n] = A\,\sin\!\bigl(2\pi\, f_{\text{row}}\, n / f_s\bigr) + A\,\sin\!\bigl(2\pi\, f_{\text{col}}\, n / f_s\bigr), \qquad n = 0, 1, \ldots, N_{\text{tone}}-1 \]

where \(A = 0.5\) so the peak combined amplitude is 1.0, \(f_s\) is the sampling rate, and \(N_{\text{tone}}\) is the number of samples per tone.

Reading the formula in C:

// A -> 0.5, f_row/f_col -> row_freq/col_freq, fs -> sample_rate
// n -> i, x[n] -> output[offset + i]
for (unsigned i = 0; i < tone_samples; i++) {
double t = (double)i / sample_rate;
output[offset + i] = 0.5 * sin(2 * M_PI * row_freq * t)
+ 0.5 * sin(2 * M_PI * col_freq * t);
}

The library implementation uses MD_sine_wave() to generate each component separately, then sums them.


Generation

API:

unsigned len = MD_dtmf_signal_length(num_digits, sample_rate,
tone_ms, pause_ms);
double *sig = malloc(len * sizeof(double));
MD_dtmf_generate(sig, "5551234", sample_rate, tone_ms, pause_ms);
void MD_dtmf_generate(double *output, const char *digits, double sample_rate, unsigned tone_ms, unsigned pause_ms)
Generate a DTMF tone sequence.
unsigned MD_dtmf_signal_length(unsigned num_digits, double sample_rate, unsigned tone_ms, unsigned pause_ms)
Calculate the number of samples needed for MD_dtmf_generate().

The total signal length in samples is:

\[ N = D \cdot \left\lfloor \frac{t_{\text{tone}} \cdot f_s}{1000} \right\rfloor + (D - 1) \cdot \left\lfloor \frac{t_{\text{pause}} \cdot f_s}{1000} \right\rfloor \]

where \(D\) is the number of digits.

Reading the formula in C:

// D -> num_digits, t_tone -> tone_ms, t_pause -> pause_ms, fs -> sample_rate
// floor(t_tone * fs / 1000) -> tone_samples
unsigned tone_samples = (unsigned)(tone_ms * sample_rate / 1000.0);
unsigned pause_samples = (unsigned)(pause_ms * sample_rate / 1000.0);
unsigned N = num_digits * tone_samples
+ (num_digits - 1) * pause_samples;

Quick example – generate a DTMF sequence and save as WAV:

static int valid_dtmf_char(char ch)
{
return (ch >= '0' && ch <= '9') || ch == '*' || ch == '#'
|| ch == 'A' || ch == 'a' || ch == 'B' || ch == 'b'
|| ch == 'C' || ch == 'c' || ch == 'D' || ch == 'd';
}
static int generate_wav(const char *digits, const char *outfile)
{
const double sample_rate = 8000.0;
const unsigned tone_ms = 70;
const unsigned pause_ms = 70;
if (digits[0] == '\0') {
fprintf(stderr, "Digit string must not be empty\n");
return 1;
}
for (const char *p = digits; *p; p++) {
if (!valid_dtmf_char(*p)) {
fprintf(stderr, "Invalid DTMF character '%c'. "
"Valid: 0-9, A-D, *, #\n", *p);
return 1;
}
}
unsigned num_digits = (unsigned)strlen(digits);
unsigned signal_len = MD_dtmf_signal_length(num_digits, sample_rate,
tone_ms, pause_ms);
double *signal = malloc(signal_len * sizeof(double));
if (!signal) { fprintf(stderr, "allocation failed\n"); return 1; }
MD_dtmf_generate(signal, digits, sample_rate, tone_ms, pause_ms);
/* Convert double -> float for WAV writing. */
float *fdata = malloc(signal_len * sizeof(float));
if (!fdata) { free(signal); fprintf(stderr, "allocation failed\n"); return 1; }
for (unsigned i = 0; i < signal_len; i++)
fdata[i] = (float)signal[i];
int ret = FIO_write_wav(outfile, fdata, signal_len, (unsigned)sample_rate);
if (ret == 0)
printf("Generated DTMF \"%s\" -> %s (%u samples, %.3f s)\n",
digits, outfile, signal_len,
(double)signal_len / sample_rate);
else
fprintf(stderr, "Error writing %s\n", outfile);
free(fdata);
free(signal);
return ret;
}
int FIO_write_wav(const char *outfile, const float *data, size_t datalen, unsigned samprate)
Write mono float audio to a WAV file.
Definition fileio.c:282

Detection algorithm

Detection slides a Hanning-windowed FFT frame across the audio signal:

  1. FFT size is the largest power of two whose window fits within 35 ms (e.g. \(N = 256\) at 8 kHz, giving \(\Delta f = 31.25\) Hz). Keeping the window shorter than the 40 ms Q.24 minimum pause ensures the state machine can resolve inter-digit gaps.
  2. Hop is \(N/4\) (75 % overlap).
  3. Per frame: apply Hanning window, compute MD_magnitude_spectrum(), normalise to single-sided amplitude, then check the magnitude at each of the eight DTMF frequency bins.
  4. A digit is detected when both the strongest row and strongest column exceed a threshold (8 \(\times\) the mean spectral magnitude, roughly 18 dB above the noise floor).
  5. A state machine enforces ITU-T Q.24 timing:
State Transition condition Action
IDLE Digit detected Enter PENDING, start counter
PENDING Same digit for \(\geq\) 40 ms Enter ACTIVE (confirmed)
PENDING Different digit or silence Return to IDLE
ACTIVE Same digit continues Update end time
ACTIVE Silence / different for \(\geq\) 40 ms Emit tone, return to IDLE

Single-sided amplitude normalisation:

\[ \hat{X}[k] = \begin{cases} |X[k]| / N & k = 0 \text{ or } k = N/2 \\[4pt] 2\,|X[k]| / N & 0 < k < N/2 \end{cases} \]

Reading the normalisation in C:

// X[k] -> mag[k] (raw FFTW output), N -> FFT size
for (unsigned k = 0; k < num_bins; k++) {
mag[k] /= (double)N; // divide by FFT size
if (k > 0 && k < N / 2)
mag[k] *= 2.0; // fold negative frequencies
}

Quick example – detect DTMF tones in a WAV file:

static int detect_file(const char *infile)
{
float *fdata = nullptr;
size_t datalen = 0;
unsigned samprate = 0;
if (FIO_read_audio(infile, &fdata, &datalen, &samprate, 1) != 0) {
fprintf(stderr, "Error reading %s\n", infile);
return 1;
}
if (datalen == 0) {
fprintf(stderr, "File contains no audio samples\n");
free(fdata);
return 1;
}
if (samprate < 4000) {
fprintf(stderr, "Sample rate %u Hz is too low for DTMF detection "
"(minimum 4000 Hz)\n", samprate);
free(fdata);
return 1;
}
if (datalen > UINT_MAX) {
fprintf(stderr, "File too large (%zu samples, max %u)\n",
datalen, UINT_MAX);
free(fdata);
return 1;
}
printf("Read %s: %zu samples at %u Hz (%.3f s)\n",
infile, datalen, samprate, (double)datalen / (double)samprate);
/* Convert float -> double for the library. */
double *signal = malloc(datalen * sizeof(double));
if (!signal) {
free(fdata);
fprintf(stderr, "allocation failed\n");
return 1;
}
for (size_t i = 0; i < datalen; i++)
signal[i] = (double)fdata[i];
free(fdata);
/* Detect. */
MD_DTMFTone tones[256];
unsigned n = MD_dtmf_detect(signal, (unsigned)datalen,
(double)samprate, tones, 256);
printf("\nDetected %u DTMF tone%s:\n", n, n == 1 ? "" : "s");
if (n > 0) {
printf(" %-6s %-12s %-12s\n", "Digit", "Start (s)", "End (s)");
for (unsigned i = 0; i < n; i++)
printf(" %-6c %-12.3f %-12.3f\n",
tones[i].digit, tones[i].start_s, tones[i].end_s);
}
free(signal);
return 0;
}
int FIO_read_audio(const char *infile, float **indata, size_t *datalen, unsigned *samprate, unsigned donorm)
Read a single-channel audio file into memory.
Definition fileio.c:83
unsigned MD_dtmf_detect(const double *signal, unsigned signal_len, double sample_rate, MD_DTMFTone *tones_out, unsigned max_tones)
Detect DTMF tones in an audio signal.
void MD_shutdown(void)
Free all internally cached FFT plans and buffers.
A single detected DTMF tone with timing information.
Definition minidsp.h:1112

Self-test mode

Running the example with no arguments generates a known digit sequence, detects it, and verifies correctness:

static int self_test(void)
{
const char *test_digits = "14*258039#";
const double sample_rate = 8000.0;
const unsigned tone_ms = 70;
const unsigned pause_ms = 70;
unsigned num_digits = (unsigned)strlen(test_digits);
unsigned signal_len = MD_dtmf_signal_length(num_digits, sample_rate,
tone_ms, pause_ms);
printf("Self-test: generating DTMF sequence \"%s\"\n", test_digits);
printf(" sample_rate = %.0f Hz, tone = %u ms, pause = %u ms\n",
sample_rate, tone_ms, pause_ms);
double *signal = malloc(signal_len * sizeof(double));
if (!signal) { fprintf(stderr, "allocation failed\n"); return 1; }
MD_dtmf_generate(signal, test_digits, sample_rate, tone_ms, pause_ms);
MD_DTMFTone tones[64];
unsigned n = MD_dtmf_detect(signal, signal_len, sample_rate, tones, 64);
printf("\nDetected %u DTMF tone%s:\n", n, n == 1 ? "" : "s");
printf(" %-6s %-12s %-12s\n", "Digit", "Start (s)", "End (s)");
for (unsigned i = 0; i < n; i++)
printf(" %-6c %-12.3f %-12.3f\n",
tones[i].digit, tones[i].start_s, tones[i].end_s);
/* Verify. */
int pass = 1;
if (n != num_digits) {
printf("\nSelf-test FAILED: expected %u digits, detected %u\n",
num_digits, n);
pass = 0;
} else {
for (unsigned i = 0; i < num_digits; i++) {
if (tones[i].digit != test_digits[i]) {
printf("\nSelf-test FAILED: digit %u expected '%c' got '%c'\n",
i, test_digits[i], tones[i].digit);
pass = 0;
break;
}
}
}
if (pass)
printf("\nSelf-test PASSED: all %u digits detected correctly\n",
num_digits);
free(signal);
return pass ? 0 : 1;
}

Frequency resolution and bin mapping

For a given FFT size \(N\) and sampling rate \(f_s\), each bin \(k\) corresponds to frequency:

\[ f_k = k \cdot \frac{f_s}{N} \]

Reading the formula in C:

// k -> bin index, fs -> sample_rate, N -> FFT size
// f_k -> freq (the frequency that bin k represents)
double freq = (double)k * sample_rate / (double)N;

The nearest bin for a DTMF frequency \(f\) is:

\[ k = \mathrm{round}\!\left(\frac{f \cdot N}{f_s}\right) \]

Reading the formula in C:

// f -> dtmf_freq, N -> FFT size, fs -> sample_rate
// k -> bin (nearest FFT bin for the DTMF frequency)
unsigned bin = (unsigned)(dtmf_freq * N / sample_rate + 0.5);

The detector checks bins \(k-1\), \(k\), and \(k+1\) and takes the maximum magnitude, compensating for the slight frequency mismatch when the DTMF frequency does not fall exactly on a bin centre.

At 8 kHz with \(N = 256\):

DTMF freq Nearest bin Bin freq Error
697 Hz 22 687.5 Hz -9.5
770 Hz 25 781.3 Hz +11.3
852 Hz 27 843.8 Hz -8.2
941 Hz 30 937.5 Hz -3.5
1209 Hz 39 1218.8 Hz +9.8
1336 Hz 43 1343.8 Hz +7.8
1477 Hz 47 1468.8 Hz -8.2
1633 Hz 52 1625.0 Hz -8.0

All errors are well within the ±1.5 % tolerance specified by ITU-T. The detector also checks the two adjacent bins (±1) to handle residual frequency mismatch.