Dual-Tone Multi-Frequency (DTMF) is the signalling system used by touch-tone telephones. Each keypad button is encoded as the sum of two sinusoids – one from a low-frequency row group and one from a high-frequency column group. The receiver decodes the button by identifying both frequencies.

miniDSP provides ITU-T Q.24-compliant detection and generation in src/minidsp_dtmf.c, demonstrated in examples/dtmf_detector.c.

Build and run the self-test from the repository root:

make -C examples dtmf_detector

cd examples && ./dtmf_detector

The DTMF frequency table

Each button sits at the intersection of one row and one column frequency:

	1209 Hz	1336 Hz	1477 Hz	1633 Hz
697 Hz	1	2	3	A
770 Hz	4	5	6	B
852 Hz	7	8	9	C
941 Hz	*	0	#	D

The frequencies were chosen so that no tone is a harmonic of another (ratios are never simple integers), preventing false triggers from harmonically rich signals like speech.

Spectrogram of the sequence "159#" (70 ms tones, 70 ms pauses, 8 kHz). Each digit appears as a pair of horizontal bands — one row frequency and one column frequency. Dashed lines mark the eight DTMF frequencies:

ITU-T Q.24 timing constraints

ITU-T Recommendation Q.24 specifies minimum timing for reliable DTMF signalling:

Parameter	Minimum
Tone duration for valid digit	40 ms
Inter-digit pause	40 ms

In practice, telephone systems use 70–120 ms tones and pauses. The miniDSP detector enforces the 40 ms minimums via a frame-counting state machine; the generator asserts that requested durations meet the minimums.

Signal model

A single DTMF digit is the sum of two sinusoids at equal amplitude:

\[ x[n] = A\,\sin\!\bigl(2\pi\, f_{\text{row}}\, n / f_s\bigr) + A\,\sin\!\bigl(2\pi\, f_{\text{col}}\, n / f_s\bigr), \qquad n = 0, 1, \ldots, N_{\text{tone}}-1 \]

where \(A = 0.5\) so the peak combined amplitude is 1.0, \(f_s\) is the sampling rate, and \(N_{\text{tone}}\) is the number of samples per tone.

Reading the formula in C:

// A -> 0.5, f_row/f_col -> row_freq/col_freq, fs -> sample_rate
// n -> i, x[n] -> output[offset + i]
for (unsigned i = 0; i < tone_samples; i++) {
    double t = (double)i / sample_rate;
    output[offset + i] = 0.5 * sin(2 * M_PI * row_freq * t)
                        + 0.5 * sin(2 * M_PI * col_freq * t);
}

The library implementation uses MD_sine_wave() to generate each component separately, then sums them.

Generation

API:

unsigned len = MD_dtmf_signal_length(num_digits, sample_rate,
                                     tone_ms, pause_ms);
double *sig = malloc(len * sizeof(double));
MD_dtmf_generate(sig, "5551234", sample_rate, tone_ms, pause_ms);

The total signal length in samples is:

\[ N = D \cdot \left\lfloor \frac{t_{\text{tone}} \cdot f_s}{1000} \right\rfloor + (D - 1) \cdot \left\lfloor \frac{t_{\text{pause}} \cdot f_s}{1000} \right\rfloor \]

where \(D\) is the number of digits.

Reading the formula in C:

// D -> num_digits, t_tone -> tone_ms, t_pause -> pause_ms, fs -> sample_rate
// floor(t_tone * fs / 1000) -> tone_samples
unsigned tone_samples  = (unsigned)(tone_ms * sample_rate / 1000.0);
unsigned pause_samples = (unsigned)(pause_ms * sample_rate / 1000.0);
unsigned N = num_digits * tone_samples
           + (num_digits - 1) * pause_samples;

Quick example – generate a DTMF sequence and save as WAV:

static int valid_dtmf_char(char ch)
{
    return (ch >= '0' && ch <= '9') || ch == '*' || ch == '#'
        || ch == 'A' || ch == 'a' || ch == 'B' || ch == 'b'
        || ch == 'C' || ch == 'c' || ch == 'D' || ch == 'd';
}
 
static int generate_wav(const char *digits, const char *outfile)
{
    const double sample_rate = 8000.0;
    const unsigned tone_ms   = 70;
    const unsigned pause_ms  = 70;
 
    if (digits[0] == '\0') {
        fprintf(stderr, "Digit string must not be empty\n");
        return 1;
    }
 
    for (const char *p = digits; *p; p++) {
        if (!valid_dtmf_char(*p)) {
            fprintf(stderr, "Invalid DTMF character '%c'. "
                            "Valid: 0-9, A-D, *, #\n", *p);
            return 1;
        }
    }
 
    unsigned num_digits = (unsigned)strlen(digits);
    unsigned signal_len = MD_dtmf_signal_length(num_digits, sample_rate,
                                                tone_ms, pause_ms);
 
    double *signal = malloc(signal_len * sizeof(double));
    if (!signal) { fprintf(stderr, "allocation failed\n"); return 1; }
 
    MD_dtmf_generate(signal, digits, sample_rate, tone_ms, pause_ms);
 
    /* Convert double -> float for WAV writing. */
    float *fdata = malloc(signal_len * sizeof(float));
    if (!fdata) { free(signal); fprintf(stderr, "allocation failed\n"); return 1; }
    for (unsigned i = 0; i < signal_len; i++)
        fdata[i] = (float)signal[i];
 
    int ret = FIO_write_wav(outfile, fdata, signal_len, (unsigned)sample_rate);
    if (ret == 0)
        printf("Generated DTMF \"%s\" -> %s  (%u samples, %.3f s)\n",
               digits, outfile, signal_len,
               (double)signal_len / sample_rate);
    else
        fprintf(stderr, "Error writing %s\n", outfile);
 
    free(fdata);
    free(signal);
    return ret;
}

Detection algorithm

Detection slides a Hanning-windowed FFT frame across the audio signal:

FFT size is the largest power of two whose window fits within 35 ms (e.g. \(N = 256\) at 8 kHz, giving \(\Delta f = 31.25\) Hz). Keeping the window shorter than the 40 ms Q.24 minimum pause ensures the state machine can resolve inter-digit gaps.
Hop is \(N/4\) (75 % overlap).
Per frame: apply Hanning window, compute MD_magnitude_spectrum(), normalise to single-sided amplitude, then check the magnitude at each of the eight DTMF frequency bins.
A digit is detected when both the strongest row and strongest column exceed a threshold (8 \(\times\) the mean spectral magnitude, roughly 18 dB above the noise floor).
A state machine enforces ITU-T Q.24 timing:

State	Transition condition	Action
IDLE	Digit detected	Enter PENDING, start counter
PENDING	Same digit for \(\geq\) 40 ms	Enter ACTIVE (confirmed)
PENDING	Different digit or silence	Return to IDLE
ACTIVE	Same digit continues	Update end time
ACTIVE	Silence / different for \(\geq\) 40 ms	Emit tone, return to IDLE

Single-sided amplitude normalisation:

\[ \hat{X}[k] = \begin{cases} |X[k]| / N & k = 0 \text{ or } k = N/2 \\[4pt] 2\,|X[k]| / N & 0 < k < N/2 \end{cases} \]

Reading the normalisation in C:

// X[k] -> mag[k] (raw FFTW output), N -> FFT size
for (unsigned k = 0; k < num_bins; k++) {
    mag[k] /= (double)N;                // divide by FFT size
    if (k > 0 && k < N / 2)
        mag[k] *= 2.0;                  // fold negative frequencies
}

Quick example – detect DTMF tones in a WAV file:

static int detect_file(const char *infile)
{
    float  *fdata    = nullptr;
    size_t  datalen  = 0;
    unsigned samprate = 0;
 
    if (FIO_read_audio(infile, &fdata, &datalen, &samprate, 1) != 0) {
        fprintf(stderr, "Error reading %s\n", infile);
        return 1;
    }
 
    if (datalen == 0) {
        fprintf(stderr, "File contains no audio samples\n");
        free(fdata);
        return 1;
    }
 
    if (samprate < 4000) {
        fprintf(stderr, "Sample rate %u Hz is too low for DTMF detection "
                        "(minimum 4000 Hz)\n", samprate);
        free(fdata);
        return 1;
    }
 
    if (datalen > UINT_MAX) {
        fprintf(stderr, "File too large (%zu samples, max %u)\n",
                datalen, UINT_MAX);
        free(fdata);
        return 1;
    }
 
    printf("Read %s: %zu samples at %u Hz (%.3f s)\n",
           infile, datalen, samprate, (double)datalen / (double)samprate);
 
    /* Convert float -> double for the library. */
    double *signal = malloc(datalen * sizeof(double));
    if (!signal) {
        free(fdata);
        fprintf(stderr, "allocation failed\n");
        return 1;
    }
    for (size_t i = 0; i < datalen; i++)
        signal[i] = (double)fdata[i];
    free(fdata);
 
    /* Detect. */
    MD_DTMFTone tones[256];
    unsigned n = MD_dtmf_detect(signal, (unsigned)datalen,
                                (double)samprate, tones, 256);
 
    printf("\nDetected %u DTMF tone%s:\n", n, n == 1 ? "" : "s");
    if (n > 0) {
        printf("  %-6s  %-12s  %-12s\n", "Digit", "Start (s)", "End (s)");
        for (unsigned i = 0; i < n; i++)
            printf("  %-6c  %-12.3f  %-12.3f\n",
                   tones[i].digit, tones[i].start_s, tones[i].end_s);
    }
 
    free(signal);
    MD_shutdown();
    return 0;
}

Self-test mode

Running the example with no arguments generates a known digit sequence, detects it, and verifies correctness:

static int self_test(void)
{
    const char    *test_digits = "14*258039#";
    const double   sample_rate = 8000.0;
    const unsigned tone_ms     = 70;
    const unsigned pause_ms    = 70;
 
    unsigned num_digits = (unsigned)strlen(test_digits);
    unsigned signal_len = MD_dtmf_signal_length(num_digits, sample_rate,
                                                tone_ms, pause_ms);
 
    printf("Self-test: generating DTMF sequence \"%s\"\n", test_digits);
    printf("  sample_rate = %.0f Hz, tone = %u ms, pause = %u ms\n",
           sample_rate, tone_ms, pause_ms);
 
    double *signal = malloc(signal_len * sizeof(double));
    if (!signal) { fprintf(stderr, "allocation failed\n"); return 1; }
 
    MD_dtmf_generate(signal, test_digits, sample_rate, tone_ms, pause_ms);
 
    MD_DTMFTone tones[64];
    unsigned n = MD_dtmf_detect(signal, signal_len, sample_rate, tones, 64);
 
    printf("\nDetected %u DTMF tone%s:\n", n, n == 1 ? "" : "s");
    printf("  %-6s  %-12s  %-12s\n", "Digit", "Start (s)", "End (s)");
    for (unsigned i = 0; i < n; i++)
        printf("  %-6c  %-12.3f  %-12.3f\n",
               tones[i].digit, tones[i].start_s, tones[i].end_s);
 
    /* Verify. */
    int pass = 1;
    if (n != num_digits) {
        printf("\nSelf-test FAILED: expected %u digits, detected %u\n",
               num_digits, n);
        pass = 0;
    } else {
        for (unsigned i = 0; i < num_digits; i++) {
            if (tones[i].digit != test_digits[i]) {
                printf("\nSelf-test FAILED: digit %u expected '%c' got '%c'\n",
                       i, test_digits[i], tones[i].digit);
                pass = 0;
                break;
            }
        }
    }
 
    if (pass)
        printf("\nSelf-test PASSED: all %u digits detected correctly\n",
               num_digits);
 
    free(signal);
    MD_shutdown();
    return pass ? 0 : 1;
}

Frequency resolution and bin mapping

For a given FFT size \(N\) and sampling rate \(f_s\), each bin \(k\) corresponds to frequency:

\[ f_k = k \cdot \frac{f_s}{N} \]

Reading the formula in C:

// k -> bin index, fs -> sample_rate, N -> FFT size
// f_k -> freq (the frequency that bin k represents)
double freq = (double)k * sample_rate / (double)N;

The nearest bin for a DTMF frequency \(f\) is:

\[ k = \mathrm{round}\!\left(\frac{f \cdot N}{f_s}\right) \]

Reading the formula in C:

// f -> dtmf_freq, N -> FFT size, fs -> sample_rate
// k -> bin (nearest FFT bin for the DTMF frequency)
unsigned bin = (unsigned)(dtmf_freq * N / sample_rate + 0.5);

The detector checks bins \(k-1\), \(k\), and \(k+1\) and takes the maximum magnitude, compensating for the slight frequency mismatch when the DTMF frequency does not fall exactly on a bin centre.

At 8 kHz with \(N = 256\):

DTMF freq	Nearest bin	Bin freq	Error
697 Hz	22	687.5 Hz	-9.5
770 Hz	25	781.3 Hz	+11.3
852 Hz	27	843.8 Hz	-8.2
941 Hz	30	937.5 Hz	-3.5
1209 Hz	39	1218.8 Hz	+9.8
1336 Hz	43	1343.8 Hz	+7.8
1477 Hz	47	1468.8 Hz	-8.2
1633 Hz	52	1625.0 Hz	-8.0

All errors are well within the ±1.5 % tolerance specified by ITU-T. The detector also checks the two adjacent bins (±1) to handle residual frequency mismatch.