# miniDSP - Complete Reference > A small C library for audio DSP This document contains the complete API reference and tutorial content for the miniDSP library. It is generated automatically from the source code documentation. Source: https://github.com/wooters/miniDSP --- # miniDSP API Reference C library for audio DSP. Header: `#include "minidsp.h"` ## `void MD_set_error_handler(MD_ErrorHandler handler)` Install a custom error handler. When a miniDSP function detects a precondition violation (NULL pointer, invalid size, etc.) it calls the active handler instead of aborting. The default handler prints a message to `stderr`. Pass `NULL` to restore the default handler. **Thread safety** This function must be called **once, before any other MD_* call**. It must not be called concurrently with any other miniDSP function. ```c // Example: redirect miniDSP errors to an application log static void my_handler(MD_ErrorCode code, const char *fn, const char *msg) { app_log(LOG_WARN, "miniDSP error %d in %s: %s", code, fn, msg); } int main(void) { MD_set_error_handler(my_handler); // ... use miniDSP ... } ``` --- ## `double MD_dot(const double *a, const double *b, unsigned N)` Compute the dot product of two vectors. The dot product is the sum of element-wise products: a[0]*b[0] + a[1]*b[1] + ... --- ## `double MD_entropy(const double *a, unsigned N, bool clip)` Compute the normalized entropy of a distribution. Returns a value between 0.0 (all energy concentrated in one bin) and 1.0 (energy spread equally across all bins). **Parameters:** - `clip` - If true, ignore negative values. If false, square all values first. --- ## `double MD_energy(const double *a, unsigned N)` Compute signal energy: sum of squared samples. N-1 E = SUM a[n]^2 n=0 Energy tells you "how loud" a signal is overall. A silent signal has zero energy; a loud one has high energy. Notice that squaring makes all values positive, so negative samples contribute too. --- ## `double MD_power(const double *a, unsigned N)` Compute signal power: energy divided by the number of samples. P = E / N Power is the "average energy per sample". It lets you compare signals of different lengths on an equal footing. --- ## `double MD_power_db(const double *a, unsigned N)` Compute signal power in decibels: 10 * log10(power). P_dB = 10 * log10(P) Decibels are a logarithmic scale commonly used in audio engineering. Every +10 dB means the power is 10x larger. A floor of 1e-10 is used to avoid log(0), which would be negative infinity. --- ## `double MD_rms(const double *a, unsigned N)` Compute the root mean square (RMS) of a signal. RMS is the standard measure of signal "loudness": $$\mathrm{RMS} = \sqrt{\frac{1}{N}\sum_{n=0}^{N-1} x[n]^2}$$ Equivalently, `sqrt(MD_power(a, N))`. A unit-amplitude sine wave has RMS = 1/sqrt(2) ~ 0.707. A DC signal of value c has RMS = |c|. **Parameters:** - `a` - Input signal of length N. - `N` - Number of samples. Must be > 0. **Returns:** RMS value (always >= 0). ```c double sig[1024]; MD_sine_wave(sig, 1024, 1.0, 440.0, 44100.0); double rms = MD_rms(sig, 1024); // ~ 0.707 ``` --- ## `double MD_zero_crossing_rate(const double *a, unsigned N)` Compute the zero-crossing rate of a signal. The zero-crossing rate counts how often the signal changes sign, normalised by the number of adjacent pairs: $$\mathrm{ZCR} = \frac{1}{N-1}\sum_{n=1}^{N-1} \mathbf{1}\!\bigl[\mathrm{sgn}(x[n]) \ne \mathrm{sgn}(x[n-1])\bigr]$$ Returns a value in [0.0, 1.0]. High ZCR indicates noise or high-frequency content; low ZCR indicates tonal or low-frequency content. Zero is treated as non-negative (standard convention). **Parameters:** - `a` - Input signal of length N. - `N` - Number of samples. Must be > 1. **Returns:** Zero-crossing rate in [0.0, 1.0]. ```c double sig[4096]; MD_sine_wave(sig, 4096, 1.0, 1000.0, 16000.0); double zcr = MD_zero_crossing_rate(sig, 4096); // zcr ~ 2 * 1000 / 16000 = 0.125 ``` --- ## `void MD_autocorrelation(const double *a, unsigned N, double *out, unsigned max_lag)` Compute the normalised autocorrelation of a signal. The autocorrelation measures the similarity between a signal and a delayed copy of itself. This function computes the normalised (biased) autocorrelation for lags 0 through max_lag-1: $$R[\tau] = \frac{1}{R[0]} \sum_{n=0}^{N-1-\tau} x[n]\,x[n+\tau]$$ where $R[0] = \sum x[n]^2$ is the signal energy. The output satisfies out[0] = 1.0 and |out[tau]| <= 1.0. A silent signal (all zeros) produces all-zero output. **Parameters:** - `a` - Input signal of length N. - `N` - Number of samples. Must be > 0. - `out` - Output array of length max_lag (caller-allocated). - `max_lag` - Number of lag values to compute. Must be > 0 and < N. ```c double sig[1024]; MD_sine_wave(sig, 1024, 1.0, 100.0, 1000.0); double acf[50]; MD_autocorrelation(sig, 1024, acf, 50); // acf[0] = 1.0, acf[10] ~ 1.0 (lag = one period of 100 Hz at 1 kHz) ``` --- ## `void MD_peak_detect(const double *a, unsigned N, double threshold, unsigned min_distance, unsigned *peaks_out, unsigned *num_peaks_out)` Detect peaks (local maxima) in a signal. A sample a[i] is a peak if it is strictly greater than both immediate neighbours and above the given threshold. The min_distance parameter suppresses nearby secondary peaks: after accepting a peak at index i, any candidate within min_distance indices is skipped. Peaks are found left-to-right (deterministic tie-breaking). Endpoint samples (i=0, i=N-1) are never peaks because they lack two neighbours. **Parameters:** - `a` - Input signal of length N. - `N` - Number of samples. - `threshold` - Minimum value for a peak. - `min_distance` - Minimum index gap between accepted peaks (>= 1). - `peaks_out` - Output array of peak indices (caller-allocated, worst case N elements). - `num_peaks_out` - Receives the number of peaks found. ```c double sig[] = {0, 1, 3, 1, 0, 2, 5, 2, 0}; unsigned peaks[9], n; MD_peak_detect(sig, 9, 0.0, 1, peaks, &n); // n = 2, peaks = {2, 6} (values 3 and 5) ``` --- ## `double MD_f0_autocorrelation(const double *signal, unsigned N, double sample_rate, double min_freq_hz, double max_freq_hz)` Estimate the fundamental frequency (F0) using autocorrelation. This method searches for the strongest local peak in the normalised autocorrelation sequence over lags corresponding to the requested frequency range: $$f_0 = \frac{f_s}{\tau_\text{peak}}$$ where $\tau_\text{peak}$ is the selected lag and $f_s$ is the sample rate. **Parameters:** - `signal` - Input signal of length N. - `N` - Number of samples (must be >= 2). - `sample_rate` - Sampling rate in Hz (must be > 0). - `min_freq_hz` - Minimum search frequency in Hz (must be > 0). - `max_freq_hz` - Maximum search frequency in Hz (must be > min_freq_hz). **Returns:** Estimated F0 in Hz, or 0.0 if no reliable F0 is found. **Note:** Invalid API usage is guarded by assertions. A return value of 0.0 means the call was valid but no usable pitch peak was found (for example, silence or an unvoiced/noisy frame). ```c double f0 = MD_f0_autocorrelation(frame, frame_len, 16000.0, 80.0, 400.0); ``` --- ## `double MD_f0_fft(const double *signal, unsigned N, double sample_rate, double min_freq_hz, double max_freq_hz)` Estimate the fundamental frequency (F0) using FFT peak picking. The signal is internally Hann-windowed, transformed with a one-sided FFT, then the dominant magnitude peak in the requested frequency range is mapped back to Hz: $$f_0 = \frac{k_\text{peak} f_s}{N}$$ where $k_\text{peak}$ is the selected FFT bin. This method is simple and fast but generally less robust than MD_f0_autocorrelation() for noisy or strongly harmonic signals. **Parameters:** - `signal` - Input signal of length N. - `N` - Number of samples (must be >= 2). - `sample_rate` - Sampling rate in Hz (must be > 0). - `min_freq_hz` - Minimum search frequency in Hz (must be > 0). - `max_freq_hz` - Maximum search frequency in Hz (must be > min_freq_hz). **Returns:** Estimated F0 in Hz, or 0.0 if no reliable F0 is found. **Note:** This function uses the internal FFT cache. Call MD_shutdown() when done with FFT-based APIs. **Note:** Invalid API usage is guarded by assertions. A return value of 0.0 means the call was valid but no usable pitch peak was found. ```c double f0 = MD_f0_fft(frame, frame_len, 16000.0, 80.0, 400.0); ``` --- ## `void MD_mix(const double *a, const double *b, double *out, unsigned N, double w_a, double w_b)` Mix (weighted sum) two signals. Computes the element-wise weighted sum: $$\mathrm{out}[n] = w_a \cdot a[n] + w_b \cdot b[n]$$ In-place safe: the output buffer may alias either input. **Parameters:** - `a` - First input signal of length N. - `b` - Second input signal of length N. - `out` - Output signal of length N (caller-allocated). - `N` - Number of samples. - `w_a` - Weight for signal a. - `w_b` - Weight for signal b. ```c double sine[1024], noise[1024], mix[1024]; MD_sine_wave(sine, 1024, 1.0, 440.0, 44100.0); MD_white_noise(noise, 1024, 0.1, 42); MD_mix(sine, noise, mix, 1024, 0.8, 0.2); ``` --- ## `void MD_delay_echo(const double *in, double *out, unsigned N, unsigned delay_samples, double feedback, double dry, double wet)` Delay line / echo effect using a circular buffer with feedback. Let D = delay_samples. The internal delay state follows: $$s[n] = x[n] + feedback \cdot s[n-D]$$ and output is: $$y[n] = dry \cdot x[n] + wet \cdot s[n-D]$$ This creates repeating echoes decaying geometrically when $|feedback| < 1$. In-place safe: `out` may alias `in`. **Parameters:** - `in` - Input signal of length N. - `out` - Output signal of length N (caller-allocated). - `N` - Number of samples. - `delay_samples` - Delay length in samples (must be > 0). - `feedback` - Echo feedback gain (must satisfy |feedback| < 1). - `dry` - Dry (original) mix weight. - `wet` - Wet (delayed) mix weight. --- ## `void MD_tremolo(const double *in, double *out, unsigned N, double rate_hz, double depth, double sample_rate)` Tremolo effect (amplitude modulation by a sinusoidal LFO). The modulation gain is: $$g[n] = (1 - depth) + depth \cdot \frac{1 + \sin(2\pi f_{LFO} n / f_s)}{2}$$ so $g[n]$ ranges from $1-depth$ to $1$. Output: $$y[n] = g[n] \cdot x[n]$$ In-place safe: `out` may alias `in`. **Parameters:** - `in` - Input signal of length N. - `out` - Output signal of length N (caller-allocated). - `N` - Number of samples. - `rate_hz` - LFO rate in Hz (must be >= 0). - `depth` - Modulation depth in [0, 1]. - `sample_rate` - Sampling rate in Hz (must be > 0). --- ## `void MD_comb_reverb(const double *in, double *out, unsigned N, unsigned delay_samples, double feedback, double dry, double wet)` Comb-filter reverb (feedback comb filter with dry/wet mix). Internal comb section: $$c[n] = x[n] + feedback \cdot c[n-D]$$ Final output: $$y[n] = dry \cdot x[n] + wet \cdot c[n]$$ where $D$ is `delay_samples`. In-place safe: `out` may alias `in`. **Parameters:** - `in` - Input signal of length N. - `out` - Output signal of length N (caller-allocated). - `N` - Number of samples. - `delay_samples` - Comb delay in samples (must be > 0). - `feedback` - Feedback gain (must satisfy |feedback| < 1). - `dry` - Dry (original) mix weight. - `wet` - Wet (comb output) mix weight. --- ## `unsigned MD_convolution_num_samples(unsigned signal_len, unsigned kernel_len)` Compute the output length of a full linear convolution. For input length N and kernel length M, full convolution length is N+M-1. --- ## `void MD_convolution_time(const double *signal, unsigned signal_len, const double *kernel, unsigned kernel_len, double *out)` Time-domain full linear convolution (direct sum-of-products). Computes: out[n] = sum_{k=0}^{kernel_len-1} signal[n-k] * kernel[k] with out-of-range signal samples treated as zero. **Parameters:** - `signal` - Input signal of length signal_len. - `signal_len` - Number of input samples (must be > 0). - `kernel` - FIR kernel of length kernel_len. - `kernel_len` - Number of FIR taps (must be > 0). - `out` - Output buffer of length signal_len + kernel_len - 1. --- ## `void MD_moving_average(const double *signal, unsigned signal_len, unsigned window_len, double *out)` Causal moving-average FIR filter with zero-padded startup. Computes: out[n] = (1/window_len) * sum_{k=0}^{window_len-1} signal[n-k] where out-of-range samples (n-k < 0) are treated as zero. **Parameters:** - `signal` - Input signal of length signal_len. - `signal_len` - Number of input samples (must be > 0). - `window_len` - Moving-average window length (must be > 0). - `out` - Output buffer of length signal_len. --- ## `void MD_fir_filter(const double *signal, unsigned signal_len, const double *coeffs, unsigned num_taps, double *out)` Apply a causal FIR filter with arbitrary coefficients. Computes: out[n] = sum_{k=0}^{num_taps-1} coeffs[k] * signal[n-k] with out-of-range signal samples treated as zero. **Parameters:** - `signal` - Input signal of length signal_len. - `signal_len` - Number of input samples (must be > 0). - `coeffs` - FIR coefficients of length num_taps. - `num_taps` - Number of FIR taps (must be > 0). - `out` - Output buffer of length signal_len. --- ## `void MD_convolution_fft_ola(const double *signal, unsigned signal_len, const double *kernel, unsigned kernel_len, double *out)` Full linear convolution using FFT overlap-add (offline). Produces the same output as MD_convolution_time() but is faster for longer kernels by processing blocks in the frequency domain. **Parameters:** - `signal` - Input signal of length signal_len. - `signal_len` - Number of input samples (must be > 0). - `kernel` - FIR kernel of length kernel_len. - `kernel_len` - Number of FIR taps (must be > 0). - `out` - Output buffer of length signal_len + kernel_len - 1. --- ## `void MD_design_lowpass_fir(double *coeffs, unsigned num_taps, double cutoff_freq, double sample_rate, double kaiser_beta)` Design a Kaiser-windowed sinc lowpass FIR filter. Generates a linear-phase lowpass filter with the specified cutoff frequency and Kaiser window shape. The coefficients are normalized to unity DC gain (they sum to 1.0). $$h[i] = 2\,f_c\;\mathrm{sinc}\!\bigl(2\,f_c\,(i - (N-1)/2)\bigr) \;\cdot\; w_{\mathrm{Kaiser}}[i]$$ where $f_c = \mathrm{cutoff\_freq} / \mathrm{sample\_rate}$. **Parameters:** - `coeffs` - Output buffer of length num_taps (caller-allocated). - `num_taps` - Filter length (must be > 0). - `cutoff_freq` - -6 dB cutoff frequency in Hz (must be > 0 and < sample_rate / 2). - `sample_rate` - Sampling rate in Hz (must be > 0). - `kaiser_beta` - Kaiser window $\beta$ parameter (e.g. 10.0). ```c double h[65]; MD_design_lowpass_fir(h, 65, 4000.0, 48000.0, 10.0); // 4 kHz LPF ``` --- ## `double MD_scale(double in, double oldmin, double oldmax, double newmin, double newmax)` Map a single value from one range to another. Example: MD_scale(5, 0, 10, 0, 100) returns 50. --- ## `void MD_scale_vec(double *in, double *out, unsigned N, double oldmin, double oldmax, double newmin, double newmax)` Map every element of a vector from one range to another. --- ## `void MD_fit_within_range(double *in, double *out, unsigned N, double newmin, double newmax)` Fit values within [newmin, newmax]. If all values already fit, they are copied unchanged. Otherwise the entire vector is rescaled. --- ## `void MD_adjust_dblevel(const double *in, double *out, unsigned N, double dblevel)` Automatic Gain Control: scale a signal so its power matches the requested dB level, then clip to [-1, 1]. The input signal is assumed to be in the range [-1.0, 1.0]. The function computes a gain factor so that the output signal has the desired power level (in dB), then clips to [-1.0, 1.0] if any sample exceeds that range. Based on Jaydeep Dhole's AGC implementation for MATLAB. --- ## `void MD_magnitude_spectrum(const double *signal, unsigned N, double *mag_out)` Compute the magnitude spectrum of a real-valued signal. Given a signal of length N, this function computes the FFT and returns the magnitude |X(k)| for each frequency bin. Because the input is real-valued, the FFT output is conjugate-symmetric, so only the first N/2 + 1 bins are unique (from DC through Nyquist). To convert bin index k to a frequency in Hz: freq_k = k * sample_rate / N The output is **not** normalised by N -- divide each value by N to get the "standard" DFT magnitude, or by N/2 (except DC and Nyquist) for single-sided amplitude. **Parameters:** - `signal` - Input signal of length N. - `N` - Number of samples in the signal. - `mag_out` - Output array, must be pre-allocated to at least N/2 + 1 doubles. On return, mag_out[k] = |X(k)|. **Note:** The caller must allocate mag_out. The required size is (N / 2 + 1) * sizeof(double). Example: ```c double signal[1024]; // ... fill signal with audio samples ... unsigned num_bins = 1024 / 2 + 1; // = 513 double *mag = malloc(num_bins * sizeof(double)); MD_magnitude_spectrum(signal, 1024, mag); // mag[0] = DC component magnitude // mag[k] = magnitude at frequency k * sample_rate / 1024 // mag[512] = Nyquist frequency magnitude free(mag); MD_shutdown(); // free cached FFT plans when done ``` This performs a real-to-complex FFT using FFTW, then computes the absolute value (magnitude) of each complex frequency bin. For a real signal of length N, the FFT is conjugate-symmetric, so only the first N/2 + 1 bins are unique: - Bin 0: DC component (zero frequency) - Bin k: frequency = k * sample_rate / N - Bin N/2: Nyquist frequency (sample_rate / 2) The magnitude is computed as: |X(k)| = sqrt( Re(X(k))^2 + Im(X(k))^2 ) The FFT plan is cached and reused across calls of the same length, following the same pattern as the GCC functions. **Parameters:** - `signal` - Input signal of length N. - `N` - Number of samples (must be >= 2). - `mag_out` - Output: magnitudes for bins 0..N/2. Must be pre-allocated to N/2 + 1 doubles. --- ## `void MD_power_spectral_density(const double *signal, unsigned N, double *psd_out)` Compute the power spectral density (PSD) of a real-valued signal. The PSD describes how a signal's power is distributed across frequencies. While the magnitude spectrum tells you the *amplitude* at each frequency, the PSD tells you the *power* -- useful for noise analysis, SNR estimation, and comparing signals of different lengths. This function computes the "periodogram" estimator: PSD[k] = |X(k)|^2 / N = (Re(X(k))^2 + Im(X(k))^2) / N where X(k) is the DFT of the input signal (unnormalised, as computed by FFTW). **Relationship to the magnitude spectrum:** PSD[k] = |X(k)|^2 / N = (magnitude[k])^2 / N **Parseval's theorem (energy conservation):** The one-sided PSD sums to the total signal energy: PSD[0] + 2 * sum(PSD[1..N/2-1]) + PSD[N/2] = sum(x[n]^2) **Parameters:** - `signal` - Input signal of length N. - `N` - Number of samples in the signal (must be >= 2). - `psd_out` - Output array, must be pre-allocated to at least N/2 + 1 doubles. On return, psd_out[k] = |X(k)|^2 / N. **Note:** The caller must allocate psd_out. The required size is (N / 2 + 1) * sizeof(double). Example: ```c double signal[1024]; // ... fill signal with audio samples ... unsigned num_bins = 1024 / 2 + 1; // = 513 double *psd = malloc(num_bins * sizeof(double)); MD_power_spectral_density(signal, 1024, psd); // psd[0] = DC power // psd[k] = power at frequency k * sample_rate / 1024 // psd[512] = Nyquist frequency power free(psd); MD_shutdown(); // free cached FFT plans when done ``` The PSD is the "periodogram" estimator: PSD[k] = |X(k)|^2 / N. It reuses the same FFT cache as MD_magnitude_spectrum() -- both perform the same real-to-complex FFT, only the post-processing differs. We compute |X(k)|^2 = re^2 + im^2 directly from the real and imaginary parts, rather than calling cabs() (which computes sqrt(re^2 + im^2)) and then squaring. This avoids a redundant sqrt and is both faster and more numerically precise. **Parameters:** - `signal` - Input signal of length N. - `N` - Number of samples (must be >= 2). - `psd_out` - Output: PSD for bins 0..N/2. Must be pre-allocated to N/2 + 1 doubles. --- ## `void MD_phase_spectrum(const double *signal, unsigned N, double *phase_out)` Compute the one-sided phase spectrum of a real signal. Returns the instantaneous phase angle $\phi(k) = \arg X(k)$ for each DFT bin using an unnormalised real-to-complex FFT (FFTW r2c). The phase is expressed in **radians** in the range $[-\pi,\,\pi]$. For a real signal of length $N$, only the first $N/2+1$ bins carry unique information (bins $N/2+1\ldots N-1$ are conjugate-symmetric mirrors). Accordingly, `phase_out` must be pre-allocated to hold at least $N/2+1$ doubles. **Interpretation:**- A pure cosine at an integer bin $k_0$ (exact period in $N$ samples) produces $\phi(k_0) = 0$. - A pure sine at the same bin produces $\phi(k_0) = -\pi/2$. - A time-delayed signal exhibits **linear phase**: $\phi(k) = -2\pi k d/N$, where $d$ is the delay in samples. **Note:** Phase values at bins where the magnitude is near zero are numerically unreliable. Always examine MD_magnitude_spectrum() alongside the phase spectrum to identify significant bins. **Parameters:** - `signal` - Input signal of length `N`. - `N` - Signal length (must be >= 2). - `phase_out` - Pre-allocated array of at least `N/2+1` doubles that receives the phase in radians. **Example** ```c unsigned N = 1024; double *sig = malloc(N * sizeof(double)); // ... fill sig ... unsigned num_bins = N / 2 + 1; double *phase = malloc(num_bins * sizeof(double)); MD_phase_spectrum(sig, N, phase); // phase[k] in [-M_PI, M_PI] free(sig); free(phase); MD_shutdown(); ``` **See also:** MD_magnitude_spectrum(), MD_power_spectral_density(), MD_shutdown() --- ## `unsigned MD_stft_num_frames(unsigned signal_len, unsigned N, unsigned hop)` Compute the number of STFT frames for the given signal length and parameters. The formula is: num_frames = (signal_len - N) / hop + 1 when signal_len >= N num_frames = 0 when signal_len < N Use this function to size the output buffer before calling MD_stft(). **Parameters:** - `signal_len` - Total number of samples in the signal. - `N` - FFT window size (samples per frame). - `hop` - Hop size (samples between successive frame starts). **Returns:** Number of complete frames that fit in the signal. Example: ```c // 1 s of audio at 16 kHz, 32 ms window, 8 ms hop unsigned signal_len = 16000; unsigned N = 512; // 32 ms at 16 kHz unsigned hop = 128; // 8 ms (75% overlap) unsigned num_frames = MD_stft_num_frames(signal_len, N, hop); // num_frames = (16000 - 512) / 128 + 1 = 121 ``` --- ## `void MD_stft(const double *signal, unsigned signal_len, unsigned N, unsigned hop, double *mag_out)` Compute the Short-Time Fourier Transform (STFT) of a real-valued signal. The STFT slides a Hanning-windowed FFT over the signal in steps of `hop` samples, producing a time-frequency magnitude matrix. For each frame `f` starting at sample `f` * hop, the function:1. Multiplies the frame by a Hanning window. 2. Computes the real-to-complex FFT using FFTW. 3. Stores the magnitudes |X(k)| for bins 0..N/2 in the output. The STFT formula for frame `f` and bin `k` is: X_f(k) = SUM_{n=0}^{N-1} w[n] * x[f*hop + n] * e^{-j2pi*k*n/N} mag_out[f * (N/2+1) + k] = |X_f(k)| where `w`[n] is the Hanning window. The output is **not** normalised by N -- divide each value by N to get the "standard" DFT magnitude, consistent with MD_magnitude_spectrum(). The STFT reuses the same cached FFT plan as MD_magnitude_spectrum() and MD_power_spectral_density(). Only the Hanning window buffer is separate. **Parameters:** - `signal` - Input signal. - `signal_len` - Total number of samples in the signal. - `N` - FFT window size (must be >= 2). - `hop` - Hop size in samples (must be >= 1). - `mag_out` - Output array (row-major). Must be pre-allocated to at least MD_stft_num_frames(signal_len, N, hop) * (N/2+1) doubles. On return, mag_out[f*(N/2+1) + k] = |X_f(k)|. **Note:** If signal_len < N, the function returns immediately without writing any output (zero frames fit). Use MD_stft_num_frames() to check in advance. **Note:** The caller must allocate mag_out. A typical pattern: ```c unsigned N = 512; unsigned hop = 128; unsigned num_frames = MD_stft_num_frames(signal_len, N, hop); unsigned num_bins = N / 2 + 1; double *mag_out = malloc(num_frames * num_bins * sizeof(double)); MD_stft(signal, signal_len, N, hop, mag_out); // mag_out[f * num_bins + k] = |X_f(k)| // Convert bin k to Hz: freq_hz = k * sample_rate / N // Convert frame f to seconds: time_s = (double)(f * hop) / sample_rate free(mag_out); MD_shutdown(); // free cached FFT plans when done ``` --- ## `void MD_mel_filterbank(unsigned N, double sample_rate, unsigned num_mels, double min_freq_hz, double max_freq_hz, double *filterbank_out)` Build a mel-spaced triangular filterbank matrix. This function creates `num_mels` triangular filters over one-sided FFT bins (0..N/2), using the HTK mel mapping: mel(f) = 2595 * log10(1 + f / 700) The requested frequency range [`min_freq_hz`, `max_freq_hz`] is clamped at runtime to [0, sample_rate/2]. If the clamped range is empty, all output weights are zero. Output layout is row-major: filterbank_out[m * (N/2 + 1) + k] where `m` is the mel band index and `k` is the FFT bin index. **Parameters:** - `N` - FFT size (must be >= 2). - `sample_rate` - Sampling rate in Hz (must be > 0). - `num_mels` - Number of mel filters (must be > 0). - `min_freq_hz` - Requested lower frequency bound in Hz. - `max_freq_hz` - Requested upper frequency bound in Hz (must be > min_freq_hz). - `filterbank_out` - Output matrix, caller-allocated, length num_mels * (N/2 + 1) doubles. --- ## `void MD_mel_energies(const double *signal, unsigned N, double sample_rate, unsigned num_mels, double min_freq_hz, double max_freq_hz, double *mel_out)` Compute mel-band energies from a single frame. Processing steps: 1) Apply an internal Hann window. 2) Compute one-sided PSD bins via FFT: |X(k)|^2 / N. 3) Apply mel filterbank weights and sum per band. This function uses the same internal FFT cache as MD_stft() and spectrum APIs. **Parameters:** - `signal` - Input frame of length N. - `N` - Frame length / FFT size (must be >= 2). - `sample_rate` - Sampling rate in Hz (must be > 0). - `num_mels` - Number of mel bands (must be > 0). - `min_freq_hz` - Requested lower frequency bound in Hz. - `max_freq_hz` - Requested upper frequency bound in Hz (must be > min_freq_hz). - `mel_out` - Output mel energies, caller-allocated length num_mels. --- ## `void MD_mfcc(const double *signal, unsigned N, double sample_rate, unsigned num_mels, unsigned num_coeffs, double min_freq_hz, double max_freq_hz, double *mfcc_out)` Compute mel-frequency cepstral coefficients (MFCCs) from a single frame. Conventions used by this API:- HTK mel mapping for filter placement. - Internal Hann windowing and one-sided PSD mel energies. - Natural-log compression: log(max(E_mel, 1e-12)). - DCT-II with HTK-C0 profile: c0 uses sqrt(1/M) normalization, c_n (n>0) uses sqrt(2/M), where M is num_mels. - Coefficient 0 (C0) is written to mfcc_out[0]. **Parameters:** - `signal` - Input frame of length N. - `N` - Frame length / FFT size (must be >= 2). - `sample_rate` - Sampling rate in Hz (must be > 0). - `num_mels` - Number of mel bands (must be > 0). - `num_coeffs` - Number of cepstral coefficients to output (must be in [1, num_mels]). - `min_freq_hz` - Requested lower frequency bound in Hz. - `max_freq_hz` - Requested upper frequency bound in Hz (must be > min_freq_hz). - `mfcc_out` - Output array, caller-allocated length num_coeffs. --- ## `void MD_lowpass_brickwall(double *signal, unsigned len, double cutoff_hz, double sample_rate)` Apply a brickwall lowpass filter to a signal in-place. Performs an FFT, zeroes all frequency bins above the cutoff frequency, and inverse-FFTs the result. This is a "brickwall" filter: it has zero transition bandwidth, meaning content at or below the cutoff is preserved exactly and content above the cutoff is eliminated completely. The filter operation in the frequency domain: $$X'(k) = \begin{cases} X(k) & \text{if } k \leq \lfloor f_c \cdot N / f_s \rfloor \\ 0 & \text{otherwise} \end{cases}$$ where $X(k)$ is the DFT of the input, $f_c$ is the cutoff frequency, $N$ is the signal length, and $f_s$ is the sample rate. **Note:** Uses one-off FFTW plans (not the cached spectrum analysis plans). Suitable for offline/batch processing. Gibbs ringing occurs at the cutoff frequency; this is negligible when the cutoff is far from any frequency band of interest. **Parameters:** - `signal` - Signal buffer, modified in-place (must not be NULL). - `len` - Number of samples (must be > 0). - `cutoff_hz` - Cutoff frequency in Hz (must be in (0, sample_rate/2)). - `sample_rate` - Sample rate in Hz (must be > 0). ```c // Remove all content above 8 kHz from a 48 kHz signal double buf[48000]; MD_sine_wave(buf, 48000, 1.0, 440.0, 48000.0); MD_lowpass_brickwall(buf, 48000, 8000.0, 48000.0); ``` --- ## `double MD_bessel_i0(double x)` Zeroth-order modified Bessel function of the first kind, $I_0(x)$. Computed via the power series: $$I_0(x) = \sum_{k=0}^{\infty} \left[\frac{(x/2)^k}{k!}\right]^2$$ Converges until the term is below $10^{-15}$ relative tolerance. **Parameters:** - `x` - Input value (any real number). **Returns:** $I_0(x)$. Always $\geq 1$ since $I_0(0)=1$. ```c double val = MD_bessel_i0(5.0); // ≈ 27.2399 ``` --- ## `double MD_sinc(double x)` Normalized sinc function: $\mathrm{sinc}(x) = \sin(\pi x)/(\pi x)$. $$\mathrm{sinc}(x) = \begin{cases} 1 & \text{if } |x| < 10^{-12} \\ \dfrac{\sin(\pi x)}{\pi x} & \text{otherwise} \end{cases}$$ **Parameters:** - `x` - Input value. **Returns:** sinc(x). Equals 1 at zero, 0 at nonzero integers. ```c double v = MD_sinc(0.5); // ≈ 0.6366 (2/π) ``` --- ## `void MD_Gen_Hann_Win(double *out, unsigned n)` Generate a Hanning (Hann) window of length n. Tapers to zero at both ends and is a common default for FFT analysis. --- ## `void MD_Gen_Hamming_Win(double *out, unsigned n)` Generate a Hamming window of length n. Similar to Hanning, but with a slightly lower first sidelobe. --- ## `void MD_Gen_Blackman_Win(double *out, unsigned n)` Generate a Blackman window of length n. Much lower sidelobes than Hanning/Hamming, with a wider main lobe. --- ## `void MD_Gen_Rect_Win(double *out, unsigned n)` Generate a rectangular window of length n (all ones). Useful as a baseline reference (equivalent to no tapering). --- ## `void MD_Gen_Kaiser_Win(double *out, unsigned n, double beta)` Generate a Kaiser window of length n with shape parameter beta. $$w[i] = \frac{I_0\!\left(\beta\,\sqrt{1 - \left(\frac{2i}{n-1}-1\right)^2}\right)} {I_0(\beta)}$$ The $\beta$ parameter controls the sidelobe/mainlobe tradeoff:- $\beta \approx 5$ : ~45 dB stopband attenuation (fast/draft) - $\beta \approx 10$ : ~100 dB stopband attenuation (high quality) - $\beta \approx 14$ : ~120 dB stopband attenuation (mastering) **Parameters:** - `out` - Output buffer of length n (caller-allocated). - `n` - Window length. Must be > 0. For n == 1, outputs 1.0. - `beta` - Shape parameter (> 0 for useful windows). ```c double win[256]; MD_Gen_Kaiser_Win(win, 256, 10.0); // high-quality Kaiser window ``` --- ## `void MD_sine_wave(double *output, unsigned N, double amplitude, double freq, double sample_rate)` Generate a sine wave. Fills `output[i] = amplitude * sin(2π * freq * i / sample_rate)` for i in [0, N). This is the simplest test signal in DSP — a pure tone at a single frequency. Use it to verify filter responses, check FFT bin alignment, or provide a clean input for any processing chain. **Parameters:** - `output` - Output buffer of length N (caller-allocated). - `N` - Number of samples to generate. Must be > 0. - `amplitude` - Peak amplitude of the sine wave (e.g. 1.0). - `freq` - Frequency in Hz. - `sample_rate` - Sampling rate in Hz. Must be > 0. ```c double sig[1024]; MD_sine_wave(sig, 1024, 1.0, 440.0, 44100.0); // 440 Hz A4 tone ``` --- ## `void MD_white_noise(double *output, unsigned N, double amplitude, unsigned seed)` Generate Gaussian white noise. Fills `output` with normally distributed random samples (mean 0, standard deviation `amplitude`) using the Box-Muller transform. White noise has equal energy at all frequencies -- its power spectral density is approximately flat. Use white noise to test filters, measure impulse responses, or as an additive noise source for SNR experiments. **Parameters:** - `output` - Output buffer of length N (caller-allocated). - `N` - Number of samples to generate. Must be > 0. - `amplitude` - Standard deviation of the noise (e.g. 1.0). - `seed` - Seed for the random number generator. Using the same seed produces the same output sequence, which is useful for reproducible tests. ```c double noise[4096]; MD_white_noise(noise, 4096, 1.0, 42); // reproducible Gaussian noise ``` --- ## `void MD_impulse(double *output, unsigned N, double amplitude, unsigned position)` Generate a discrete impulse (Kronecker delta). Fills the output buffer with zeros except at `position`, where the value is set to `amplitude`. A unit impulse (amplitude 1.0 at position 0) is the identity element of convolution and has a perfectly flat magnitude spectrum. Common uses:- Measure a system's impulse response by feeding it through a filter. - Verify that MD_magnitude_spectrum() returns a flat spectrum. - Create delayed spikes for testing convolution and delay estimation. **Parameters:** - `output` - Output buffer of length N (caller-allocated). - `N` - Number of samples to generate. Must be > 0. - `amplitude` - Value of the impulse spike (e.g. 1.0 for unit impulse). - `position` - Sample index of the spike (0-based). Must be < N. ```c double sig[1024]; MD_impulse(sig, 1024, 1.0, 0); // unit impulse at sample 0 ``` --- ## `void MD_chirp_linear(double *output, unsigned N, double amplitude, double f_start, double f_end, double sample_rate)` Generate a linear chirp (swept sine with linearly increasing frequency). The instantaneous frequency sweeps linearly from `f_start` to `f_end` over N samples: f(t) = f_start + (f_end - f_start) * t / T where T = (N-1) / sample_rate is the sweep duration. The output is: output[i] = amplitude * sin(2*pi * (f_start*t + 0.5*chirp_rate*t^2)) A linear chirp is the standard test signal for spectrograms -- its instantaneous frequency traces a straight diagonal line in the time-frequency plane. **Parameters:** - `output` - Output buffer of length N (caller-allocated). - `N` - Number of samples to generate. Must be > 0. - `amplitude` - Peak amplitude of the chirp (e.g. 1.0). - `f_start` - Starting frequency in Hz. - `f_end` - Ending frequency in Hz. - `sample_rate` - Sampling rate in Hz. Must be > 0. ```c double sig[16000]; // 1-second linear chirp from 200 Hz to 4000 Hz at 16 kHz sample rate MD_chirp_linear(sig, 16000, 1.0, 200.0, 4000.0, 16000.0); ``` --- ## `void MD_chirp_log(double *output, unsigned N, double amplitude, double f_start, double f_end, double sample_rate)` Generate a logarithmic chirp (swept sine with exponentially increasing frequency). The instantaneous frequency sweeps exponentially from `f_start` to `f_end` over N samples: f(t) = f_start * (f_end / f_start)^(t / T) where T = (N-1) / sample_rate is the sweep duration. The output is: output[i] = amplitude * sin(2*pi * f_start * T * (r^(t/T) - 1) / ln(r)) where r = f_end / f_start. A logarithmic chirp spends equal time per octave, making it ideal for measuring systems whose behaviour is best described on a log-frequency axis (e.g. audio equaliser response). **Parameters:** - `output` - Output buffer of length N (caller-allocated). - `N` - Number of samples to generate. Must be > 0. - `amplitude` - Peak amplitude of the chirp (e.g. 1.0). - `f_start` - Starting frequency in Hz. Must be > 0. - `f_end` - Ending frequency in Hz. Must be > 0 and != f_start. - `sample_rate` - Sampling rate in Hz. Must be > 0. ```c double sig[44100]; // 1-second log chirp from 20 Hz to 20 kHz at 44.1 kHz sample rate MD_chirp_log(sig, 44100, 1.0, 20.0, 20000.0, 44100.0); ``` --- ## `void MD_square_wave(double *output, unsigned N, double amplitude, double freq, double sample_rate)` Generate a square wave. Fills the output buffer with a bipolar square wave that alternates between +amplitude and −amplitude at the given frequency: $$x[n] = \begin{cases} +A & 0 < \phi < \pi \\ -A & \pi < \phi < 2\pi \\ 0 & \phi = 0 \text{ or } \phi = \pi \end{cases}$$ where $\phi = 2\pi f n / f_s \pmod{2\pi}$. A square wave contains only odd harmonics (1f, 3f, 5f, …) whose amplitudes decay as 1/k — a classic demonstration of the Fourier series and the Gibbs phenomenon. **Parameters:** - `output` - Output buffer of length N (caller-allocated). - `N` - Number of samples to generate. Must be > 0. - `amplitude` - Peak amplitude of the square wave (e.g. 1.0). - `freq` - Frequency in Hz. - `sample_rate` - Sampling rate in Hz. Must be > 0. ```c double sig[1024]; MD_square_wave(sig, 1024, 1.0, 440.0, 44100.0); ``` --- ## `void MD_sawtooth_wave(double *output, unsigned N, double amplitude, double freq, double sample_rate)` Generate a sawtooth wave. Fills the output buffer with a sawtooth wave that ramps linearly from −amplitude to +amplitude over each period: $$x[n] = A \left(\frac{\phi}{\pi} - 1\right)$$ where $\phi = 2\pi f n / f_s \pmod{2\pi}$. A sawtooth wave contains all integer harmonics (1f, 2f, 3f, …) whose amplitudes decay as 1/k — useful for demonstrating richer harmonic content compared to the square wave's odd-only series. **Parameters:** - `output` - Output buffer of length N (caller-allocated). - `N` - Number of samples to generate. Must be > 0. - `amplitude` - Peak amplitude of the sawtooth wave (e.g. 1.0). - `freq` - Frequency in Hz. - `sample_rate` - Sampling rate in Hz. Must be > 0. ```c double sig[1024]; MD_sawtooth_wave(sig, 1024, 1.0, 440.0, 44100.0); ``` --- ## `void MD_shepard_tone(double *output, unsigned N, double amplitude, double base_freq, double sample_rate, double rate_octaves_per_sec, unsigned num_octaves)` Generate a Shepard tone — the auditory illusion of endlessly rising or falling pitch. A [Shepard tone](https://en.wikipedia.org/wiki/Shepard_tone) superimposes several sine waves spaced one octave apart. Each tone glides continuously upward (or downward) in pitch while a Gaussian spectral envelope — fixed in log-frequency space — fades tones in at one end and out at the other. The result is a sound that seems to rise (or fall) forever without ever actually leaving its frequency range. **Signal model.** At time $t = n / f_s$ the output sample is: $$x[n] \;=\; A_\mathrm{norm}\!\sum_k \underbrace{ \exp\!\Bigl(-\frac{d_k(t)^2}{2\sigma^2}\Bigr) }_{\text{Gaussian envelope}} \;\sin\!\bigl(\varphi_k(n)\bigr)$$ where the octave distance from the Gaussian centre is $$d_k(t) = k - c + R\,t, \qquad c = \frac{L-1}{2}, \qquad \sigma = \frac{L}{4}$$ and the instantaneous frequency of layer $k$ is $f_k(t) = f_\text{base}\cdot 2^{d_k(t)}$. The phase $\varphi_k$ is accumulated sample-by-sample from $f_k$ so that each tone glides smoothly. $R$ is `rate_octaves_per_sec` and $L$ is `num_octaves`. $A_\mathrm{norm}$ scales the peak to `amplitude`. The sum ranges over all integer indices $k$ for which $|d_k(t)|$ is within $5\sigma$ at some point during the signal, ensuring enough layers are present to sustain the illusion for the full duration. **Parameters:** - `output` - Output buffer of length `N` (caller-allocated). - `N` - Number of samples to generate. Must be > 0. - `amplitude` - Peak amplitude of the output signal. - `base_freq` - Centre frequency of the Gaussian envelope in Hz. Typical values: 200–600 Hz. - `sample_rate` - Sampling rate in Hz. Must be > 0. - `rate_octaves_per_sec` - Glissando rate in octaves per second. Positive = rising, negative = falling, 0 = static chord. - `num_octaves` - Number of audible octave layers (width of the Gaussian bell). Typical values: 6–10. Must be > 0. ```c // 5 seconds of endlessly rising Shepard tone at 44.1 kHz unsigned N = 5 * 44100; double *sig = malloc(N * sizeof(double)); MD_shepard_tone(sig, N, 0.8, 440.0, 44100.0, 0.5, 8); // sig[] now sounds like it rises forever free(sig); ``` --- ## `unsigned MD_dtmf_detect(const double *signal, unsigned signal_len, double sample_rate, MD_DTMFTone *tones_out, unsigned max_tones)` Detect DTMF tones in an audio signal. Slides a Hanning-windowed FFT frame across the signal and checks each frame for a valid row + column frequency pair. A tone is reported only after it persists for at least 40 ms (ITU-T Q.24), and a new tone is not accepted until at least 40 ms of silence separates it from the previous one. The FFT size is the largest power of two whose window fits within 35 ms, keeping it shorter than the 40 ms Q.24 minimum pause so the state machine can resolve inter-digit gaps. The hop size is N/4 (75 % overlap). **Parameters:** - `signal` - Audio samples (mono). - `signal_len` - Number of samples. Must be > 0. - `sample_rate` - Sampling rate in Hz. Must be >= 4000 (Nyquist for the highest DTMF frequency, 1633 Hz). - `tones_out` - Output array for detected tones (caller-allocated). - `max_tones` - Size of `tones_out`. Must be > 0. **Returns:** Number of tones detected (<= `max_tones`). ```c // Generate a short DTMF sequence and detect it const char *digits = "5551234"; unsigned len = MD_dtmf_signal_length(7, 8000.0, 70, 70); double *sig = malloc(len * sizeof(double)); MD_dtmf_generate(sig, digits, 8000.0, 70, 70); MD_DTMFTone tones[16]; unsigned n = MD_dtmf_detect(sig, len, 8000.0, tones, 16); for (unsigned i = 0; i < n; i++) printf("%c %.3f--%.3f s\n", tones[i].digit, tones[i].start_s, tones[i].end_s); free(sig); ``` --- ## `void MD_dtmf_generate(double *output, const char *digits, double sample_rate, unsigned tone_ms, unsigned pause_ms)` Generate a DTMF tone sequence. Each digit is rendered as the sum of its row and column sinusoids, each at amplitude 0.5 (peak sum = 1.0). Digits are separated by silent gaps. Use MD_dtmf_signal_length() to pre-compute the required output buffer size. Timing constraints follow ITU-T Q.24:- `tone_ms` >= 40 (minimum recognisable tone) - `pause_ms` >= 40 (minimum inter-digit pause) **Parameters:** - `output` - Output buffer (caller-allocated, length from MD_dtmf_signal_length()). - `digits` - Null-terminated string of valid DTMF characters: '0'--'9', 'A'--'D' (case-insensitive), '*', '#'. An invalid character triggers an assertion failure. - `sample_rate` - Sampling rate in Hz. Must be > 0. - `tone_ms` - Duration of each tone in milliseconds (>= 40). - `pause_ms` - Duration of silence between tones in ms (>= 40). ```c unsigned len = MD_dtmf_signal_length(3, 8000.0, 70, 70); double *sig = malloc(len * sizeof(double)); MD_dtmf_generate(sig, "911", 8000.0, 70, 70); // sig now contains 3 tones of 70 ms each with 70 ms gaps free(sig); ``` --- ## `unsigned MD_dtmf_signal_length(unsigned num_digits, double sample_rate, unsigned tone_ms, unsigned pause_ms)` Calculate the number of samples needed for MD_dtmf_generate(). $$N = D \cdot \left\lfloor \frac{t_{\text{tone}} \cdot f_s}{1000} \right\rfloor + (D - 1) \cdot \left\lfloor \frac{t_{\text{pause}} \cdot f_s}{1000} \right\rfloor$$ where $D$ is the number of digits. **Parameters:** - `num_digits` - Number of digits in the sequence. - `sample_rate` - Sampling rate in Hz. Must be > 0. - `tone_ms` - Tone duration in milliseconds. - `pause_ms` - Pause duration in milliseconds. **Returns:** Total number of samples (0 if num_digits == 0). --- ## `void MD_shutdown(void)` Free all internally cached FFT plans and buffers. Call when done. --- ## `void MD_get_multiple_delays(const double **sigs, unsigned M, unsigned N, unsigned margin, int weightfunc, int *outdelays)` Estimate delays between a reference signal and M-1 other signals. **Parameters:** - `sigs` - Array of M pointers to signals (sigs[0] is the reference). - `M` - Number of signals. - `N` - Length of each signal (all must be the same length). - `margin` - Search +/- this many samples around zero-lag. - `weightfunc` - SIMP or PHAT (see MD_GCC_WEIGHTING_TYPE). - `outdelays` - Output array of M-1 delay values (must be pre-allocated). --- ## `int MD_get_delay(const double *siga, const double *sigb, unsigned N, double *ent, unsigned margin, int weightfunc)` Estimate the delay between two signals. **Parameters:** - `siga` - First signal. - `sigb` - Second signal. - `N` - Length of both signals. - `ent` - If non-null, receives the normalised entropy of the correlation peak region (closer to 1.0 = less trustworthy). - `margin` - Search +/- this many samples around zero-lag. - `weightfunc` - SIMP or PHAT. **Returns:** Delay in samples (positive = sigb lags siga). --- ## `void MD_gcc(const double *siga, const double *sigb, unsigned N, double *lagvals, int weightfunc)` Compute the full generalized cross-correlation between two signals. **Parameters:** - `siga` - First signal. - `sigb` - Second signal. - `N` - Length of both signals. - `lagvals` - Output array of N doubles (pre-allocated). The zero-lag value is at index ceil(N/2). - `weightfunc` - SIMP or PHAT. --- ## `unsigned MD_spectrogram_text(double *output, unsigned max_len, const char *text, double freq_lo, double freq_hi, double duration_sec, double sample_rate)` Synthesise audio that displays readable text in a spectrogram. Given a short string, the function rasterises it with a built-in 5x7 bitmap font. Each bitmap column becomes a time slice; each "on" pixel becomes a sine wave at the corresponding frequency between `freq_lo` and `freq_hi`. A 3 ms raised-cosine crossfade suppresses clicks at column boundaries. The output is normalised to 0.9 peak amplitude. **Parameters:** - `output` - Output buffer (caller-allocated). - `max_len` - Size of `output` in samples (must be >= returned value). - `text` - Printable ASCII string to render (must be non-empty). - `freq_lo` - Lowest frequency in Hz (bottom of text). - `freq_hi` - Highest frequency in Hz (top of text, must be < Nyquist). - `duration_sec` - Total duration in seconds. - `sample_rate` - Sample rate in Hz. **Returns:** Number of samples written to `output`. ```c double buf[64000]; unsigned n = MD_spectrogram_text(buf, 64000, "HELLO", 200.0, 7500.0, 2.0, 16000.0); // buf[0..n-1] now contains synthesised audio; view its spectrogram // to see "HELLO" spelled out in the frequency domain. ``` --- ## `unsigned MD_steg_capacity(unsigned signal_len, double sample_rate, int method)` Compute the maximum message length (in bytes) that can be hidden. **Parameters:** - `signal_len` - Length of the host signal in samples. - `sample_rate` - Sample rate in Hz. - `method` - MD_STEG_LSB, MD_STEG_FREQ_BAND, or MD_STEG_SPECTEXT. **Returns:** Maximum message bytes that fit (excluding null terminator). --- ## `unsigned MD_steg_encode(const double *host, double *output, unsigned signal_len, double sample_rate, const char *message, int method)` Encode a secret message into a host audio signal. The host signal is copied to `output`, then the message is embedded using the selected method. A 32-bit length header is prepended so that MD_steg_decode() can recover the message without knowing its length in advance. - **MD_STEG_LSB** — flips the least-significant bit of a 16-bit PCM representation of each sample. Distortion is at most ±1/32768 (≈ −90 dB). Works at any sample rate. - **MD_STEG_FREQ_BAND** — adds a low-amplitude BFSK tone in the near-ultrasonic band (18.5 / 19.5 kHz). Requires `sample_rate` >= 40000 Hz so the carriers remain below Nyquist. - **MD_STEG_SPECTEXT** — hybrid: LSB data + spectrogram text art in the 18--23.5 kHz band. Auto-upsamples to 48 kHz; output buffer must be sized for the 48 kHz signal length when input rate < 48 kHz. **Parameters:** - `host` - Input host signal (not modified). - `output` - Output stego signal (caller-allocated; for MD_STEG_SPECTEXT with sample_rate < 48 kHz, must hold MD_resample_output_len(signal_len, sample_rate, 48000) samples). - `signal_len` - Length of host and output in samples. - `sample_rate` - Sample rate in Hz. - `message` - Null-terminated message string to hide. - `method` - MD_STEG_LSB, MD_STEG_FREQ_BAND, or MD_STEG_SPECTEXT. **Returns:** Number of message bytes encoded (0 on failure). ```c double host[44100], stego[44100]; MD_sine_wave(host, 44100, 0.8, 440.0, 44100.0); unsigned n = MD_steg_encode(host, stego, 44100, 44100.0, "secret", MD_STEG_LSB); ``` --- ## `unsigned MD_steg_decode(const double *stego, unsigned signal_len, double sample_rate, char *message_out, unsigned max_msg_len, int method)` Decode a secret message from a stego audio signal. Reads the 32-bit length header, then extracts message bytes using the same method that was used for encoding. **Parameters:** - `stego` - The stego signal containing the hidden message. - `signal_len` - Length of the stego signal in samples. - `sample_rate` - Sample rate in Hz. - `message_out` - Output buffer for the decoded message (caller-allocated). - `max_msg_len` - Size of `message_out` buffer (including null terminator). - `method` - MD_STEG_LSB, MD_STEG_FREQ_BAND, or MD_STEG_SPECTEXT. **Returns:** Number of message bytes decoded (0 if none found). ```c char recovered[256]; unsigned n = MD_steg_decode(stego, 44100, 44100.0, recovered, 256, MD_STEG_LSB); printf("Hidden message: %s\n", recovered); ``` --- ## `unsigned MD_steg_encode_bytes(const double *host, double *output, unsigned signal_len, double sample_rate, const unsigned char *data, unsigned data_len, int method)` Encode arbitrary binary data into a host audio signal. Works like MD_steg_encode() but accepts a raw byte buffer instead of a null-terminated string, so it can embed data containing `0x00` bytes (e.g. images, compressed archives, cryptographic keys). **Parameters:** - `host` - Input host signal (not modified). - `output` - Output stego signal (caller-allocated, same length). - `signal_len` - Length of host and output in samples. - `sample_rate` - Sample rate in Hz. - `data` - Pointer to the binary data to hide. - `data_len` - Length of `data` in bytes. - `method` - MD_STEG_LSB, MD_STEG_FREQ_BAND, or MD_STEG_SPECTEXT. **Returns:** Number of data bytes encoded (0 on failure). ```c double host[44100], stego[44100]; MD_sine_wave(host, 44100, 0.8, 440.0, 44100.0); unsigned char png_data[332]; // ... read PNG file into png_data ... unsigned n = MD_steg_encode_bytes(host, stego, 44100, 44100.0, png_data, 332, MD_STEG_LSB); ``` --- ## `unsigned MD_steg_decode_bytes(const double *stego, unsigned signal_len, double sample_rate, unsigned char *data_out, unsigned max_len, int method)` Decode binary data from a stego audio signal. Works like MD_steg_decode() but writes raw bytes without null termination, suitable for recovering binary payloads. **Parameters:** - `stego` - The stego signal containing the hidden data. - `signal_len` - Length of the stego signal in samples. - `sample_rate` - Sample rate in Hz. - `data_out` - Output buffer for the decoded bytes (caller-allocated). - `max_len` - Maximum number of bytes to write to `data_out`. - `method` - MD_STEG_LSB, MD_STEG_FREQ_BAND, or MD_STEG_SPECTEXT. **Returns:** Number of data bytes decoded (0 if none found). ```c unsigned char recovered[1024]; unsigned n = MD_steg_decode_bytes(stego, 44100, 44100.0, recovered, 1024, MD_STEG_LSB); // recovered[0..n-1] contains the original binary data ``` --- ## `int MD_steg_detect(const double *signal, unsigned signal_len, double sample_rate, int *payload_type_out)` Detect which steganography method (if any) was used to encode a signal. Probes the signal for a valid steganographic header using both LSB and frequency-band (BFSK) methods. If a valid header is found, returns the method identifier and optionally the payload type (text or binary). BFSK detection is only attempted when `sample_rate` >= 40000 Hz. If both methods appear valid, BFSK is preferred (lower false-positive rate). **Parameters:** - `signal` - The signal to inspect. - `signal_len` - Length of the signal in samples. - `sample_rate` - Sample rate in Hz. - `payload_type_out` - If non-null, receives MD_STEG_TYPE_TEXT or MD_STEG_TYPE_BINARY when a method is detected. **Returns:** MD_STEG_LSB, MD_STEG_FREQ_BAND, MD_STEG_SPECTEXT, or -1 if no steganographic content is detected. ```c int payload_type; int method = MD_steg_detect(signal, signal_len, 44100.0, &payload_type); if (method >= 0) { printf("Detected %s payload (%s)\n", payload_type == MD_STEG_TYPE_BINARY ? "binary" : "text", method == MD_STEG_LSB ? "LSB" : "BFSK"); } ``` --- ## `unsigned MD_resample_output_len(unsigned input_len, double in_rate, double out_rate)` Compute the output buffer size needed for resampling. Returns $\lceil \mathrm{input\_len} \times \mathrm{out\_rate} / \mathrm{in\_rate} \rceil$. **Parameters:** - `input_len` - Number of input samples (must be > 0). - `in_rate` - Input sample rate in Hz (must be > 0). - `out_rate` - Output sample rate in Hz (must be > 0). **Returns:** Required output buffer length. ```c unsigned n = MD_resample_output_len(44100, 44100.0, 48000.0); // 48000 ``` --- ## `unsigned MD_resample(const double *input, unsigned input_len, double *output, unsigned max_output_len, double in_rate, double out_rate, unsigned num_zero_crossings, double kaiser_beta)` Resample a signal from one sample rate to another using polyphase sinc interpolation. Uses a fixed table of 512 sub-phases with linear interpolation and a Kaiser-windowed sinc anti-aliasing filter. Anti-aliasing cutoff is automatically set to $\min(\mathrm{in\_rate}, \mathrm{out\_rate})/2$. **Parameters:** - `input` - Input signal of length input_len. - `input_len` - Number of input samples (must be > 0). - `output` - Output buffer (caller-allocated). Use MD_resample_output_len() to size it. - `max_output_len` - Capacity of output buffer. Must be >= MD_resample_output_len(input_len, in_rate, out_rate). - `in_rate` - Input sample rate in Hz (must be > 0). - `out_rate` - Output sample rate in Hz (must be > 0). - `num_zero_crossings` - Number of sinc zero-crossings per side. Higher = better quality. Typical: 32. - `kaiser_beta` - Kaiser window shape parameter. Typical: 10.0 for ~100 dB stopband. **Returns:** Number of samples written to output. ```c double in[44100], out[48000]; MD_sine_wave(in, 44100, 1.0, 440.0, 44100.0); unsigned n = MD_resample(in, 44100, out, 48000, 44100.0, 48000.0, 32, 10.0); ``` --- ## `void MD_vad_default_params(MD_vad_params *params)` Populate a VAD params struct with sensible defaults. Default values:- weights: 0.2 each (equal) - threshold: 0.5 - onset_frames: 3 - hangover_frames: 15 - adaptation_rate: 0.01 - band: 300–3400 Hz **Parameters:** - `params` - Output params struct. Must not be NULL. ```c MD_vad_params p; MD_vad_default_params(&p); p.threshold = 0.4; // lower threshold for noisy environments ``` **See also:** MD_vad_init(), MD_vad_process_frame() --- ## `void MD_vad_init(MD_vad_state *state, const MD_vad_params *params)` Initialize VAD state from params. If `params` is NULL, default params are used (equivalent to calling MD_vad_default_params() first). After initialization the detector is in the SILENCE state with all counters at zero. **Parameters:** - `state` - Output state struct. Must not be NULL. - `params` - Parameters to copy, or NULL for defaults. ```c MD_vad_state st; MD_vad_init(&st, NULL); // use defaults ``` **See also:** MD_vad_default_params(), MD_vad_process_frame() --- ## `void MD_vad_calibrate(MD_vad_state *state, const double *signal, unsigned N, double sample_rate)` Feed a known-silence frame to seed the adaptive normalization. Computes all five features and updates the EMA min/max estimates without running the state machine or producing a decision. Call this on several silence frames before processing live audio to improve initial normalization accuracy. **Parameters:** - `state` - VAD state (must be initialized). - `signal` - Frame samples of length `N`. - `N` - Frame length in samples (must be >= 2). - `sample_rate` - Sample rate in Hz (must be > 0). ```c // Calibrate on 10 frames of silence double silence[256] = {0}; for (int i = 0; i < 10; i++) MD_vad_calibrate(&st, silence, 256, 16000.0); ``` **See also:** MD_vad_init(), MD_vad_process_frame() --- ## `int MD_vad_process_frame(MD_vad_state *state, const double *signal, unsigned N, double sample_rate, double *score_out, double *features_out)` Process one audio frame and return a binary speech decision. Processing pipeline:1. Extract five raw features (energy, ZCR, spectral entropy, spectral flatness, band energy ratio). 2. Update adaptive normalization (EMA min/max). 3. Normalize features to [0.0, 1.0]. 4. Compute weighted score: $$S = \sum_{i=0}^{4} w_i \cdot \hat{f}_i$$ 5. Apply onset/hangover state machine. **Parameters:** - `state` - VAD state (must be initialized). - `signal` - Frame samples of length `N`. - `N` - Frame length in samples (must be >= 2). - `sample_rate` - Sample rate in Hz (must be > 0). - `score_out` - If non-NULL, receives the combined score. - `features_out` - If non-NULL, receives MD_VAD_NUM_FEATURES normalized feature values in [0.0, 1.0]. **Returns:** 1 if speech detected, 0 if silence. ```c double frame[256]; double score; double feats[MD_VAD_NUM_FEATURES]; // ... fill frame ... int decision = MD_vad_process_frame(&st, frame, 256, 16000.0, &score, feats); ``` **See also:** MD_energy(), MD_zero_crossing_rate(), MD_power_spectral_density() --- --- # Tutorials # Signal Generators Signal generators produce test signals from scratch — no audio file or microphone required. They are the "hello world" of DSP: a pure sine wave, an impulse, or a burst of noise gives you a known input that you can trace through any processing chain and verify at every step. miniDSP ships generators as simple, stateless functions. They take an output buffer and a handful of parameters; no setup or teardown is needed. --- ## Sine wave A sine wave at frequency $f$ Hz, amplitude $A$, and sample rate $f_s$: $$x[n] = A \sin\!\left(\frac{2\pi f n}{f_s}\right), \quad n = 0, 1, \ldots, N-1$$ **Reading the formula in C:** ```c // A -> amplitude, f -> freq, fs -> sample_rate, n -> i, x[n] -> output[i] double phase_step = 2.0 * M_PI * freq / sample_rate; // precompute 2*pi*f / fs for (unsigned i = 0; i < N; i++) output[i] = amplitude * sin(phase_step * (double)i); // A * sin(2*pi*f*n / fs) ``` **API:** ```c void MD_sine_wave(double *output, unsigned N, double amplitude, double freq, double sample_rate); ``` **Listen** — 440 Hz, 2 seconds: **Quick example** — generate one second of A4 (440 Hz): ```c MD_sine_wave(signal, N, amplitude, freq_hz, sample_rate); ``` **Verifying via spectrum** The clearest way to confirm the generator is correct is to feed its output to `MD_magnitude_spectrum()` and check that the peak lands on the expected bin: ```c unsigned N = 1024; double fs = 16000.0, freq = 1000.0; /* bin 64 */ double sig[N], mag[N/2 + 1]; MD_sine_wave(sig, N, 1.0, freq, fs); MD_magnitude_spectrum(sig, N, mag); /* mag[64] should be the largest value */ ``` See `examples/sine_wave.c` for a full runnable program that generates the spectrum and writes an interactive HTML plot. --- ## Impulse A discrete impulse ([Kronecker delta](https://en.wikipedia.org/wiki/Kronecker_delta)) at position $n_0$ with amplitude $A$: $$x[n] = \begin{cases} A & \text{if } n = n_0 \\ 0 & \text{otherwise} \end{cases}$$ **Reading the formula in C:** ```c // A -> amplitude, n0 -> position, x[n] -> output[n] memset(output, 0, N * sizeof(double)); // set all samples to 0 output[position] = amplitude; // except at n = n0, where x[n] = A ``` The unit impulse ($A = 1$, $n_0 = 0$) is the identity element of [convolution](https://en.wikipedia.org/wiki/Convolution) and has a perfectly flat magnitude spectrum — every frequency bin equals 1.0. **API:** ```c void MD_impulse(double *output, unsigned N, double amplitude, unsigned position); ``` **Listen** — impulse train (four clicks at 0.5-second intervals), 2 seconds: **Quick example** — generate a unit impulse at sample 0: ```c MD_impulse(signal, N, amplitude, position); ``` **Verifying via spectrum** Feed the impulse to `MD_magnitude_spectrum()` and confirm every bin has the same magnitude: ```c unsigned N = 1024; double sig[N], mag[N/2 + 1]; MD_impulse(sig, N, 1.0, 0); MD_magnitude_spectrum(sig, N, mag); /* Every element of mag[] should be 1.0 */ ``` See `examples/impulse.c` for a full runnable program that generates both time-domain and frequency-domain plots. --- ## Chirp (swept sine) A **[chirp](https://en.wikipedia.org/wiki/Chirp)** sweeps frequency over time — either linearly or logarithmically. Chirps are the standard test signal for spectrograms and for measuring filter magnitude response. ### Linear chirp A linear chirp sweeps instantaneous frequency from $f_0$ to $f_1$ at a constant rate over duration $T = (N-1)/f_s$: $$x[n] = A \sin\!\left(2\pi\!\left(f_0\,t + \frac{1}{2}\,\frac{f_1 - f_0}{T}\,t^2\right)\right), \quad t = n / f_s$$ **Reading the formula in C:** ```c // A -> amplitude, f0 -> f_start, f1 -> f_end, fs -> sample_rate // t = n/fs, T = (N-1)/fs, x[n] -> output[i] double T = (double)(N - 1) / sample_rate; double chirp_rate = (f_end - f_start) / T; // (f1 - f0) / T for (unsigned i = 0; i < N; i++) { double t = (double)i / sample_rate; // t = n / fs double phase = 2.0 * M_PI * (f_start * t + 0.5 * chirp_rate * t * t); output[i] = amplitude * sin(phase); // A * sin(2*pi * (f0*t + 1/2 * (f1-f0)/T * t^2)) } ``` **API:** ```c void MD_chirp_linear(double *output, unsigned N, double amplitude, double f_start, double f_end, double sample_rate); ``` **Listen** — linear sweep from 20 Hz to 4000 Hz, 2 seconds: ### Logarithmic chirp A logarithmic chirp sweeps frequency exponentially, spending equal time per octave. This is ideal for audio systems whose response is best viewed on a log-frequency axis. $$x[n] = A \sin\!\left(\frac{2\pi f_0 T}{\ln r}\!\left(r^{t/T} - 1\right)\right), \quad r = f_1 / f_0,\quad t = n / f_s$$ **Reading the formula in C:** ```c // A -> amplitude, f0 -> f_start, f1 -> f_end, fs -> sample_rate // r = f1/f0, t = n/fs, T = (N-1)/fs, x[n] -> output[i] double T = (double)(N - 1) / sample_rate; double ratio = f_end / f_start; // r = f1 / f0 double log_ratio = log(ratio); // ln(r) for (unsigned i = 0; i < N; i++) { double t = (double)i / sample_rate; // t = n / fs double phase = 2.0 * M_PI * f_start * T * (pow(ratio, t / T) - 1.0) / log_ratio; // 2*pi*f0*T / ln(r) * (r^(t/T) - 1) output[i] = amplitude * sin(phase); } ``` **API:** ```c void MD_chirp_log(double *output, unsigned N, double amplitude, double f_start, double f_end, double sample_rate); ``` Requires $f_0 > 0$, $f_1 > 0$, and $f_0 \ne f_1$. **Listen** — logarithmic sweep from 20 Hz to 4000 Hz, 2 seconds: **Quick example** — generate both chirp types and compare: ```c MD_chirp_linear(sig_lin, N, amplitude, f_start, f_end, sample_rate); MD_chirp_log(sig_log, N, amplitude, f_start, f_end, sample_rate); ``` See `examples/chirp_wave.c` for a full runnable program that generates the magnitude spectra of both chirp types and writes an interactive HTML plot. --- ## Square wave A square wave at frequency $f$ Hz alternates between $+A$ and $-A$: $$x[n] = \begin{cases} +A & 0 < \phi < \pi \\ -A & \pi < \phi < 2\pi \\ 0 & \phi = 0 \text{ or } \phi = \pi \end{cases}$$ where $\phi = 2\pi f n / f_s \pmod{2\pi}$. **Reading the formula in C:** ```c // A -> amplitude, f -> freq, fs -> sample_rate // phi -> phase, x[n] -> output[i] double phase_step = 2.0 * M_PI * freq / sample_rate; // 2*pi*f / fs for (unsigned i = 0; i < N; i++) { double phase = fmod(phase_step * (double)i, 2.0 * M_PI); // phi = 2*pi*f*n/fs (mod 2*pi) if (phase < M_PI) output[i] = amplitude; // +A when 0 < phi < pi else output[i] = -amplitude; // -A when pi < phi < 2*pi } ``` Its [Fourier series](https://en.wikipedia.org/wiki/Fourier_series) contains **only odd harmonics** (1f, 3f, 5f, …) with amplitudes decaying as $1/k$ — a textbook demonstration of the [Gibbs phenomenon](https://en.wikipedia.org/wiki/Gibbs_phenomenon). **API:** ```c void MD_square_wave(double *output, unsigned N, double amplitude, double freq, double sample_rate); ``` **Listen** — 440 Hz, 2 seconds: **Quick example:** ```c MD_square_wave(sq_sig, N, amplitude, freq_hz, sample_rate); ``` See `examples/square_sawtooth.c` for a full program that compares the square and sawtooth spectra side by side. --- ## Sawtooth wave A sawtooth wave ramps linearly from $-A$ to $+A$ over each period: $$x[n] = A \left(\frac{\phi}{\pi} - 1\right)$$ where $\phi = 2\pi f n / f_s \pmod{2\pi}$. **Reading the formula in C:** ```c // A -> amplitude, f -> freq, fs -> sample_rate // phi -> phase, x[n] -> output[i] double phase_step = 2.0 * M_PI * freq / sample_rate; // 2*pi*f / fs for (unsigned i = 0; i < N; i++) { double phase = fmod(phase_step * (double)i, 2.0 * M_PI); // phi = 2*pi*f*n/fs (mod 2*pi) output[i] = amplitude * (phase / M_PI - 1.0); // A * (phi/pi - 1) } ``` Unlike the square wave, the sawtooth contains **all integer harmonics** (1f, 2f, 3f, …), each decaying as $1/k$. Comparing the two spectra shows how waveform shape determines harmonic content. **API:** ```c void MD_sawtooth_wave(double *output, unsigned N, double amplitude, double freq, double sample_rate); ``` **Listen** — 440 Hz, 2 seconds: **Quick example:** ```c MD_sawtooth_wave(saw_sig, N, amplitude, freq_hz, sample_rate); ``` See `examples/square_sawtooth.c` for the full runnable program. --- ## White noise [Gaussian white noise](https://en.wikipedia.org/wiki/White_noise) has **equal power at all frequencies** — its power spectral density (PSD) is approximately flat across the entire band. It is the standard broadband test signal for filter characterisation, impulse response measurement, and SNR experiments. Each sample is drawn independently from a [normal distribution](https://en.wikipedia.org/wiki/Normal_distribution) with mean 0 and standard deviation $\sigma$: $$x[n] \sim \mathrm{N}(0,\, \sigma^2), \quad n = 0, 1, \ldots, N-1$$ Samples are generated with the [Box-Muller transform](https://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform) seeded by `seed`, so the same seed always produces the same sequence — useful for reproducible tests. **Reading the formula in C** — the [Box-Muller transform](https://en.wikipedia.org/wiki/Box%E2%80%93Muller_transform) turns uniform random numbers into Gaussian samples: ```c // sigma -> amplitude, x[n] -> output[i] // u1, u2 are uniform random numbers in (0, 1) double r = sqrt(-2.0 * log(u1)); // sqrt(-2 * ln(u1)) double theta = 2.0 * M_PI * u2; // 2 * pi * u2 output[i] = amplitude * r * cos(theta); // sigma * sqrt(-2*ln(u1)) * cos(2*pi*u2) output[i + 1] = amplitude * r * sin(theta); // sigma * sqrt(-2*ln(u1)) * sin(2*pi*u2) ``` **API:** ```c void MD_white_noise(double *output, unsigned N, double amplitude, unsigned seed); ``` `amplitude` is the standard deviation $\sigma$ of the distribution. **Listen** — Gaussian white noise (sigma 0.25), 2 seconds: **Quick example** — generate 4096 samples of unit-variance noise: ```c MD_white_noise(signal, N, amplitude, seed); ``` **Verifying via PSD** Feed the noise to `MD_power_spectral_density()` and confirm the spectrum is approximately flat — no bin should dominate: ```c unsigned N = 4096; double *sig = malloc(N * sizeof(double)); double *psd = malloc((N/2 + 1) * sizeof(double)); MD_white_noise(sig, N, 1.0, 42); MD_power_spectral_density(sig, N, psd); /* psd[] should fluctuate around a constant level */ ``` See `examples/white_noise.c` for a full runnable program that computes and plots the power spectral density. --- ## Further reading - [Chirp](https://en.wikipedia.org/wiki/Chirp) — Wikipedia - [Fourier series](https://en.wikipedia.org/wiki/Fourier_series) — Wikipedia - [Gibbs phenomenon](https://en.wikipedia.org/wiki/Gibbs_phenomenon) — Wikipedia - [White noise](https://en.wikipedia.org/wiki/White_noise) — Wikipedia - Julius O. Smith, [Mathematics of the DFT](https://ccrma.stanford.edu/~jos/mdft/) — free online textbook --- # Window Functions [Window functions](https://en.wikipedia.org/wiki/Window_function) taper a finite signal block before an [FFT](https://en.wikipedia.org/wiki/Fast_Fourier_transform) so the block edges do not create a large discontinuity. That discontinuity causes **[spectral leakage](https://en.wikipedia.org/wiki/Spectral_leakage)**: energy spreads into neighboring bins. miniDSP provides five windows so you can compare the trade-off between main-lobe width (frequency resolution) and [sidelobe](https://en.wikipedia.org/wiki/Sidelobes) level (leakage suppression). --- ## Hanning window The Hanning (Hann) window is a common default: $$w[n] = 0.5 \left(1 - \cos\!\left(\frac{2\pi n}{N-1}\right)\right), \quad n = 0, 1, \ldots, N-1$$ It tapers smoothly to zero at both ends and gives good all-around performance for FFT analysis. **Reading the formula in C:** ```c // n -> N (window length), i -> n (sample index), out[i] -> w[n] if (n == 1) { out[0] = 1.0; } else { double n_minus_1 = (double)(n - 1); for (unsigned i = 0; i < n; i++) { out[i] = 0.5 * (1.0 - cos(2.0 * M_PI * (double)i / n_minus_1)); } } ``` **API:** ```c void MD_Gen_Hann_Win(double *out, unsigned n); ``` **Visuals** — window taps and magnitude response: The smooth taper to zero at both ends reduces leakage compared with a rectangular window. **Quick example:** ```c MD_Gen_Hann_Win(hann, N); ``` --- ## Hamming window The Hamming window keeps a similar shape to Hanning, but with non-zero endpoints and a lower first sidelobe: $$w[n] = 0.54 - 0.46 \cos\!\left(\frac{2\pi n}{N-1}\right)$$ **Reading the formula in C:** ```c // n -> N (window length), i -> n (sample index), out[i] -> w[n] if (n == 1) { out[0] = 1.0; } else { double n_minus_1 = (double)(n - 1); for (unsigned i = 0; i < n; i++) { out[i] = 0.54 - 0.46 * cos(2.0 * M_PI * (double)i / n_minus_1); } } ``` **API:** ```c void MD_Gen_Hamming_Win(double *out, unsigned n); ``` **Visuals** — window taps and magnitude response: Compared with Hanning, the non-zero endpoints and coefficients shift the sidelobe pattern while keeping a similar main-lobe width. **Quick example:** ```c MD_Gen_Hamming_Win(hamming, N); ``` --- ## Blackman window The Blackman window strongly suppresses sidelobes by adding another cosine term: $$w[n] = 0.42 - 0.5 \cos\!\left(\frac{2\pi n}{N-1}\right) + 0.08 \cos\!\left(\frac{4\pi n}{N-1}\right)$$ Compared with Hanning/Hamming, it has much lower sidelobes but a wider main lobe. **Reading the formula in C:** ```c // n -> N (window length), i -> n (sample index), // p -> 2*pi*n/(N-1), out[i] -> w[n] if (n == 1) { out[0] = 1.0; } else { double n_minus_1 = (double)(n - 1); for (unsigned i = 0; i < n; i++) { double p = 2.0 * M_PI * (double)i / n_minus_1; out[i] = 0.42 - 0.5 * cos(p) + 0.08 * cos(2.0 * p); } } ``` **API:** ```c void MD_Gen_Blackman_Win(double *out, unsigned n); ``` **Visuals** — window taps and magnitude response: You should see much lower sidelobes than Hanning/Hamming, with a wider main lobe in the response plot. **Quick example:** ```c MD_Gen_Blackman_Win(blackman, N); ``` --- ## Rectangular window The rectangular window is the no-taper baseline: $$w[n] = 1$$ It preserves the narrowest main lobe but has the highest sidelobes. **Reading the formula in C:** ```c // i -> n (sample index), out[i] -> w[n] for (unsigned i = 0; i < n; i++) { out[i] = 1.0; } ``` **API:** ```c void MD_Gen_Rect_Win(double *out, unsigned n); ``` **Visuals** — window taps and magnitude response: As the no-taper baseline, rectangular gives the narrowest main lobe and the highest sidelobes. **Quick example:** ```c MD_Gen_Rect_Win(rect, N); ``` --- ## Kaiser window The [Kaiser window](https://en.wikipedia.org/wiki/Kaiser_window) uses the zeroth-order modified Bessel function $I_0$ to provide continuous control over the sidelobe/mainlobe tradeoff via a single parameter $\beta$: $$w[n] = \frac{I_0\!\left(\beta\,\sqrt{1 - \left(\frac{2n}{N-1}-1\right)^2}\right)} {I_0(\beta)}, \quad n = 0, 1, \ldots, N-1$$ Higher $\beta$ values produce lower sidelobes (better leakage suppression) at the cost of a wider main lobe: - $\beta \approx 5$ : ~45 dB stopband attenuation - $\beta \approx 10$ : ~100 dB stopband attenuation - $\beta \approx 14$ : ~120 dB stopband attenuation **Reading the formula in C:** ```c // n -> N (window length), i -> n (sample index), out[i] -> w[n] // beta -> shape parameter, inv_i0_beta -> 1/I₀(β) if (n == 1) { out[0] = 1.0; } else { double inv_i0_beta = 1.0 / MD_bessel_i0(beta); double n_minus_1 = (double)(n - 1); for (unsigned i = 0; i < n; i++) { double t = 2.0 * (double)i / n_minus_1 - 1.0; double arg = 1.0 - t * t; if (arg < 0.0) arg = 0.0; out[i] = MD_bessel_i0(beta * sqrt(arg)) * inv_i0_beta; } } ``` **API:** ```c void MD_Gen_Kaiser_Win(double *out, unsigned n, double beta); ``` **Visuals** — window taps and magnitude response ($\beta = 10$): With $\beta = 10$, the Kaiser window achieves ~100 dB stopband attenuation — much deeper suppression than Blackman, with a comparable main lobe width. **Quick example:** ```c MD_Gen_Kaiser_Win(kaiser, N, 10.0); ``` --- ## Quick comparison | Window | Edge values | Sidelobes | Main lobe | Tunable? | |--------|-------------|-----------|-----------|----------| | Rectangular | 1.0 | Highest | Narrowest | No | | Hanning | 0.0 | Low | Medium | No | | Hamming | 0.08 | Lower first sidelobe | Medium | No | | Blackman | 0.0 | Very low | Widest | No | | Kaiser | > 0 | Configurable via $\beta$ | Configurable | Yes | If you are unsure where to start, Hanning is a good default. Use Blackman when leakage suppression matters more than peak sharpness. Use Kaiser when you need precise control over the sidelobe level (e.g., for FIR filter design or resampling). All response plots above use the same tap length and zero-padded FFT size, so sidelobe and main-lobe differences are directly comparable. ## Further reading - [Window function](https://en.wikipedia.org/wiki/Window_function) -- Wikipedia - [Hann function](https://en.wikipedia.org/wiki/Hann_function) -- Wikipedia - [Hamming window](https://en.wikipedia.org/wiki/Window_function#Hann_and_Hamming_windows) -- Wikipedia - [Blackman window](https://en.wikipedia.org/wiki/Window_function#Blackman_window) -- Wikipedia - [Kaiser window](https://en.wikipedia.org/wiki/Kaiser_window) -- Wikipedia ## API reference - MD_Gen_Hann_Win() -- generate a Hanning window - MD_Gen_Hamming_Win() -- generate a Hamming window - MD_Gen_Blackman_Win() -- generate a Blackman window - MD_Gen_Rect_Win() -- generate a rectangular window - MD_Gen_Kaiser_Win() -- generate a Kaiser window with configurable $\beta$ --- # Computing the Magnitude Spectrum This tutorial walks through the `examples/magnitude_spectrum.c` program, which generates a multi-tone signal, windows it, computes the magnitude spectrum with MD_magnitude_spectrum(), and visualises the result. ## What is a magnitude spectrum? Every real-valued signal can be decomposed into a sum of sinusoids at different frequencies. The **magnitude spectrum** tells you the amplitude of each sinusoidal component. The tool for this decomposition is the **[Discrete Fourier Transform (DFT)](https://en.wikipedia.org/wiki/Discrete_Fourier_transform)**. Given $N$ samples $x[n]$, the DFT produces $N$ complex coefficients: $$X(k) = \sum_{n=0}^{N-1} x[n] \, e^{-j\,2\pi\,k\,n/N}, \qquad k = 0, 1, \ldots, N-1$$ **Reading the formula in C:** ```c // x[n] -> signal[n], X(k) -> (X_re[k], X_im[k]), N -> num_samples for (unsigned k = 0; k < num_samples; k++) { double re = 0.0, im = 0.0; for (unsigned n = 0; n < num_samples; n++) { double theta = 2.0 * M_PI * (double)k * (double)n / (double)num_samples; re += signal[n] * cos(theta); im -= signal[n] * sin(theta); } X_re[k] = re; X_im[k] = im; } ``` The **magnitude spectrum** is simply the absolute value of these coefficients: $|X(k)|$. Each bin $k$ corresponds to frequency $f_k = k \cdot f_s / N$, where $f_s$ is the sample rate. For a real signal, the spectrum is symmetric around $N/2$, so only the first $N/2 + 1$ bins (the "one-sided" spectrum) carry unique information. ## Step 1: Generate a test signal We create a signal with three known sinusoidal components (440 Hz, 1000 Hz, and 2500 Hz) at different amplitudes, plus a small DC offset. This lets us verify that the spectrum shows peaks at exactly those frequencies. ```c double *tone = malloc(N * sizeof(double)); for (unsigned i = 0; i < N; i++) signal[i] = dc; MD_sine_wave(tone, N, amp1, freq1, sample_rate); for (unsigned i = 0; i < N; i++) signal[i] += tone[i]; MD_sine_wave(tone, N, amp2, freq2, sample_rate); for (unsigned i = 0; i < N; i++) signal[i] += tone[i]; MD_sine_wave(tone, N, amp3, freq3, sample_rate); for (unsigned i = 0; i < N; i++) signal[i] += tone[i]; free(tone); ``` ## Step 2: Apply a window function Before computing the FFT, we multiply the signal by a **[Hanning window](https://en.wikipedia.org/wiki/Hann_function)**. Why? The DFT assumes the input is one period of a periodic signal. In reality, our signal chunk rarely starts and ends at a perfect zero crossing. This mismatch creates an artificial discontinuity at the boundaries, which causes energy to "leak" from the true frequency into neighbouring bins -- an effect called **[spectral leakage](https://en.wikipedia.org/wiki/Spectral_leakage)**. A window function tapers the signal smoothly to zero at both edges, eliminating the discontinuity: $$w[n] = 0.5 \left(1 - \cos\!\left(\frac{2\pi\,n}{N-1}\right)\right)$$ **Reading the formula in C:** ```c // w[n] -> window[n], x[n] -> signal[n], x_w[n] -> windowed[n], N -> num_samples for (unsigned n = 0; n < num_samples; n++) { window[n] = 0.5 * (1.0 - cos(2.0 * M_PI * (double)n / (double)(num_samples - 1))); windowed[n] = signal[n] * window[n]; } ``` ```c MD_Gen_Hann_Win(window, N); for (unsigned i = 0; i < N; i++) { windowed[i] = signal[i] * window[i]; } ``` The trade-off: windowing widens the main lobe of each peak slightly (reducing frequency resolution) but dramatically suppresses the side lobes (improving dynamic range). ## Step 3: Compute the magnitude spectrum With the windowed signal ready, a single call to MD_magnitude_spectrum() computes $|X(k)|$ for bins $k = 0, 1, \ldots, N/2$: ```c MD_magnitude_spectrum(windowed, N, mag); ``` Internally, miniDSP uses FFTW to compute the real-to-complex FFT, then takes the absolute value of each complex coefficient. ## Step 4: Convert to a one-sided amplitude spectrum The raw magnitudes $|X(k)|$ are unnormalised (they scale with $N$). To recover the actual signal amplitudes, we: 1. **Divide every bin by $N$** to undo the DFT scaling. 2. **Double the interior bins** ($k = 1 \ldots N/2-1$) because the energy from the discarded negative-frequency bins folds onto the positive side. 3. **Leave DC and [Nyquist](https://en.wikipedia.org/wiki/Nyquist_frequency) alone** -- they have no mirror image. ```c for (unsigned k = 0; k < num_bins; k++) { freqs[k] = (double)k * sample_rate / (double)N; mag[k] /= (double)N; if (k > 0 && k < N / 2) { mag[k] *= 2.0; } } ``` After this normalisation, a pure sine wave at amplitude $A$ produces a peak of height $A$ in the spectrum. ## Results ### Linear scale The linear-scale plot shows clear spikes at the three input frequencies. The 440 Hz tone (amplitude 1.0) is tallest, followed by 1000 Hz (0.6) and 2500 Hz (0.3). The tiny spike at 0 Hz is the DC offset (0.1). \image html magnitude-spectrum-linear.png "Magnitude spectrum -- linear scale" width=700px ### Logarithmic (dB) scale The [dB](https://en.wikipedia.org/wiki/Decibel) plot reveals low-level details invisible on the linear scale. The Hanning window's side lobes appear as the gradually decaying skirt around each peak. Without windowing, these lobes would be much higher and wider, obscuring nearby weak signals. \image html magnitude-spectrum-db.png "Magnitude spectrum -- dB scale" width=700px ## Key takeaways - The DFT converts a time-domain signal into its frequency components. - Always **window** the signal before the FFT to control spectral leakage. - **Normalise** the output by dividing by $N$ and doubling interior bins to get a one-sided amplitude spectrum. - Use a **dB scale** ($20 \log_{10}$) to see low-level details. ## Further reading - [Discrete Fourier transform](https://en.wikipedia.org/wiki/Discrete_Fourier_transform) -- Wikipedia - [Spectral leakage](https://en.wikipedia.org/wiki/Spectral_leakage) -- Wikipedia - Julius O. Smith, [Mathematics of the DFT](https://ccrma.stanford.edu/~jos/mdft/) -- free online textbook - Next tutorial: power-spectral-density ## API reference - MD_magnitude_spectrum() -- compute the magnitude spectrum - MD_Gen_Hann_Win() -- generate a Hanning window - MD_shutdown() -- free cached FFTW plans --- # Computing the Power Spectral Density This tutorial walks through the `examples/power_spectral_density.c` program, which generates a multi-tone signal, windows it, computes the PSD with MD_power_spectral_density(), and visualises the result. If you haven't already, read the magnitude-spectrum tutorial first -- it covers the DFT fundamentals that this tutorial builds on. ## What is the power spectral density? The **magnitude spectrum** (covered in the magnitude-spectrum tutorial) shows *amplitude* at each frequency. The **[power spectral density (PSD)](https://en.wikipedia.org/wiki/Spectral_density)** shows how the signal's *power* is distributed across frequencies. Given the DFT coefficients $X(k)$ of an $N$-sample signal, the [periodogram](https://en.wikipedia.org/wiki/Periodogram) estimate of the PSD is: $$\text{PSD}(k) = \frac{|X(k)|^2}{N}$$ **Reading the formula in C:** ```c // X(k) -> (X_re[k], X_im[k]), PSD(k) -> psd[k], N -> num_samples for (unsigned k = 0; k <= num_samples / 2; k++) { double mag2 = X_re[k] * X_re[k] + X_im[k] * X_im[k]; psd[k] = mag2 / (double)num_samples; } ``` The key difference from the magnitude spectrum is the **squaring** -- we use $|X(k)|^2$ (power) instead of $|X(k)|$ (amplitude). ## Magnitude spectrum vs. PSD | Property | Magnitude spectrum | PSD | |----------|-------------------|-----| | Formula | $\|X(k)\|$ | $\|X(k)\|^2 / N$ | | Units | Amplitude | Power | | dB conversion | $20 \log_{10}$ | $10 \log_{10}$ | | Use case | Identify frequency components | Noise analysis, [SNR](https://en.wikipedia.org/wiki/Signal-to-noise_ratio) estimation | Why the different dB formulas? Power is proportional to the *square* of amplitude. Since $10 \log_{10}(A^2) = 20 \log_{10}(A)$, using $10 \log_{10}$ for power and $20 \log_{10}$ for amplitude produces the same dB values for equivalent signals. ## Parseval's theorem [Parseval's theorem](https://en.wikipedia.org/wiki/Parseval%27s_theorem) states that the total energy in the time domain equals the total energy in the frequency domain: $$\sum_{n=0}^{N-1} |x[n]|^2 = \frac{1}{N} \sum_{k=0}^{N-1} |X(k)|^2 = \sum_{k=0}^{N-1} \text{PSD}(k)$$ **Reading the formula in C:** ```c // sum_n |x[n]|^2 -> time_energy, sum_k PSD(k) -> freq_energy double time_energy = 0.0; for (unsigned n = 0; n < num_samples; n++) { time_energy += signal[n] * signal[n]; } double freq_energy = 0.0; for (unsigned k = 0; k < num_samples; k++) { freq_energy += psd_full[k]; } ``` This is a useful sanity check: the sum of all PSD bins should equal $\sum x[n]^2$ (the signal's total energy). The miniDSP test suite verifies this property. ## Step 1: Generate a test signal The test signal is identical to the magnitude spectrum example: three sinusoids at 440 Hz, 1000 Hz, and 2500 Hz, plus a DC offset. ```c double *tone = malloc(N * sizeof(double)); for (unsigned i = 0; i < N; i++) signal[i] = dc; MD_sine_wave(tone, N, amp1, freq1, sample_rate); for (unsigned i = 0; i < N; i++) signal[i] += tone[i]; MD_sine_wave(tone, N, amp2, freq2, sample_rate); for (unsigned i = 0; i < N; i++) signal[i] += tone[i]; MD_sine_wave(tone, N, amp3, freq3, sample_rate); for (unsigned i = 0; i < N; i++) signal[i] += tone[i]; free(tone); ``` ## Step 2: Apply a window function Windowing is just as important for PSD estimation as for the magnitude spectrum. Without it, spectral leakage smears power across bins, making it hard to distinguish signal from noise. ```c MD_Gen_Hann_Win(window, N); for (unsigned i = 0; i < N; i++) { windowed[i] = signal[i] * window[i]; } ``` ## Step 3: Compute the PSD A single call to MD_power_spectral_density() computes $\text{PSD}(k) = |X(k)|^2 / N$ for bins $k = 0, 1, \ldots, N/2$. The $/N$ normalisation is handled internally. ```c MD_power_spectral_density(windowed, N, psd); ``` ## Step 4: Convert to a one-sided PSD Just like the magnitude spectrum, we convert to a one-sided representation by doubling the interior bins. DC and Nyquist bins are not doubled since they have no mirror image. Note that we do **not** divide by $N$ again -- MD_power_spectral_density() already includes the $/N$ normalisation. ```c for (unsigned k = 0; k < num_bins; k++) { freqs[k] = (double)k * sample_rate / (double)N; if (k > 0 && k < N / 2) { psd[k] *= 2.0; } } ``` ## Results ### Linear scale The linear PSD plot shows power concentrated at the three input frequencies. Because PSD squares the amplitude, the ratio between peaks is exaggerated compared to the magnitude spectrum: the 440 Hz tone (amplitude 1.0, power 1.0) towers over the 2500 Hz tone (amplitude 0.3, power 0.09). \image html power-spectral-density-linear.png "Power spectral density -- linear scale" width=700px ### Logarithmic (dB) scale The dB plot uses $10 \log_{10}(\text{PSD})$ -- note the factor of 10 (not 20) because PSD is already a power quantity. This reveals the noise floor and window side lobes clearly. \image html power-spectral-density-db.png "Power spectral density -- dB scale" width=700px ## Key takeaways - PSD measures how **power** distributes across frequencies; the magnitude spectrum measures **amplitude**. - Use $10 \log_{10}$ for PSD (power) and $20 \log_{10}$ for magnitude spectrum (amplitude) -- both give the same dB values for equivalent signals. - **Parseval's theorem** provides a sanity check: the sum of all PSD bins equals the signal's total energy $\sum x[n]^2$. - PSD is widely used in noise analysis, SNR estimation, and anywhere you need to quantify the power distribution of a signal. ## Further reading - [Spectral density](https://en.wikipedia.org/wiki/Spectral_density) -- Wikipedia - [Parseval's theorem](https://en.wikipedia.org/wiki/Parseval%27s_theorem) -- Wikipedia - Julius O. Smith, [Spectral Audio Signal Processing](https://ccrma.stanford.edu/~jos/sasp/) -- free online textbook - Previous tutorial: magnitude-spectrum ## API reference - MD_power_spectral_density() -- compute the PSD (periodogram) - MD_magnitude_spectrum() -- compute the magnitude spectrum - MD_Gen_Hann_Win() -- generate a Hanning window - MD_shutdown() -- free cached FFTW plans --- # Spectrogram and the STFT This tutorial walks through the `examples/spectrogram.c` program, which generates a linear chirp, computes its STFT with MD_stft(), converts to dB, and writes an interactive HTML heatmap. If you haven't already, read the magnitude-spectrum tutorial first — it covers the DFT fundamentals that the STFT builds on. ## What is the STFT? The **magnitude spectrum** shows the frequency content of an *entire* signal. But audio signals are rarely [stationary](https://en.wikipedia.org/wiki/Stationary_process) — a speech sentence, a musical phrase, or a chirp sweep all have frequency content that changes over time. The **[Short-Time Fourier Transform (STFT)](https://en.wikipedia.org/wiki/Short-time_Fourier_transform)** solves this by dividing the signal into short, overlapping frames and computing the DFT of each one. The result is a two-dimensional time-frequency representation called a **[spectrogram](https://en.wikipedia.org/wiki/Spectrogram)**. For frame $f$ and frequency bin $k$, the STFT is: $$X_f(k) = \sum_{n=0}^{N-1} w[n]\, x[f \cdot H + n]\, e^{-j 2\pi k n / N}$$ **Reading the formula in C:** ```c // f -> frame index, k -> bin index, n -> sample index // x[f*H+n] -> signal[sample_idx], w[n] -> window[n], |X_f(k)| -> mag_out[f*(N/2+1)+k] for (unsigned f = 0; f < num_frames; f++) { for (unsigned k = 0; k <= N / 2; k++) { double re = 0.0, im = 0.0; for (unsigned n = 0; n < N; n++) { unsigned sample_idx = f * hop + n; double xw = signal[sample_idx] * window[n]; double theta = 2.0 * M_PI * (double)k * (double)n / (double)N; re += xw * cos(theta); im -= xw * sin(theta); } mag_out[f * (N / 2 + 1) + k] = sqrt(re * re + im * im); } } ``` where: - $N$ is the FFT window size (samples per frame) - $H$ is the hop size (samples between successive frames) - $w[n]$ is the [Hanning window](https://en.wikipedia.org/wiki/Hann_function) - $x[\cdot]$ is the input signal MD_stft() stores $|X_f(k)|$ in a row-major matrix: `mag_out[f * (N/2+1) + k]`. ## Time-frequency resolution trade-off Choosing $N$ involves a fundamental trade-off: | | Narrow window (small N) | Wide window (large N) | |-|------------------------|-----------------------| | Time resolution | High | Low | | Frequency resolution | Low | High | | Bins | Few | Many | A common starting point for speech and music at 16 kHz is $N = 512$ (32 ms), which gives adequate resolution in both dimensions. The hop size $H$ controls frame overlap. 75% overlap ($H = N/4$) is a standard choice that produces smooth spectrograms without excessive computation. ## Step 1: Generate a chirp signal A **[linear chirp](https://en.wikipedia.org/wiki/Chirp)** sweeps instantaneous frequency from $f_0$ to $f_1$ linearly over a duration $T$: $$x(t) = \sin\!\left(2\pi \left(f_0 + \frac{f_1 - f_0}{2T}\,t\right) t\right)$$ **Reading the formula in C:** ```c // t -> time_s, x(t) -> out[n], f0/f1 -> f0_hz/f1_hz, T -> duration_s for (unsigned n = 0; n < num_samples; n++) { double time_s = (double)n / sample_rate; double slope = (f1_hz - f0_hz) / (2.0 * duration_s); double phase = 2.0 * M_PI * (f0_hz + slope * time_s) * time_s; out[n] = sin(phase); } ``` A chirp is the ideal test signal for a spectrogram because its instantaneous frequency changes over time, producing a clearly visible diagonal stripe across the time-frequency plane. ```c MD_chirp_linear(signal, signal_len, 1.0, chirp_f0, chirp_f1, sample_rate); ``` ## Step 2: Compute the STFT MD_stft_num_frames() tells you how many complete frames fit in the signal, so you can allocate the output buffer before calling MD_stft(). ```c MD_stft(signal, signal_len, N, hop, mag_out); ``` MD_stft() applies the Hanning window internally, so the signal does not need to be pre-windowed. The output `mag_out` is a row-major matrix with `num_frames` rows and `N/2+1` columns. Magnitudes are raw FFTW output — not normalised by $N$. ## Step 3: Convert to dB Raw magnitudes span many orders of magnitude. A log scale compresses the [dynamic range](https://en.wikipedia.org/wiki/Dynamic_range) and makes both loud and quiet features visible simultaneously. Normalise by $N$ before taking the log so that a full-scale sine (amplitude 1) reads near 0 dB. Floor at $10^{-6}$ to avoid $\log(0)$. ```c for (unsigned i = 0; i < num_frames * num_bins; i++) { spec_db[i] = 20.0 * log10(fmax(mag_out[i] / (double)N, 1e-6)); } ``` The formula 20 log$_{10}$ is used because the input is a magnitude (amplitude), not a power. ## Results The resulting spectrogram shows the chirp as a diagonal stripe rising from 200 Hz at $t = 0$ to 4000 Hz at $t = 2$ s. Hover over the interactive plot to read exact time, frequency, and dB values. \image html spectrogram.png "STFT spectrogram of a 200 → 4000 Hz linear chirp" width=700px ## Key takeaways - The STFT slides a windowed FFT over a signal to reveal how its frequency content evolves over time. - Window size $N$ controls the time-frequency trade-off: larger $N$ gives finer frequency resolution but coarser time resolution. - Hop size $H < N$ creates overlapping frames. 75% overlap ($H = N/4$) is a common default. - MD_stft() reuses the same cached FFT plan as MD_magnitude_spectrum() and MD_power_spectral_density(), so mixing calls with the same $N$ incurs no extra plan-rebuild overhead. - Divide magnitudes by $N$ before dB conversion so that a unit-amplitude sine reads near 0 dB. ## Further reading - [Short-time Fourier transform](https://en.wikipedia.org/wiki/Short-time_Fourier_transform) — Wikipedia - [Spectrogram](https://en.wikipedia.org/wiki/Spectrogram) — Wikipedia - Julius O. Smith, [Spectral Audio Signal Processing](https://ccrma.stanford.edu/~jos/sasp/) — free online textbook - Previous tutorial: power-spectral-density ## API reference - MD_chirp_linear() — generate the linear chirp test signal - MD_stft() — compute the STFT magnitude matrix - MD_stft_num_frames() — compute the number of output frames - MD_magnitude_spectrum() — single-frame magnitude spectrum - MD_power_spectral_density() — single-frame PSD - MD_Gen_Hann_Win() — generate a Hanning window - MD_shutdown() — free cached FFTW plans --- # Mel Filterbank and MFCCs This tutorial introduces two classic speech/audio front-end features: - **Mel filterbank energies** — triangular spectral bands spaced on the [mel scale](https://en.wikipedia.org/wiki/Mel_scale). - **[MFCCs](https://en.wikipedia.org/wiki/Mel-frequency_cepstrum)** — DCT of log mel energies. miniDSP provides both as frame-level APIs in `minidsp.h`. ## Why mel and MFCC? The linear FFT axis over-resolves high frequencies compared to human pitch perception. Mel filterbanks compress frequency spacing to be denser at low frequencies and coarser at high frequencies. MFCCs then decorrelate log mel energies via a DCT, producing compact features used widely in speech recognition and audio classification. ## Step 1: Build mel energies miniDSP uses: - HTK mel mapping: $$m(f) = 2595 \log_{10}\!\left(1 + \frac{f}{700}\right)$$ - Internal Hann windowing - One-sided PSD bins: $|X(k)|^2 / N$ **Reading the formula in C:** ```c // f -> freq_hz, m(f) -> mel double mel = 2595.0 * log10(1.0 + freq_hz / 700.0); ``` Compute one frame of mel energies: ```c MD_mel_energies(signal, N, sample_rate, num_mels, min_freq_hz, max_freq_hz, mel); ``` ## Step 2: Compute MFCCs MFCCs are computed from log mel energies with a DCT-II: $$c_n = \alpha_n \sum_{m=0}^{M-1} \log(\max(E_m, 10^{-12})) \cos\!\left(\frac{\pi n (m + 1/2)}{M}\right)$$ **Reading the formula in C:** ```c // E_m -> mel_energy[m], c_n -> mfcc[n], M -> num_mels, n -> coeff index for (unsigned n = 0; n < num_coeffs; n++) { double alpha = (n == 0) ? sqrt(1.0 / (double)num_mels) : sqrt(2.0 / (double)num_mels); double acc = 0.0; for (unsigned m = 0; m < num_mels; m++) { double log_em = log(fmax(mel_energy[m], 1e-12)); double basis = cos(M_PI * (double)n * ((double)m + 0.5) / (double)num_mels); acc += log_em * basis; } mfcc[n] = alpha * acc; } ``` where: - $M$ is the number of mel bands - $\alpha_0 = \sqrt{1/M}$ - $\alpha_n = \sqrt{2/M}$ for $n > 0$ miniDSP returns **C0 in `mfcc_out[0]`**. ```c MD_mfcc(signal, N, sample_rate, num_mels, num_coeffs, min_freq_hz, max_freq_hz, mfcc); ``` ## Visualisations These plots are generated from one deterministic signal: $x[n] = 0.7\sin(2\pi\cdot440t) + 0.2\cos(2\pi\cdot1000t) + 0.1\sin(2\pi\cdot3000t)$, with $t=n/f_s$ and $f_s=8192$ Hz. Mel energies and MFCCs are computed from the first 1024-sample analysis frame. ## Practical notes - Requested frequency bounds are runtime-clamped to `[0, Nyquist]`. - If the clamped band is empty, mel energies are zero and MFCCs remain finite via the log floor. - `MD_shutdown()` should be called when done with FFT-based APIs. ## API reference - MD_mel_filterbank() — build mel triangular weight matrix - MD_mel_energies() — compute mel-band energies from one frame - MD_mfcc() — compute MFCC vector (C0 included) - MD_shutdown() — release cached FFT resources --- # Phase Spectrum This tutorial walks through `examples/phase_spectrum.c`, which generates a three-component test signal, computes its phase spectrum with MD_phase_spectrum(), and visualises both the magnitude and phase with an interactive Plotly chart. If you haven't read magnitude-spectrum yet, start there -- it covers the DFT fundamentals this tutorial builds on. ## What is the phase spectrum? Every DFT coefficient $X(k)$ is a [complex number](https://en.wikipedia.org/wiki/Complex_number) with a magnitude and an angle. The **magnitude spectrum** $|X(k)|$ tells you how much energy is at frequency $k$. The **phase spectrum** $\phi(k)$ tells you the *timing* of that frequency component: $$\phi(k) = \arg X(k) = \mathrm{atan2}\!\bigl(\mathrm{Im}\, X(k),\,\mathrm{Re}\, X(k)\bigr) \qquad \phi(k) \in [-\pi,\, \pi]$$ **Reading the formula in C:** ```c // Re X(k) -> X_re[k], Im X(k) -> X_im[k], phi(k) -> phase[k] for (unsigned k = 0; k <= num_samples / 2; k++) { phase[k] = atan2(X_im[k], X_re[k]); } ``` ### Intuitive meaning | Signal | Phase at its bin | |--------|-----------------| | $\cos(2\pi k_0 n / N)$ | $\phi(k_0) = 0$ | | $\sin(2\pi k_0 n / N)$ | $\phi(k_0) = -\pi/2$ | | $\cos(2\pi k_0 n / N + \varphi)$ | $\phi(k_0) = \varphi$ | | Impulse delayed by $d$ samples | $\phi(k) = -2\pi k d / N$ (linear) | A **[time-delayed signal](https://en.wikipedia.org/wiki/Time_delay)** is the most important case: the [DFT shift theorem](https://en.wikipedia.org/wiki/Discrete_Fourier_transform#Shift_theorem) says delaying a signal by $d$ samples adds a linear phase ramp of slope $-2\pi d / N$ radians per bin. This is the basis of the [GCC-PHAT](https://en.wikipedia.org/wiki/Generalized_cross-correlation) delay estimator (see MD_get_delay()). ### When to trust the phase Phase is only meaningful at bins where the magnitude is significant. At bins dominated by noise or leakage, $\phi(k)$ is numerically unreliable -- always examine MD_magnitude_spectrum() alongside the phase spectrum. ## Step 1: Generate a test signal The test signal has three components at **exact integer-bin frequencies** (N = 1024, sample rate = 1024 Hz, so bin $k$ corresponds to exactly $k$ Hz). With exact bins there is no spectral leakage, so no windowing is needed and the phase values are bit-exact. ```c for (unsigned i = 0; i < N; i++) { double n = (double)i; signal[i] = cos(2.0 * M_PI * freq1 * n / sample_rate) + sin(2.0 * M_PI * freq2 * n / sample_rate) + cos(2.0 * M_PI * freq3 * n / sample_rate + phi3); } ``` The three components and their expected phase values: | Component | Bin | Expected $\phi$ | |-----------|-----|---------------------| | `cos(50 Hz)` | 50 | 0 | | `sin(100 Hz)` | 100 | $-\pi/2$ | | `cos(200 Hz, $\pi/4$ offset)` | 200 | $\pi/4$ | ## Step 2: Compute the phase spectrum MD_phase_spectrum() reuses the same cached FFT plan as MD_magnitude_spectrum() and MD_power_spectral_density(). Calling both in sequence costs only one plan lookup: ```c MD_magnitude_spectrum(signal, N, mag); MD_phase_spectrum(signal, N, phase); ``` The output `phase[k]` is in radians, range $[-\pi, \pi]$. The magnitude output `mag[k]` lets you distinguish signal bins from noise bins when interpreting the phase. ## Results \image html phase-spectrum.png "Phase spectrum (top: magnitude, bottom: phase)" width=700px The magnitude plot (top) confirms that energy is concentrated at bins 50, 100, and 200. The phase plot (bottom) shows: - Bin 50: $\phi \approx 0$ (pure cosine) - Bin 100: $\phi \approx -\pi/2$ (pure sine) - Bin 200: $\phi \approx \pi/4$ (phase-shifted cosine) At all other bins the phase is numerically arbitrary (magnitude is zero). ## Key takeaways - $\phi(k) = \arg X(k)$ is computed by `carg()`, which calls `atan2(Im, Re)` and returns values in $[-\pi, \pi]$. - Phase is **scale-invariant**: multiplying a signal by a positive constant does not change its phase. - **No windowing** is needed when all frequency components land on exact integer bins. Windowing smears phase and should be omitted for clean phase measurements. - Always look at the magnitude spectrum alongside the phase spectrum. Phase at low-magnitude bins is meaningless noise. - A linear phase ramp ($\phi(k) = -2\pi k d/N$) indicates a time delay of $d$ samples -- the foundation of GCC-PHAT delay estimation. ## Connection to phase vocoder The [phase vocoder](https://en.wikipedia.org/wiki/Phase_vocoder) uses phase differences between consecutive STFT frames to track the instantaneous frequency of each bin, enabling high-quality time-stretching and pitch-shifting of audio. MD_phase_spectrum() provides the per-frame phase needed as input to such an algorithm. ## Further reading - [Phase (waves)](https://en.wikipedia.org/wiki/Phase_(waves)) -- Wikipedia - [DFT shift theorem](https://en.wikipedia.org/wiki/Discrete_Fourier_transform#Shift_theorem) -- Wikipedia - [Phase vocoder](https://en.wikipedia.org/wiki/Phase_vocoder) -- Wikipedia - Julius O. Smith, [Spectral Audio Signal Processing](https://ccrma.stanford.edu/~jos/sasp/) -- free online textbook - Previous tutorial: power-spectral-density - Next: stft-spectrogram ## API reference - MD_phase_spectrum() -- compute the one-sided phase spectrum - MD_magnitude_spectrum() -- compute the magnitude spectrum - MD_power_spectral_density() -- compute the PSD (periodogram) - MD_shutdown() -- free cached FFTW plans --- # Basic Signal Operations Five fundamental [time-domain](https://en.wikipedia.org/wiki/Time_domain) operations that complement the existing signal measurements (energy, power, entropy) and FFT-based spectrum analysis. Together they give you a toolkit for analysing and manipulating signals before or instead of going to the [frequency domain](https://en.wikipedia.org/wiki/Frequency_domain). --- ## RMS (Root Mean Square) [RMS](https://en.wikipedia.org/wiki/Root_mean_square) is the standard measure of signal "loudness" — the square root of the mean squared amplitude: $$\mathrm{RMS} = \sqrt{\frac{1}{N}\sum_{n=0}^{N-1} x[n]^2}$$ **Reading the formula in C:** ```c // x[n]^2 -> a[i] * a[i] square each sample // sum -> accumulate in a loop sum of squares // 1/N -> / (double)N divide by number of samples // sqrt() -> sqrt() take the square root double sum = 0.0; for (unsigned i = 0; i < N; i++) sum += a[i] * a[i]; double rms = sqrt(sum / (double)N); ``` A unit-amplitude sine wave has RMS = $1/\sqrt{2} \approx 0.707$. A DC signal of value $c$ has RMS = $|c|$. **API:** ```c double MD_rms(const double *a, unsigned N); ``` **Quick example** — measure the RMS of three signals: ```c double rms_sine = MD_rms(sine, N); double rms_noise = MD_rms(noise, N); double rms_mixed = MD_rms(mixed, N); ``` **Verification tip:** `MD_rms(a, N)` should always equal `sqrt(MD_power(a, N))` to within floating-point precision. --- ## Zero-Crossing Rate The [zero-crossing rate](https://en.wikipedia.org/wiki/Zero-crossing_rate) (ZCR) counts how often the signal changes sign, normalised by the number of adjacent pairs: $$\mathrm{ZCR} = \frac{1}{N-1}\sum_{n=1}^{N-1} \mathbf{1}\!\bigl[\mathrm{sgn}(x[n]) \ne \mathrm{sgn}(x[n-1])\bigr]$$ **Reading the formula in C:** ```c // Walk through the signal and count sign changes. // Zero is treated as non-negative (standard convention). unsigned crossings = 0; for (unsigned i = 1; i < N; i++) if ((a[i] < 0.0) != (a[i - 1] < 0.0)) crossings++; double zcr = (double)crossings / (double)(N - 1); ``` A pure sine at frequency $f$ with sample rate $f_s$ has ZCR $\approx 2f/f_s$. White noise has a higher ZCR than a low-frequency tone — this makes ZCR a simple proxy for distinguishing noise from tonal content. **API:** ```c double MD_zero_crossing_rate(const double *a, unsigned N); ``` **Quick example** — compare ZCR of sine, noise, and their mix: ```c double zcr_sine = MD_zero_crossing_rate(sine, N); double zcr_noise = MD_zero_crossing_rate(noise, N); double zcr_mixed = MD_zero_crossing_rate(mixed, N); ``` **Verification tip:** for a sine wave with an integer number of cycles in the buffer, the ZCR should be very close to $2f/f_s$. --- ## Autocorrelation The [autocorrelation](https://en.wikipedia.org/wiki/Autocorrelation) measures the similarity between a signal and a delayed copy of itself. The normalised form divides by $R[0]$ (the signal energy) so the output is bounded: $$R[\tau] = \frac{\displaystyle\sum_{n=0}^{N-1-\tau} x[n]\,x[n+\tau]} {\displaystyle\sum_{n=0}^{N-1} x[n]^2}$$ **Reading the formula in C:** ```c // Denominator: sum of x[n]^2 (the signal energy, R[0]) double r0 = 0.0; for (unsigned n = 0; n < N; n++) r0 += a[n] * a[n]; // Numerator for each lag tau: sum of x[n] * x[n + tau] for (unsigned tau = 0; tau < max_lag; tau++) { double sum = 0.0; for (unsigned n = 0; n < N - tau; n++) sum += a[n] * a[n + tau]; out[tau] = sum / r0; // normalise so out[0] = 1.0 } ``` $R[0] = 1.0$ always, and $|R[\tau]| \le 1.0$. For a periodic signal, the autocorrelation peaks at the fundamental period (the inverse of the [fundamental frequency](https://en.wikipedia.org/wiki/Fundamental_frequency)) — this is the basis of time-domain pitch detection. **API:** ```c void MD_autocorrelation(const double *a, unsigned N, double *out, unsigned max_lag); ``` **Quick example** — autocorrelation of a noisy sine: ```c MD_autocorrelation(mixed, N, acf, max_lag); ``` **Verification tip:** for a 200 Hz sine at 8000 Hz sample rate, the autocorrelation should peak at lag 40 (= 8000 / 200). --- ## Peak Detection Peak detection finds local maxima in a signal. A sample $a[i]$ is a peak if it is strictly greater than both immediate neighbours and above a given threshold. The `min_distance` parameter suppresses nearby secondary peaks. **Reading the algorithm:** ```c // Left-to-right scan: // 1. a[i] > a[i-1] AND a[i] > a[i+1] (strict local maximum) // 2. a[i] >= threshold (above noise floor) // 3. distance from last peak >= min_distance (suppress echoes) ``` Endpoints (i=0 and i=N-1) are never peaks because they lack two neighbours. A flat signal (all values equal) produces zero peaks. **API:** ```c void MD_peak_detect(const double *a, unsigned N, double threshold, unsigned min_distance, unsigned *peaks_out, unsigned *num_peaks_out); ``` **Quick example** — find peaks in the autocorrelation to estimate pitch: ```c MD_peak_detect(acf, max_lag, 0.3, 10, peaks, &num_peaks); ``` **Verification tip:** pass a hand-crafted signal like `{0, 1, 3, 1, 0, 2, 5, 2, 0}` with threshold 0 and min_distance 1. You should get peaks at indices 2 and 6 (values 3 and 5). --- ## Signal Mixing Mixing computes the element-wise weighted sum of two signals: $$\mathrm{out}[n] = w_a \cdot a[n] + w_b \cdot b[n]$$ **Reading the formula in C:** ```c for (unsigned i = 0; i < N; i++) out[i] = w_a * a[i] + w_b * b[i]; ``` Equal weights of 0.5 produce the average. A weight of 1.0/0.0 passes one signal through unchanged. The output buffer may alias either input (in-place safe). **API:** ```c void MD_mix(const double *a, const double *b, double *out, unsigned N, double w_a, double w_b); ``` **Quick example** — mix 80% sine with 20% noise: ```c MD_mix(sine, noise, mixed, N, 0.8, 0.2); ``` **Verification tip:** for uncorrelated signals (e.g. a sine and white noise), the energy of the mix is approximately the sum of energies: $E(\mathrm{mix}) \approx w_a^2 E(a) + w_b^2 E(b)$. --- # Simple Effects Three classic audio effects built from short delay lines: - [Delay/Echo](https://en.wikipedia.org/wiki/Delay_%28audio_effect%29) ([circular buffer](https://en.wikipedia.org/wiki/Circular_buffer) + feedback) - [Tremolo](https://en.wikipedia.org/wiki/Tremolo) (low-frequency [amplitude modulation](https://en.wikipedia.org/wiki/Amplitude_modulation)) - [Comb-filter](https://en.wikipedia.org/wiki/Comb_filter) [reverb](https://en.wikipedia.org/wiki/Reverberation) (feedback comb) These are simple enough to study sample-by-sample, but they are also the building blocks of many larger audio effects. --- ## Delay line / echo Delay/echo reads an older sample from a circular delay line and mixes it with the current input: $$s[n] = x[n] + feedback \cdot s[n-D]$$ The output mixes dry input with the delayed state: $$y[n] = dry \cdot x[n] + wet \cdot s[n-D]$$ where $D$ is the delay in samples and $|feedback| < 1$. **Reading the formula in C:** ```c // x[n] -> in[n], s[n-D] -> d (delay[idx]), y[n] -> out[n], D -> delay_samples double d = delay[idx]; out[n] = dry * in[n] + wet * d; delay[idx] = in[n] + feedback * d; idx = (idx + 1) % delay_samples; ``` **API:** ```c void MD_delay_echo(const double *in, double *out, unsigned N, unsigned delay_samples, double feedback, double dry, double wet); ``` **Quick example:** ```c MD_delay_echo(delay_src, delay_out, N, 400, 0.45, 1.0, 0.6); ``` **Audio (before/after):** **Spectrograms (before/after):** --- ## Tremolo Tremolo is amplitude modulation by a low-frequency oscillator (LFO). The gain is: $$g[n] = (1-depth) + depth \cdot \frac{1 + \sin(2\pi f_{LFO}n/f_s)}{2}$$ and output is: $$y[n] = g[n] \cdot x[n]$$ So gain moves between $1-depth$ and $1$. **Reading the formula in C:** ```c // f_LFO -> rate_hz, fs -> sample_rate, g[n] -> gain, x[n] -> in[n], y[n] -> out[n] double lfo = 0.5 * (1.0 + sin(2.0 * M_PI * rate_hz * n / sample_rate)); double gain = (1.0 - depth) + depth * lfo; out[n] = in[n] * gain; ``` **API:** ```c void MD_tremolo(const double *in, double *out, unsigned N, double rate_hz, double depth, double sample_rate); ``` **Quick example:** ```c MD_tremolo(trem_src, trem_out, N, 5.0, 0.8, sample_rate); ``` **Audio (before/after):** **Spectrograms (before/after):** --- ## Comb-filter reverb A feedback comb filter reuses a delayed copy of its own output: $$c[n] = x[n] + feedback \cdot c[n-D]$$ Then we blend dry input and comb output: $$y[n] = dry \cdot x[n] + wet \cdot c[n]$$ The repeated, closely spaced echoes create a reverb-like resonant tail. **Reading the formula in C:** ```c // c[n-D] -> delayed (comb[idx]), c[n] -> c, x[n] -> in[n], y[n] -> out[n], D -> delay_samples double delayed = comb[idx]; double c = in[n] + feedback * delayed; comb[idx] = c; out[n] = dry * in[n] + wet * c; idx = (idx + 1) % delay_samples; ``` **API:** ```c void MD_comb_reverb(const double *in, double *out, unsigned N, unsigned delay_samples, double feedback, double dry, double wet); ``` **Quick example:** ```c MD_comb_reverb(comb_src, comb_out, N, 120, 0.75, 0.7, 0.6); ``` **Audio (before/after):** **Spectrograms (before/after):** --- ## Verification tips - Delay echo impulse input should repeat every `delay_samples`, decaying by `feedback^m`. - Tremolo with `depth = 0` should exactly match the input. - Comb reverb with `dry = 0` and impulse input should produce a geometric series at delay multiples. ## API reference - MD_delay_echo() -- circular-buffer echo with feedback - MD_tremolo() -- sinusoidal amplitude modulation - MD_comb_reverb() -- feedback comb filter with dry/wet mix --- # FIR Filters and Convolution [Finite impulse response (FIR)](https://en.wikipedia.org/wiki/Finite_impulse_response) filtering and [convolution](https://en.wikipedia.org/wiki/Convolution) are core DSP tools: you shape a signal by summing delayed, weighted copies of it. miniDSP provides five related APIs so users can start with direct time-domain sums, compare against the FFT overlap-add fast method, and design lowpass filters from scratch. --- ## Time-domain convolution For input $x[n]$ (length $N$) and kernel $h[k]$ (length $M$), full linear convolution is: $$y[n] = \sum_{k=0}^{M-1} h[k]\,x[n-k], \quad n = 0,1,\ldots,N+M-2$$ Out-of-range samples of $x[\cdot]$ are treated as zero. **Reading the formula in C:** ```c // n -> output index, k -> kernel index, x[n-k] -> signal[si], h[k] -> kernel[k] for (unsigned n = 0; n < out_len; n++) { double acc = 0.0; for (unsigned k = 0; k < kernel_len; k++) { if (k > n) break; // n-k would be negative unsigned si = n - k; if (si >= signal_len) continue; acc += signal[si] * kernel[k]; } out[n] = acc; } ``` **API:** ```c unsigned MD_convolution_num_samples(unsigned signal_len, unsigned kernel_len); void MD_convolution_time(const double *signal, unsigned signal_len, const double *kernel, unsigned kernel_len, double *out); ``` **Visuals** — response and magnitude spectrum: **Quick example:** ```c MD_convolution_time(signal, N, kernel, kernel_len, y_conv_time); ``` --- ## Moving-average FIR filter The moving-average filter is the simplest FIR [low-pass](https://en.wikipedia.org/wiki/Low-pass_filter): $$y[n] = \frac{1}{M}\sum_{k=0}^{M-1} x[n-k]$$ This implementation is causal and uses zero-padded startup (fixed divisor $M$ at every sample). **Reading the formula in C:** ```c // M -> window_len, x[n] -> signal[n], y[n] -> out[n] double sum = 0.0; for (unsigned n = 0; n < signal_len; n++) { sum += signal[n]; if (n >= window_len) sum -= signal[n - window_len]; out[n] = sum / (double)window_len; } ``` **API:** ```c void MD_moving_average(const double *signal, unsigned signal_len, unsigned window_len, double *out); ``` **Visuals** — response and magnitude spectrum: **Quick example:** ```c MD_moving_average(signal, N, 8, y_ma); ``` --- ## General FIR filter With arbitrary coefficients $b[k]$ (length $M$), causal FIR filtering is: $$y[n] = \sum_{k=0}^{M-1} b[k]\,x[n-k], \quad n = 0,1,\ldots,N-1$$ This is the direct form used for most textbook FIR designs. **Reading the formula in C:** ```c // b[k] -> coeffs[k], x[n-k] -> signal[n-k], y[n] -> out[n] for (unsigned n = 0; n < signal_len; n++) { double acc = 0.0; for (unsigned k = 0; k < num_taps; k++) { if (k > n) break; // zero-padded startup acc += coeffs[k] * signal[n - k]; } out[n] = acc; } ``` **API:** ```c void MD_fir_filter(const double *signal, unsigned signal_len, const double *coeffs, unsigned num_taps, double *out); ``` **Visuals** — response and magnitude spectrum: **Quick example:** ```c MD_fir_filter(signal, N, coeffs, num_taps, y_fir); ``` --- ## FFT overlap-add fast convolution For long filters, direct convolution is $O(NM)$. [Overlap-add](https://en.wikipedia.org/wiki/Overlap%E2%80%93add_method) computes the same full linear convolution in blocks using FFTs: $$Y_b(k) = X_b(k)\,H(k), \quad y_b[n] = \mathrm{IFFT}(Y_b(k))$$ Each block's time-domain result is added into the output with overlap between adjacent blocks, after an [inverse FFT (IFFT)](https://en.wikipedia.org/wiki/Fast_Fourier_transform#Inverse_transform) per block. **Reading the formula in C:** ```c // x_b[n] -> block_time[n], H(k) -> (H_re[k], H_im[k]), y[n] -> out[n] // Y_b(k) = X_b(k) * H(k), then overlap-add IFFT(Y_b) into out[] for (unsigned start = 0; start < signal_len; start += block_len) { for (unsigned n = 0; n < nfft; n++) block_time[n] = 0.0; for (unsigned n = 0; n < block_len && start + n < signal_len; n++) { block_time[n] = signal[start + n]; } // Educational DFT form for X_b(k) (FFT is used in production code). for (unsigned k = 0; k < nfft; k++) { double x_re = 0.0, x_im = 0.0; for (unsigned n = 0; n < nfft; n++) { double theta = 2.0 * M_PI * (double)k * (double)n / (double)nfft; x_re += block_time[n] * cos(theta); x_im -= block_time[n] * sin(theta); } // Complex multiply: Y_b(k) = X_b(k) * H(k) Y_re[k] = x_re * H_re[k] - x_im * H_im[k]; Y_im[k] = x_re * H_im[k] + x_im * H_re[k]; } // Educational inverse DFT + overlap-add. for (unsigned n = 0; n < nfft; n++) { double yb = 0.0; for (unsigned k = 0; k < nfft; k++) { double theta = 2.0 * M_PI * (double)k * (double)n / (double)nfft; yb += Y_re[k] * cos(theta) - Y_im[k] * sin(theta); } yb /= (double)nfft; if (start + n < out_len) out[start + n] += yb; } } ``` **API:** ```c void MD_convolution_fft_ola(const double *signal, unsigned signal_len, const double *kernel, unsigned kernel_len, double *out); ``` **Visuals** — response and magnitude spectrum: **Quick example:** ```c MD_convolution_fft_ola(signal, N, kernel, kernel_len, y_conv_fft); ``` --- ## Lowpass FIR filter design The [windowed-sinc method](https://en.wikipedia.org/wiki/Sinc_filter) designs a lowpass FIR filter by sampling the ideal sinc impulse response and tapering it with a window function. Using a [Kaiser window](https://en.wikipedia.org/wiki/Kaiser_window) gives precise control over stopband attenuation via the $\beta$ parameter. For a normalized cutoff $f_c = \mathrm{cutoff\_freq} / \mathrm{sample\_rate}$ and filter length $N$: $$h[i] = 2\,f_c\;\mathrm{sinc}\!\bigl(2\,f_c\,(i - (N-1)/2)\bigr) \;\cdot\; w_{\mathrm{Kaiser}}[i], \quad i = 0, 1, \ldots, N-1$$ The coefficients are then normalized so they sum to 1.0 (unity DC gain). **Reading the formula in C:** ```c // num_taps -> N, cutoff_freq -> Hz, sample_rate -> Hz // fc -> normalized cutoff, center -> (N-1)/2, coeffs[i] -> h[i] double fc = cutoff_freq / sample_rate; double center = (double)(num_taps - 1) / 2.0; double kaiser[num_taps]; MD_Gen_Kaiser_Win(kaiser, num_taps, kaiser_beta); double sum = 0.0; for (unsigned i = 0; i < num_taps; i++) { double x = (double)i - center; coeffs[i] = 2.0 * fc * sin(M_PI * 2.0 * fc * x) / (M_PI * 2.0 * fc * x) * kaiser[i]; // (the sinc function handles x == 0 -> 1.0) sum += coeffs[i]; } // Normalize for unity DC gain for (unsigned i = 0; i < num_taps; i++) { coeffs[i] /= sum; } ``` **API:** ```c void MD_design_lowpass_fir(double *coeffs, unsigned num_taps, double cutoff_freq, double sample_rate, double kaiser_beta); ``` **Quick example:** ```c double lpf[65]; MD_design_lowpass_fir(lpf, 65, 2000.0, sample_rate, 10.0); ``` --- ## Quick comparison | Method | Output length | Typical cost | Best use | |--------|---------------|--------------|----------| | Time-domain convolution | $N+M-1$ | $O(NM)$ | Teaching and short kernels | | Moving average | $N$ | $O(N)$ (running sum) | First FIR low-pass example | | General FIR | $N$ | $O(NM)$ | Arbitrary FIR taps | | FFT overlap-add | $N+M-1$ | $O(N\log N)$ blocks | Long kernels / fast offline convolution | | Lowpass FIR design | $N$ taps | Design: $O(N)$ | Anti-aliasing, band-limiting | Use direct methods to build intuition, then compare with overlap-add to see why FFT-based convolution matters for longer filters. Use the lowpass designer when you need a specific cutoff frequency with controlled stopband attenuation. ## Further reading - [Convolution](https://en.wikipedia.org/wiki/Convolution) -- Wikipedia - [Finite impulse response](https://en.wikipedia.org/wiki/Finite_impulse_response) -- Wikipedia - [Overlap-add method](https://en.wikipedia.org/wiki/Overlap%E2%80%93add_method) -- Wikipedia - [Sinc filter](https://en.wikipedia.org/wiki/Sinc_filter) -- Wikipedia ## API reference - MD_convolution_num_samples() -- compute full-convolution output length - MD_convolution_time() -- direct time-domain full convolution - MD_moving_average() -- causal moving-average FIR filter - MD_fir_filter() -- causal arbitrary-coefficient FIR filter - MD_convolution_fft_ola() -- full convolution via FFT overlap-add - MD_design_lowpass_fir() -- design a Kaiser-windowed sinc lowpass FIR filter --- # Brickwall Lowpass Filter A [brickwall lowpass filter](https://en.wikipedia.org/wiki/Low-pass_filter) removes all frequency content above a cutoff frequency with zero transition bandwidth. Unlike FIR or IIR filters that roll off gradually, a brickwall filter has a perfectly sharp boundary: everything at or below the cutoff passes through unchanged, and everything above is eliminated completely. miniDSP implements this by transforming the signal into the frequency domain with an FFT, zeroing the bins above the cutoff, and inverse-FFTing back to the time domain. --- ## Frequency-domain operation Given a real signal $x[n]$ of length $N$ with DFT $X(k)$, the brickwall lowpass at cutoff frequency $f_c$ produces: $$X'(k) = \begin{cases} X(k) & \text{if } k \leq \lfloor f_c \cdot N / f_s \rfloor \\ 0 & \text{otherwise} \end{cases}$$ where $f_s$ is the sample rate. The filtered signal is $x'[n] = \mathrm{IFFT}(X'(k)) / N$ (the $1/N$ factor normalizes FFTW's unnormalized inverse transform). **Reading the formula in C:** ```c // k -> frequency bin index, cutoff_bin -> floor(fc * N / fs) // freq[k] -> X(k), zeroed when k > cutoff_bin unsigned cutoff_bin = (unsigned)(cutoff_hz * N / sample_rate); for (unsigned k = cutoff_bin + 1; k < num_bins; k++) { freq[k] = 0.0; // zero both real and imaginary parts } // After inverse FFT, normalize every sample by N for (unsigned n = 0; n < N; n++) { signal[n] /= (double)N; } ``` --- ## API **Quick example:** ```c double buf[4096]; MD_sine_wave(buf, 4096, 1.0, 440.0, 48000.0); MD_lowpass_brickwall(buf, 4096, 8000.0, 48000.0); // buf now contains only content at or below 8 kHz ``` The full signature is: ```c void MD_lowpass_brickwall(double *signal, unsigned len, double cutoff_hz, double sample_rate); ``` The filter modifies the buffer in-place. It creates one-off FFTW plans (`FFTW_ESTIMATE`) internally, so there is no plan caching overhead — but it is best suited for offline/batch use rather than real-time sample-by-sample processing. --- ## Gibbs ringing Because the brickwall filter has an infinitely sharp cutoff in the frequency domain, the time-domain impulse response has infinite length and exhibits [Gibbs ringing](https://en.wikipedia.org/wiki/Gibbs_phenomenon) — oscillatory overshoot and undershoot near the cutoff frequency. This is the fundamental tradeoff: zero transition bandwidth comes at the cost of time-domain ringing. In practice, this matters only when the cutoff is close to a frequency band you care about. When the cutoff is far from the band of interest (e.g., filtering at 8 kHz when your signal content is below 4 kHz), the ringing is inaudible and invisible. --- ## Example output The plot below shows a 400 Hz + 12000 Hz signal before and after applying a brickwall lowpass at 4000 Hz. The 12000 Hz component is completely eliminated while the 400 Hz component passes through unchanged. ```c MD_lowpass_brickwall(signal, N, cutoff_hz, sample_rate); ``` --- # Pitch Detection This [pitch detection](https://en.wikipedia.org/wiki/Pitch_detection_algorithm) tutorial compares two classic [fundamental-frequency (F0)](https://en.wikipedia.org/wiki/Fundamental_frequency) estimators: - **[Autocorrelation](https://en.wikipedia.org/wiki/Autocorrelation) peak picking** (time domain) - **[FFT](https://en.wikipedia.org/wiki/Fast_Fourier_transform) peak picking** (frequency domain) Both are implemented in miniDSP and demonstrated in `examples/pitch_detection.c`. Build and run the example from the repository root: ```sh make -C examples pitch_detection cd examples && ./pitch_detection open pitch_detection.html ``` --- ## Autocorrelation F0 For a voiced frame, the fundamental period shows up as a strong peak in the autocorrelation function: $$R[\tau] = \frac{\sum_{n=0}^{N-1-\tau} x[n]x[n+\tau]} {\sum_{n=0}^{N-1} x[n]^2}, \qquad f_0 = \frac{f_s}{\tau_{\text{peak}}}$$ We search only lags mapped from a desired F0 range (`min_freq_hz..max_freq_hz`), then choose the strongest local peak. **Reading the formula in C:** ```c // x[n] -> frame[n], fs -> sample_rate // lag_min/lag_max come from requested min/max F0 range. double r0 = 0.0; // denominator: SUM x[n]^2 for (unsigned n = 0; n < N; n++) { r0 += frame[n] * frame[n]; } double best_r = -1.0; unsigned best_lag = 0; for (unsigned tau = lag_min; tau <= lag_max; tau++) { // numerator: SUM x[n] * x[n+tau] double num = 0.0; for (unsigned n = 0; n < N - tau; n++) { num += frame[n] * frame[n + tau]; } double r_tau = (r0 > 0.0) ? (num / r0) : 0.0; // normalised R[tau] // local-max check using neighbors (R[tau-1], R[tau], R[tau+1]) if (r_tau > best_r /* and is_local_peak */) { best_r = r_tau; best_lag = tau; } } double f0_hz = (best_lag > 0) ? (sample_rate / (double)best_lag) : 0.0; ``` --- ## FFT-based F0 This method applies a [Hann window](https://en.wikipedia.org/wiki/Hann_function), computes the one-sided FFT magnitude, and picks the dominant peak in a frequency range: $$f_0 = \frac{k_{\text{peak}} f_s}{N}$$ It is simple and fast, but more sensitive to noise and harmonic dominance than autocorrelation. **Reading the formula in C:** ```c // x[n] -> frame[n], fs -> sample_rate // 1) Apply Hann window: for (unsigned n = 0; n < N; n++) { double w = 0.5 * (1.0 - cos(2.0 * M_PI * (double)n / (double)(N - 1))); xw[n] = frame[n] * w; } // 2) Compute magnitude spectrum directly from DFT definition (educational form): for (unsigned k = 0; k <= N / 2; k++) { double re = 0.0, im = 0.0; for (unsigned n = 0; n < N; n++) { double phase = 2.0 * M_PI * (double)k * (double)n / (double)N; re += xw[n] * cos(phase); im -= xw[n] * sin(phase); } mag[k] = sqrt(re * re + im * im); } // 3) Search bins mapped from requested F0 range: unsigned k_min = (unsigned)ceil(min_freq_hz * (double)N / sample_rate); unsigned k_max = (unsigned)floor(max_freq_hz * (double)N / sample_rate); unsigned k_peak = k_min; for (unsigned k = k_min; k <= k_max; k++) { if (mag[k] > mag[k_peak]) k_peak = k; } double f0_hz = (double)k_peak * sample_rate / (double)N; ``` --- ## Frame-Wise Tracking In practice, pitch is estimated frame-by-frame over time: ```c for (unsigned f = 0; f < num_frames; f++) { unsigned start = f * hop; const double *frame = signal + start; unsigned center = start + frame_len / 2; if (center >= N) center = N - 1; /* Ground truth for this frame (piecewise-constant by construction). */ f0_true[f] = (center < seg1) ? 140.0 : (center < seg2 ? 220.0 : 320.0); times[f] = (double)center / sample_rate; //! [acf-reading-algorithm] f0_acf[f] = MD_f0_autocorrelation(frame, frame_len, sample_rate, min_f0, max_f0); //! [acf-reading-algorithm] //! [fft-reading-algorithm] f0_fft[f] = MD_f0_fft(frame, frame_len, sample_rate, min_f0, max_f0); //! [fft-reading-algorithm] } ``` --- ## Visual Comparison --- ## Failure Modes and Trade-offs - **Autocorrelation** can fail on weakly voiced/noisy frames where no clear lag peak exists. - **FFT peak pick** can lock onto [harmonics](https://en.wikipedia.org/wiki/Harmonic) (e.g. 2f0, 3f0) when the fundamental is weak. - Restricting the search range (`min_freq_hz`, `max_freq_hz`) is critical for both methods. - Short frames improve time resolution but reduce frequency/lag resolution. Both miniDSP APIs return `0.0` when no reliable F0 peak is found. --- ## API Reference - MD_f0_autocorrelation() -- F0 from autocorrelation peak - MD_f0_fft() -- F0 from Hann-windowed FFT magnitude peak - MD_shutdown() -- release cached FFT plans/resources when done --- # DTMF Tone Detection and Generation [Dual-Tone Multi-Frequency (DTMF)](https://en.wikipedia.org/wiki/Dual-tone_multi-frequency_signaling) is the signalling system used by touch-tone telephones. Each keypad button is encoded as the sum of two sinusoids -- one from a low-frequency **row** group and one from a high-frequency **column** group. The receiver decodes the button by identifying both frequencies. miniDSP provides ITU-T Q.24-compliant detection and generation in `src/minidsp_dtmf.c`, demonstrated in `examples/dtmf_detector.c`. Build and run the self-test from the repository root: ```sh make -C examples dtmf_detector cd examples && ./dtmf_detector ``` --- ## The DTMF frequency table Each button sits at the intersection of one row and one column frequency: | | 1209 Hz | 1336 Hz | 1477 Hz | 1633 Hz | |:----------:|:-------:|:-------:|:-------:|:-------:| | **697 Hz** | 1 | 2 | 3 | A | | **770 Hz** | 4 | 5 | 6 | B | | **852 Hz** | 7 | 8 | 9 | C | | **941 Hz** | * | 0 | # | D | The frequencies were chosen so that no tone is a harmonic of another (ratios are never simple integers), preventing false triggers from harmonically rich signals like speech. **Spectrogram** of the sequence "159#" (70 ms tones, 70 ms pauses, 8 kHz). Each digit appears as a pair of horizontal bands — one row frequency and one column frequency. Dashed lines mark the eight DTMF frequencies: --- ## ITU-T Q.24 timing constraints [ITU-T Recommendation Q.24](https://en.wikipedia.org/wiki/Q.24) specifies minimum timing for reliable DTMF signalling: | Parameter | Minimum | |:---------------------------------|:-------:| | Tone duration for valid digit | 40 ms | | Inter-digit pause | 40 ms | In practice, telephone systems use 70--120 ms tones and pauses. The miniDSP detector enforces the 40 ms minimums via a frame-counting state machine; the generator asserts that requested durations meet the minimums. --- ## Signal model A single DTMF digit is the sum of two sinusoids at equal amplitude: $$x[n] = A\,\sin\!\bigl(2\pi\, f_{\text{row}}\, n / f_s\bigr) + A\,\sin\!\bigl(2\pi\, f_{\text{col}}\, n / f_s\bigr), \qquad n = 0, 1, \ldots, N_{\text{tone}}-1$$ where $A = 0.5$ so the peak combined amplitude is 1.0, $f_s$ is the sampling rate, and $N_{\text{tone}}$ is the number of samples per tone. **Reading the formula in C:** ```c // A -> 0.5, f_row/f_col -> row_freq/col_freq, fs -> sample_rate // n -> i, x[n] -> output[offset + i] for (unsigned i = 0; i < tone_samples; i++) { double t = (double)i / sample_rate; output[offset + i] = 0.5 * sin(2 * M_PI * row_freq * t) + 0.5 * sin(2 * M_PI * col_freq * t); } ``` The library implementation uses MD_sine_wave() to generate each component separately, then sums them. --- ## Generation **API:** ```c unsigned len = MD_dtmf_signal_length(num_digits, sample_rate, tone_ms, pause_ms); double *sig = malloc(len * sizeof(double)); MD_dtmf_generate(sig, "5551234", sample_rate, tone_ms, pause_ms); ``` The total signal length in samples is: $$N = D \cdot \left\lfloor \frac{t_{\text{tone}} \cdot f_s}{1000} \right\rfloor + (D - 1) \cdot \left\lfloor \frac{t_{\text{pause}} \cdot f_s}{1000} \right\rfloor$$ where $D$ is the number of digits. **Reading the formula in C:** ```c // D -> num_digits, t_tone -> tone_ms, t_pause -> pause_ms, fs -> sample_rate // floor(t_tone * fs / 1000) -> tone_samples unsigned tone_samples = (unsigned)(tone_ms * sample_rate / 1000.0); unsigned pause_samples = (unsigned)(pause_ms * sample_rate / 1000.0); unsigned N = num_digits * tone_samples + (num_digits - 1) * pause_samples; ``` **Quick example** -- generate a DTMF sequence and save as WAV: ```c static int valid_dtmf_char(char ch) { return (ch >= '0' && ch <= '9') || ch == '*' || ch == '#' || ch == 'A' || ch == 'a' || ch == 'B' || ch == 'b' || ch == 'C' || ch == 'c' || ch == 'D' || ch == 'd'; } static int generate_wav(const char *digits, const char *outfile) { const double sample_rate = 8000.0; const unsigned tone_ms = 70; const unsigned pause_ms = 70; if (digits[0] == '\0') { fprintf(stderr, "Digit string must not be empty\n"); return 1; } for (const char *p = digits; *p; p++) { if (!valid_dtmf_char(*p)) { fprintf(stderr, "Invalid DTMF character '%c'. " "Valid: 0-9, A-D, *, #\n", *p); return 1; } } unsigned num_digits = (unsigned)strlen(digits); unsigned signal_len = MD_dtmf_signal_length(num_digits, sample_rate, tone_ms, pause_ms); double *signal = malloc(signal_len * sizeof(double)); if (!signal) { fprintf(stderr, "allocation failed\n"); return 1; } MD_dtmf_generate(signal, digits, sample_rate, tone_ms, pause_ms); /* Convert double -> float for WAV writing. */ float *fdata = malloc(signal_len * sizeof(float)); if (!fdata) { free(signal); fprintf(stderr, "allocation failed\n"); return 1; } for (unsigned i = 0; i < signal_len; i++) fdata[i] = (float)signal[i]; int ret = FIO_write_wav(outfile, fdata, signal_len, (unsigned)sample_rate); if (ret == 0) printf("Generated DTMF \"%s\" -> %s (%u samples, %.3f s)\n", digits, outfile, signal_len, (double)signal_len / sample_rate); else fprintf(stderr, "Error writing %s\n", outfile); free(fdata); free(signal); return ret; } ``` --- ## Detection algorithm Detection slides a Hanning-windowed FFT frame across the audio signal: 1. **FFT size** is the largest power of two whose window fits within 35 ms (e.g. $N = 256$ at 8 kHz, giving $\Delta f = 31.25$ Hz). Keeping the window shorter than the 40 ms Q.24 minimum pause ensures the state machine can resolve inter-digit gaps. 2. **Hop** is $N/4$ (75 % overlap). 3. **Per frame:** apply Hanning window, compute MD_magnitude_spectrum(), normalise to single-sided amplitude, then check the magnitude at each of the eight DTMF frequency bins. 4. A digit is detected when both the strongest row and strongest column exceed a threshold (8$\times$ the mean spectral magnitude, roughly 18 dB above the noise floor). 5. A **state machine** enforces ITU-T Q.24 timing: | State | Transition condition | Action | |:------------|:----------------------------------------|:-----------------------------| | **IDLE** | Digit detected | Enter PENDING, start counter | | **PENDING** | Same digit for $\geq$ 40 ms | Enter ACTIVE (confirmed) | | **PENDING** | Different digit or silence | Return to IDLE | | **ACTIVE** | Same digit continues | Update end time | | **ACTIVE** | Silence / different for $\geq$ 40 ms | Emit tone, return to IDLE | **Single-sided amplitude normalisation:** $$\hat{X}[k] = \begin{cases} |X[k]| / N & k = 0 \text{ or } k = N/2 \\[4pt] 2\,|X[k]| / N & 0 < k < N/2 \end{cases}$$ **Reading the normalisation in C:** ```c // X[k] -> mag[k] (raw FFTW output), N -> FFT size for (unsigned k = 0; k < num_bins; k++) { mag[k] /= (double)N; // divide by FFT size if (k > 0 && k < N / 2) mag[k] *= 2.0; // fold negative frequencies } ``` **Quick example** -- detect DTMF tones in a WAV file: ```c static int detect_file(const char *infile) { float *fdata = NULL; size_t datalen = 0; unsigned samprate = 0; if (FIO_read_audio(infile, &fdata, &datalen, &samprate, 1) != 0) { fprintf(stderr, "Error reading %s\n", infile); return 1; } if (datalen == 0) { fprintf(stderr, "File contains no audio samples\n"); free(fdata); return 1; } if (samprate < 4000) { fprintf(stderr, "Sample rate %u Hz is too low for DTMF detection " "(minimum 4000 Hz)\n", samprate); free(fdata); return 1; } if (datalen > UINT_MAX) { fprintf(stderr, "File too large (%zu samples, max %u)\n", datalen, UINT_MAX); free(fdata); return 1; } printf("Read %s: %zu samples at %u Hz (%.3f s)\n", infile, datalen, samprate, (double)datalen / (double)samprate); /* Convert float -> double for the library. */ double *signal = malloc(datalen * sizeof(double)); if (!signal) { free(fdata); fprintf(stderr, "allocation failed\n"); return 1; } for (size_t i = 0; i < datalen; i++) signal[i] = (double)fdata[i]; free(fdata); /* Detect. */ MD_DTMFTone tones[256]; unsigned n = MD_dtmf_detect(signal, (unsigned)datalen, (double)samprate, tones, 256); printf("\nDetected %u DTMF tone%s:\n", n, n == 1 ? "" : "s"); if (n > 0) { printf(" %-6s %-12s %-12s\n", "Digit", "Start (s)", "End (s)"); for (unsigned i = 0; i < n; i++) printf(" %-6c %-12.3f %-12.3f\n", tones[i].digit, tones[i].start_s, tones[i].end_s); } free(signal); MD_shutdown(); return 0; } ``` --- ## Self-test mode Running the example with no arguments generates a known digit sequence, detects it, and verifies correctness: ```c static int self_test(void) { const char *test_digits = "14*258039#"; const double sample_rate = 8000.0; const unsigned tone_ms = 70; const unsigned pause_ms = 70; unsigned num_digits = (unsigned)strlen(test_digits); unsigned signal_len = MD_dtmf_signal_length(num_digits, sample_rate, tone_ms, pause_ms); printf("Self-test: generating DTMF sequence \"%s\"\n", test_digits); printf(" sample_rate = %.0f Hz, tone = %u ms, pause = %u ms\n", sample_rate, tone_ms, pause_ms); double *signal = malloc(signal_len * sizeof(double)); if (!signal) { fprintf(stderr, "allocation failed\n"); return 1; } MD_dtmf_generate(signal, test_digits, sample_rate, tone_ms, pause_ms); MD_DTMFTone tones[64]; unsigned n = MD_dtmf_detect(signal, signal_len, sample_rate, tones, 64); printf("\nDetected %u DTMF tone%s:\n", n, n == 1 ? "" : "s"); printf(" %-6s %-12s %-12s\n", "Digit", "Start (s)", "End (s)"); for (unsigned i = 0; i < n; i++) printf(" %-6c %-12.3f %-12.3f\n", tones[i].digit, tones[i].start_s, tones[i].end_s); /* Verify. */ int pass = 1; if (n != num_digits) { printf("\nSelf-test FAILED: expected %u digits, detected %u\n", num_digits, n); pass = 0; } else { for (unsigned i = 0; i < num_digits; i++) { if (tones[i].digit != test_digits[i]) { printf("\nSelf-test FAILED: digit %u expected '%c' got '%c'\n", i, test_digits[i], tones[i].digit); pass = 0; break; } } } if (pass) printf("\nSelf-test PASSED: all %u digits detected correctly\n", num_digits); free(signal); MD_shutdown(); return pass ? 0 : 1; } ``` --- ## Frequency resolution and bin mapping For a given FFT size $N$ and sampling rate $f_s$, each bin $k$ corresponds to frequency: $$f_k = k \cdot \frac{f_s}{N}$$ **Reading the formula in C:** ```c // k -> bin index, fs -> sample_rate, N -> FFT size // f_k -> freq (the frequency that bin k represents) double freq = (double)k * sample_rate / (double)N; ``` The nearest bin for a DTMF frequency $f$ is: $$k = \mathrm{round}\!\left(\frac{f \cdot N}{f_s}\right)$$ **Reading the formula in C:** ```c // f -> dtmf_freq, N -> FFT size, fs -> sample_rate // k -> bin (nearest FFT bin for the DTMF frequency) unsigned bin = (unsigned)(dtmf_freq * N / sample_rate + 0.5); ``` The detector checks bins $k-1$, $k$, and $k+1$ and takes the maximum magnitude, compensating for the slight frequency mismatch when the DTMF frequency does not fall exactly on a bin centre. At 8 kHz with $N = 256$: | DTMF freq | Nearest bin | Bin freq | Error | |:----------|:-----------:|:---------:|:-------:| | 697 Hz | 22 | 687.5 Hz | -9.5 | | 770 Hz | 25 | 781.3 Hz | +11.3 | | 852 Hz | 27 | 843.8 Hz | -8.2 | | 941 Hz | 30 | 937.5 Hz | -3.5 | | 1209 Hz | 39 | 1218.8 Hz | +9.8 | | 1336 Hz | 43 | 1343.8 Hz | +7.8 | | 1477 Hz | 47 | 1468.8 Hz | -8.2 | | 1633 Hz | 52 | 1625.0 Hz | -8.0 | All errors are well within the ±1.5 % tolerance specified by ITU-T. The detector also checks the two adjacent bins (±1) to handle residual frequency mismatch. --- # Spectrogram Text Art Generate audio that displays readable text when viewed as a spectrogram. ## The idea A spectrogram is a time-frequency picture of a signal: the x-axis is time, the y-axis is frequency, and brightness encodes magnitude. If we place sine waves at specific frequencies during specific time intervals, we can "draw" in the spectrogram. Given a short string like `"HELLO"`, the library rasterises it with a tiny bitmap font, maps each pixel to a frequency band, and synthesises the corresponding tones. The result is a WAV file that sounds like a buzzy chord but reveals the hidden message visually. ## How it works **Bitmap font** — Each printable ASCII character (32–126) is stored as a 5-column, 7-row bitmap. One byte per column, one bit per row. Characters are spaced three columns apart, giving a grid width of $ W = 8 \times \mathrm{len} - 3 $ columns and 7 rows. **Frequency mapping** — Row 0 maps to `freq_hi` (top of the spectrogram), row 6 maps to `freq_lo` (bottom). Intermediate rows are linearly interpolated: $$f_r = f_\mathrm{hi} - \frac{r}{6} \, (f_\mathrm{hi} - f_\mathrm{lo})$$ **Reading the formula in C:** ```c // f_hi -> freq_hi, f_lo -> freq_lo, r -> r, f_r -> row_freq[r] // The font has 7 rows, so the denominator is 6 (= 7 − 1). double row_freq[7]; for (unsigned r = 0; r < 7; r++) row_freq[r] = freq_hi - (double)r / 6.0 * (freq_hi - freq_lo); ``` **Synthesis** — Each bitmap column occupies `col_samples = duration / W * sample_rate` audio samples. For every "on" pixel in that column a sine wave at frequency $ f_r $ is added. Phase is carried continuously across columns so that sustained tones (adjacent "on" pixels in the same row) produce a smooth sinusoid. **Crossfade** — A 3 ms raised-cosine ramp is applied at tone onsets and offsets (transitions between adjacent "on" and "off" columns in the same row) to suppress broadband clicks. **Normalisation** — After all columns are synthesised the entire signal is scaled so that the peak absolute amplitude is 0.9. ## API ```c unsigned MD_spectrogram_text(double *output, unsigned max_len, const char *text, double freq_lo, double freq_hi, double duration_sec, double sample_rate); ``` **Parameters:** | Parameter | Description | |----------------|-------------| | `output` | Caller-allocated buffer for the synthesised audio. | | `max_len` | Size of `output` in samples (must be >= returned value). | | `text` | Printable ASCII string to render (non-empty). | | `freq_lo` | Lowest frequency in Hz (bottom of text in spectrogram). | | `freq_hi` | Highest frequency in Hz (top of text in spectrogram). | | `duration_sec` | Total signal duration in seconds. | | `sample_rate` | Sample rate in Hz (`freq_hi` must be < `sample_rate / 2`). | **Returns** the number of samples written. ## Quick example ```c #include "minidsp.h" double buf[64000]; unsigned n = MD_spectrogram_text(buf, 64000, "HELLO", 200.0, 7500.0, 2.0, 16000.0); /* buf[0..n-1] contains the audio. Compute its STFT to see "HELLO". */ ``` ## Example program The example `examples/spectrogram_text.c` generates a WAV file, an interactive HTML spectrogram, and a static PNG image. **Usage:** ```sh ./spectrogram_text [TEXT] [--colormap hot|grayscale] ``` Default text is `"HELLO"`, default colormap is `hot`. **Synthesis and STFT parameters** used by the example (so you can replicate the spectrogram with your own tools): | Parameter | Value | |----------------|-------------------| | Sample rate | 16000 Hz | | Frequency range| 400–7300 Hz | | Duration | 2.25 s | | Silence padding| 0.5 s before/after (default) | | STFT FFT size | 1024 | | STFT hop | 16 samples (1 ms) | | dB range | −80 to 0 dB | | Colorscale | Hot (or grayscale) | Outputs: `spectrogram_text.wav`, `spectrogram_text.html`, `spectrogram_text.png`. ## Viewing the result **Listen** — the synthesised "HELLO" (buzzy chord): **Spectrogram** — each letter is visible as a cluster of horizontal bands in the time-frequency plane: **Build and run:** ```sh make -C examples spectrogram_text cd examples && ./spectrogram_text "HELLO" open spectrogram_text.html # interactive spectrogram open spectrogram_text.png # static image ``` --- # Shepard Tone Generator A [Shepard tone](https://en.wikipedia.org/wiki/Shepard_tone) is the auditory equivalent of an [M.C. Escher](https://en.wikipedia.org/wiki/M._C._Escher) staircase: a sound that seems to rise (or fall) in pitch forever without ever actually leaving its frequency range. The effect was first described by cognitive scientist [Roger Shepard](https://en.wikipedia.org/wiki/Roger_Shepard) in 1964. miniDSP provides a single-call generator in `src/minidsp_generators.c`, demonstrated in `examples/shepard_tone.c`. Build and run from the repository root: ```sh make -C examples shepard_tone cd examples && ./shepard_tone ``` --- ## How it works The illusion rests on two ideas: 1. **Octave equivalence** — the human ear perceives tones one octave apart as the "same note" at a different pitch height. 2. **Spectral envelope** — a fixed [Gaussian](https://en.wikipedia.org/wiki/Gaussian_function) bell curve in log-frequency space controls how loud each tone is. Tones near the centre are loud; tones at the edges are nearly silent. Several sine waves are sounded simultaneously, each separated by one octave. All of them glide upward (or downward) in pitch at the same rate. As a tone approaches the upper edge of the Gaussian, it fades to silence. Meanwhile, a new tone enters at the lower edge, fading in. Because the only thing the ear can latch onto (the **loudest** tones) are always in the middle and always going up, the sound appears to ascend indefinitely. This diagram shows the principle for a **rising** Shepard tone: ``` Amplitude | ╭───╮ | ╱ ╲ ← Gaussian spectral envelope | ╱ ● ╲ (fixed in log-frequency) | ╱ ● ● ╲ | ╱ ● ● ╲ | ╱ ● ● ╲ | ╱ ● ● ╲ +--●─────────────────────────────●──→ log₂(frequency) ↑ ↑ low edge high edge (fading in) (fading out) ● = individual octave-spaced tones, all gliding → ``` --- ## Signal model At time $t = n / f_s$, the output sample is: $$x[n] \;=\; A_\text{norm}\,\sum_k\; \underbrace{ \exp\!\Bigl(-\frac{d_k(t)^2}{2\sigma^2}\Bigr) }_{\text{Gaussian envelope}} \;\sin\!\bigl(\varphi_k(n)\bigr)$$ where: | Symbol | Meaning | |--------|---------| | $k$ | Layer index (one per octave) | | $d_k(t) = k - c + R\,t$ | Octave distance from the Gaussian centre at time $t$ | | $c = (L-1)/2$ | Centre of the layer range | | $\sigma = L/4$ | Gaussian width in octaves | | $R$ | Glissando rate (`rate_octaves_per_sec`): positive = rising | | $L$ | Number of audible octave layers (`num_octaves`) | | $f_k(t) = f_\text{base} \cdot 2^{d_k(t)}$ | Instantaneous frequency of layer $k$ | | $\varphi_k(n)$ | Phase accumulated sample-by-sample from $f_k$ | | $A_\text{norm}$ | Normalisation factor so peak amplitude equals the requested `amplitude` | The Gaussian is fixed in log-frequency space while the tones glide through it. Only layers within $\pm 5\sigma$ are computed; layers whose frequency exceeds the [Nyquist frequency](https://en.wikipedia.org/wiki/Nyquist_frequency) or falls below 20 Hz are silently skipped. --- ## Reading the formula in C The core synthesis loop — what `MD_shepard_tone()` does internally: ```c // R -> rate, L -> num_octaves, f_base -> base_freq, fs -> sample_rate // c = (L - 1) / 2, sigma = L / 4 // d_k(t) -> d, f_k(t) -> freq, phi_k(n) -> phases[k] // x[n] -> output[i] double c = (double)(num_octaves - 1) / 2.0; double sigma = (double)num_octaves / 4.0; for (unsigned i = 0; i < N; i++) { double t = (double)i / sample_rate; double sample = 0.0; for (int k = k_min; k <= k_max; k++) { double d = (double)k - c + rate * t; // octave distance from centre double freq = base_freq * pow(2.0, d); // instantaneous frequency double gauss = exp(-0.5 * d * d / (sigma * sigma)); // Gaussian weight phases[k] += 2.0 * M_PI * freq / sample_rate; // accumulate phase sample += gauss * sin(phases[k]); // add weighted sine } output[i] = sample; // (later normalised to peak amplitude) } ``` --- ## Parameters and their effect ### Glissando rate (rate_octaves_per_sec) Controls how fast the tones rise or fall. | Rate | Effect | |------|--------| | 0.0 | Static chord — no movement, just octave-spaced sines | | 0.25 | Slow, dreamy ascent (4 seconds per octave) | | 0.5 | Moderate rise (default) — 2 seconds per octave | | 1.0 | Fast rise — 1 second per octave | | −0.5 | Moderate descent — the "falling" Shepard tone | **Listen** — rising at 0.5 oct/s (5 seconds): **Listen** — falling at 0.5 oct/s (5 seconds): **Listen** — static chord (5 seconds): ### Rising spectrogram The spectrogram below shows the characteristic pattern of a rising Shepard tone: parallel diagonal lines (one per octave layer) sweeping upward through the Gaussian bell curve. Tones fade in at the bottom and fade out at the top. ### Falling spectrogram The falling variant mirrors the rising pattern — diagonal lines sweep **downward**. --- ### Number of octaves (num_octaves) Controls how many simultaneous octave layers are present and the width of the Gaussian envelope ($\sigma = L/4$). | Value | Typical use | |-------|-------------| | 4–6 | Narrow bell — prominent entry/exit of tones; more "organ-like" | | 8 | Default — smooth, balanced illusion | | 10–12 | Wide bell — very gradual fading; ethereal, diffuse texture | More octaves means more layers span the audible range at any instant, making the transitions smoother at the expense of a busier spectrum. ### Base frequency (base_freq) The centre of the Gaussian bell curve. Tones above this frequency are treated the same as those below it (the Gaussian is symmetric in log-frequency space). Typical values: 200–600 Hz. --- ## API ```c void MD_shepard_tone(double *output, unsigned N, double amplitude, double base_freq, double sample_rate, double rate_octaves_per_sec, unsigned num_octaves); ``` **Parameters:** | Parameter | Description | |--------------------------|-------------| | `output` | Caller-allocated buffer for the synthesised audio. | | `N` | Number of samples to generate. Must be > 0. | | `amplitude` | Peak amplitude of the output signal. | | `base_freq` | Centre frequency of the Gaussian envelope in Hz (e.g. 440). | | `sample_rate` | Sample rate in Hz. Must be > 0. | | `rate_octaves_per_sec` | Glissando rate: positive = rising, negative = falling, 0 = static. | | `num_octaves` | Number of audible octave layers (Gaussian width). Must be > 0. | --- ## Quick example ```c #include "minidsp.h" #include // 5 seconds of endlessly rising Shepard tone at 44.1 kHz unsigned N = 5 * 44100; double *sig = malloc(N * sizeof(double)); MD_shepard_tone(sig, N, 0.8, 440.0, 44100.0, 0.5, 8); // sig[] now sounds like it rises forever free(sig); ``` --- ## Example program The example `examples/shepard_tone.c` generates a WAV file and an interactive HTML spectrogram. **Usage:** ```sh ./shepard_tone [--rising | --falling | --static] [--rate OCTAVES_PER_SEC] [--octaves NUM] [--base FREQ_HZ] [--duration SEC] ``` Default: rising at 0.5 oct/s, 440 Hz base, 8 octaves, 5 seconds. **Generate and listen:** *(snippet: shepard_tone.c [generate-signal])* **Build and run:** ```sh make -C examples shepard_tone cd examples && ./shepard_tone open shepard_tone.html # interactive spectrogram ``` --- ## Why it works — the psychoacoustics The Shepard tone exploits a fundamental ambiguity in pitch perception. Pitch has two dimensions: - **[Pitch chroma](https://en.wikipedia.org/wiki/Pitch_class)** — which note it is (C, D, E, …), determined by the position within the octave. - **Pitch height** — how high or low it sounds overall. The Gaussian envelope removes pitch-height cues: there is no single "highest" or "lowest" tone to anchor the percept. All the listener hears is the chroma — and the chroma is always going up. The effect is even more striking when a rising Shepard tone is followed by a falling one. Despite the falling version being physically the mirror image, many listeners perceive *both* as rising — a dramatic demonstration of how expectation shapes perception. [Jean-Claude Risset](https://en.wikipedia.org/wiki/Jean-Claude_Risset) later extended the idea to **continuous glissando** (the Shepard–Risset glissando), which is exactly what `MD_shepard_tone()` implements: instead of discrete steps, the tones slide smoothly. --- ## Further reading - [Roger N. Shepard, "Circularity in Judgments of Relative Pitch" (1964)](https://doi.org/10.1121/1.1908239) — the original paper. - [Shepard tone — Wikipedia](https://en.wikipedia.org/wiki/Shepard_tone) - [Shepard–Risset glissando — Wikipedia](https://en.wikipedia.org/wiki/Shepard_tone#Shepard%E2%80%93Risset_glissando) - Diana Deutsch, [Musical Illusions and Phantom Words](https://deutsch.ucsd.edu/psychology/pages.php?i=201) — demonstrations and further reading on auditory illusions. --- # Audio Steganography [Steganography](https://en.wikipedia.org/wiki/Steganography) is the practice of hiding a secret message inside an innocuous-looking cover medium. **Audio steganography** hides data inside an audio signal so that a casual listener hears only the original sound, while a decoder can extract the hidden payload. miniDSP provides three complementary methods in `src/minidsp_steg.c`, demonstrated in `tools/audio_steg/audio_steg.c`: | Method | Identifier | Capacity | Robustness | Audibility | |:-------|:-----------|:---------|:-----------|:-----------| | **LSB** (Least Significant Bit) | `MD_STEG_LSB` | High (~1 bit/sample) | Fragile | Inaudible (~-90 dB) | | **Frequency-band** (BFSK) | `MD_STEG_FREQ_BAND` | Lower (~2.6 kbit/s) | Moderate | Near-inaudible (ultrasonic) | | **Spectrogram text** (hybrid) | `MD_STEG_SPECTEXT` | ~4 chars/sec visual | Fragile (like LSB) | Inaudible (ultrasonic) | Build and run the self-test from the repository root: ```sh make -C tools/audio_steg cd tools/audio_steg && ./audio_steg ``` --- ## Message framing Both methods prepend a **32-bit little-endian header** before the payload. Bits 0–30 hold the message byte count; bit 31 is a **payload type flag** (0 = text, 1 = binary). This allows the decoder to recover the message without knowing its length in advance, and enables `MD_steg_detect()` to identify the payload type: ``` [ bit 31: type flag | bits 0-30: msg_len (LE) ] [ 8 * msg_len bits: payload ] ``` Each bit of the header and payload is encoded independently using the chosen method. Bits within each byte are transmitted LSB-first. --- ## Method 1: Least Significant Bit (LSB) ### The idea Audio samples are typically stored as 16-bit integers (-32768 to +32767). The least significant bit of each sample contributes only ±1 to a range of 65536 — a change of about -90 dB relative to full scale. By replacing the LSB of each sample with a message bit, we embed data that is completely inaudible. ### Signal model The host signal $x[n] \in [-1, 1]$ is quantised to 16-bit PCM: $$p[n] = \mathrm{round}(x[n] \times 32767)$$ The LSB is then overwritten with message bit $b_k$: $$p'[n] = (p[n] \mathbin{\&} \sim 1) \mathbin{|} b_k$$ and the stego sample is converted back to double: $$y[n] = p'[n] \;/\; 32767$$ The maximum distortion per sample is: $$|y[n] - x[n]| \leq \frac{1}{32767} \approx 3.05 \times 10^{-5}$$ **Reading the formula in C:** ```c // x[n] -> host[i], p[n] -> pcm, b_k -> bit, y[n] -> output[i] int pcm = (int)(host[i] * 32767.0); // quantise to 16-bit pcm = (pcm & ~1) | bit; // overwrite LSB output[i] = (double)pcm / 32767.0; // convert back ``` ### Capacity One bit per sample, minus the 32-bit header: $$C_{\text{LSB}} = \frac{N - 32}{8} \text{ bytes}$$ where $N$ is the signal length in samples. **Reading the formula in C:** ```c // N -> signal_len, C_LSB -> capacity unsigned capacity = (signal_len - 32) / 8; ``` For a 3-second signal at 44.1 kHz ($N = 132300$): $$C_{\text{LSB}} = \frac{132300 - 32}{8} = 16533 \text{ bytes} \approx 16 \text{ KB}$$ **LSB capacity at 44.1 kHz by audio duration:** | Audio duration | Samples | Max payload | Equivalent | |:---------------|--------:|------------:|:-----------| | 1 s | 44 100 | 5 508 B | ~5 KB (small config file) | | 5 s | 220 500 | 27 558 B | ~27 KB (web thumbnail) | | 30 s | 1 323 000 | 165 371 B | ~161 KB (high-res photo) | | 1 min | 2 646 000 | 330 746 B | ~323 KB (short PDF) | | 5 min | 13 230 000 | 1 653 746 B | ~1.6 MB (multi-page document) | | 10 min | 26 460 000 | 3 307 496 B | ~3.2 MB (zip archive) | | 30 min | 79 380 000 | 9 922 496 B | ~9.5 MB (high-res image set) | | 1 hour | 158 760 000 | 19 844 996 B | ~18.9 MB (small software package) | ### Listening comparison **Original host signal** (440 Hz sine, 3 seconds): **After LSB encoding** (message hidden inside): The two are perceptually identical. The difference signal (host minus stego) is pure quantisation noise at -90 dB: ### Trade-offs | Advantage | Disadvantage | |:----------|:-------------| | Very high capacity | Destroyed by any lossy compression (MP3, AAC, Opus) | | Zero audible distortion | Destroyed by resampling or sample-rate conversion | | Simple, fast implementation | Destroyed by amplitude scaling or normalisation | | Works at any sample rate | Requires lossless transport (WAV, FLAC) | --- ## Method 2: Frequency-Band Modulation (BFSK) ### The idea Human hearing sensitivity falls off sharply above ~16 kHz, and most adults cannot hear tones above 18 kHz. By adding low-amplitude tones in the 18–20 kHz "near-ultrasonic" band, we can encode data that is effectively inaudible. The encoding uses [Binary Frequency-Shift Keying (BFSK)](https://en.wikipedia.org/wiki/Frequency-shift_keying): each bit is represented by a short burst ("chip") of a sinusoidal tone at one of two carrier frequencies. ### Carrier frequencies | Bit value | Carrier frequency | |:---------:|:-----------------:| | 0 | 18500 Hz | | 1 | 19500 Hz | Both carriers are above the typical hearing threshold, and the 1 kHz separation provides reliable discrimination during decoding. ### Chip duration Each bit occupies a **3 ms chip** — a burst of $C$ samples: $$C = \left\lfloor \frac{3.0 \times f_s}{1000} \right\rfloor$$ **Reading the formula in C:** ```c // C -> chip_samples, fs -> sample_rate unsigned chip_samples = (unsigned)(3.0 * sample_rate / 1000.0); ``` At 44.1 kHz, $C = 132$ samples per chip. ### Encoding For each bit $b_k$, a sine burst at the selected carrier frequency is added to the host signal at amplitude $A = 0.02$ (-34 dB): $$y[n] = x[n] + A \sin\!\bigl(2\pi\, f_{b_k}\, (n - n_0) / f_s\bigr), \qquad n \in [n_0,\; n_0 + C)$$ where $n_0 = k \cdot C$ is the start sample of chip $k$ and $f_{b_k}$ is 18500 Hz (bit 0) or 19500 Hz (bit 1). **Reading the formula in C:** ```c // A -> TONE_AMP (0.02), f_bk -> freq, n0 -> start, fs -> sample_rate // x[n] -> output[start+s] (already contains host), y[n] -> output[start+s] for (unsigned s = 0; s < chip_samples; s++) { double t = (double)s / sample_rate; output[start + s] += 0.02 * sin(2.0 * M_PI * freq * t); } ``` ### Decoding Each chip is correlated against both carrier frequencies. The carrier with the larger absolute correlation determines the bit value: $$r_f = \sum_{s=0}^{C-1} y[n_0 + s] \,\sin\!\bigl(2\pi\, f \, s / f_s\bigr)$$ $$b_k = \begin{cases} 1 & |r_{19500}| > |r_{18500}| \\ 0 & \text{otherwise} \end{cases}$$ **Reading the formula in C:** ```c // r_f -> corr_lo / corr_hi, y[n0+s] -> stego[start+s] double corr_lo = 0.0, corr_hi = 0.0; for (unsigned s = 0; s < chip_samples; s++) { double t = (double)s / sample_rate; corr_lo += stego[start + s] * sin(2.0 * M_PI * 18500.0 * t); corr_hi += stego[start + s] * sin(2.0 * M_PI * 19500.0 * t); } unsigned bit = (fabs(corr_hi) > fabs(corr_lo)) ? 1 : 0; ``` ### Capacity $$C_{\text{freq}} = \frac{\lfloor N / C \rfloor - 32}{8} \text{ bytes}$$ **Reading the formula in C:** ```c // N -> signal_len, C -> chip_samples unsigned total_chips = signal_len / chip_samples; unsigned capacity = (total_chips - 32) / 8; ``` At 44.1 kHz with a 3-second signal ($N = 132300$, $C = 132$): $$C_{\text{freq}} = \frac{\lfloor 132300 / 132 \rfloor - 32}{8} = \frac{1002 - 32}{8} = 121 \text{ bytes}$$ ### Listening comparison **After frequency-band encoding** (same host, message hidden via BFSK): The added carriers at 18.5/19.5 kHz are above most listeners' hearing range. **Spectrogram** showing the hidden BFSK signal above the 440 Hz host tone: The faint horizontal bands near the top of the spectrogram are the BFSK carriers. The main 440 Hz tone dominates the audible range. ### Trade-offs | Advantage | Disadvantage | |:----------|:-------------| | Survives mild additive noise | Lower capacity than LSB | | Frequency-domain robustness | Requires sample_rate >= 40 kHz | | Inaudible to most listeners | May be audible to young listeners with excellent high-frequency hearing | | Amenable to spectral analysis | Vulnerable to low-pass filtering above 18 kHz | --- ## Method 3: Spectrogram Text (spectext) ### The idea What if a hidden message were visible to the *human eye* as well as recoverable by machine? The **spectext** method combines LSB data encoding (for reliable machine decode) with **spectrogram text art** in the 18--23.5 kHz ultrasonic band (for visual verification). Open the stego file in any spectrogram viewer and the message is spelled out in the high frequencies — while a listener hears nothing unusual. The spectrogram art also acts as a **tamper indicator**: if the text is intact, the LSB data likely is too. ### Encode pipeline ``` host.wav ──┐ (any SR) │ ▼ ┌──────────────┐ ┌────────────────────┐ │ MD_resample()│────▶│ host @ 48 kHz │ │ to 48 kHz │ └────────┬───────────┘ └──────────────┘ │ ▼ ┌───────────────────────────┐ │ MD_lowpass_brickwall() │ │ cutoff = original_SR / 2 │ │ (eliminates resampler │ │ spectral images) │ └────────────┬──────────────┘ │ "SECRET" ──┐ │ ▼ │ ┌──────────────────────┐ │ │MD_spectrogram_text() │ │ │ freq: 18–23.5 kHz │ │ │ 30 ms / column │ │ │ amplitude: 0.02 │ │ └──────────┬───────────┘ │ │ mix (add) │ ▼ ▼ ┌──────────────────────────┐ │ host + spectrogram art │ └────────────┬─────────────┘ │ LSB encode (last step) ▼ stego.wav (48 kHz) ``` The spectrogram art is mixed into the host **before** LSB encoding, so the LSB bits remain undisturbed. Decode simply reads the LSB channel. ### Automatic upsampling and spectral cleanup The spectrogram art uses the 18--23.5 kHz band, which requires a Nyquist frequency of at least 23.5 kHz (sample rate >= 47 kHz). If the input host is below 48 kHz, the encoder automatically upsamples it using `MD_resample()`. After upsampling, `MD_lowpass_brickwall()` is applied at the original Nyquist frequency to eliminate any residual spectral images from the resampler's transition band. This ensures the 18--23.5 kHz band is completely clean before the spectrogram text is mixed in, so the hidden message is clearly readable in any spectrogram viewer. The output is always 48 kHz. ### Fixed column width and capacity Each character in the bitmap font is 8 columns wide (5 data + 3 spacing). Each column occupies a fixed **30 ms** of audio, giving 240 ms per character: $$C_{\text{spectext}} = \left\lfloor \frac{D}{0.24} \right\rfloor \text{ characters}$$ where $D$ is the host signal duration in seconds. **Reading the formula in C:** ```c // D -> duration_sec, C_spectext -> vis_chars double duration_sec = (double)signal_len / sample_rate; unsigned vis_chars = (unsigned)(duration_sec / 0.24); ``` **Visual capacity by audio duration:** | Audio duration | Max visible chars | |:---------------|------------------:| | 3 s | 12 | | 10 s | 41 | | 30 s | 125 | | 60 s | 250 | ### Frequency mapping and amplitude The 7 rows of the 5x7 bitmap font are mapped linearly across the 18--23.5 kHz band. Row 0 (top of character) maps to 23.5 kHz; row 6 (bottom) maps to 18 kHz. The spectrogram text is generated at full amplitude by `MD_spectrogram_text()` (normalised to 0.9 peak), then scaled to **0.02** (~-34 dB) before mixing. This is loud enough to be clearly visible in a spectrogram but completely inaudible — most adults cannot hear above 18 kHz. ### Visual truncation If the message is longer than the visual capacity, the spectrogram art shows only the first N characters that fit. The full message is always recoverable via the LSB data channel, which has much higher capacity (~5.5 KB/sec at 48 kHz). For binary payloads, the spectrogram art shows `[BIN B]` as a label. ### Listening comparison **After spectext encoding** (same host, message "miniDSP" hidden via spectext): The ultrasonic tones at 18--23.5 kHz are far above the audible range. **Spectrogram** showing "miniDSP" rendered as text art in the ultrasonic band: The host audio — a [TIMIT](https://en.wikipedia.org/wiki/TIMIT) sentence ("Don't ask me to carry an oily rag like that.") — is visible at the bottom of the spectrogram. The text "miniDSP" is rendered in the 18--23.5 kHz band near the top, using the 5x7 bitmap font from `MD_spectrogram_text()`. ### Trade-offs | Advantage | Disadvantage | |:----------|:-------------| | Human-readable visual watermark | Lower visual capacity than LSB data capacity | | Machine-readable round-trip via LSB | Requires 48 kHz output (auto-upsampled) | | Visual tamper indicator | Destroyed by lossy compression (like LSB) | | Completely inaudible | Destroyed by low-pass filtering above 18 kHz | --- ## Embedding binary data The string-based API (`MD_steg_encode` / `MD_steg_decode`) uses null-terminated C strings, which cannot represent binary data containing `0x00` bytes. To hide arbitrary binary payloads — images, compressed archives, cryptographic keys — use the byte-oriented API: - `MD_steg_encode_bytes()` — accepts a raw byte buffer and length - `MD_steg_decode_bytes()` — returns raw bytes without null termination ### Visual demo **Space invader** — a tiny 110x80 RGB PNG (332 bytes) hidden inside a short 440 Hz sine using LSB. The recovered image is bit-identical to the original: **QR code** — a 165x165 grayscale PNG (486 bytes) encoding this repository's URL, doubly encoded: data → QR → audio: ### Minimum samples For LSB encoding, each data byte requires 8 samples, plus a 32-bit header: $$N_{\min} = 8L + 32$$ where $L$ is the data length in bytes. **Reading the formula in C:** ```c // L -> data_len, N_min -> min_samples unsigned min_samples = data_len * 8 + 32; ``` For a 332-byte PNG image: $N_{\min} = 8 \times 332 + 32 = 2688$ samples — well under a second at any common sample rate. ### Example: hiding an image in audio **Space invader** — a tiny 110x80 RGB PNG (332 bytes) makes a good test payload. It is small enough to embed in a fraction of a second of audio using LSB: ```sh # Encode the image (auto-generates a host signal) ./audio_steg --encode-image lsb space_invader.png -o steg_invader.wav # Decode to recover the image ./audio_steg --decode-image lsb steg_invader.wav -o recovered_invader.png # Verify cmp space_invader.png recovered_invader.png && echo "Identical" ``` **QR code** — a 165x165 1-bit grayscale PNG (486 bytes) containing a URL to this repository. This is a "double encoding": data encoded as a QR code, then the QR code hidden inside audio: ```sh ./audio_steg --encode-image lsb minidsp_qr.png -o steg_qr.wav ./audio_steg --decode-image lsb steg_qr.wav -o recovered_qr.png cmp minidsp_qr.png recovered_qr.png && echo "Identical" ``` **Quick example** — encode and decode a PNG in C: ```c #include "minidsp.h" #include // Read the image file FILE *fp = fopen("space_invader.png", "rb"); fseek(fp, 0, SEEK_END); unsigned len = (unsigned)ftell(fp); fseek(fp, 0, SEEK_SET); unsigned char *img = malloc(len); fread(img, 1, len, fp); fclose(fp); // Encode into audio unsigned N = len * 8 + 32 + 1024; // payload + header + margin double *host = malloc(N * sizeof(double)); double *stego = malloc(N * sizeof(double)); MD_sine_wave(host, N, 0.8, 440.0, 44100.0); MD_steg_encode_bytes(host, stego, N, 44100.0, img, len, MD_STEG_LSB); // Decode unsigned char *recovered = malloc(len); MD_steg_decode_bytes(stego, N, 44100.0, recovered, len, MD_STEG_LSB); // recovered[0..len-1] == img[0..len-1] ``` --- ## API ### Capacity ```c unsigned MD_steg_capacity(unsigned signal_len, double sample_rate, int method); ``` Returns the maximum number of message bytes that can be hidden. ### Encode ```c unsigned MD_steg_encode(const double *host, double *output, unsigned signal_len, double sample_rate, const char *message, int method); ``` | Parameter | Description | |:--------------|:------------| | `host` | Input host signal (not modified). | | `output` | Output stego signal (caller-allocated, same length). | | `signal_len` | Number of samples. | | `sample_rate` | Sample rate in Hz. | | `message` | Null-terminated secret message. | | `method` | `MD_STEG_LSB`, `MD_STEG_FREQ_BAND`, or `MD_STEG_SPECTEXT`. | Returns the number of message bytes encoded (0 on failure). ### Decode ```c unsigned MD_steg_decode(const double *stego, unsigned signal_len, double sample_rate, char *message_out, unsigned max_msg_len, int method); ``` | Parameter | Description | |:--------------|:------------| | `stego` | The stego signal containing the hidden message. | | `signal_len` | Number of samples. | | `sample_rate` | Sample rate in Hz. | | `message_out` | Output buffer (caller-allocated, null-terminated on return). | | `max_msg_len` | Size of buffer including null terminator. | | `method` | `MD_STEG_LSB`, `MD_STEG_FREQ_BAND`, or `MD_STEG_SPECTEXT`. | Returns the number of message bytes decoded (0 if none found). ### Encode bytes ```c unsigned MD_steg_encode_bytes(const double *host, double *output, unsigned signal_len, double sample_rate, const unsigned char *data, unsigned data_len, int method); ``` | Parameter | Description | |:--------------|:------------| | `host` | Input host signal (not modified). | | `output` | Output stego signal (caller-allocated, same length). | | `signal_len` | Number of samples. | | `sample_rate` | Sample rate in Hz. | | `data` | Pointer to the binary data to hide. | | `data_len` | Length of data in bytes. | | `method` | `MD_STEG_LSB`, `MD_STEG_FREQ_BAND`, or `MD_STEG_SPECTEXT`. | Returns the number of data bytes encoded (0 on failure). ### Decode bytes ```c unsigned MD_steg_decode_bytes(const double *stego, unsigned signal_len, double sample_rate, unsigned char *data_out, unsigned max_len, int method); ``` | Parameter | Description | |:--------------|:------------| | `stego` | The stego signal containing the hidden data. | | `signal_len` | Number of samples. | | `sample_rate` | Sample rate in Hz. | | `data_out` | Output buffer for the decoded bytes (caller-allocated). | | `max_len` | Maximum number of bytes to write to buffer. | | `method` | `MD_STEG_LSB`, `MD_STEG_FREQ_BAND`, or `MD_STEG_SPECTEXT`. | Returns the number of data bytes decoded (0 if none found). ### Detect ```c int MD_steg_detect(const double *signal, unsigned signal_len, double sample_rate, int *payload_type_out); ``` Inspects a signal and determines which steganography method (if any) was used. Returns `MD_STEG_LSB`, `MD_STEG_FREQ_BAND`, `MD_STEG_SPECTEXT`, or `-1` if no hidden payload is found. The optional `payload_type_out` receives `MD_STEG_TYPE_TEXT` (0) or `MD_STEG_TYPE_BINARY` (1). | Parameter | Description | |:-------------------|:------------| | `signal` | The signal to inspect. | | `signal_len` | Length of the signal in samples. | | `sample_rate` | Sample rate in Hz. | | `payload_type_out` | If non-null, receives the payload type flag. | **How it works:** The function probes the first 32 samples (LSB) or 32 BFSK chips (frequency-band) to extract the header. A header is considered valid when the decoded length is positive and fits the signal capacity. For BFSK, the average correlation must also exceed a minimum threshold to avoid false positives. If both methods claim a valid header, BFSK wins (harder to trigger by accident). **Quick example:** ```c int payload_type; int method = MD_steg_detect(signal, signal_len, 44100.0, &payload_type); if (method == MD_STEG_LSB) printf("LSB-encoded %s payload detected\n", payload_type == MD_STEG_TYPE_BINARY ? "binary" : "text"); ``` --- ## Quick example **Encode and decode with LSB:** ```c #include "minidsp.h" double host[44100], stego[44100]; MD_sine_wave(host, 44100, 0.8, 440.0, 44100.0); // Encode unsigned n = MD_steg_encode(host, stego, 44100, 44100.0, "secret message", MD_STEG_LSB); // Decode char recovered[256]; MD_steg_decode(stego, 44100, 44100.0, recovered, 256, MD_STEG_LSB); printf("Hidden: %s\n", recovered); // "secret message" ``` **Encode and decode with frequency band:** ```c double host[132300], stego[132300]; // 3 s at 44.1 kHz MD_sine_wave(host, 132300, 0.8, 440.0, 44100.0); unsigned n = MD_steg_encode(host, stego, 132300, 44100.0, "hidden!", MD_STEG_FREQ_BAND); char recovered[256]; MD_steg_decode(stego, 132300, 44100.0, recovered, 256, MD_STEG_FREQ_BAND); printf("Hidden: %s\n", recovered); // "hidden!" ``` **Encode and decode with spectrogram text (spectext):** ```c double host[132300]; // 3 s at 44.1 kHz MD_sine_wave(host, 132300, 0.8, 440.0, 44100.0); // Output at 48 kHz — compute required buffer size unsigned out_len = MD_resample_output_len(132300, 44100.0, 48000.0); double *stego = malloc(out_len * sizeof(double)); MD_steg_encode(host, stego, 132300, 44100.0, "miniDSP", MD_STEG_SPECTEXT); // Decode from the 48 kHz output char recovered[256]; MD_steg_decode(stego, out_len, 48000.0, recovered, 256, MD_STEG_SPECTEXT); printf("Hidden: %s\n", recovered); // "miniDSP" // View stego in a spectrogram to see "miniDSP" in the 18-23.5 kHz band free(stego); ``` --- ## Example program The tool `tools/audio_steg/audio_steg.c` provides a command-line program for encoding and decoding steganographic messages and binary data in WAV files. **Self-test** (no arguments): ```c static int self_test(void) { const double sr = 44100.0; const unsigned N = (unsigned)(sr * 3.0); /* 3 seconds */ const char *secret = "The quick brown fox jumps over the lazy dog."; double *host = malloc(N * sizeof(double)); double *stego = malloc(N * sizeof(double)); char recovered[256]; if (!host || !stego) { fprintf(stderr, "allocation failed\n"); free(stego); free(host); return 1; } MD_sine_wave(host, N, 0.8, 440.0, sr); int pass = 1; /* --- LSB test --- */ printf("=== LSB steganography ===\n"); printf(" Host: 3 s sine wave at 440 Hz, %.0f Hz sample rate\n", sr); printf(" Capacity: %u bytes\n", MD_steg_capacity(N, sr, MD_STEG_LSB)); printf(" Message (%zu bytes): \"%s\"\n", strlen(secret), secret); unsigned enc_lsb = MD_steg_encode(host, stego, N, sr, secret, MD_STEG_LSB); printf(" Encoded: %u bytes\n", enc_lsb); unsigned dec_lsb = MD_steg_decode(stego, N, sr, recovered, sizeof(recovered), MD_STEG_LSB); printf(" Decoded: %u bytes -> \"%s\"\n", dec_lsb, recovered); if (dec_lsb != enc_lsb || strcmp(recovered, secret) != 0) { printf(" LSB FAILED: decoded message does not match!\n"); pass = 0; } else { /* Compute distortion. */ double max_diff = 0.0; for (unsigned i = 0; i < N; i++) { double d = fabs(host[i] - stego[i]); if (d > max_diff) max_diff = d; } printf(" Max distortion: %.2e (%.1f dB)\n", max_diff, 20.0 * log10(max_diff + 1e-30)); printf(" LSB PASSED\n"); } /* --- Frequency-band test --- */ printf("\n=== Frequency-band steganography (BFSK) ===\n"); printf(" Host: 3 s sine wave at 440 Hz, %.0f Hz sample rate\n", sr); printf(" Capacity: %u bytes\n", MD_steg_capacity(N, sr, MD_STEG_FREQ_BAND)); printf(" Message (%zu bytes): \"%s\"\n", strlen(secret), secret); unsigned enc_freq = MD_steg_encode(host, stego, N, sr, secret, MD_STEG_FREQ_BAND); printf(" Encoded: %u bytes\n", enc_freq); unsigned dec_freq = MD_steg_decode(stego, N, sr, recovered, sizeof(recovered), MD_STEG_FREQ_BAND); printf(" Decoded: %u bytes -> \"%s\"\n", dec_freq, recovered); if (dec_freq != enc_freq || strcmp(recovered, secret) != 0) { printf(" Frequency-band FAILED: decoded message does not match!\n"); pass = 0; } else { printf(" Frequency-band PASSED\n"); } /* --- Spectext test (uses 48 kHz output) --- */ printf("\n=== Spectrogram text steganography (spectext) ===\n"); const char *spec_secret = "miniDSP"; printf(" Host: 3 s sine wave at 440 Hz, %.0f Hz sample rate\n", sr); printf(" Capacity: %u chars\n", MD_steg_capacity(N, sr, MD_STEG_SPECTEXT)); printf(" Message (%zu bytes): \"%s\"\n", strlen(spec_secret), spec_secret); /* Spectext may upsample to 48 kHz — allocate for larger output. */ unsigned spec_out_len = MD_resample_output_len(N, sr, 48000.0); double *stego_spec = malloc(spec_out_len * sizeof(double)); if (!stego_spec) { fprintf(stderr, "allocation failed\n"); free(stego); free(host); return 1; } unsigned enc_spec = MD_steg_encode(host, stego_spec, N, sr, spec_secret, MD_STEG_SPECTEXT); printf(" Encoded: %u bytes (output: %u samples at 48 kHz)\n", enc_spec, spec_out_len); memset(recovered, 0, sizeof(recovered)); unsigned dec_spec = MD_steg_decode(stego_spec, spec_out_len, 48000.0, recovered, sizeof(recovered), MD_STEG_SPECTEXT); printf(" Decoded: %u bytes -> \"%s\"\n", dec_spec, recovered); if (dec_spec != enc_spec || strcmp(recovered, spec_secret) != 0) { printf(" Spectext FAILED: decoded message does not match!\n"); pass = 0; } else { printf(" Spectext PASSED\n"); } free(stego_spec); if (pass) printf("\nSelf-test PASSED: all methods recovered the message.\n"); else printf("\nSelf-test FAILED.\n"); free(stego); free(host); return pass ? 0 : 1; } ``` **Encode a message into a WAV file:** ```c static int encode_wav(int method, const char *message, const char *infile, const char *outfile) { double *host = NULL; unsigned signal_len; unsigned samprate; if (infile) { if (read_wav_to_double(infile, &host, &signal_len, &samprate) != 0) return 1; } else { /* Generate a default host signal: 3 s sine at 44.1 kHz. */ samprate = 44100; signal_len = samprate * 3; host = malloc(signal_len * sizeof(double)); if (!host) return 1; MD_sine_wave(host, signal_len, 0.8, 440.0, (double)samprate); printf("No host file specified; using 3 s, 440 Hz sine at %u Hz\n", samprate); } if (method == MD_STEG_FREQ_BAND && samprate < 40000) { fprintf(stderr, "Frequency-band method requires sample rate >= 40 kHz " "(file is %u Hz).\n" "Use a 44.1 kHz or 48 kHz host, or use the LSB method.\n", samprate); free(host); return 1; } unsigned capacity = MD_steg_capacity(signal_len, (double)samprate, method); unsigned msg_len = (unsigned)strlen(message); printf("Method: %s | Capacity: %u bytes | Message: %u bytes\n", method_name(method), capacity, msg_len); if (msg_len > capacity) { fprintf(stderr, "Message too long (%u bytes) for host signal capacity (%u bytes).\n" "Use a longer host signal or a shorter message.\n", msg_len, capacity); free(host); return 1; } /* Spectext outputs at 48 kHz — allocate for the larger signal. */ unsigned out_len = signal_len; unsigned out_sr = samprate; if (method == MD_STEG_SPECTEXT) { if (samprate < 48000) { out_len = MD_resample_output_len(signal_len, (double)samprate, 48000.0); } out_sr = 48000; } double *stego = malloc(out_len * sizeof(double)); if (!stego) { free(host); return 1; } unsigned encoded = MD_steg_encode(host, stego, signal_len, (double)samprate, message, method); int ret = write_double_as_wav(outfile, stego, out_len, out_sr); if (ret == 0) printf("Encoded %u bytes -> %s (%u samples, %.3f s, %u Hz)\n", encoded, outfile, out_len, (double)out_len / (double)out_sr, out_sr); else fprintf(stderr, "Error writing %s\n", outfile); free(stego); free(host); return ret; } ``` **Decode a message from a WAV file:** ```c static int decode_wav(int method, const char *infile) { double *stego = NULL; unsigned signal_len; unsigned samprate; if (read_wav_to_double(infile, &stego, &signal_len, &samprate) != 0) return 1; printf("Read %s: %u samples at %u Hz (%.3f s)\n", infile, signal_len, samprate, (double)signal_len / (double)samprate); char message[4096]; unsigned decoded = MD_steg_decode(stego, signal_len, (double)samprate, message, sizeof(message), method); free(stego); if (decoded == 0) { printf("No hidden message found (method: %s).\n", method_name(method)); return 1; } printf("\nDecoded %u bytes (method: %s):\n \"%s\"\n", decoded, method_name(method), message); return 0; } ``` **Encode a binary file (e.g. image) into a WAV file:** ```c static int encode_image_wav(int method, const char *image_path, const char *infile, const char *outfile) { /* Read the image file into memory. */ FILE *fp = fopen(image_path, "rb"); if (!fp) { fprintf(stderr, "Cannot open image file: %s\n", image_path); return 1; } fseek(fp, 0, SEEK_END); long fsize = ftell(fp); fseek(fp, 0, SEEK_SET); if (fsize <= 0 || (unsigned long)fsize > UINT_MAX) { fprintf(stderr, "Invalid file size: %s\n", image_path); fclose(fp); return 1; } unsigned data_len = (unsigned)fsize; unsigned char *data = malloc(data_len); if (!data) { fclose(fp); return 1; } if (fread(data, 1, data_len, fp) != data_len) { fprintf(stderr, "Failed to read %s\n", image_path); free(data); fclose(fp); return 1; } fclose(fp); printf("Image: %s (%u bytes)\n", image_path, data_len); double *host = NULL; unsigned signal_len; unsigned samprate; if (infile) { if (read_wav_to_double(infile, &host, &signal_len, &samprate) != 0) { free(data); return 1; } } else { /* Generate a host signal sized for the payload. */ samprate = 44100; unsigned min_samples = data_len * 8 + HEADER_BITS; if (method == MD_STEG_FREQ_BAND) { unsigned cs = (unsigned)(3.0 * samprate / 1000.0); min_samples = (data_len * 8 + HEADER_BITS) * cs; } unsigned half_sec = samprate / 2; if (min_samples < half_sec) min_samples = half_sec; signal_len = min_samples + min_samples / 10; /* 10% margin */ host = malloc(signal_len * sizeof(double)); if (!host) { free(data); return 1; } MD_sine_wave(host, signal_len, 0.8, 440.0, (double)samprate); printf("Generated host: %u samples (%.3f s) at %u Hz\n", signal_len, (double)signal_len / (double)samprate, samprate); } if (method == MD_STEG_FREQ_BAND && samprate < 40000) { fprintf(stderr, "Frequency-band method requires sample rate >= 40 kHz " "(got %u Hz).\n", samprate); free(host); free(data); return 1; } unsigned capacity = MD_steg_capacity(signal_len, (double)samprate, method); printf("Method: %s | Capacity: %u bytes | Payload: %u bytes\n", method_name(method), capacity, data_len); if (data_len > capacity) { fprintf(stderr, "Payload too large (%u bytes) for host capacity (%u bytes).\n", data_len, capacity); free(host); free(data); return 1; } /* Spectext outputs at 48 kHz — allocate for the larger signal. */ unsigned out_len = signal_len; unsigned out_sr = samprate; if (method == MD_STEG_SPECTEXT) { if (samprate < 48000) { out_len = MD_resample_output_len(signal_len, (double)samprate, 48000.0); } out_sr = 48000; } double *stego = malloc(out_len * sizeof(double)); if (!stego) { free(host); free(data); return 1; } unsigned encoded = MD_steg_encode_bytes(host, stego, signal_len, (double)samprate, data, data_len, method); int ret = write_double_as_wav(outfile, stego, out_len, out_sr); if (ret == 0) printf("Encoded %u bytes -> %s (%u samples, %.3f s, %u Hz)\n", encoded, outfile, out_len, (double)out_len / (double)out_sr, out_sr); else fprintf(stderr, "Error writing %s\n", outfile); free(stego); free(host); free(data); return ret; } ``` **Decode a binary file from a WAV file:** ```c static int decode_image_wav(int method, const char *infile, const char *outfile) { double *stego = NULL; unsigned signal_len; unsigned samprate; if (read_wav_to_double(infile, &stego, &signal_len, &samprate) != 0) return 1; printf("Read %s: %u samples at %u Hz (%.3f s)\n", infile, signal_len, samprate, (double)signal_len / (double)samprate); /* Allocate a buffer large enough for the maximum capacity (capped at * 16 MB to avoid unbounded allocation from corrupted headers). */ unsigned capacity = MD_steg_capacity(signal_len, (double)samprate, method); if (capacity > 16u * 1024 * 1024) capacity = 16u * 1024 * 1024; unsigned char *buf = malloc(capacity > 0 ? capacity : 1); if (!buf) { free(stego); return 1; } unsigned decoded = MD_steg_decode_bytes(stego, signal_len, (double)samprate, buf, capacity, method); free(stego); if (decoded == 0) { printf("No hidden data found (method: %s).\n", method_name(method)); free(buf); return 1; } printf("Decoded %u bytes (method: %s)\n", decoded, method_name(method)); FILE *fp = fopen(outfile, "wb"); if (!fp) { fprintf(stderr, "Cannot open output file: %s\n", outfile); free(buf); return 1; } fwrite(buf, 1, decoded, fp); fclose(fp); free(buf); printf("Written to %s\n", outfile); return 0; } ``` **Auto-detect and decode** (no method needed): ```c static int auto_decode_wav(const char *infile) { double *stego = NULL; unsigned signal_len; unsigned samprate; if (read_wav_to_double(infile, &stego, &signal_len, &samprate) != 0) return 1; printf("Read %s: %u samples at %u Hz (%.3f s)\n", infile, signal_len, samprate, (double)signal_len / (double)samprate); int payload_type = -1; int method = MD_steg_detect(stego, signal_len, (double)samprate, &payload_type); if (method < 0) { printf("No hidden payload detected.\n"); free(stego); return 1; } printf("Detected: %s method, %s payload\n", method_name(method), payload_type == MD_STEG_TYPE_BINARY ? "binary" : "text"); if (payload_type == MD_STEG_TYPE_BINARY) { /* Probe the payload size without a full decode. */ unsigned capacity = MD_steg_capacity(signal_len, (double)samprate, method); if (capacity > 16u * 1024 * 1024) capacity = 16u * 1024 * 1024; unsigned char *buf = malloc(capacity > 0 ? capacity : 1); if (!buf) { free(stego); return 1; } unsigned decoded = MD_steg_decode_bytes(stego, signal_len, (double)samprate, buf, capacity, method); free(buf); free(stego); printf("Binary payload: %u bytes\n", decoded); printf("Use --decode-image %s %s -o OUTPUT to extract.\n", method_cli_name(method), infile); return 0; } /* Text payload — decode and print. */ char message[4096]; unsigned decoded = MD_steg_decode(stego, signal_len, (double)samprate, message, sizeof(message), method); free(stego); if (decoded == 0) { printf("Decode returned 0 bytes.\n"); return 1; } printf("\nDecoded %u bytes:\n \"%s\"\n", decoded, message); return 0; } ``` **Usage:** ```sh # Self-test (encode + decode with both methods, verify round-trip) ./audio_steg # Encode a text message using LSB into a default 440 Hz host ./audio_steg --encode lsb "my secret" -o stego.wav # Encode using frequency-band into an existing WAV host ./audio_steg --encode freq "hidden" -i music.wav -o stego.wav # Auto-detect and decode (just pass the file) ./audio_steg stego.wav # Decode with explicit method ./audio_steg --decode lsb stego.wav ./audio_steg --decode freq stego.wav # Decode without specifying method (auto-detect) ./audio_steg --decode stego.wav # Encode using spectext (hybrid LSB + spectrogram art) ./audio_steg --encode spectext "miniDSP" -i music.wav -o stego.wav # Encode a binary file (image) using LSB ./audio_steg --encode-image lsb space_invader.png -o steg_invader.wav # Decode a binary file (auto-detect method) ./audio_steg --decode-image steg_invader.wav -o recovered.png ``` --- ## Choosing a method | Criterion | LSB | Frequency-band | Spectext | |:----------|:----|:---------------|:---------| | **Message size** | Up to ~16 KB/s of audio | Up to ~121 B per 3 s | ~4 chars/s (visual); LSB capacity for data | | **Audio quality** | Imperceptible (-90 dB) | Near-imperceptible (-34 dB) | Imperceptible (ultrasonic, -34 dB) | | **Survives lossy compression** | No | No (but tolerates noise) | No | | **Survives additive noise** | No (bit errors) | Yes (mild noise) | No (LSB channel) | | **Sample rate requirement** | Any | >= 40 kHz | Output always 48 kHz | | **Visual verification** | No | No (spectrogram shows carriers) | Yes — text readable in spectrogram | | **Best for** | Lossless pipelines (WAV/FLAC) | Light interference environments | Visual watermarking + machine decode | For maximum capacity and fidelity in lossless pipelines, use **LSB**. For slightly more robust hiding in near-ultrasonic bands, use **frequency-band**. For a human-readable visual watermark with machine-readable data recovery, use **spectext**. --- # Sample Rate Conversion [Sample rate conversion](https://en.wikipedia.org/wiki/Sample-rate_conversion) (resampling) changes a signal from one sample rate to another. This is essential when combining audio from different sources, preparing data for systems that expect a specific rate, or reducing storage by downsampling. miniDSP provides a high-quality offline [polyphase](https://en.wikipedia.org/wiki/Polyphase_filter) sinc resampler that handles arbitrary rate ratios (e.g., 44100 Hz to 48000 Hz) with >100 dB stopband attenuation using default parameters. --- ## Mathematical building blocks ### Bessel I₀ The zeroth-order modified Bessel function of the first kind appears in the [Kaiser window](https://en.wikipedia.org/wiki/Kaiser_window) formula. It is computed via the convergent power series: $$I_0(x) = \sum_{k=0}^{\infty} \left[\frac{(x/2)^k}{k!}\right]^2$$ **Reading the formula in C:** ```c // x -> input value, sum -> I₀(x), term -> (x/2)^k / k! double sum = 1.0; double term = 1.0; double half_x = x / 2.0; for (unsigned k = 1; k < 300; k++) { term *= (half_x / (double)k); double term_sq = term * term; sum += term_sq; if (term_sq < 1e-15 * sum) break; // converged } // sum now holds I₀(x) ``` **API:** ```c double MD_bessel_i0(double x); ``` ### Normalized sinc The [sinc function](https://en.wikipedia.org/wiki/Sinc_function) is the ideal lowpass interpolation kernel. The normalized form is: $$\mathrm{sinc}(x) = \begin{cases} 1 & \text{if } |x| < 10^{-12} \\ \dfrac{\sin(\pi x)}{\pi x} & \text{otherwise} \end{cases}$$ **Reading the formula in C:** ```c // x -> input value, result -> sinc(x) double result; if (fabs(x) < 1e-12) { result = 1.0; } else { double px = M_PI * x; result = sin(px) / px; } ``` **API:** ```c double MD_sinc(double x); ``` --- ## The polyphase sinc resampler ### How it works The resampler converts a signal from sample rate $f_{\mathrm{in}}$ to $f_{\mathrm{out}}$ by treating each output sample as a fractional-position lookup into the input signal, filtered through a windowed sinc kernel. For each output sample $n$: 1. Compute the fractional input position: $p = n \cdot f_{\mathrm{in}} / f_{\mathrm{out}}$ 2. Split into integer index $\lfloor p \rfloor$ and fractional offset 3. Select two adjacent filter sub-phases from a precomputed table 4. Linearly interpolate the sub-phase coefficients 5. Dot product with surrounding input samples The filter table contains 512 sub-phases, each with $2 \times \mathrm{num\_zero\_crossings}$ taps of a Kaiser-windowed sinc. Anti-aliasing is handled automatically: for downsampling, the sinc cutoff is scaled to $\min(f_{\mathrm{in}}, f_{\mathrm{out}})/2$. ### Output buffer sizing $$N_{\mathrm{out}} = \left\lceil N_{\mathrm{in}} \cdot \frac{f_{\mathrm{out}}}{f_{\mathrm{in}}} \right\rceil$$ **Reading the formula in C:** ```c // input_len -> N_in, in_rate -> f_in, out_rate -> f_out unsigned out_len = (unsigned)ceil((double)input_len * out_rate / in_rate); ``` **API:** ```c unsigned MD_resample_output_len(unsigned input_len, double in_rate, double out_rate); ``` ### Resampling a signal **API:** ```c unsigned MD_resample(const double *input, unsigned input_len, double *output, unsigned max_output_len, double in_rate, double out_rate, unsigned num_zero_crossings, double kaiser_beta); ``` **Parameters:** - `num_zero_crossings` controls filter quality. Range: 8 (fast) to 64 (high quality). Default recommendation: 32. - `kaiser_beta` controls stopband attenuation. Recommendation: 10.0 for ~100 dB. - Returns the number of samples written to `output`. **Quick example:** ```c const double out_rate = 48000.0; unsigned N_out = MD_resample_output_len(N_in, in_rate, out_rate); double *output = malloc(N_out * sizeof(double)); unsigned n_written = MD_resample(input, N_in, output, N_out, in_rate, out_rate, 32, 10.0); ``` --- ## Common rate pairs | Conversion | Ratio | Use case | |---|---|---| | 44100 to 48000 | 160/147 | CD audio to professional video/DAW | | 48000 to 44100 | 147/160 | Professional to CD quality | | 48000 to 16000 | 1/3 | Wideband to narrowband speech | | 44100 to 22050 | 1/2 | Half-rate for reduced storage | | 16000 to 8000 | 1/2 | Wideband to telephone bandwidth | --- ## Further reading - [Sample-rate conversion](https://en.wikipedia.org/wiki/Sample-rate_conversion) -- Wikipedia - [Sinc filter](https://en.wikipedia.org/wiki/Sinc_filter) -- Wikipedia - [Polyphase filter](https://en.wikipedia.org/wiki/Polyphase_filter) -- Wikipedia - [Kaiser window](https://en.wikipedia.org/wiki/Kaiser_window) -- Wikipedia ## API reference - MD_bessel_i0() -- zeroth-order modified Bessel function I₀ - MD_sinc() -- normalized sinc function - MD_resample_output_len() -- compute required output buffer size - MD_resample() -- polyphase sinc resampler --- # Voice Activity Detection This tutorial builds a [voice activity detector](https://en.wikipedia.org/wiki/Voice_activity_detection) (VAD) that combines five normalized audio features into a weighted score, applies a threshold, and smooths the binary decision with onset gating and hangover. VAD is a fundamental preprocessing step for streaming audio, [ASR](https://en.wikipedia.org/wiki/Speech_recognition) pipelines, and noise reduction. All features are computed from existing miniDSP primitives. The detector adapts to different microphones, gain settings, and noise floors without manual tuning. Build and run the example from the repository root: ```sh make -C examples vad cd examples && ./vad open vad_plot.html ``` --- ## Energy Frame energy measures overall loudness — the simplest voice activity cue. Speech frames carry more energy than silence or low-level background noise. $$E = \sum_{n=0}^{N-1} x[n]^2$$ **Reading the formula in C:** ```c // x[n] -> signal[n], N -> frame length double energy = 0.0; for (unsigned n = 0; n < N; n++) { energy += signal[n] * signal[n]; // x[n]^2 } ``` **Intuition:** silence has near-zero energy; speech has high energy. Energy alone fails in moderate noise because the noise floor raises the baseline. --- ## Zero-crossing rate The [zero-crossing rate](https://en.wikipedia.org/wiki/Zero-crossing_rate) counts how often the signal changes sign, normalized by the number of adjacent pairs: $$\mathrm{ZCR} = \frac{1}{N-1}\sum_{n=1}^{N-1} \mathbf{1}\!\bigl[\mathrm{sgn}(x[n]) \ne \mathrm{sgn}(x[n-1])\bigr]$$ **Reading the formula in C:** ```c // x[n] -> signal[n], N -> frame length unsigned crossings = 0; for (unsigned n = 1; n < N; n++) { // 1[sgn(x[n]) != sgn(x[n-1])] if ((signal[n] >= 0.0) != (signal[n - 1] >= 0.0)) crossings++; } double zcr = (double)crossings / (double)(N - 1); // normalize to [0, 1] ``` **Intuition:** voiced speech has a low ZCR (periodic, low-frequency fundamental); unvoiced fricatives have high ZCR; silence has low ZCR. ZCR helps distinguish voiced speech from broadband noise. --- ## Spectral entropy [Spectral entropy](https://en.wikipedia.org/wiki/Spectral_density#Spectral_entropy) measures how uniformly energy is spread across frequency bins. Low entropy means energy is concentrated (tonal); high entropy means energy is diffuse (noise-like). $$H = -\frac{1}{\ln(K)} \sum_{k=0}^{K-1} p_k \ln(p_k), \qquad p_k = \frac{\mathrm{PSD}[k]}{\sum_{j=0}^{K-1} \mathrm{PSD}[j]}$$ where $K = N/2 + 1$ is the number of one-sided PSD bins. **Reading the formula in C:** ```c // PSD[k] -> psd[k], K -> num_bins double total = 0.0; for (unsigned k = 0; k < num_bins; k++) total += psd[k]; // denominator of p_k double entropy = 0.0; for (unsigned k = 0; k < num_bins; k++) { double p_k = psd[k] / total; // p_k = PSD[k] / sum(PSD) if (p_k > 0.0) entropy -= p_k * log(p_k); // -sum(p_k * ln(p_k)) } entropy /= log((double)num_bins); // normalize by ln(K) -> [0, 1] ``` **Intuition:** speech has lower spectral entropy (energy concentrated at harmonics); noise has higher spectral entropy (energy spread uniformly). --- ## Spectral flatness [Spectral flatness](https://en.wikipedia.org/wiki/Spectral_flatness) (also called the Wiener entropy) is the ratio of the geometric mean to the arithmetic mean of PSD bins: $$\mathrm{SF} = \frac{\left(\prod_{k=0}^{K-1} \mathrm{PSD}[k]\right)^{1/K}} {\frac{1}{K}\sum_{k=0}^{K-1} \mathrm{PSD}[k]}$$ **Reading the formula in C:** ```c // PSD[k] -> psd[k], K -> num_bins // Compute in log domain to avoid overflow in the product double log_sum = 0.0; double arith_sum = 0.0; for (unsigned k = 0; k < num_bins; k++) { double val = psd[k] > 0.0 ? psd[k] : 1e-30; log_sum += log(val); // sum of ln(PSD[k]) arith_sum += psd[k]; // sum of PSD[k] } double log_geo_mean = log_sum / (double)num_bins; // (1/K) * sum(ln(PSD[k])) double geo_mean = exp(log_geo_mean); // geometric mean double arith_mean = arith_sum / (double)num_bins; // arithmetic mean double flatness = geo_mean / arith_mean; // SF in [0, 1] ``` **Intuition:** SF = 1.0 for white noise (perfectly flat spectrum); SF approaches 0 for a pure tone (all energy in one bin). Speech falls between these extremes, with lower flatness than background noise. --- ## Band energy ratio The band energy ratio measures the concentration of energy in the speech band (typically 300–3400 Hz, the [telephone bandwidth](https://en.wikipedia.org/wiki/Voice_frequency)): $$\mathrm{BER} = \frac{\sum_{k : f_k \in [f_{\mathrm{lo}},\, f_{\mathrm{hi}}]} \mathrm{PSD}[k]} {\sum_{k=0}^{K-1} \mathrm{PSD}[k]}$$ where $f_k = k \cdot f_s / N$ is the frequency of bin $k$. **Reading the formula in C:** ```c // PSD[k] -> psd[k], K -> num_bins // f_lo -> band_low_hz, f_hi -> band_high_hz double freq_per_bin = sample_rate / (double)N; // f_s / N double total = 0.0; double band = 0.0; for (unsigned k = 0; k < num_bins; k++) { double freq = k * freq_per_bin; // f_k total += psd[k]; if (freq >= band_low_hz && freq <= band_high_hz) band += psd[k]; // sum PSD in speech band } double ber = band / total; // BER in [0, 1] ``` **Intuition:** speech concentrates energy in the 300–3400 Hz band. Background noise typically has a flatter distribution, yielding a lower BER. --- ## Adaptive normalization Raw feature values vary by orders of magnitude across microphones and gain settings. The VAD normalizes each feature to [0, 1] using per-feature min/max estimates tracked via [exponential moving average](https://en.wikipedia.org/wiki/Moving_average#Exponential_moving_average) (EMA): $$m_i \leftarrow m_i + \alpha \cdot (f_i - m_i), \quad M_i \leftarrow M_i + \alpha \cdot (f_i - M_i)$$ where $m_i$ and $M_i$ are the running minimum and maximum for feature $i$, $f_i$ is the current raw value, and $\alpha$ is the adaptation rate. The normalized feature is: $$\hat{f}_i = \frac{f_i - m_i}{M_i - m_i}$$ clamped to [0, 1]. **Reading the formula in C:** ```c // alpha -> adaptation_rate, f_i -> raw[i] // m_i -> feat_min[i], M_i -> feat_max[i] // Update min/max via EMA (only when raw value exceeds current bound) if (raw[i] < feat_min[i]) feat_min[i] = feat_min[i] + alpha * (raw[i] - feat_min[i]); if (raw[i] > feat_max[i]) feat_max[i] = feat_max[i] + alpha * (raw[i] - feat_max[i]); // Normalize to [0, 1] double range = feat_max[i] - feat_min[i]; if (range < 1e-12) range = 1e-12; // prevent division by zero double norm = (raw[i] - feat_min[i]) / range; if (norm < 0.0) norm = 0.0; if (norm > 1.0) norm = 1.0; ``` --- ## Weighted scoring The five normalized features are combined into a single score via weighted sum: $$S = \sum_{i=0}^{4} w_i \cdot \hat{f}_i$$ where $w_i$ are caller-tunable weights (default: 0.2 each for equal weighting). **Reading the formula in C:** ```c // w_i -> weights[i], f_hat_i -> norm[i] double score = 0.0; for (int i = 0; i < 5; i++) { score += weights[i] * norm[i]; // S = sum(w_i * f_hat_i) } ``` **Intuition:** equal weights give each feature equal vote. Setting one weight to 1.0 and the rest to 0.0 reduces the detector to a single-feature VAD. --- ## State machine The detector uses an onset + hangover mechanism to smooth the binary decision: | State | Condition | Action | |:------------|:---------------------------------------|:---------------------------------| | **SILENCE** | score >= threshold | Increment onset counter | | **SILENCE** | onset counter >= onset_frames | Transition to SPEECH | | **SILENCE** | score < threshold | Reset onset counter | | **SPEECH** | score >= threshold | Reset hangover to hangover_frames| | **SPEECH** | score < threshold | Decrement hangover counter | | **SPEECH** | hangover counter == 0 | Transition to SILENCE | **Onset gating** prevents transient clicks from triggering false positives — the score must exceed the threshold for `onset_frames` consecutive frames before the detector declares speech. **Hangover** bridges brief dips mid-utterance — after the score drops below threshold, the speech decision holds for `hangover_frames` additional frames. --- ## Visualization The interactive plot below shows the VAD processing a synthesized signal with two speech segments (sine bursts at 1000 Hz) separated by silence: The four panels show: 1. **Waveform** — peak envelope per frame, showing speech and silence regions. 2. **Normalized features** — all five features mapped to [0, 1], showing how they respond differently to speech vs. silence. 3. **Combined score** — the weighted sum of features compared against the threshold (dashed red line). 4. **Decision** — the final binary output after onset gating and hangover smoothing. --- ## API summary **API:** - `MD_vad_default_params(MD_vad_params *params)` — populate with sensible defaults. - `MD_vad_init(MD_vad_state *state, const MD_vad_params *params)` — initialize state (NULL params for defaults). - `MD_vad_calibrate(MD_vad_state *state, const double *signal, unsigned N, double sample_rate)` — feed known-silence frames to seed normalization. - `MD_vad_process_frame(MD_vad_state *state, const double *signal, unsigned N, double sample_rate, double *score_out, double *features_out)` — process one frame, return 0 (silence) or 1 (speech). **Quick example** — initialize and process: ```c MD_vad_params params; MD_vad_default_params(¶ms); params.threshold = 0.3; MD_vad_state state; MD_vad_init(&state, ¶ms); ``` ```c FILE *csv = fopen("vad.csv", "w"); fprintf(csv, "frame,time,decision,score,energy,zcr," "spectral_entropy,spectral_flatness,band_energy_ratio\n"); for (unsigned i = 0; i < num_frames; i++) { double *frame = signal + i * FRAME_SIZE; double score; double features[MD_VAD_NUM_FEATURES]; int decision = MD_vad_process_frame(&state, frame, FRAME_SIZE, SAMPLE_RATE, &score, features); double t = (double)(i * FRAME_SIZE) / SAMPLE_RATE; times[i] = t; decisions[i] = decision; scores[i] = score; /* Peak absolute amplitude in this frame */ double peak = 0.0; for (unsigned s = 0; s < FRAME_SIZE; s++) { double a = fabs(frame[s]); if (a > peak) peak = a; } peak_env[i] = peak; for (int f = 0; f < MD_VAD_NUM_FEATURES; f++) feat_matrix[i * MD_VAD_NUM_FEATURES + f] = features[f]; fprintf(csv, "%u,%.4f,%d,%.6f,%.6f,%.6f,%.6f,%.6f,%.6f\n", i, t, decision, score, features[0], features[1], features[2], features[3], features[4]); } fclose(csv); printf(" CSV: vad.csv (%u frames)\n", num_frames); ``` **Calibration** — optional, improves accuracy if silence frames are available: ```c unsigned cal_frames = 10; for (unsigned i = 0; i < cal_frames; i++) { double *frame = signal + i * FRAME_SIZE; MD_vad_calibrate(&state, frame, FRAME_SIZE, SAMPLE_RATE); } ``` **Custom weights** — use a single feature: ```c MD_vad_params energy_only_params; MD_vad_default_params(&energy_only_params); for (int i = 0; i < MD_VAD_NUM_FEATURES; i++) energy_only_params.weights[i] = 0.0; energy_only_params.weights[MD_VAD_FEAT_ENERGY] = 1.0; MD_vad_state energy_only_state; MD_vad_init(&energy_only_state, &energy_only_params); ``` --- # Error Handling miniDSP uses a configurable error handler to report precondition violations. The library **never** calls `abort()` or terminates the host process. When a function detects bad input (NULL pointer, invalid size, out-of-range parameter), it: 1. Calls the active error handler with an error code, function name, and human-readable message. 2. Returns a **safe default** value appropriate to the function's return type. ## Error codes | Code | Meaning | |------|---------| | `MD_ERR_NULL_POINTER` | A required pointer argument is NULL | | `MD_ERR_INVALID_SIZE` | A size or count argument is invalid (e.g. N == 0) | | `MD_ERR_INVALID_RANGE` | A range or bound is invalid (e.g. min >= max) | | `MD_ERR_ALLOC_FAILED` | A memory allocation failed | ## Default behavior If no custom handler is installed, errors are printed to `stderr`: ``` miniDSP: error 1 in MD_energy(): a is NULL ``` The function then returns a safe default: - `void` functions: early return (no-op) - `double`-returning functions: `0.0` - `unsigned`-returning functions: `0` - `int`-returning functions: `-1` ## Installing a custom handler Use MD_set_error_handler() to install your own handler. The handler receives the error code, function name, and a descriptive message. ```c static void my_handler(MD_ErrorCode code, const char *fn, const char *msg) { app_log(LOG_WARN, "miniDSP error %d in %s: %s", code, fn, msg); } int main(void) { MD_set_error_handler(my_handler); /* ... use miniDSP ... */ } ``` Pass `NULL` to restore the default stderr handler: ```c MD_set_error_handler(NULL); /* back to default */ ``` ## Thread safety MD_set_error_handler() must be called **once, before any other `MD_*` function**. It must not be called concurrently with any other miniDSP function. This is the same threading contract as FFTW's initialization functions. ## Sentinel returns vs errors Some functions return sentinel values for valid-but-unresolved runtime outcomes. For example, MD_f0_autocorrelation() returns `0.0` when no pitch peak is found. These are **not errors** and do not trigger the error handler. The error handler fires only for **precondition violations** (bad arguments that represent programmer mistakes). --- # mel_viz -- Mel-Spectrum Audio Visualizer **mel_viz** turns any WAV file into a browser-based radial animation driven by mel-spectrum analysis. Concentric rings pulse, wobble, and glow in response to the audio — bass hits zoom the whole canvas. \image html mel-viz-screenshot.png "mel_viz running on a bass-heavy track (Plasma palette)" width=600px ## Visualize your own audio Follow these steps to create a visualization from any WAV file. **1. Build mel_viz** From the repository root: ```sh make tools ``` This compiles `tools/mel_viz/mel_viz` (requires FFTW3 and libsndfile). **2. Run mel_viz on your WAV file** ```sh ./tools/mel_viz/mel_viz your_song.wav -o /tmp/viz ``` Replace `your_song.wav` with the path to your audio file. The tool reads the WAV, computes per-frame mel energies, and writes a self-contained visualization folder to the output directory. Common options: | Flag | Default | What it does | |------|---------|--------------| | `-o ` | `mel_viz_out/` | Output directory | | `--mels ` | 24 | Number of mel bands | | `--fft-size ` | 2048 | FFT window size | | `--fps ` | 30 | Frames per second | | `--min-freq ` | 40 | Low frequency bound | | `--max-freq ` | 16000 | High frequency bound | **3. Open in a browser** The output folder needs to be served over HTTP (browsers block local `file://` audio playback). The easiest way: ```sh cd /tmp/viz python3 -m http.server 8000 ``` Then open in your browser and press play. **4. Tweak the visual controls** The side panel on the left has real-time knobs — no recomputation needed: - **Palette** — Plasma, Ocean, Fire, or Neon color themes - **Rings** — 8, 12, or 24 concentric ring groups - **Smoothing** — Temporal smoothing from snappy to flowing - **Bass** — How much kick/bass hits zoom the entire canvas - **Wobble** — Ring edge deformation amount - **Glow** — Neon bloom intensity around each ring Experiment with different palettes and cranking up the glow and wobble for a more psychedelic look, or dial them down for a clean, minimal animation. **5. Export as video** Want to share your visualization? Click **Export MP4** in the side panel. The browser renders every frame offline and muxes it into an MP4 file (Chrome or Edge required — uses the WebCodecs API). ## Live mic mode The web renderer also works standalone without the C program — it captures microphone audio and visualizes it in real time. ```sh cd tools/mel_viz python3 -m http.server 8000 # open http://localhost:8000/web/ ``` Click **Mic**, grant microphone access, and the visualization responds to whatever you play or say. ## Architecture ``` File mode: WAV file --> mel_viz (C) --> data.js + web assets --> Browser (Canvas 2D) | | | MD_mel_energies() audio player sync | (per-frame mel analysis) + radial renderer Mic mode: Microphone --> Web Audio API --> JS mel weighting --> Same renderer getUserMedia() AnalyserNode ``` The C program reads audio via libsndfile, computes per-frame mel energies using `MD_mel_energies()`, and writes the data as a JavaScript file. The browser renders concentric rings whose size, color, glow, and wobble are driven by the mel band energies. For the full list of CLI flags, see the [mel_viz README](https://github.com/wooters/miniDSP/blob/main/tools/mel_viz/README.md). --- # resample -- Sample Rate Converter **resample** converts a mono audio file from one sample rate to another using the miniDSP polyphase sinc resampler. It reads any format supported by libsndfile (WAV, FLAC, AIFF, OGG, etc.) and writes WAV output (IEEE float). ## Build From the repository root: ```sh make tools ``` This compiles `tools/resample/resample` (requires FFTW3 and libsndfile). ## Usage ``` resample [-z N] [-b F] ``` | Argument | Description | |----------|-------------| | `` | Input audio file (must be mono) | | `` | Target sample rate in Hz (positive integer) | | `` | Output file path (WAV format only) | ## Options | Flag | Default | What it does | |------|---------|--------------| | `-z N` | 32 | Number of sinc zero-crossings per side. More zero-crossings produce a sharper cutoff and better stopband rejection at the cost of speed. | | `-b F` | 10.0 | Kaiser window beta parameter. Higher values widen the mainlobe but deepen stopband attenuation (10.0 gives >100 dB rejection). | The defaults work well for most use cases. Bump them up (e.g. `-z 64 -b 14.0`) for mastering-quality conversion where you want even cleaner anti-aliasing. ## Examples **Upsample from 16 kHz to 48 kHz:** ```sh ./tools/resample/resample speech.wav 48000 speech_48k.wav ``` **Downsample to 8 kHz (telephone quality):** ```sh ./tools/resample/resample recording.wav 8000 recording_8k.wav ``` The resampler automatically applies an anti-aliasing lowpass filter when downsampling — no separate filtering step is needed. **High-quality conversion with custom parameters:** ```sh ./tools/resample/resample -z 64 -b 14.0 master.wav 96000 master_96k.wav ``` ## How it works ``` WAV/FLAC/... --> FIO_read_audio() --> float* | float-to-double conversion | MD_resample() (polyphase sinc) | double-to-float conversion | FIO_write_wav() --> output.wav ``` The tool reads the input file via libsndfile, converts the samples to double precision for the DSP pipeline, resamples using `MD_resample()`, converts back to float, and writes the output as IEEE float WAV. Buffers are freed eagerly after each stage to keep peak memory at roughly 2x the larger of the input or output signal. **Same-rate detection:** if the input is already at the target rate, the tool copies the audio directly and prints a note — no resampling overhead. **Constraints:** - Input must be mono (single-channel). Use `sox` to split multi-channel files. - Output is always WAV format. A warning is printed if the output filename does not end in `.wav`. ## The resampling core The core algorithm is a 512-phase polyphase sinc interpolation filter with Kaiser windowing. For a detailed explanation of the math, including the Bessel function, normalised sinc, and filter design, see the resampling "Sample Rate Conversion" tutorial. ```c /* Resample */ unsigned out_len = MD_resample_output_len( (unsigned)in_len, (double)in_rate, (double)target_rate); double *out_double = malloc(out_len * sizeof(double)); if (out_double == NULL) { fprintf(stderr, "Error: failed to allocate memory\n"); free(in_double); return 1; } MD_resample(in_double, (unsigned)in_len, out_double, out_len, (double)in_rate, (double)target_rate, zero_crossings, kaiser_beta); ``` ---