# pyminidsp

> Python bindings to the miniDSP C library — a comprehensive DSP toolkit providing signal generation, spectral analysis, filtering, effects, and more. All functions accept and return NumPy arrays.

Source: https://github.com/wooters/pyminidsp
Documentation: https://wooters.github.io/pyminidsp/

# API Reference

## Signal measurement, analysis, and scaling functions


### `bessel_i0(x: 'float') -> 'float'`

Compute the zeroth-order modified Bessel function of the first kind.

---


### `sinc(x: 'float') -> 'float'`

Compute the normalized sinc function: sin(pi*x) / (pi*x), with sinc(0) = 1.

---


### `dot(a: 'npt.ArrayLike', b: 'npt.ArrayLike') -> 'float'`

Compute the dot product of two vectors.

---


### `entropy(a: 'npt.ArrayLike', clip: 'bool' = False) -> 'float'`

Compute the normalized entropy of a distribution.

---


### `energy(a: 'npt.ArrayLike') -> 'float'`

Compute signal energy: sum of squared samples.

---


### `power(a: 'npt.ArrayLike') -> 'float'`

Compute signal power: energy / N.

---


### `power_db(a: 'npt.ArrayLike') -> 'float'`

Compute signal power in decibels.

---


### `rms(a: 'npt.ArrayLike') -> 'float'`

Compute the root mean square (RMS) of a signal.

---


### `zero_crossing_rate(a: 'npt.ArrayLike') -> 'float'`

Compute the zero-crossing rate of a signal.

---


### `autocorrelation(a: 'npt.ArrayLike', max_lag: 'int') -> 'npt.NDArray[np.float64]'`

Compute the normalised autocorrelation of a signal.

Args:
    a: Input signal.
    max_lag: Number of lag values to compute.

Returns:
    numpy array of autocorrelation values, length max_lag.

---


### `peak_detect(a: 'npt.ArrayLike', threshold: 'float' = 0.0, min_distance: 'int' = 1) -> 'npt.NDArray[np.uint32]'`

Detect peaks (local maxima) in a signal.

Args:
    a: Input signal.
    threshold: Minimum value for a peak.
    min_distance: Minimum index gap between peaks.

Returns:
    numpy array of peak indices.

---


### `f0_autocorrelation(signal: 'npt.ArrayLike', sample_rate: 'float', min_freq_hz: 'float' = 80.0, max_freq_hz: 'float' = 400.0) -> 'float'`

Estimate F0 using autocorrelation.

---


### `f0_fft(signal: 'npt.ArrayLike', sample_rate: 'float', min_freq_hz: 'float' = 80.0, max_freq_hz: 'float' = 400.0) -> 'float'`

Estimate F0 using FFT peak picking.

---


### `mix(a: 'npt.ArrayLike', b: 'npt.ArrayLike', w_a: 'float' = 0.5, w_b: 'float' = 0.5) -> 'npt.NDArray[np.float64]'`

Mix (weighted sum) two signals.

Args:
    a, b: Input signals of the same length.
    w_a, w_b: Weights for signals a and b.

Returns:
    numpy array of the mixed signal.

---


### `scale(value: 'float', oldmin: 'float', oldmax: 'float', newmin: 'float', newmax: 'float') -> 'float'`

Map a single value from one range to another.

---


### `scale_vec(a: 'npt.ArrayLike', oldmin: 'float', oldmax: 'float', newmin: 'float', newmax: 'float') -> 'npt.NDArray[np.float64]'`

Map every element of a vector from one range to another.

---


### `fit_within_range(a: 'npt.ArrayLike', newmin: 'float', newmax: 'float') -> 'npt.NDArray[np.float64]'`

Fit values within [newmin, newmax].

---


### `adjust_dblevel(signal: 'npt.ArrayLike', dblevel: 'float') -> 'npt.NDArray[np.float64]'`

Automatic Gain Control: scale signal to target dB level, clip to [-1, 1].

---


---

## DTMF tone generation and detection


### `dtmf_detect(signal: 'npt.ArrayLike', sample_rate: 'float' = 8000.0, max_tones: 'int' = 64) -> 'list[tuple[str, float, float]]'`

Detect DTMF tones in an audio signal.

Args:
    signal: Audio samples.
    sample_rate: Sampling rate in Hz.
    max_tones: Maximum number of tones to detect.

Returns:
    List of (digit, start_s, end_s) tuples.

---


### `dtmf_generate(digits: 'str', sample_rate: 'float' = 8000.0, tone_ms: 'int' = 70, pause_ms: 'int' = 70) -> 'npt.NDArray[np.float64]'`

Generate a DTMF tone sequence.

Args:
    digits: String of DTMF characters ('0'-'9', 'A'-'D', '*', '#').
    sample_rate: Sampling rate in Hz.
    tone_ms: Duration of each tone in ms (>= 40).
    pause_ms: Duration of silence between tones in ms (>= 40).

Returns:
    numpy array of audio samples.

---


### `dtmf_signal_length(num_digits: 'int', sample_rate: 'float' = 8000.0, tone_ms: 'int' = 70, pause_ms: 'int' = 70) -> 'int'`

Calculate the number of samples needed for dtmf_generate().

---


---

## Simple audio effects: delay/echo, tremolo, comb reverb


### `delay_echo(signal: 'npt.ArrayLike', delay_samples: 'int', feedback: 'float' = 0.5, dry: 'float' = 1.0, wet: 'float' = 0.5) -> 'npt.NDArray[np.float64]'`

Apply a delay/echo effect.

Args:
    signal: Input signal.
    delay_samples: Delay length in samples.
    feedback: Echo feedback gain (abs(feedback) < 1).
    dry: Dry mix weight.
    wet: Wet mix weight.

Returns:
    numpy array of the processed signal.

---


### `tremolo(signal: 'npt.ArrayLike', rate_hz: 'float', depth: 'float' = 0.5, sample_rate: 'float' = 44100.0) -> 'npt.NDArray[np.float64]'`

Apply a tremolo effect (amplitude modulation).

Args:
    signal: Input signal.
    rate_hz: LFO rate in Hz.
    depth: Modulation depth in [0, 1].
    sample_rate: Sampling rate in Hz.

Returns:
    numpy array of the processed signal.

---


### `comb_reverb(signal: 'npt.ArrayLike', delay_samples: 'int', feedback: 'float' = 0.5, dry: 'float' = 1.0, wet: 'float' = 0.3) -> 'npt.NDArray[np.float64]'`

Apply a comb-filter reverb effect.

Args:
    signal: Input signal.
    delay_samples: Comb delay in samples.
    feedback: Feedback gain (abs(feedback) < 1).
    dry: Dry mix weight.
    wet: Wet mix weight.

Returns:
    numpy array of the processed signal.

---


---

## FIR filters, convolution, and biquad (IIR) filtering


### `convolution_num_samples(signal_len: 'int', kernel_len: 'int') -> 'int'`

Compute the output length of a full linear convolution.

---


### `convolution_time(signal: 'npt.ArrayLike', kernel: 'npt.ArrayLike') -> 'npt.NDArray[np.float64]'`

Time-domain full linear convolution.

Returns:
    numpy array of length signal_len + kernel_len - 1.

---


### `moving_average(signal: 'npt.ArrayLike', window_len: 'int') -> 'npt.NDArray[np.float64]'`

Causal moving-average FIR filter.

Returns:
    numpy array of the same length as the input.

---


### `fir_filter(signal: 'npt.ArrayLike', coeffs: 'npt.ArrayLike') -> 'npt.NDArray[np.float64]'`

Apply a causal FIR filter with arbitrary coefficients.

Returns:
    numpy array of the same length as the input.

---


### `design_lowpass_fir(num_taps: 'int', cutoff_freq: 'float', sample_rate: 'float', kaiser_beta: 'float' = 5.0) -> 'npt.NDArray[np.float64]'`

Design a Kaiser-windowed sinc lowpass FIR filter.

Args:
    num_taps: Number of filter coefficients (filter order + 1).
    cutoff_freq: Cutoff frequency in Hz.
    sample_rate: Sampling rate in Hz.
    kaiser_beta: Kaiser window shape parameter (default 5.0).

Returns:
    numpy array of length num_taps containing the filter coefficients.

---


### `convolution_fft_ola(signal: 'npt.ArrayLike', kernel: 'npt.ArrayLike') -> 'npt.NDArray[np.float64]'`

Full linear convolution using FFT overlap-add.

Returns:
    numpy array of length signal_len + kernel_len - 1.

---


### `class BiquadFilter(filter_type: 'int', freq: 'float', sample_rate: 'float', db_gain: 'float' = 0.0, bandwidth: 'float' = 1.0) -> 'None'`

Biquad (second-order IIR) filter.

Supports low-pass, high-pass, band-pass, notch, peaking EQ,
low shelf, and high shelf filter types.

Example:
    >>> filt = BiquadFilter(LPF, freq=1000.0, sample_rate=44100.0)
    >>> for sample in signal:
    ...     output = filt.process(sample)

---


---

## Generalized Cross-Correlation (GCC) delay estimation


### `get_delay(sig_a: 'npt.ArrayLike', sig_b: 'npt.ArrayLike', margin: 'int', weighting: 'int' = 1) -> 'tuple[int, float]'`

Estimate the delay between two signals using GCC.

Args:
    sig_a: First signal.
    sig_b: Second signal.
    margin: Search +/- this many samples around zero-lag.
    weighting: GCC_SIMP or GCC_PHAT.

Returns:
    (delay, entropy) tuple. Delay in samples (positive = sig_b lags sig_a).

---


### `get_multiple_delays(signals: 'list[npt.ArrayLike]', margin: 'int', weighting: 'int' = 1) -> 'npt.NDArray[np.int32]'`

Estimate delays between a reference signal and M-1 other signals.

Args:
    signals: List of numpy arrays (signals[0] is reference).
    margin: Search window in samples.
    weighting: GCC_SIMP or GCC_PHAT.

Returns:
    numpy array of M-1 delay values.

---


### `gcc(sig_a: 'npt.ArrayLike', sig_b: 'npt.ArrayLike', weighting: 'int' = 1) -> 'npt.NDArray[np.float64]'`

Compute the full generalized cross-correlation between two signals.

Args:
    sig_a: First signal.
    sig_b: Second signal.
    weighting: GCC_SIMP or GCC_PHAT.

Returns:
    numpy array of N doubles (zero-lag at index ceil(N/2)).

---


---

## Signal generators: sine, noise, impulse, chirps, and spectrogram text


### `sine_wave(n: 'int', amplitude: 'float' = 1.0, freq: 'float' = 440.0, sample_rate: 'float' = 44100.0) -> 'npt.NDArray[np.float64]'`

Generate a sine wave.

Args:
    n: Number of samples.
    amplitude: Peak amplitude.
    freq: Frequency in Hz.
    sample_rate: Sampling rate in Hz.

Returns:
    numpy array of length n.

---


### `white_noise(n: 'int', amplitude: 'float' = 1.0, seed: 'int' = 42) -> 'npt.NDArray[np.float64]'`

Generate Gaussian white noise.

Args:
    n: Number of samples.
    amplitude: Standard deviation.
    seed: Random seed for reproducibility.

Returns:
    numpy array of length n.

---


### `impulse(n: 'int', amplitude: 'float' = 1.0, position: 'int' = 0) -> 'npt.NDArray[np.float64]'`

Generate a discrete impulse (Kronecker delta).

Args:
    n: Number of samples.
    amplitude: Spike amplitude.
    position: Sample index of the spike.

Returns:
    numpy array of length n.

---


### `chirp_linear(n: 'int', amplitude: 'float' = 1.0, f_start: 'float' = 200.0, f_end: 'float' = 4000.0, sample_rate: 'float' = 16000.0) -> 'npt.NDArray[np.float64]'`

Generate a linear chirp (swept sine).

Args:
    n: Number of samples.
    amplitude: Peak amplitude.
    f_start: Starting frequency in Hz.
    f_end: Ending frequency in Hz.
    sample_rate: Sampling rate in Hz.

Returns:
    numpy array of length n.

---


### `chirp_log(n: 'int', amplitude: 'float' = 1.0, f_start: 'float' = 20.0, f_end: 'float' = 20000.0, sample_rate: 'float' = 44100.0) -> 'npt.NDArray[np.float64]'`

Generate a logarithmic chirp.

Args:
    n: Number of samples.
    amplitude: Peak amplitude.
    f_start: Starting frequency in Hz (must be > 0).
    f_end: Ending frequency in Hz (must be > 0, != f_start).
    sample_rate: Sampling rate in Hz.

Returns:
    numpy array of length n.

---


### `square_wave(n: 'int', amplitude: 'float' = 1.0, freq: 'float' = 440.0, sample_rate: 'float' = 44100.0) -> 'npt.NDArray[np.float64]'`

Generate a square wave.

---


### `sawtooth_wave(n: 'int', amplitude: 'float' = 1.0, freq: 'float' = 440.0, sample_rate: 'float' = 44100.0) -> 'npt.NDArray[np.float64]'`

Generate a sawtooth wave.

---


### `shepard_tone(n: 'int', amplitude: 'float' = 0.8, base_freq: 'float' = 440.0, sample_rate: 'float' = 44100.0, rate_octaves_per_sec: 'float' = 0.5, num_octaves: 'int' = 8) -> 'npt.NDArray[np.float64]'`

Generate a Shepard tone (auditory illusion of endlessly rising/falling pitch).

Args:
    n: Number of samples.
    amplitude: Peak amplitude.
    base_freq: Centre frequency of the Gaussian envelope in Hz.
    sample_rate: Sampling rate in Hz.
    rate_octaves_per_sec: Glissando rate (positive=rising, negative=falling).
    num_octaves: Number of audible octave layers.

Returns:
    numpy array of length n.

---


### `spectrogram_text(text: 'str', freq_lo: 'float' = 200.0, freq_hi: 'float' = 7500.0, duration_sec: 'float' = 2.0, sample_rate: 'float' = 16000.0) -> 'npt.NDArray[np.float64]'`

Synthesise audio that displays readable text in a spectrogram.

Args:
    text: ASCII string to render.
    freq_lo: Lowest frequency in Hz.
    freq_hi: Highest frequency in Hz.
    duration_sec: Total duration in seconds.
    sample_rate: Sample rate in Hz.

Returns:
    numpy array of audio samples.

---


---

## Shared constants, CFFI helpers, and cleanup for pyminidsp submodules


### `class MiniDSPError(code: 'int', func_name: 'str', message: 'str') -> 'None'`

Raised when the miniDSP C library reports an error.

---

- `ERR_NULL_POINTER = 1`
- `ERR_INVALID_SIZE = 2`
- `ERR_INVALID_RANGE = 3`
- `ERR_ALLOC_FAILED = 4`

### `shutdown() -> 'None'`

Free all internally cached FFT plans and buffers.

---

- `LPF = 0`
- `HPF = 1`
- `BPF = 2`
- `NOTCH = 3`
- `PEQ = 4`
- `LSH = 5`
- `HSH = 6`
- `STEG_LSB = 0`
- `STEG_FREQ_BAND = 1`
- `STEG_SPECTEXT = 2`
- `STEG_TYPE_TEXT = 0`
- `STEG_TYPE_BINARY = 1`
- `GCC_SIMP = 0`
- `GCC_PHAT = 1`
- `VAD_NUM_FEATURES = 5`

---

## Polyphase sinc resampling (sample rate conversion)


### `resample_output_len(input_len: 'int', in_rate: 'float', out_rate: 'float') -> 'int'`

Compute the number of output samples for a given resampling operation.

---


### `resample(signal: 'npt.ArrayLike', in_rate: 'float', out_rate: 'float', num_zero_crossings: 'int' = 13, kaiser_beta: 'float' = 5.0) -> 'npt.NDArray[np.float64]'`

Resample a signal using a polyphase sinc resampler with Kaiser-windowed
anti-aliasing filter.

Args:
    signal: Input signal.
    in_rate: Input sample rate in Hz.
    out_rate: Output sample rate in Hz.
    num_zero_crossings: Number of zero crossings in the sinc kernel (default 13).
    kaiser_beta: Kaiser window shape parameter (default 5.0).

Returns:
    numpy array of resampled signal.

---


---

## FFT-based spectrum analysis, STFT, mel filterbanks, MFCCs, and window functions


### `lowpass_brickwall(signal: 'npt.ArrayLike', cutoff_hz: 'float', sample_rate: 'float') -> 'npt.NDArray[np.float64]'`

Apply an FFT-based ideal (brickwall) lowpass filter.

Zeroes all frequency bins above the cutoff frequency. Operates by
copying the signal, applying the filter in-place, and returning the result.

Args:
    signal: Input signal.
    cutoff_hz: Cutoff frequency in Hz.
    sample_rate: Sampling rate in Hz.

Returns:
    numpy array of the same length as the input.

---


### `magnitude_spectrum(signal: 'npt.ArrayLike') -> 'npt.NDArray[np.float64]'`

Compute the magnitude spectrum of a real-valued signal.

Returns:
    numpy array of length N/2 + 1 containing magnitudes.

---


### `power_spectral_density(signal: 'npt.ArrayLike') -> 'npt.NDArray[np.float64]'`

Compute the power spectral density (PSD) of a signal.

Returns:
    numpy array of length N/2 + 1 containing power values.

---


### `phase_spectrum(signal: 'npt.ArrayLike') -> 'npt.NDArray[np.float64]'`

Compute the one-sided phase spectrum in radians.

Returns:
    numpy array of length N/2 + 1 with phase in [-pi, pi].

---


### `stft_num_frames(signal_len: 'int', n: 'int', hop: 'int') -> 'int'`

Compute the number of STFT frames.

---


### `stft(signal: 'npt.ArrayLike', n: 'int', hop: 'int') -> 'npt.NDArray[np.float64]'`

Compute the Short-Time Fourier Transform (STFT).

Args:
    signal: Input signal.
    n: FFT window size.
    hop: Hop size in samples.

Returns:
    2D numpy array of shape (num_frames, n//2+1) containing magnitudes.

---


### `mel_filterbank(n: 'int', sample_rate: 'float', num_mels: 'int' = 26, min_freq_hz: 'float' = 0.0, max_freq_hz: 'float | None' = None) -> 'npt.NDArray[np.float64]'`

Build a mel-spaced triangular filterbank matrix.

Args:
    n: FFT size.
    sample_rate: Sampling rate in Hz.
    num_mels: Number of mel filters.
    min_freq_hz: Lower frequency bound.
    max_freq_hz: Upper frequency bound (defaults to sample_rate/2).

Returns:
    2D numpy array of shape (num_mels, n//2+1).

---


### `mel_energies(signal: 'npt.ArrayLike', sample_rate: 'float', num_mels: 'int' = 26, min_freq_hz: 'float' = 0.0, max_freq_hz: 'float | None' = None) -> 'npt.NDArray[np.float64]'`

Compute mel-band energies from a single frame.

Returns:
    numpy array of length num_mels.

---


### `mfcc(signal: 'npt.ArrayLike', sample_rate: 'float', num_mels: 'int' = 26, num_coeffs: 'int' = 13, min_freq_hz: 'float' = 0.0, max_freq_hz: 'float | None' = None) -> 'npt.NDArray[np.float64]'`

Compute MFCCs from a single frame.

Args:
    signal: Input frame.
    sample_rate: Sampling rate in Hz.
    num_mels: Number of mel bands.
    num_coeffs: Number of cepstral coefficients to output.
    min_freq_hz: Lower frequency bound.
    max_freq_hz: Upper frequency bound (defaults to sample_rate/2).

Returns:
    numpy array of length num_coeffs.

---


### `hann_window(n: 'int') -> 'npt.NDArray[np.float64]'`

Generate a Hanning (Hann) window of length n.

---


### `hamming_window(n: 'int') -> 'npt.NDArray[np.float64]'`

Generate a Hamming window of length n.

---


### `blackman_window(n: 'int') -> 'npt.NDArray[np.float64]'`

Generate a Blackman window of length n.

---


### `rect_window(n: 'int') -> 'npt.NDArray[np.float64]'`

Generate a rectangular window of length n (all ones).

---


### `kaiser_window(n: 'int', beta: 'float') -> 'npt.NDArray[np.float64]'`

Generate a Kaiser window of length n with shape parameter beta.

Unlike other window functions, Kaiser windows require a beta parameter
that controls the trade-off between main-lobe width and side-lobe level.
Higher beta gives lower sidelobes but a wider main lobe.

Args:
    n: Window length.
    beta: Shape parameter (typical values: 5-14).

Returns:
    numpy array of length n.

---


---

## Audio steganography: hide and recover data within audio signals


### `steg_capacity(signal_len: 'int', sample_rate: 'float', method: 'int' = 0) -> 'int'`

Compute maximum message length that can be hidden.

---


### `steg_encode(host: 'npt.ArrayLike', message: 'str', sample_rate: 'float' = 44100.0, method: 'int' = 0) -> 'tuple[npt.NDArray[np.float64], int]'`

Encode a secret text message into a host audio signal.

Args:
    host: Host signal (not modified).
    message: String message to hide.
    sample_rate: Sample rate in Hz.
    method: STEG_LSB or STEG_FREQ_BAND.

Returns:
    (stego_signal, num_bytes_encoded) tuple.

---


### `steg_decode(stego: 'npt.ArrayLike', sample_rate: 'float' = 44100.0, method: 'int' = 0, max_msg_len: 'int' = 4096) -> 'str'`

Decode a secret text message from a stego audio signal.

Returns:
    Decoded string message.

---


### `steg_encode_bytes(host: 'npt.ArrayLike', data: 'bytes', sample_rate: 'float' = 44100.0, method: 'int' = 0) -> 'tuple[npt.NDArray[np.float64], int]'`

Encode arbitrary binary data into a host audio signal.

Args:
    host: Host signal.
    data: bytes-like object to hide.
    sample_rate: Sample rate in Hz.
    method: STEG_LSB or STEG_FREQ_BAND.

Returns:
    (stego_signal, num_bytes_encoded) tuple.

---


### `steg_decode_bytes(stego: 'npt.ArrayLike', sample_rate: 'float' = 44100.0, method: 'int' = 0, max_len: 'int' = 4096) -> 'bytes'`

Decode binary data from a stego audio signal.

Returns:
    bytes object containing the decoded data.

---


### `steg_detect(signal: 'npt.ArrayLike', sample_rate: 'float' = 44100.0) -> 'tuple[int | None, int | None]'`

Detect which steganography method was used.

Returns:
    (method, payload_type) tuple, or (None, None) if no steg detected.
    method is STEG_LSB, STEG_FREQ_BAND, or None.
    payload_type is STEG_TYPE_TEXT, STEG_TYPE_BINARY, or None.

---


---

## Voice activity detection (VAD) with adaptive normalization


### `class VAD(*, threshold: 'float | None' = None, onset_frames: 'int | None' = None, hangover_frames: 'int | None' = None, adaptation_rate: 'float | None' = None, band_low_hz: 'float | None' = None, band_high_hz: 'float | None' = None, weights: 'Sequence[float] | None' = None) -> 'None'`

Voice activity detector with adaptive feature normalization and
onset/hangover smoothing.

Wraps the miniDSP C library's stateful VAD API.  The detector extracts
five features per frame (energy, ZCR, spectral entropy, spectral
flatness, band energy ratio), normalizes them adaptively, computes a
weighted score, and applies an onset/hangover state machine.

Example:
    >>> detector = VAD(threshold=0.4)
    >>> detector.calibrate(silence_frame, sample_rate=16000.0)
    >>> decision, score, features = detector.process_frame(frame, 16000.0)

---


---

# Tutorials

# Signal Generators


pyminidsp provides stateless signal generators for creating test signals.
No audio input or microphone source is needed — just specify the
parameters and get a NumPy array back.

## Sine wave


The fundamental test signal — a pure tone at a single frequency:


$$
x[n] = A \sin(2\pi f \, n / f_s)

$$

```python
import pyminidsp as md

signal = md.sine_wave(44100, amplitude=1.0, freq=440.0, sample_rate=44100.0)

# Verify: the FFT peak should align with the expected frequency bin
mag = md.magnitude_spectrum(signal)
```

**Listen** — 440 Hz, 2 seconds:

## Impulse (Kronecker delta)


A single spike at a given position, zeros everywhere else.  The unit
impulse (amplitude 1.0 at position 0) is the identity element of
convolution and has a perfectly flat magnitude spectrum.

```python
imp = md.impulse(1024, amplitude=1.0, position=0)

# Flat spectrum — all bins have equal magnitude
mag = md.magnitude_spectrum(imp)
```

**Listen** — impulse train (4 clicks at 0.5 s intervals):

## Chirp (swept sine)


Two varieties:

**Linear chirp** — frequency sweeps at a constant rate.  The
instantaneous frequency traces a straight diagonal in the spectrogram.

```python
# 1-second sweep from 200 Hz to 4 kHz at 16 kHz sample rate
chirp = md.chirp_linear(16000, amplitude=1.0, f_start=200.0,
                         f_end=4000.0, sample_rate=16000.0)
```

**Logarithmic chirp** — exponential sweep, spending equal time per
octave.  Ideal for measuring systems on a log-frequency axis.

```python
# Full audible range sweep: 20 Hz to 20 kHz
chirp = md.chirp_log(44100, amplitude=1.0, f_start=20.0,
                      f_end=20000.0, sample_rate=44100.0)
```

## Square wave


Alternates between +amplitude and −amplitude.  Its Fourier series
contains only **odd harmonics** (1f, 3f, 5f, …) with amplitudes
decaying as 1/k — a textbook demonstration of the Gibbs phenomenon.

```python
sq = md.square_wave(4096, amplitude=1.0, freq=440.0, sample_rate=44100.0)
```

## Sawtooth wave


Ramps linearly from −amplitude to +amplitude each period.  Contains
**all integer harmonics** (1f, 2f, 3f, …) decaying as 1/k — richer
harmonic content than the square wave's odd-only series.

```python
saw = md.sawtooth_wave(4096, amplitude=1.0, freq=440.0, sample_rate=44100.0)
```

## White noise


Gaussian white noise has equal power at all frequencies — its PSD is
approximately flat.  Samples follow N(0, σ²) via the Box-Muller
transform.  A fixed seed gives reproducible output.

```python
noise = md.white_noise(4096, amplitude=1.0, seed=42)

# Same seed → same output
noise2 = md.white_noise(4096, amplitude=1.0, seed=42)
assert (noise == noise2).all()
```

## Shepard tone


See `shepard-tone` for a dedicated guide on this auditory illusion.

---

# Basic Signal Operations


Five fundamental time-domain analysis techniques that work alongside
`pyminidsp.energy`, `pyminidsp.power`, and
`pyminidsp.entropy`.


## RMS (Root Mean Square)


The standard measure of signal "loudness":


$$
\text{RMS} = \sqrt{\frac{1}{N}\sum_{n=0}^{N-1} x[n]^2}

$$

A unit sine wave yields ≈ 0.707; a DC signal of value *c* has RMS = $|c|$.

```python
import pyminidsp as md

signal = md.sine_wave(44100, amplitude=1.0, freq=440.0, sample_rate=44100.0)
print(md.rms(signal))  # ≈ 0.707
```

## Zero-crossing rate


Counts how often the signal changes sign, normalised by the number of
adjacent pairs.  High ZCR → noise or high-frequency content.  Low ZCR →
tonal or low-frequency content.

```python
signal = md.sine_wave(16000, freq=1000.0, sample_rate=16000.0)
zcr = md.zero_crossing_rate(signal)
# zcr ≈ 2 * 1000 / 16000 = 0.125
```

## Autocorrelation


Measures the similarity between a signal and a delayed copy of itself.
Periodic signals produce a strong peak at the fundamental period —
the basis of autocorrelation-based pitch detection.

```python
signal = md.sine_wave(1024, freq=100.0, sample_rate=1000.0)
acf = md.autocorrelation(signal, max_lag=50)
# acf[0] = 1.0
# acf[10] ≈ 1.0  (lag 10 = one period of 100 Hz at 1 kHz sample rate)
```

## Peak detection


Finds local maxima above a threshold with a minimum distance constraint
to suppress secondary peaks.

```python
import numpy as np

signal = np.array([0, 1, 3, 1, 0, 2, 5, 2, 0], dtype=float)
peaks = md.peak_detect(signal, threshold=0.0, min_distance=1)
print(peaks)  # [2, 6]  (values 3 and 5)
```

## Signal mixing


Element-wise weighted sum of two signals:


$$
\text{out}[n] = w_a \cdot a[n] + w_b \cdot b[n]

$$

```python
sine = md.sine_wave(1024, amplitude=1.0, freq=440.0, sample_rate=44100.0)
noise = md.white_noise(1024, amplitude=0.1, seed=42)
mixed = md.mix(sine, noise, w_a=0.8, w_b=0.2)
```

---

# Window Functions


Window functions taper finite signal blocks before FFT processing to
prevent **spectral leakage** — the spreading of energy into neighbouring
frequency bins caused by discontinuities at block edges.

The DFT assumes the input is one period of a periodic signal.  When the
signal doesn't have an integer number of cycles in the block, the
endpoints are mismatched.  A window smoothly tapers the signal to zero
at the edges, greatly reducing this leakage.


## Four window types


**Hanning (Hann)** — the default choice for FFT analysis.


$$
w[n] = 0.5\bigl(1 - \cos(2\pi n / (N-1))\bigr)

$$

```python
import pyminidsp as md
win = md.hann_window(256)
```

**Hamming** — similar to Hanning but with a lower first sidelobe.


$$
w[n] = 0.54 - 0.46\cos(2\pi n / (N-1))

$$

```python
win = md.hamming_window(256)
```

**Blackman** — strongest sidelobe suppression, widest main lobe.


$$
w[n] = 0.42 - 0.5\cos(2\pi n/(N-1)) + 0.08\cos(4\pi n/(N-1))

$$

```python
win = md.blackman_window(256)
```

**Rectangular** — all ones (no tapering).  Narrowest main lobe but
maximum sidelobe leakage.

```python
win = md.rect_window(256)
```

## Comparison


| Window | Edge values | Sidelobe level | Main lobe width |
|---|---|---|---|
| Rectangular | 1.0 | Highest | Narrowest |
| Hanning | 0.0 | Low | Medium |
| Hamming | 0.08 | Lower first sidelobe | Medium |
| Blackman | 0.0 | Lowest | Widest |

**Rule of thumb:** start with Hanning.  Use Blackman when minimising
leakage matters more than frequency resolution.

---

# Computing the Magnitude Spectrum


The **magnitude spectrum** tells you the amplitude of each sinusoidal
component present in a signal.

## Workflow


1. Generate (or load) a signal.
2. Apply a window function to reduce spectral leakage.
3. Compute the magnitude spectrum via `pyminidsp.magnitude_spectrum`.
4. Normalise if needed.


## Example


```python
import pyminidsp as md
import numpy as np

sr = 44100.0
N = 1024

# Build a test signal: 440 Hz + 1000 Hz + 2500 Hz + DC offset
t = np.arange(N) / sr
signal = (0.1
          + 1.0 * np.sin(2 * np.pi * 440.0 * t)
          + 0.5 * np.sin(2 * np.pi * 1000.0 * t)
          + 0.3 * np.sin(2 * np.pi * 2500.0 * t))

mag = md.magnitude_spectrum(signal)

# mag has N//2 + 1 = 513 bins
# bin k → frequency = k * sr / N
```

## Normalisation


The raw output is **not** normalised by *N*.  Three steps to get
single-sided amplitudes:

1. Divide all bins by *N*.
2. Double interior bins (k = 1 to N/2 − 1) to account for folded
   negative frequencies.
3. Leave DC (k = 0) and Nyquist (k = N/2) unchanged.

```python
amp = mag / N
amp[1:-1] *= 2  # double interior bins
```

## Visualisation


The linear plot shows distinct peaks at the input frequencies.  The
logarithmic (dB) scale reveals the Hanning window's sidelobes and
low-level details that are invisible on a linear axis.

```python
# Convert to dB (for plotting)
mag_db = 20 * np.log10(amp + 1e-12)

md.shutdown()
```

---

# Power Spectral Density


The Power Spectral Density (PSD) measures how a signal's **power** is
distributed across frequencies.  While the magnitude spectrum tells you
the *amplitude* at each frequency, the PSD tells you the *power* —
useful for noise analysis, SNR estimation, and comparing signals of
different lengths.

## Formula


The periodogram estimator:


$$
\text{PSD}[k] = \frac{|X(k)|^2}{N}

$$

**Relationship to the magnitude spectrum:**
``PSD[k] = magnitude[k]**2 / N``

**dB conversion:** use ``10 * log10()`` for power (not ``20 * log10()``
as with amplitude), because power scales with amplitude squared:
``10 * log10(A²) = 20 * log10(A)``.


## Example


```python
import pyminidsp as md
import numpy as np

sr = 44100.0
N = 1024

# Multi-tone test signal
t = np.arange(N) / sr
signal = (0.1
          + 1.0 * np.sin(2 * np.pi * 440.0 * t)
          + 0.5 * np.sin(2 * np.pi * 1000.0 * t)
          + 0.3 * np.sin(2 * np.pi * 2500.0 * t))

psd = md.power_spectral_density(signal)
```

## Parseval's theorem


Total time-domain energy equals frequency-domain energy (validation):

```python
time_energy = np.sum(signal ** 2)
freq_energy = psd[0] + 2 * np.sum(psd[1:-1]) + psd[-1]
np.testing.assert_allclose(time_energy, freq_energy, rtol=1e-10)
```

## Visualisation


```python
psd_db = 10 * np.log10(psd + 1e-12)

md.shutdown()
```

---

# Phase Spectrum


The phase spectrum describes the **timing** of frequency components.
Each DFT coefficient is a complex number; while magnitude reveals energy
distribution, phase reveals the angle or shift of that frequency
component:


$$
\phi(k) = \arg X(k) = \text{atan2}(\text{Im}\,X(k),\;\text{Re}\,X(k))

$$

Values span $[-\pi, \pi]$.


## Key intuitions


- A **cosine** at an integer bin produces $\phi \approx 0$.
- A **sine** at the same bin produces $\phi \approx -\pi/2$.
- A **time-delayed** signal exhibits **linear phase**:
  $\phi(k) = -2\pi k d / N$, a principle underlying delay
  estimation (GCC-PHAT).


## Example


```python
import pyminidsp as md
import numpy as np

N = 1024
sr = 44100.0
t = np.arange(N) / sr

# Three tones with known phases
signal = (1.0 * np.cos(2 * np.pi * 440.0 * t)     # phase ≈ 0
        + 0.5 * np.sin(2 * np.pi * 1000.0 * t))    # phase ≈ -π/2

phase = md.phase_spectrum(signal)
# phase has N//2 + 1 = 513 bins, values in [-π, π]
```

**IMPORTANT**:
Phase is only meaningful at bins where the magnitude is significant.
Always examine :func:`~pyminidsp.magnitude_spectrum` alongside the
phase to identify significant bins.


## Visualisation


```python
md.shutdown()
```

---

# STFT & Spectrogram


The magnitude spectrum reveals frequency content across an entire signal,
but cannot show how that content **changes over time**.  The Short-Time
Fourier Transform (STFT) solves this by dividing the signal into short,
overlapping frames and computing the DFT of each one, producing a 2-D
time-frequency representation called a **spectrogram**.


## Key parameters


**Window size** (*n*) — larger windows give better frequency resolution
but worse time resolution.  For audio at 16 kHz, ``n=512`` (32 ms) is
a balanced starting point.

**Hop size** (*hop*) — controls frame overlap.  75% overlap
(``hop = n // 4``) is the standard choice: smooth spectrograms without
excessive computation.


## Example


```python
import pyminidsp as md
import numpy as np

sr = 16000.0
N = 16000  # 1 second

# Linear chirp — frequency rises from 200 Hz to 4 kHz
signal = md.chirp_linear(N, amplitude=1.0, f_start=200.0,
                          f_end=4000.0, sample_rate=sr)

n = 512
hop = 128
spec = md.stft(signal, n=n, hop=hop)

# spec.shape == (num_frames, n // 2 + 1)
num_frames = md.stft_num_frames(N, n, hop)

# Convert bin k to Hz:  freq_hz = k * sr / n
# Convert frame f to seconds:  time_s = f * hop / sr
```

## Converting to dB


Normalise by *n* before taking the log so that a full-scale sine
(amplitude 1) reads near 0 dB:

```python
spec_db = 20 * np.log10(spec / n + 1e-12)
```

## Visualisation


The linear chirp appears as a diagonal stripe rising across the
time-frequency plane.

```python
md.shutdown()
```

---

# Mel Filterbanks & MFCCs


Two essential features for speech and audio machine learning:

1. **Mel filterbank energies** — triangular spectral bands spaced on the
   `mel scale <https://en.wikipedia.org/wiki/Mel_scale>`_, which
   compresses frequency representation to match human hearing.
2. **MFCCs** — decorrelated coefficients derived from the log mel
   energies via a DCT, widely used in speech recognition and audio
   classification.


## Mel scale


The HTK mapping:


$$
\text{mel}(f) = 2595 \cdot \log_{10}\!\left(1 + \frac{f}{700}\right)

$$

This densifies low frequencies and coarsens high frequencies, reflecting
how humans perceive pitch.


## Building a mel filterbank


```python
import pyminidsp as md

fb = md.mel_filterbank(512, sample_rate=16000.0, num_mels=26)
# fb.shape == (26, 257)  — 26 triangular filters over 257 FFT bins
```

## Computing mel energies


From a single frame:

```python
signal = md.sine_wave(512, freq=440.0, sample_rate=16000.0)
mel = md.mel_energies(signal, sample_rate=16000.0, num_mels=26)
# mel.shape == (26,)
```

Processing steps (internally):

1. Apply a Hann window.
2. Compute one-sided PSD bins via FFT: ``|X(k)|² / N``.
3. Apply mel filterbank weights and sum per band.


## Computing MFCCs


```python
coeffs = md.mfcc(signal, sample_rate=16000.0, num_mels=26, num_coeffs=13)
# coeffs.shape == (13,)
```

Conventions:

- HTK mel mapping for filter placement.
- Natural-log compression: ``log(max(E_mel, 1e-12))``.
- DCT-II with HTK-C0 normalisation.
- Coefficient C0 is in ``coeffs[0]``.


## Processing a full utterance


To extract MFCCs from a longer signal, use the STFT to break it into
frames first:

```python
import numpy as np

sr = 16000.0
frame_size = 512
hop = 128

# Load or generate a signal
signal = md.chirp_linear(int(sr), f_start=200.0, f_end=4000.0, sample_rate=sr)

num_frames = md.stft_num_frames(len(signal), frame_size, hop)
all_mfcc = np.zeros((num_frames, 13))
for f in range(num_frames):
    start = f * hop
    frame = signal[start:start + frame_size]
    all_mfcc[f] = md.mfcc(frame, sample_rate=sr, num_mels=26, num_coeffs=13)

md.shutdown()
```

---

# Pitch Detection


Two methods for estimating the fundamental frequency (F0) of a signal.


## Autocorrelation method


Searches for the strongest peak in the normalised autocorrelation:


$$
f_0 = \frac{f_s}{\tau_\text{peak}}

$$

More robust for noisy or strongly harmonic signals.

```python
import pyminidsp as md

signal = md.sine_wave(4096, freq=200.0, sample_rate=16000.0)
f0 = md.f0_autocorrelation(signal, sample_rate=16000.0,
                             min_freq_hz=80.0, max_freq_hz=400.0)
print(f"Estimated F0: {f0:.1f} Hz")  # ≈ 200.0
```

## FFT peak-picking method


Applies a Hann window, computes the magnitude spectrum, and identifies
the dominant peak in the requested frequency range:


$$
f_0 = \frac{k_\text{peak} \cdot f_s}{N}

$$

Simple and fast, but can lock onto harmonics (2f0, 3f0) when the
fundamental is weak.

```python
f0 = md.f0_fft(signal, sample_rate=16000.0,
                min_freq_hz=80.0, max_freq_hz=400.0)
print(f"Estimated F0: {f0:.1f} Hz")  # ≈ 200.0
```

## Practical notes


- **Search range** is critical for both methods.  Use prior knowledge of
  the expected pitch range (e.g. 80–400 Hz for speech).
- A return value of **0.0** means no reliable F0 was found — typically
  silence, unvoiced speech, or noisy frames.
- Longer frames improve resolution but reduce time accuracy.

```python
md.shutdown()
```

---

# FIR Filters & Convolution


Four complementary methods for filtering and convolution, from
educational time-domain approaches to efficient FFT-based processing.


## Time-domain convolution


For signals of length *N* and kernels of length *M*, computes the full
linear convolution.  Output length is ``N + M - 1``.

```python
import pyminidsp as md
import numpy as np

signal = md.impulse(100, amplitude=1.0, position=0)
kernel = np.array([1.0, 2.0, 3.0])
out = md.convolution_time(signal, kernel)
# out[:3] == [1.0, 2.0, 3.0]
# len(out) == 102
```

## Moving-average filter


A simple low-pass filter that computes the running mean over a window.
Output matches input length with zero-padded startup.

```python
signal = md.sine_wave(1024, freq=440.0, sample_rate=44100.0)
smoothed = md.moving_average(signal, window_len=5)
```

## General FIR filter


Apply a causal FIR filter with arbitrary coefficients:


$$
\text{out}[n] = \sum_{k=0}^{T-1} \text{coeffs}[k] \cdot \text{signal}[n-k]

$$

Output matches input length.

```python
coeffs = np.array([0.25, 0.5, 0.25])
filtered = md.fir_filter(signal, coeffs)
```

## FFT overlap-add


Same result as time-domain convolution but **much faster for long
kernels** by processing blocks in the frequency domain.

```python
kernel = md.hann_window(256)
out_time = md.convolution_time(signal, kernel)
out_fft = md.convolution_fft_ola(signal, kernel)
np.testing.assert_allclose(out_time, out_fft, atol=1e-10)
```

## Comparison


| Method | Complexity | Output length | Best for |
|---|---|---|---|
| ``convolution_time`` | O(NM) | N + M − 1 | Teaching, short kernels |
| ``moving_average`` | O(N) | N | Simple smoothing |
| ``fir_filter`` | O(NM) | N | Standard FIR design |
| ``convolution_fft_ola`` | O(N log N) | N + M − 1 | Long kernels, production |

```python
md.shutdown()
```

---

# Simple Audio Effects


Three foundational audio effects built on delay lines.


## Delay / echo


A circular buffer with feedback creates repeating echoes that decay
geometrically:


$$
s[n] &= x[n] + \text{feedback} \cdot s[n - D] \\
y[n] &= \text{dry} \cdot x[n] + \text{wet} \cdot s[n - D]

$$

```python
import pyminidsp as md

signal = md.sine_wave(44100, freq=440.0, sample_rate=44100.0)
echoed = md.delay_echo(signal, delay_samples=4410,
                        feedback=0.5, dry=1.0, wet=0.5)
```

**Before:**

**After:**

## Tremolo


Amplitude modulation by a sinusoidal LFO.  The gain oscillates between
``1 - depth`` and ``1``:


$$
g[n] = (1 - d) + d \cdot \frac{1 + \sin(2\pi f_\text{LFO} n / f_s)}{2}

$$

```python
tremmed = md.tremolo(signal, rate_hz=5.0, depth=0.5, sample_rate=44100.0)
```

**Before:**

**After:**

## Comb-filter reverb


Feeds delayed output back into itself, creating closely-spaced echoes
that simulate reverberation:


$$
c[n] &= x[n] + \text{feedback} \cdot c[n - D] \\
y[n] &= \text{dry} \cdot x[n] + \text{wet} \cdot c[n]

$$

```python
reverbed = md.comb_reverb(signal, delay_samples=1000,
                           feedback=0.5, dry=1.0, wet=0.3)
```

**Before:**

**After:**

## Verification tips


- **Impulse response:** feed an impulse through each effect.  Echoes
  should decay predictably based on the feedback value.
- **Parameter extremes:** ``depth=0`` for tremolo should return the
  original signal unchanged.
- **Feedback = 0:** all effects should produce a single delayed copy
  (no ringing).

```python
md.shutdown()
```

---

# DTMF Tone Detection & Generation


`Dual-Tone Multi-Frequency (DTMF)
<https://en.wikipedia.org/wiki/Dual-tone_multi-frequency_signaling>`_ is
the signalling system used by touch-tone telephones.  Each keypad button
is encoded as a pair of sinusoids — one from a low-frequency "row" group
and one from a high-frequency "column" group:

|  | 1209 Hz | 1336 Hz | 1477 Hz | 1633 Hz |
|---|---|---|---|---|
| 697 Hz | 1 | 2 | 3 | A |
| 770 Hz | 4 | 5 | 6 | B |
| 852 Hz | 7 | 8 | 9 | C |
| 941 Hz | \* | 0 | # | D |

The frequencies were chosen to avoid harmonic relationships, preventing
false detections from speech.


## Timing standards


`ITU-T Q.24 <https://www.itu.int/rec/T-REC-Q.24>`_ specifies:

- Minimum tone duration: **40 ms**
- Minimum inter-digit pause: **40 ms**

Practical systems typically use 70–120 ms for both.


## Generating tones


```python
import pyminidsp as md

# Generate "5551234" at 8 kHz with 70 ms tones and pauses
sig = md.dtmf_generate("5551234", sample_rate=8000.0, tone_ms=70, pause_ms=70)
```

Each digit is rendered as the sum of its row and column sinusoids at
amplitude 0.5 (peak combined amplitude = 1.0).

## Detecting tones


```python
tones = md.dtmf_detect(sig, sample_rate=8000.0)
for digit, start_s, end_s in tones:
    print(f"{digit}  {start_s:.3f}–{end_s:.3f} s")
```

Detection uses a sliding Hanning-windowed FFT with a state machine that
enforces ITU-T Q.24 minimum timing.  The FFT size is the largest power
of two fitting within 35 ms (e.g. 256 at 8 kHz, giving 31.25 Hz
resolution).


## Round-trip verification


```python
digits = "5551234"
sig = md.dtmf_generate(digits, sample_rate=8000.0)
detected = md.dtmf_detect(sig, sample_rate=8000.0)
result = "".join(t[0] for t in detected)
assert result == digits

md.shutdown()
```

---

# Shepard Tone


A `Shepard tone <https://en.wikipedia.org/wiki/Shepard_tone>`_ is an
acoustic illusion — a sound that appears to continuously rise (or fall)
in pitch without ever actually leaving its frequency range.  Cognitive
scientist Roger Shepard first described this effect in 1964.  It mirrors
an M.C. Escher staircase: listeners perceive endless ascending motion
that never reaches its destination.


## How it works


The illusion relies on two principles:

1. **Octave equivalence** — the human ear perceives tones one octave
   apart as the "same note" at a different pitch height.
2. **Spectral envelope** — a fixed Gaussian curve in log-frequency space
   controls loudness.  Tones near the centre are loud; those at edges
   fade nearly silent.

Multiple sine waves — each separated by one octave — sound
simultaneously while gliding upward.  As tones fade at the upper edge,
new tones enter at the bottom, fading in.  The loudest tones always
occupy the middle and move upward, so the sound seems to ascend
perpetually.


## Signal model


$$
x[n] = A_\text{norm}\sum_k
\exp\!\left(-\frac{d_k(t)^2}{2\sigma^2}\right)
\sin(\varphi_k(n))

$$

where the octave distance from the Gaussian centre is


$$
d_k(t) = k - c + R\,t, \quad
c = \frac{L-1}{2}, \quad
\sigma = \frac{L}{4}

$$

and the instantaneous frequency of layer *k* is
$f_k(t) = f_\text{base} \cdot 2^{d_k(t)}$.  Phase is accumulated
sample-by-sample for smooth glides.


## Example


```python
import pyminidsp as md

# 5 seconds of endlessly rising Shepard tone at 44.1 kHz
sig = md.shepard_tone(5 * 44100, amplitude=0.8, base_freq=440.0,
                       sample_rate=44100.0, rate_octaves_per_sec=0.5,
                       num_octaves=8)
```

**Listen** — rising Shepard tone (5 seconds):

**Falling** Shepard tone:

**Static** chord (rate = 0):

## Key parameters


**Glissando rate** (``rate_octaves_per_sec``):

- ``0.0`` — static chord (no motion)
- ``0.5`` — moderate rise (default)
- Negative values → falling Shepard tone

**Number of octaves** (``num_octaves``):

- 4–6 — narrow, organ-like quality
- 8 — balanced (default)
- 10–12 — ethereal, diffuse texture

**Base frequency** (``base_freq``): centres the Gaussian envelope.
Typical values: 200–600 Hz.

```python
# Slowly falling Shepard tone
falling = md.shepard_tone(44100 * 3, amplitude=0.8, base_freq=300.0,
                           sample_rate=44100.0, rate_octaves_per_sec=-0.3,
                           num_octaves=10)

md.shutdown()
```

---

# Spectrogram Text Art


Synthesise audio that displays **readable text** when viewed as a
spectrogram — time runs horizontally, frequency vertically.


## How it works


1. Each ASCII character (32–126) is rasterised with a built-in 5 × 7
   bitmap font, spaced 3 columns apart.
2. Each bitmap column becomes a time slice.
3. Each "on" pixel becomes a sine wave at the corresponding frequency
   between *freq_lo* and *freq_hi* (top row → highest frequency,
   bottom row → lowest, linearly interpolated).
4. A 3 ms raised-cosine crossfade at column boundaries suppresses
   clicks.
5. The output is normalised to 0.9 peak amplitude.


## Example


```python
import pyminidsp as md

sig = md.spectrogram_text("HELLO", freq_lo=200.0, freq_hi=7500.0,
                           duration_sec=2.0, sample_rate=16000.0)

# View the spectrogram of `sig` to see "HELLO" spelled out
# in the frequency domain.
```

**Listen** — "HELLO" encoded in the spectrogram:

The result sounds like a buzzy chord, but when analysed with a
spectrogram viewer (1024-point FFT at 16 kHz), the text is clearly
visible.

## Tips


- Use a sample rate of at least 16 kHz and keep *freq_hi* below
  Nyquist.
- Longer *duration_sec* stretches the text horizontally — easier to
  read in spectrograms.
- Short strings work best (the 5 × 7 font has limited resolution).

```python
md.shutdown()
```

---

# Voice Activity Detection


`Voice activity detection (VAD)
<https://en.wikipedia.org/wiki/Voice_activity_detection>`_ is the task of
determining whether an audio frame contains speech or silence.  It is a
fundamental building block in speech processing pipelines — from automatic
speech recognition to noise-aware audio analysis.

pyminidsp provides a frame-level VAD that extracts five features per frame,
normalizes them adaptively, computes a weighted score, and applies an
onset/hangover state machine.


## Features


The detector extracts five features from each audio frame:

**Energy** — sum of squared samples.  Silence has near-zero energy; speech
has high energy.  Energy alone fails in moderate noise.


$$
E = \sum_{n=0}^{N-1} x[n]^{2}

$$

**Zero-crossing rate (ZCR)** — fraction of consecutive samples that cross
zero.  Voiced speech has low ZCR; unvoiced fricatives have high ZCR;
silence has low ZCR.


$$
\text{ZCR} = \frac{1}{N-1} \sum_{n=1}^{N-1}
\mathbf{1}\!\bigl[\operatorname{sgn}(x[n]) \neq \operatorname{sgn}(x[n-1])\bigr]

$$

**Spectral entropy** — how spread out the energy is across frequency bins.
Speech has lower spectral entropy (energy concentrated in harmonics);
noise has higher spectral entropy (energy spread evenly).


$$
H = -\frac{1}{\ln K} \sum_{k=0}^{K-1} p_k \ln p_k
\qquad\text{where } p_k = \frac{\text{PSD}[k]}{\sum_j \text{PSD}[j]}

$$

**Spectral flatness** — ratio of the geometric mean to the arithmetic
mean of the power spectrum.  White noise gives SF ≈ 1; a pure tone gives
SF ≈ 0.  Speech falls between.


$$
\text{SF}
= \frac{\bigl(\prod_{k=0}^{K-1} \text{PSD}[k]\bigr)^{1/K}}
{\frac{1}{K}\sum_{k=0}^{K-1} \text{PSD}[k]}

$$

**Band energy ratio** — fraction of total energy that falls within the
speech band (default 300–3400 Hz, telephone bandwidth).


$$
\text{BER}
= \frac{\sum_{k:\,f_k \in [f_\text{lo},\,f_\text{hi}]} \text{PSD}[k]}
{\sum_k \text{PSD}[k]}


$$

## Adaptive normalization


Raw feature values vary widely across recordings.  The detector tracks
per-feature minimums and maximums using an exponential moving average
(EMA) and normalizes each feature to [0, 1]:


$$
m_i \leftarrow m_i + \alpha\,(f_i - m_i)
\qquad
M_i \leftarrow M_i + \alpha\,(f_i - M_i)

$$


$$
\hat{f}_i = \text{clamp}\!\Bigl(\frac{f_i - m_i}{M_i - m_i},\; 0,\; 1\Bigr)

$$

The adaptation rate α (default 0.01) controls how fast the normalization
adjusts.  Calling `pyminidsp.VAD.calibrate` with known silence
seeds the EMA estimates for faster convergence.


## Weighted scoring


The five normalized features are combined into a single score:


$$
S = \sum_{i=0}^{4} w_i \cdot \hat{f}_i

$$

By default all weights are equal (0.2 each).  You can emphasize specific
features — for example, weighting energy heavily for clean environments,
or spectral entropy for noisy ones.


## State machine


A raw score above the threshold does not immediately trigger speech.  An
onset/hangover state machine smooths the decision:

| Current State | Condition | Action |
|---|---|---|
| SILENCE | score ≥ threshold | Increment onset counter |
| SILENCE | onset counter ≥ ``onset_frames`` | Transition to SPEECH |
| SILENCE | score < threshold | Reset onset counter |
| SPEECH | score ≥ threshold | Reset hangover counter to ``hangover_frames`` |
| SPEECH | score < threshold | Decrement hangover counter |
| SPEECH | hangover counter reaches 0 | Transition to SILENCE |

**Onset gating** prevents transient clicks from triggering false
positives — the score must exceed the threshold for ``onset_frames``
consecutive frames (default 3).

**Hangover** bridges brief dips mid-utterance, holding the speech state
for ``hangover_frames`` frames (default 15) after activity drops.


## Creating a detector


The `pyminidsp.VAD` class wraps the stateful C implementation.
All parameters are optional — omitted values use sensible defaults.

```python
import pyminidsp as md

# Default parameters
detector = md.VAD()

# Custom threshold and hangover
detector = md.VAD(threshold=0.4, hangover_frames=20)

# Custom feature weights (energy, ZCR, spectral entropy,
# spectral flatness, band energy ratio)
detector = md.VAD(weights=[0.4, 0.1, 0.1, 0.1, 0.3])
```

## Calibrating with silence


Before processing live audio, feed a few frames of known silence to seed
the adaptive normalization.  This improves accuracy, especially in the
first few frames.

```python
sr = 16000.0
frame_len = 320  # 20 ms at 16 kHz
silence = np.zeros(frame_len)

for _ in range(10):
    detector.calibrate(silence, sample_rate=sr)
```

## Frame-by-frame processing


`pyminidsp.VAD.process_frame` processes a single frame and returns
a ``(decision, score, features)`` tuple.

```python
frame = md.sine_wave(frame_len, amplitude=1.0, freq=1000.0, sample_rate=sr)

decision, score, features = detector.process_frame(frame, sr)

print(f"Decision: {'speech' if decision else 'silence'}")
print(f"Score:    {score:.3f}")
print(f"Features: {features}")
```

- **decision** — ``1`` for speech, ``0`` for silence.
- **score** — weighted combination of normalized features in [0.0, 1.0].
- **features** — float64 array of length 5 with normalized feature values.


## Batch processing


`pyminidsp.VAD.process` segments a signal into non-overlapping
frames and processes each one, returning arrays.

```python
# 1 second of signal at 16 kHz
signal = md.sine_wave(16000, amplitude=1.0, freq=1000.0, sample_rate=sr)

decisions, scores, features = detector.process(signal, sr, frame_len=320)

print(f"Frames processed: {len(decisions)}")
print(f"Speech frames:    {decisions.sum()}")
print(f"Features shape:   {features.shape}")  # (50, 5)
```

## End-to-end example


The following example creates a synthetic signal with two speech-like
bursts separated by silence, runs the VAD, and prints per-frame results:

```python
import numpy as np
import pyminidsp as md

sr = 16000.0
frame_len = 320  # 20 ms

# Build signal: silence → tone → silence → tone → silence
seg = int(0.3 * sr)  # 300 ms segments
signal = np.concatenate([
    np.zeros(seg),
    md.sine_wave(seg, amplitude=0.8, freq=1000.0, sample_rate=sr),
    np.zeros(seg),
    md.sine_wave(seg, amplitude=0.8, freq=1000.0, sample_rate=sr),
    np.zeros(seg),
])

detector = md.VAD()

# Calibrate with leading silence
for i in range(10):
    frame = signal[i * frame_len:(i + 1) * frame_len]
    detector.calibrate(frame, sample_rate=sr)

# Process
decisions, scores, features = detector.process(signal, sr, frame_len)

for i in range(len(decisions)):
    t = (i * frame_len + frame_len / 2) / sr
    label = "SPEECH" if decisions[i] else "silence"
    print(f"  {t:6.3f} s  score={scores[i]:.3f}  {label}")
```

## Visualisation


The interactive plot below shows the VAD processing the synthetic signal
from the example above.  Four panels display: the per-frame peak
envelope, all five normalized features, the combined score against the
threshold (dashed red line), and the final binary decision.

## Tuning parameters


The default parameters work well for clean speech at 16 kHz.  For noisy
environments, you may need to adjust:

- **threshold** (default 0.5) — lower values increase sensitivity.
- **onset_frames** (default 3) — more frames needed to confirm speech.
- **hangover_frames** (default 15) — how long to hold the speech state
  after activity drops.
- **adaptation_rate** (default 0.01) — EMA learning rate for
  normalization.  Lower values track slower-changing environments.
- **band_low_hz / band_high_hz** (default 300–3400 Hz) — frequency band
  for the band energy ratio feature.
- **weights** (default 0.2 each) — per-feature weights.  Weight energy
  heavily for clean environments, or spectral entropy for noisy ones.

---

# Audio Steganography


Hide secret messages or binary data within audio signals so that casual
listeners hear only the original sound, while decoders can extract the
hidden payload.


## Three methods


| Method | Capacity | Audibility | Robustness | Requirement |
|---|---|---|---|---|
| **LSB** | ~1 bit/sample (~16 KB / 3 s @ 44.1 kHz) | Inaudible (≈ −90 dB) | Fragile (destroyed by lossy compression, resampling) | Any sample rate |
| **Frequency-band** | ~2.6 kbit/s (~121 bytes / 3 s @ 44.1 kHz) | Above most listeners' hearing | Moderate (survives mild noise) | sample_rate ≥ 40 kHz |
| **Spectrogram text** | ~1 bit/sample (same as LSB) | Audible as buzzy tones; visually readable in spectrogram | Fragile (same as LSB) | Any sample rate |

**LSB** flips the least-significant bit of a 16-bit PCM representation —
distortion ≈ −90 dB.  Best for lossless pipelines (WAV, FLAC).

**Frequency-band** encodes data as brief BFSK tone bursts at 18.5 kHz
(bit 0) or 19.5 kHz (bit 1).  Choose this when light interference is
expected.

**Spectrogram text** is a hybrid method that hides data via LSB encoding
*and* renders the message as readable text in a spectrogram view.  The
message is rasterised with a built-in bitmap font, and sine waves at
corresponding frequencies produce visible characters when viewed with
a spectrogram analyser.

**SEEALSO**:
:doc:`spectrogram-text`
   Detailed guide on the spectrogram text art synthesis function.

`miniDSP C library — Audio Steganography <https://wooters.github.io/miniDSP/audio-steganography.html>`_
   Upstream C library documentation with algorithm details and C-level examples.


## Message structure


All three methods prepend a **32-bit little-endian header**: bits 0–30
hold the byte count, bit 31 indicates payload type (0 = text, 1 = binary).
This lets the decoder recover messages without prior knowledge of length.


## Hiding text


```python
import pyminidsp as md

host = md.sine_wave(44100, amplitude=0.8, freq=440.0, sample_rate=44100.0)
stego, n = md.steg_encode(host, "secret message",
                           sample_rate=44100.0, method=md.STEG_LSB)
print(f"Encoded {n} bytes")
```

**Listen** — compare the host signal and the stego outputs:

*Original host (440 Hz sine):*

*LSB-encoded (sounds identical):*

*Frequency-band encoded (faint high-frequency tones):*

Spectrogram text encoding works the same way — just pass
``method=md.STEG_SPECTEXT``:

```python
stego_st, n = md.steg_encode(host, "HELLO",
                              sample_rate=44100.0, method=md.STEG_SPECTEXT)
print(f"Encoded {n} bytes (visible in spectrogram)")
```

## Recovering text


```python
recovered = md.steg_decode(stego, sample_rate=44100.0, method=md.STEG_LSB)
print(recovered)  # "secret message"

# Recover from spectrogram-text encoded signal
recovered_st = md.steg_decode(stego_st, sample_rate=44100.0, method=md.STEG_SPECTEXT)
print(recovered_st)  # "HELLO"
```

## Binary data


```python
data = b"\x00\x01\x02\xff\xfe\xfd"
stego, n = md.steg_encode_bytes(host, data, sample_rate=44100.0)
recovered = md.steg_decode_bytes(stego, sample_rate=44100.0)
assert recovered == data
```

## Automatic detection


```python
method, payload_type = md.steg_detect(stego, sample_rate=44100.0)
if method is not None:
    names = {md.STEG_LSB: "LSB", md.STEG_FREQ_BAND: "Freq-band",
             md.STEG_SPECTEXT: "Spectrogram-text"}
    print(f"Method: {names[method]}")
    print(f"Type: {'text' if payload_type == md.STEG_TYPE_TEXT else 'binary'}")
```

## Capacity check


```python
cap = md.steg_capacity(44100, sample_rate=44100.0, method=md.STEG_LSB)
print(f"Can hide up to {cap} bytes")

md.shutdown()
```

---