|
miniDSP
A small C library for audio DSP
|
Voice Activity Detection (VAD) with adaptive feature normalization. More...
Go to the source code of this file.
Macros | |
| #define | RANGE_FLOOR 1e-12 |
Functions | |
| static double | compute_spectral_entropy (const double *psd, unsigned num_bins) |
| Spectral entropy: normalize PSD to a probability distribution, return -sum(p * log(p)) / log(num_bins). | |
| static double | compute_spectral_flatness (const double *psd, unsigned num_bins) |
| Spectral flatness: geometric mean / arithmetic mean of PSD bins. | |
| static double | compute_band_energy_ratio (const double *psd, unsigned num_bins, double sample_rate, unsigned N, double band_low_hz, double band_high_hz) |
| Band energy ratio: sum of PSD bins in [band_low_hz, band_high_hz] divided by total PSD sum. | |
| static void | update_normalization (MD_vad_state *state, const double *raw) |
| static void | normalize_features (const MD_vad_state *state, const double *raw, double *norm_out) |
| static void | extract_features (const double *signal, unsigned N, double sample_rate, double band_low_hz, double band_high_hz, double *raw_out) |
| void | MD_vad_default_params (MD_vad_params *params) |
| Populate a VAD params struct with optimized defaults. | |
| void | MD_vad_init (MD_vad_state *state, const MD_vad_params *params) |
| Initialize VAD state from params. | |
| void | MD_vad_calibrate (MD_vad_state *state, const double *signal, unsigned N, double sample_rate) |
| Feed a known-silence frame to seed the adaptive normalization. | |
| int | MD_vad_process_frame (MD_vad_state *state, const double *signal, unsigned N, double sample_rate, double *score_out, double *features_out) |
| Process one audio frame and return a binary speech decision. | |
Voice Activity Detection (VAD) with adaptive feature normalization.
Definition in file minidsp_vad.c.
| #define RANGE_FLOOR 1e-12 |
Definition at line 98 of file minidsp_vad.c.
|
static |
Band energy ratio: sum of PSD bins in [band_low_hz, band_high_hz] divided by total PSD sum.
Result in [0, 1].
Definition at line 73 of file minidsp_vad.c.
|
static |
Spectral entropy: normalize PSD to a probability distribution, return -sum(p * log(p)) / log(num_bins).
Result in [0, 1].
Definition at line 17 of file minidsp_vad.c.
|
static |
Spectral flatness: geometric mean / arithmetic mean of PSD bins.
Result in [0, 1]. 1.0 = white noise, 0.0 = pure tone.
Definition at line 44 of file minidsp_vad.c.
|
static |
Definition at line 138 of file minidsp_vad.c.
| void MD_vad_calibrate | ( | MD_vad_state * | state, |
| const double * | signal, | ||
| unsigned | N, | ||
| double | sample_rate ) |
Feed a known-silence frame to seed the adaptive normalization.
Computes all five features and updates the EMA min/max estimates without running the state machine or producing a decision. Call this on several silence frames before processing live audio to improve initial normalization accuracy.
| state | VAD state (must be initialized). |
| signal | Frame samples of length N. |
| N | Frame length in samples (must be >= 2). |
| sample_rate | Sample rate in Hz (must be > 0). |
Definition at line 208 of file minidsp_vad.c.
| void MD_vad_default_params | ( | MD_vad_params * | params | ) |
Populate a VAD params struct with optimized defaults.
Default values (F2-optimized, recall-biased):
| Parameter | Value |
|---|---|
| weight (energy) | 0.723068 |
| weight (zcr) | 0.063948 |
| weight (entropy) | 0.005964 |
| weight (flatness) | 0.048865 |
| weight (band ratio) | 0.158156 |
| threshold | 0.245332 |
| onset_frames | 1 |
| hangover_frames | 22 |
| adaptation_rate | 0.012755 |
| band_low_hz | 126.4 |
| band_high_hz | 2899.3 |
| params | Output params struct. Must not be NULL. |
Definition at line 166 of file minidsp_vad.c.
| void MD_vad_init | ( | MD_vad_state * | state, |
| const MD_vad_params * | params ) |
Initialize VAD state from params.
If params is NULL, default params are used (equivalent to calling MD_vad_default_params() first). After initialization the detector is in the SILENCE state with all counters at zero.
| state | Output state struct. Must not be NULL. |
| params | Parameters to copy, or NULL for defaults. |
Definition at line 187 of file minidsp_vad.c.
| int MD_vad_process_frame | ( | MD_vad_state * | state, |
| const double * | signal, | ||
| unsigned | N, | ||
| double | sample_rate, | ||
| double * | score_out, | ||
| double * | features_out ) |
Process one audio frame and return a binary speech decision.
Processing pipeline:
\[ S = \sum_{i=0}^{4} w_i \cdot \hat{f}_i \]
| state | VAD state (must be initialized). |
| signal | Frame samples of length N. |
| N | Frame length in samples (must be >= 2). |
| sample_rate | Sample rate in Hz (must be > 0). |
| score_out | If non-NULL, receives the combined score. |
| features_out | If non-NULL, receives MD_VAD_NUM_FEATURES normalized feature values in [0.0, 1.0]. |
Definition at line 224 of file minidsp_vad.c.
|
static |
Definition at line 119 of file minidsp_vad.c.
|
static |
Definition at line 100 of file minidsp_vad.c.