miniDSP
A small C library for audio DSP
Loading...
Searching...
No Matches
minidsp_vad.c File Reference

Voice Activity Detection (VAD) with adaptive feature normalization. More...

#include "minidsp.h"
#include "minidsp_internal.h"

Go to the source code of this file.

Macros

#define RANGE_FLOOR   1e-12

Functions

static double compute_spectral_entropy (const double *psd, unsigned num_bins)
 Spectral entropy: normalize PSD to a probability distribution, return -sum(p * log(p)) / log(num_bins).
static double compute_spectral_flatness (const double *psd, unsigned num_bins)
 Spectral flatness: geometric mean / arithmetic mean of PSD bins.
static double compute_band_energy_ratio (const double *psd, unsigned num_bins, double sample_rate, unsigned N, double band_low_hz, double band_high_hz)
 Band energy ratio: sum of PSD bins in [band_low_hz, band_high_hz] divided by total PSD sum.
static void update_normalization (MD_vad_state *state, const double *raw)
static void normalize_features (const MD_vad_state *state, const double *raw, double *norm_out)
static void extract_features (const double *signal, unsigned N, double sample_rate, double band_low_hz, double band_high_hz, double *raw_out)
void MD_vad_default_params (MD_vad_params *params)
 Populate a VAD params struct with optimized defaults.
void MD_vad_init (MD_vad_state *state, const MD_vad_params *params)
 Initialize VAD state from params.
void MD_vad_calibrate (MD_vad_state *state, const double *signal, unsigned N, double sample_rate)
 Feed a known-silence frame to seed the adaptive normalization.
int MD_vad_process_frame (MD_vad_state *state, const double *signal, unsigned N, double sample_rate, double *score_out, double *features_out)
 Process one audio frame and return a binary speech decision.

Detailed Description

Voice Activity Detection (VAD) with adaptive feature normalization.

Definition in file minidsp_vad.c.

Macro Definition Documentation

◆ RANGE_FLOOR

#define RANGE_FLOOR   1e-12

Definition at line 98 of file minidsp_vad.c.

Function Documentation

◆ compute_band_energy_ratio()

double compute_band_energy_ratio ( const double * psd,
unsigned num_bins,
double sample_rate,
unsigned N,
double band_low_hz,
double band_high_hz )
static

Band energy ratio: sum of PSD bins in [band_low_hz, band_high_hz] divided by total PSD sum.

Result in [0, 1].

Definition at line 73 of file minidsp_vad.c.

◆ compute_spectral_entropy()

double compute_spectral_entropy ( const double * psd,
unsigned num_bins )
static

Spectral entropy: normalize PSD to a probability distribution, return -sum(p * log(p)) / log(num_bins).

Result in [0, 1].

Definition at line 17 of file minidsp_vad.c.

◆ compute_spectral_flatness()

double compute_spectral_flatness ( const double * psd,
unsigned num_bins )
static

Spectral flatness: geometric mean / arithmetic mean of PSD bins.

Result in [0, 1]. 1.0 = white noise, 0.0 = pure tone.

Definition at line 44 of file minidsp_vad.c.

◆ extract_features()

void extract_features ( const double * signal,
unsigned N,
double sample_rate,
double band_low_hz,
double band_high_hz,
double * raw_out )
static

Definition at line 138 of file minidsp_vad.c.

◆ MD_vad_calibrate()

void MD_vad_calibrate ( MD_vad_state * state,
const double * signal,
unsigned N,
double sample_rate )

Feed a known-silence frame to seed the adaptive normalization.

Computes all five features and updates the EMA min/max estimates without running the state machine or producing a decision. Call this on several silence frames before processing live audio to improve initial normalization accuracy.

Parameters
stateVAD state (must be initialized).
signalFrame samples of length N.
NFrame length in samples (must be >= 2).
sample_rateSample rate in Hz (must be > 0).
// Calibrate on 10 frames of silence
double silence[256] = {0};
for (int i = 0; i < 10; i++)
MD_vad_calibrate(&st, silence, 256, 16000.0);
void MD_vad_calibrate(MD_vad_state *state, const double *signal, unsigned N, double sample_rate)
Feed a known-silence frame to seed the adaptive normalization.
See also
MD_vad_init(), MD_vad_process_frame()

Definition at line 208 of file minidsp_vad.c.

◆ MD_vad_default_params()

void MD_vad_default_params ( MD_vad_params * params)

Populate a VAD params struct with optimized defaults.

Default values (F2-optimized, recall-biased):

Parameter Value
weight (energy) 0.723068
weight (zcr) 0.063948
weight (entropy) 0.005964
weight (flatness) 0.048865
weight (band ratio) 0.158156
threshold 0.245332
onset_frames 1
hangover_frames 22
adaptation_rate 0.012755
band_low_hz 126.4
band_high_hz 2899.3
Note
These defaults were optimized via a 300-trial Optuna search on LibriVAD train-clean-100 (all noise types, all SNRs), maximizing F2 (beta=2). Baseline F2=0.837 improved to F2=0.933 (P=0.782, R=0.981). See the VAD tutorial guide for full methodology and per-condition results.
Parameters
paramsOutput params struct. Must not be NULL.
p.threshold = 0.4; // raise threshold for more precision
void MD_vad_default_params(MD_vad_params *params)
Populate a VAD params struct with optimized defaults.
Parameters for the VAD detector.
Definition minidsp.h:1803
double threshold
Decision threshold (0.0–1.0).
Definition minidsp.h:1805
See also
MD_vad_init(), MD_vad_process_frame()

Definition at line 166 of file minidsp_vad.c.

◆ MD_vad_init()

void MD_vad_init ( MD_vad_state * state,
const MD_vad_params * params )

Initialize VAD state from params.

If params is NULL, default params are used (equivalent to calling MD_vad_default_params() first). After initialization the detector is in the SILENCE state with all counters at zero.

Parameters
stateOutput state struct. Must not be NULL.
paramsParameters to copy, or NULL for defaults.
MD_vad_init(&st, NULL); // use defaults
void MD_vad_init(MD_vad_state *state, const MD_vad_params *params)
Initialize VAD state from params.
Internal state for the VAD detector.
Definition minidsp.h:1819
See also
MD_vad_default_params(), MD_vad_process_frame()

Definition at line 187 of file minidsp_vad.c.

◆ MD_vad_process_frame()

int MD_vad_process_frame ( MD_vad_state * state,
const double * signal,
unsigned N,
double sample_rate,
double * score_out,
double * features_out )

Process one audio frame and return a binary speech decision.

Processing pipeline:

  1. Extract five raw features (energy, ZCR, spectral entropy, spectral flatness, band energy ratio).
  2. Update adaptive normalization (EMA min/max).
  3. Normalize features to [0.0, 1.0].
  4. Compute weighted score:

    \[ S = \sum_{i=0}^{4} w_i \cdot \hat{f}_i \]

  5. Apply onset/hangover state machine.
Parameters
stateVAD state (must be initialized).
signalFrame samples of length N.
NFrame length in samples (must be >= 2).
sample_rateSample rate in Hz (must be > 0).
score_outIf non-NULL, receives the combined score.
features_outIf non-NULL, receives MD_VAD_NUM_FEATURES normalized feature values in [0.0, 1.0].
Returns
1 if speech detected, 0 if silence.
double frame[256];
double score;
double feats[MD_VAD_NUM_FEATURES];
// ... fill frame ...
int decision = MD_vad_process_frame(&st, frame, 256, 16000.0,
&score, feats);
#define MD_VAD_NUM_FEATURES
Total number of features.
Definition minidsp.h:1794
int MD_vad_process_frame(MD_vad_state *state, const double *signal, unsigned N, double sample_rate, double *score_out, double *features_out)
Process one audio frame and return a binary speech decision.
See also
MD_energy(), MD_zero_crossing_rate(), MD_power_spectral_density()

Definition at line 224 of file minidsp_vad.c.

◆ normalize_features()

void normalize_features ( const MD_vad_state * state,
const double * raw,
double * norm_out )
static

Definition at line 119 of file minidsp_vad.c.

◆ update_normalization()

void update_normalization ( MD_vad_state * state,
const double * raw )
static

Definition at line 100 of file minidsp_vad.c.