Voice Activity Detection
16000 Hz sample rate | frame size 256 | duration 2.0 s