miniDSP
A small C library for audio DSP
Loading...
Searching...
No Matches
Audio Steganography

Steganography is the practice of hiding a secret message inside an innocuous-looking cover medium. Audio steganography hides data inside an audio signal so that a casual listener hears only the original sound, while a decoder can extract the hidden payload.

miniDSP provides three complementary methods in src/minidsp_steg.c, demonstrated in tools/audio_steg/audio_steg.c:

Method Identifier Capacity Robustness Audibility
LSB (Least Significant Bit) MD_STEG_LSB High (~1 bit/sample) Fragile Inaudible (~-90 dB)
Frequency-band (BFSK) MD_STEG_FREQ_BAND Lower (~2.6 kbit/s) Moderate Near-inaudible (ultrasonic)
Spectrogram text (hybrid) MD_STEG_SPECTEXT ~4 chars/sec visual Fragile (like LSB) Inaudible (ultrasonic)

Build and run the self-test from the repository root:

make -C tools/audio_steg
cd tools/audio_steg && ./audio_steg

Message framing

Both methods prepend a 32-bit little-endian header before the payload. Bits 0–30 hold the message byte count; bit 31 is a payload type flag (0 = text, 1 = binary). This allows the decoder to recover the message without knowing its length in advance, and enables MD_steg_detect() to identify the payload type:

[ bit 31: type flag | bits 0-30: msg_len (LE) ] [ 8 * msg_len bits: payload ]

Each bit of the header and payload is encoded independently using the chosen method. Bits within each byte are transmitted LSB-first.


Method 1: Least Significant Bit (LSB)

The idea

Audio samples are typically stored as 16-bit integers (-32768 to +32767). The least significant bit of each sample contributes only ±1 to a range of 65536 — a change of about -90 dB relative to full scale. By replacing the LSB of each sample with a message bit, we embed data that is completely inaudible.

Signal model

The host signal \(x[n] \in [-1, 1]\) is quantised to 16-bit PCM:

\[p[n] = \mathrm{round}(x[n] \times 32767) \]

The LSB is then overwritten with message bit \(b_k\):

\[p'[n] = (p[n] \mathbin{\&} \sim 1) \mathbin{|} b_k \]

and the stego sample is converted back to double:

\[y[n] = p'[n] \;/\; 32767 \]

The maximum distortion per sample is:

\[|y[n] - x[n]| \leq \frac{1}{32767} \approx 3.05 \times 10^{-5} \]

Reading the formula in C:

// x[n] -> host[i], p[n] -> pcm, b_k -> bit, y[n] -> output[i]
int pcm = (int)(host[i] * 32767.0); // quantise to 16-bit
pcm = (pcm & ~1) | bit; // overwrite LSB
output[i] = (double)pcm / 32767.0; // convert back

Capacity

One bit per sample, minus the 32-bit header:

\[C_{\text{LSB}} = \frac{N - 32}{8} \text{ bytes} \]

where \(N\) is the signal length in samples.

Reading the formula in C:

// N -> signal_len, C_LSB -> capacity
unsigned capacity = (signal_len - 32) / 8;

For a 3-second signal at 44.1 kHz ( \(N = 132300\)):

\[C_{\text{LSB}} = \frac{132300 - 32}{8} = 16533 \text{ bytes} \approx 16 \text{ KB} \]

LSB capacity at 44.1 kHz by audio duration:

Audio duration Samples Max payload Equivalent
1 s 44 100 5 508 B ~5 KB (small config file)
5 s 220 500 27 558 B ~27 KB (web thumbnail)
30 s 1 323 000 165 371 B ~161 KB (high-res photo)
1 min 2 646 000 330 746 B ~323 KB (short PDF)
5 min 13 230 000 1 653 746 B ~1.6 MB (multi-page document)
10 min 26 460 000 3 307 496 B ~3.2 MB (zip archive)
30 min 79 380 000 9 922 496 B ~9.5 MB (high-res image set)
1 hour 158 760 000 19 844 996 B ~18.9 MB (small software package)

Listening comparison

Original host signal (440 Hz sine, 3 seconds):

After LSB encoding (message hidden inside):

The two are perceptually identical. The difference signal (host minus stego) is pure quantisation noise at -90 dB:

Trade-offs

Advantage Disadvantage
Very high capacity Destroyed by any lossy compression (MP3, AAC, Opus)
Zero audible distortion Destroyed by resampling or sample-rate conversion
Simple, fast implementation Destroyed by amplitude scaling or normalisation
Works at any sample rate Requires lossless transport (WAV, FLAC)

Method 2: Frequency-Band Modulation (BFSK)

The idea

Human hearing sensitivity falls off sharply above ~16 kHz, and most adults cannot hear tones above 18 kHz. By adding low-amplitude tones in the 18–20 kHz "near-ultrasonic" band, we can encode data that is effectively inaudible.

The encoding uses Binary Frequency-Shift Keying (BFSK): each bit is represented by a short burst ("chip") of a sinusoidal tone at one of two carrier frequencies.

Carrier frequencies

Bit value Carrier frequency
0 18500 Hz
1 19500 Hz

Both carriers are above the typical hearing threshold, and the 1 kHz separation provides reliable discrimination during decoding.

Chip duration

Each bit occupies a 3 ms chip — a burst of \(C\) samples:

\[C = \left\lfloor \frac{3.0 \times f_s}{1000} \right\rfloor \]

Reading the formula in C:

// C -> chip_samples, fs -> sample_rate
unsigned chip_samples = (unsigned)(3.0 * sample_rate / 1000.0);

At 44.1 kHz, \(C = 132\) samples per chip.

Encoding

For each bit \(b_k\), a sine burst at the selected carrier frequency is added to the host signal at amplitude \(A = 0.02\) (-34 dB):

\[y[n] = x[n] + A \sin\!\bigl(2\pi\, f_{b_k}\, (n - n_0) / f_s\bigr), \qquad n \in [n_0,\; n_0 + C) \]

where \(n_0 = k \cdot C\) is the start sample of chip \(k\) and \(f_{b_k}\) is 18500 Hz (bit 0) or 19500 Hz (bit 1).

Reading the formula in C:

// A -> TONE_AMP (0.02), f_bk -> freq, n0 -> start, fs -> sample_rate
// x[n] -> output[start+s] (already contains host), y[n] -> output[start+s]
for (unsigned s = 0; s < chip_samples; s++) {
double t = (double)s / sample_rate;
output[start + s] += 0.02 * sin(2.0 * M_PI * freq * t);
}

Decoding

Each chip is correlated against both carrier frequencies. The carrier with the larger absolute correlation determines the bit value:

\[r_f = \sum_{s=0}^{C-1} y[n_0 + s] \,\sin\!\bigl(2\pi\, f \, s / f_s\bigr) \]

\[b_k = \begin{cases} 1 & |r_{19500}| > |r_{18500}| \\ 0 & \text{otherwise} \end{cases} \]

Reading the formula in C:

// r_f -> corr_lo / corr_hi, y[n0+s] -> stego[start+s]
double corr_lo = 0.0, corr_hi = 0.0;
for (unsigned s = 0; s < chip_samples; s++) {
double t = (double)s / sample_rate;
corr_lo += stego[start + s] * sin(2.0 * M_PI * 18500.0 * t);
corr_hi += stego[start + s] * sin(2.0 * M_PI * 19500.0 * t);
}
unsigned bit = (fabs(corr_hi) > fabs(corr_lo)) ? 1 : 0;

Capacity

\[C_{\text{freq}} = \frac{\lfloor N / C \rfloor - 32}{8} \text{ bytes} \]

Reading the formula in C:

// N -> signal_len, C -> chip_samples
unsigned total_chips = signal_len / chip_samples;
unsigned capacity = (total_chips - 32) / 8;

At 44.1 kHz with a 3-second signal ( \(N = 132300\), \(C = 132\)):

\[C_{\text{freq}} = \frac{\lfloor 132300 / 132 \rfloor - 32}{8} = \frac{1002 - 32}{8} = 121 \text{ bytes} \]

Listening comparison

After frequency-band encoding (same host, message hidden via BFSK):

The added carriers at 18.5/19.5 kHz are above most listeners' hearing range.

Spectrogram showing the hidden BFSK signal above the 440 Hz host tone:

The faint horizontal bands near the top of the spectrogram are the BFSK carriers. The main 440 Hz tone dominates the audible range.

Trade-offs

Advantage Disadvantage
Survives mild additive noise Lower capacity than LSB
Frequency-domain robustness Requires sample_rate >= 40 kHz
Inaudible to most listeners May be audible to young listeners with excellent high-frequency hearing
Amenable to spectral analysis Vulnerable to low-pass filtering above 18 kHz

Method 3: Spectrogram Text (spectext)

The idea

What if a hidden message were visible to the human eye as well as recoverable by machine? The spectext method combines LSB data encoding (for reliable machine decode) with spectrogram text art in the 18–23.5 kHz ultrasonic band (for visual verification). Open the stego file in any spectrogram viewer and the message is spelled out in the high frequencies — while a listener hears nothing unusual.

The spectrogram art also acts as a tamper indicator: if the text is intact, the LSB data likely is too.

Encode pipeline

host.wav ──┐
(any SR) │
┌──────────────┐ ┌────────────────────┐
│ MD_resample()│────▶│ host @ 48 kHz │
│ to 48 kHz │ └────────┬───────────┘
└──────────────┘ │
┌───────────────────────────┐
│ MD_lowpass_brickwall() │
│ cutoff = original_SR / 2 │
│ (eliminates resampler │
│ spectral images) │
└────────────┬──────────────┘
"SECRET" ──┐ │
▼ │
┌──────────────────────┐ │
│MD_spectrogram_text() │ │
│ freq: 18–23.5 kHz │ │
│ 30 ms / column │ │
│ amplitude: 0.02 │ │
└──────────┬───────────┘ │
│ mix (add) │
▼ ▼
┌──────────────────────────┐
│ host + spectrogram art │
└────────────┬─────────────┘
│ LSB encode (last step)
stego.wav (48 kHz)

The spectrogram art is mixed into the host before LSB encoding, so the LSB bits remain undisturbed. Decode simply reads the LSB channel.

Automatic upsampling and spectral cleanup

The spectrogram art uses the 18–23.5 kHz band, which requires a Nyquist frequency of at least 23.5 kHz (sample rate >= 47 kHz). If the input host is below 48 kHz, the encoder automatically upsamples it using MD_resample(). After upsampling, MD_lowpass_brickwall() is applied at the original Nyquist frequency to eliminate any residual spectral images from the resampler's transition band. This ensures the 18–23.5 kHz band is completely clean before the spectrogram text is mixed in, so the hidden message is clearly readable in any spectrogram viewer. The output is always 48 kHz.

Fixed column width and capacity

Each character in the bitmap font is 8 columns wide (5 data + 3 spacing). Each column occupies a fixed 30 ms of audio, giving 240 ms per character:

\[C_{\text{spectext}} = \left\lfloor \frac{D}{0.24} \right\rfloor \text{ characters} \]

where \(D\) is the host signal duration in seconds.

Reading the formula in C:

// D -> duration_sec, C_spectext -> vis_chars
double duration_sec = (double)signal_len / sample_rate;
unsigned vis_chars = (unsigned)(duration_sec / 0.24);

Visual capacity by audio duration:

Audio duration Max visible chars
3 s 12
10 s 41
30 s 125
60 s 250

Frequency mapping and amplitude

The 7 rows of the 5x7 bitmap font are mapped linearly across the 18–23.5 kHz band. Row 0 (top of character) maps to 23.5 kHz; row 6 (bottom) maps to 18 kHz.

The spectrogram text is generated at full amplitude by MD_spectrogram_text() (normalised to 0.9 peak), then scaled to 0.02 (~-34 dB) before mixing. This is loud enough to be clearly visible in a spectrogram but completely inaudible — most adults cannot hear above 18 kHz.

Visual truncation

If the message is longer than the visual capacity, the spectrogram art shows only the first N characters that fit. The full message is always recoverable via the LSB data channel, which has much higher capacity (~5.5 KB/sec at 48 kHz). For binary payloads, the spectrogram art shows [BIN <N>B] as a label.

Listening comparison

After spectext encoding (same host, message "miniDSP" hidden via spectext):

The ultrasonic tones at 18–23.5 kHz are far above the audible range.

Spectrogram showing "miniDSP" rendered as text art in the ultrasonic band:

The host audio — a TIMIT sentence ("Don't ask me to carry an oily rag like that.") — is visible at the bottom of the spectrogram. The text "miniDSP" is rendered in the 18–23.5 kHz band near the top, using the 5x7 bitmap font from MD_spectrogram_text().

Trade-offs

Advantage Disadvantage
Human-readable visual watermark Lower visual capacity than LSB data capacity
Machine-readable round-trip via LSB Requires 48 kHz output (auto-upsampled)
Visual tamper indicator Destroyed by lossy compression (like LSB)
Completely inaudible Destroyed by low-pass filtering above 18 kHz

Embedding binary data

The string-based API (MD_steg_encode / MD_steg_decode) uses null-terminated C strings, which cannot represent binary data containing 0x00 bytes. To hide arbitrary binary payloads — images, compressed archives, cryptographic keys — use the byte-oriented API:

Visual demo

Space invader — a tiny 110x80 RGB PNG (332 bytes) hidden inside a short 440 Hz sine using LSB. The recovered image is bit-identical to the original:

Original
Space invader sprite
Recovered from audio
Recovered space invader
Stego audio (image hidden inside):

QR code — a 165x165 grayscale PNG (486 bytes) encoding this repository's URL, doubly encoded: data → QR → audio:

Original
QR code
Recovered from audio
Recovered QR code
Stego audio (QR hidden inside):

Minimum samples

For LSB encoding, each data byte requires 8 samples, plus a 32-bit header:

\[N_{\min} = 8L + 32 \]

where \(L\) is the data length in bytes.

Reading the formula in C:

// L -> data_len, N_min -> min_samples
unsigned min_samples = data_len * 8 + 32;

For a 332-byte PNG image: \(N_{\min} = 8 \times 332 + 32 = 2688\) samples — well under a second at any common sample rate.

Example: hiding an image in audio

Space invader — a tiny 110x80 RGB PNG (332 bytes) makes a good test payload. It is small enough to embed in a fraction of a second of audio using LSB:

# Encode the image (auto-generates a host signal)
./audio_steg --encode-image lsb space_invader.png -o steg_invader.wav
# Decode to recover the image
./audio_steg --decode-image lsb steg_invader.wav -o recovered_invader.png
# Verify
cmp space_invader.png recovered_invader.png && echo "Identical"

QR code — a 165x165 1-bit grayscale PNG (486 bytes) containing a URL to this repository. This is a "double encoding": data encoded as a QR code, then the QR code hidden inside audio:

./audio_steg --encode-image lsb minidsp_qr.png -o steg_qr.wav
./audio_steg --decode-image lsb steg_qr.wav -o recovered_qr.png
cmp minidsp_qr.png recovered_qr.png && echo "Identical"

Quick example — encode and decode a PNG in C:

#include "minidsp.h"
#include <stdio.h>
// Read the image file
FILE *fp = fopen("space_invader.png", "rb");
fseek(fp, 0, SEEK_END);
unsigned len = (unsigned)ftell(fp);
fseek(fp, 0, SEEK_SET);
unsigned char *img = malloc(len);
fread(img, 1, len, fp);
fclose(fp);
// Encode into audio
unsigned N = len * 8 + 32 + 1024; // payload + header + margin
double *host = malloc(N * sizeof(double));
double *stego = malloc(N * sizeof(double));
MD_sine_wave(host, N, 0.8, 440.0, 44100.0);
MD_steg_encode_bytes(host, stego, N, 44100.0, img, len, MD_STEG_LSB);
// Decode
unsigned char *recovered = malloc(len);
MD_steg_decode_bytes(stego, N, 44100.0, recovered, len, MD_STEG_LSB);
// recovered[0..len-1] == img[0..len-1]
A mini library of DSP (Digital Signal Processing) routines.
unsigned MD_steg_encode_bytes(const double *host, double *output, unsigned signal_len, double sample_rate, const unsigned char *data, unsigned data_len, int method)
Encode arbitrary binary data into a host audio signal.
void MD_sine_wave(double *output, unsigned N, double amplitude, double freq, double sample_rate)
Generate a sine wave.
unsigned MD_steg_decode_bytes(const double *stego, unsigned signal_len, double sample_rate, unsigned char *data_out, unsigned max_len, int method)
Decode binary data from a stego audio signal.
#define MD_STEG_LSB
Steganography method: least-significant-bit encoding.
Definition minidsp.h:1538

API

Capacity

unsigned MD_steg_capacity(unsigned signal_len, double sample_rate, int method);
unsigned MD_steg_capacity(unsigned signal_len, double sample_rate, int method)
Compute the maximum message length (in bytes) that can be hidden.

Returns the maximum number of message bytes that can be hidden.

Encode

unsigned MD_steg_encode(const double *host, double *output,
unsigned signal_len, double sample_rate,
const char *message, int method);
unsigned MD_steg_encode(const double *host, double *output, unsigned signal_len, double sample_rate, const char *message, int method)
Encode a secret message into a host audio signal.
Parameter Description
host Input host signal (not modified).
output Output stego signal (caller-allocated, same length).
signal_len Number of samples.
sample_rate Sample rate in Hz.
message Null-terminated secret message.
method MD_STEG_LSB, MD_STEG_FREQ_BAND, or MD_STEG_SPECTEXT.

Returns the number of message bytes encoded (0 on failure).

Decode

unsigned MD_steg_decode(const double *stego, unsigned signal_len,
double sample_rate,
char *message_out, unsigned max_msg_len,
int method);
unsigned MD_steg_decode(const double *stego, unsigned signal_len, double sample_rate, char *message_out, unsigned max_msg_len, int method)
Decode a secret message from a stego audio signal.
Parameter Description
stego The stego signal containing the hidden message.
signal_len Number of samples.
sample_rate Sample rate in Hz.
message_out Output buffer (caller-allocated, null-terminated on return).
max_msg_len Size of buffer including null terminator.
method MD_STEG_LSB, MD_STEG_FREQ_BAND, or MD_STEG_SPECTEXT.

Returns the number of message bytes decoded (0 if none found).

Encode bytes

unsigned MD_steg_encode_bytes(const double *host, double *output,
unsigned signal_len, double sample_rate,
const unsigned char *data, unsigned data_len,
int method);
Parameter Description
host Input host signal (not modified).
output Output stego signal (caller-allocated, same length).
signal_len Number of samples.
sample_rate Sample rate in Hz.
data Pointer to the binary data to hide.
data_len Length of data in bytes.
method MD_STEG_LSB, MD_STEG_FREQ_BAND, or MD_STEG_SPECTEXT.

Returns the number of data bytes encoded (0 on failure).

Decode bytes

unsigned MD_steg_decode_bytes(const double *stego, unsigned signal_len,
double sample_rate,
unsigned char *data_out, unsigned max_len,
int method);
Parameter Description
stego The stego signal containing the hidden data.
signal_len Number of samples.
sample_rate Sample rate in Hz.
data_out Output buffer for the decoded bytes (caller-allocated).
max_len Maximum number of bytes to write to buffer.
method MD_STEG_LSB, MD_STEG_FREQ_BAND, or MD_STEG_SPECTEXT.

Returns the number of data bytes decoded (0 if none found).

Detect

int MD_steg_detect(const double *signal, unsigned signal_len,
double sample_rate, int *payload_type_out);
int MD_steg_detect(const double *signal, unsigned signal_len, double sample_rate, int *payload_type_out)
Detect which steganography method (if any) was used to encode a signal.

Inspects a signal and determines which steganography method (if any) was used. Returns MD_STEG_LSB, MD_STEG_FREQ_BAND, MD_STEG_SPECTEXT, or -1 if no hidden payload is found. The optional payload_type_out receives MD_STEG_TYPE_TEXT (0) or MD_STEG_TYPE_BINARY (1).

Parameter Description
signal The signal to inspect.
signal_len Length of the signal in samples.
sample_rate Sample rate in Hz.
payload_type_out If non-null, receives the payload type flag.

How it works: The function probes the first 32 samples (LSB) or 32 BFSK chips (frequency-band) to extract the header. A header is considered valid when the decoded length is positive and fits the signal capacity. For BFSK, the average correlation must also exceed a minimum threshold to avoid false positives. If both methods claim a valid header, BFSK wins (harder to trigger by accident).

Quick example:

int payload_type;
int method = MD_steg_detect(signal, signal_len, 44100.0, &payload_type);
if (method == MD_STEG_LSB)
printf("LSB-encoded %s payload detected\n",
payload_type == MD_STEG_TYPE_BINARY ? "binary" : "text");
#define MD_STEG_TYPE_BINARY
Payload type flag: binary (raw byte buffer).
Definition minidsp.h:1561

Quick example

Encode and decode with LSB:

#include "minidsp.h"
double host[44100], stego[44100];
MD_sine_wave(host, 44100, 0.8, 440.0, 44100.0);
// Encode
unsigned n = MD_steg_encode(host, stego, 44100, 44100.0,
"secret message", MD_STEG_LSB);
// Decode
char recovered[256];
MD_steg_decode(stego, 44100, 44100.0, recovered, 256, MD_STEG_LSB);
printf("Hidden: %s\n", recovered); // "secret message"

Encode and decode with frequency band:

double host[132300], stego[132300]; // 3 s at 44.1 kHz
MD_sine_wave(host, 132300, 0.8, 440.0, 44100.0);
unsigned n = MD_steg_encode(host, stego, 132300, 44100.0,
"hidden!", MD_STEG_FREQ_BAND);
char recovered[256];
MD_steg_decode(stego, 132300, 44100.0, recovered, 256, MD_STEG_FREQ_BAND);
printf("Hidden: %s\n", recovered); // "hidden!"
#define MD_STEG_FREQ_BAND
Steganography method: near-ultrasonic frequency-band modulation (BFSK).
Definition minidsp.h:1540

Encode and decode with spectrogram text (spectext):

double host[132300]; // 3 s at 44.1 kHz
MD_sine_wave(host, 132300, 0.8, 440.0, 44100.0);
// Output at 48 kHz — compute required buffer size
unsigned out_len = MD_resample_output_len(132300, 44100.0, 48000.0);
double *stego = malloc(out_len * sizeof(double));
MD_steg_encode(host, stego, 132300, 44100.0, "miniDSP", MD_STEG_SPECTEXT);
// Decode from the 48 kHz output
char recovered[256];
MD_steg_decode(stego, out_len, 48000.0, recovered, 256, MD_STEG_SPECTEXT);
printf("Hidden: %s\n", recovered); // "miniDSP"
// View stego in a spectrogram to see "miniDSP" in the 18-23.5 kHz band
free(stego);
unsigned MD_resample_output_len(unsigned input_len, double in_rate, double out_rate)
Compute the output buffer size needed for resampling.
#define MD_STEG_SPECTEXT
Steganography method: hybrid LSB + spectrogram text art.
Definition minidsp.h:1556

Example program

The tool tools/audio_steg/audio_steg.c provides a command-line program for encoding and decoding steganographic messages and binary data in WAV files.

Self-test (no arguments):

static int self_test(void)
{
const double sr = 44100.0;
const unsigned N = (unsigned)(sr * 3.0); /* 3 seconds */
const char *secret = "The quick brown fox jumps over the lazy dog.";
double *host = malloc(N * sizeof(double));
double *stego = malloc(N * sizeof(double));
char recovered[256];
if (!host || !stego) {
fprintf(stderr, "allocation failed\n");
free(stego); free(host);
return 1;
}
MD_sine_wave(host, N, 0.8, 440.0, sr);
int pass = 1;
/* --- LSB test --- */
printf("=== LSB steganography ===\n");
printf(" Host: 3 s sine wave at 440 Hz, %.0f Hz sample rate\n", sr);
printf(" Capacity: %u bytes\n",
printf(" Message (%zu bytes): \"%s\"\n", strlen(secret), secret);
unsigned enc_lsb = MD_steg_encode(host, stego, N, sr,
secret, MD_STEG_LSB);
printf(" Encoded: %u bytes\n", enc_lsb);
unsigned dec_lsb = MD_steg_decode(stego, N, sr,
recovered, sizeof(recovered),
printf(" Decoded: %u bytes -> \"%s\"\n", dec_lsb, recovered);
if (dec_lsb != enc_lsb || strcmp(recovered, secret) != 0) {
printf(" LSB FAILED: decoded message does not match!\n");
pass = 0;
} else {
/* Compute distortion. */
double max_diff = 0.0;
for (unsigned i = 0; i < N; i++) {
double d = fabs(host[i] - stego[i]);
if (d > max_diff) max_diff = d;
}
printf(" Max distortion: %.2e (%.1f dB)\n",
max_diff, 20.0 * log10(max_diff + 1e-30));
printf(" LSB PASSED\n");
}
/* --- Frequency-band test --- */
printf("\n=== Frequency-band steganography (BFSK) ===\n");
printf(" Host: 3 s sine wave at 440 Hz, %.0f Hz sample rate\n", sr);
printf(" Capacity: %u bytes\n",
printf(" Message (%zu bytes): \"%s\"\n", strlen(secret), secret);
unsigned enc_freq = MD_steg_encode(host, stego, N, sr,
printf(" Encoded: %u bytes\n", enc_freq);
unsigned dec_freq = MD_steg_decode(stego, N, sr,
recovered, sizeof(recovered),
printf(" Decoded: %u bytes -> \"%s\"\n", dec_freq, recovered);
if (dec_freq != enc_freq || strcmp(recovered, secret) != 0) {
printf(" Frequency-band FAILED: decoded message does not match!\n");
pass = 0;
} else {
printf(" Frequency-band PASSED\n");
}
/* --- Spectext test (uses 48 kHz output) --- */
printf("\n=== Spectrogram text steganography (spectext) ===\n");
const char *spec_secret = "miniDSP";
printf(" Host: 3 s sine wave at 440 Hz, %.0f Hz sample rate\n", sr);
printf(" Capacity: %u chars\n",
printf(" Message (%zu bytes): \"%s\"\n", strlen(spec_secret), spec_secret);
/* Spectext may upsample to 48 kHz — allocate for larger output. */
unsigned spec_out_len = MD_resample_output_len(N, sr, 48000.0);
double *stego_spec = malloc(spec_out_len * sizeof(double));
if (!stego_spec) {
fprintf(stderr, "allocation failed\n");
free(stego); free(host);
return 1;
}
unsigned enc_spec = MD_steg_encode(host, stego_spec, N, sr,
spec_secret, MD_STEG_SPECTEXT);
printf(" Encoded: %u bytes (output: %u samples at 48 kHz)\n",
enc_spec, spec_out_len);
memset(recovered, 0, sizeof(recovered));
unsigned dec_spec = MD_steg_decode(stego_spec, spec_out_len, 48000.0,
recovered, sizeof(recovered),
printf(" Decoded: %u bytes -> \"%s\"\n", dec_spec, recovered);
if (dec_spec != enc_spec || strcmp(recovered, spec_secret) != 0) {
printf(" Spectext FAILED: decoded message does not match!\n");
pass = 0;
} else {
printf(" Spectext PASSED\n");
}
free(stego_spec);
if (pass)
printf("\nSelf-test PASSED: all methods recovered the message.\n");
else
printf("\nSelf-test FAILED.\n");
free(stego);
free(host);
return pass ? 0 : 1;
}

Encode a message into a WAV file:

static int encode_wav(int method, const char *message,
const char *infile, const char *outfile)
{
double *host = NULL;
unsigned signal_len;
unsigned samprate;
if (infile) {
if (read_wav_to_double(infile, &host, &signal_len, &samprate) != 0)
return 1;
} else {
/* Generate a default host signal: 3 s sine at 44.1 kHz. */
samprate = 44100;
signal_len = samprate * 3;
host = malloc(signal_len * sizeof(double));
if (!host) return 1;
MD_sine_wave(host, signal_len, 0.8, 440.0, (double)samprate);
printf("No host file specified; using 3 s, 440 Hz sine at %u Hz\n",
samprate);
}
if (method == MD_STEG_FREQ_BAND && samprate < 40000) {
fprintf(stderr,
"Frequency-band method requires sample rate >= 40 kHz "
"(file is %u Hz).\n"
"Use a 44.1 kHz or 48 kHz host, or use the LSB method.\n",
samprate);
free(host);
return 1;
}
unsigned capacity = MD_steg_capacity(signal_len, (double)samprate, method);
unsigned msg_len = (unsigned)strlen(message);
printf("Method: %s | Capacity: %u bytes | Message: %u bytes\n",
method_name(method), capacity, msg_len);
if (msg_len > capacity) {
fprintf(stderr,
"Message too long (%u bytes) for host signal capacity (%u bytes).\n"
"Use a longer host signal or a shorter message.\n",
msg_len, capacity);
free(host);
return 1;
}
/* Spectext outputs at 48 kHz — allocate for the larger signal. */
unsigned out_len = signal_len;
unsigned out_sr = samprate;
if (method == MD_STEG_SPECTEXT) {
if (samprate < 48000) {
out_len = MD_resample_output_len(signal_len, (double)samprate,
48000.0);
}
out_sr = 48000;
}
double *stego = malloc(out_len * sizeof(double));
if (!stego) { free(host); return 1; }
unsigned encoded = MD_steg_encode(host, stego, signal_len,
(double)samprate, message, method);
int ret = write_double_as_wav(outfile, stego, out_len, out_sr);
if (ret == 0)
printf("Encoded %u bytes -> %s (%u samples, %.3f s, %u Hz)\n",
encoded, outfile, out_len,
(double)out_len / (double)out_sr, out_sr);
else
fprintf(stderr, "Error writing %s\n", outfile);
free(stego);
free(host);
return ret;
}

Decode a message from a WAV file:

static int decode_wav(int method, const char *infile)
{
double *stego = NULL;
unsigned signal_len;
unsigned samprate;
if (read_wav_to_double(infile, &stego, &signal_len, &samprate) != 0)
return 1;
printf("Read %s: %u samples at %u Hz (%.3f s)\n",
infile, signal_len, samprate,
(double)signal_len / (double)samprate);
char message[4096];
unsigned decoded = MD_steg_decode(stego, signal_len, (double)samprate,
message, sizeof(message), method);
free(stego);
if (decoded == 0) {
printf("No hidden message found (method: %s).\n", method_name(method));
return 1;
}
printf("\nDecoded %u bytes (method: %s):\n \"%s\"\n",
decoded, method_name(method), message);
return 0;
}

Encode a binary file (e.g. image) into a WAV file:

static int encode_image_wav(int method, const char *image_path,
const char *infile, const char *outfile)
{
/* Read the image file into memory. */
FILE *fp = fopen(image_path, "rb");
if (!fp) {
fprintf(stderr, "Cannot open image file: %s\n", image_path);
return 1;
}
fseek(fp, 0, SEEK_END);
long fsize = ftell(fp);
fseek(fp, 0, SEEK_SET);
if (fsize <= 0 || (unsigned long)fsize > UINT_MAX) {
fprintf(stderr, "Invalid file size: %s\n", image_path);
fclose(fp);
return 1;
}
unsigned data_len = (unsigned)fsize;
unsigned char *data = malloc(data_len);
if (!data) { fclose(fp); return 1; }
if (fread(data, 1, data_len, fp) != data_len) {
fprintf(stderr, "Failed to read %s\n", image_path);
free(data);
fclose(fp);
return 1;
}
fclose(fp);
printf("Image: %s (%u bytes)\n", image_path, data_len);
double *host = NULL;
unsigned signal_len;
unsigned samprate;
if (infile) {
if (read_wav_to_double(infile, &host, &signal_len, &samprate) != 0) {
free(data);
return 1;
}
} else {
/* Generate a host signal sized for the payload. */
samprate = 44100;
unsigned min_samples = data_len * 8 + HEADER_BITS;
if (method == MD_STEG_FREQ_BAND) {
unsigned cs = (unsigned)(3.0 * samprate / 1000.0);
min_samples = (data_len * 8 + HEADER_BITS) * cs;
}
unsigned half_sec = samprate / 2;
if (min_samples < half_sec)
min_samples = half_sec;
signal_len = min_samples + min_samples / 10; /* 10% margin */
host = malloc(signal_len * sizeof(double));
if (!host) { free(data); return 1; }
MD_sine_wave(host, signal_len, 0.8, 440.0, (double)samprate);
printf("Generated host: %u samples (%.3f s) at %u Hz\n",
signal_len, (double)signal_len / (double)samprate, samprate);
}
if (method == MD_STEG_FREQ_BAND && samprate < 40000) {
fprintf(stderr,
"Frequency-band method requires sample rate >= 40 kHz "
"(got %u Hz).\n", samprate);
free(host);
free(data);
return 1;
}
unsigned capacity = MD_steg_capacity(signal_len, (double)samprate, method);
printf("Method: %s | Capacity: %u bytes | Payload: %u bytes\n",
method_name(method), capacity, data_len);
if (data_len > capacity) {
fprintf(stderr,
"Payload too large (%u bytes) for host capacity (%u bytes).\n",
data_len, capacity);
free(host);
free(data);
return 1;
}
/* Spectext outputs at 48 kHz — allocate for the larger signal. */
unsigned out_len = signal_len;
unsigned out_sr = samprate;
if (method == MD_STEG_SPECTEXT) {
if (samprate < 48000) {
out_len = MD_resample_output_len(signal_len, (double)samprate,
48000.0);
}
out_sr = 48000;
}
double *stego = malloc(out_len * sizeof(double));
if (!stego) { free(host); free(data); return 1; }
unsigned encoded = MD_steg_encode_bytes(host, stego, signal_len,
(double)samprate,
data, data_len, method);
int ret = write_double_as_wav(outfile, stego, out_len, out_sr);
if (ret == 0)
printf("Encoded %u bytes -> %s (%u samples, %.3f s, %u Hz)\n",
encoded, outfile, out_len,
(double)out_len / (double)out_sr, out_sr);
else
fprintf(stderr, "Error writing %s\n", outfile);
free(stego);
free(host);
free(data);
return ret;
}

Decode a binary file from a WAV file:

static int decode_image_wav(int method, const char *infile,
const char *outfile)
{
double *stego = NULL;
unsigned signal_len;
unsigned samprate;
if (read_wav_to_double(infile, &stego, &signal_len, &samprate) != 0)
return 1;
printf("Read %s: %u samples at %u Hz (%.3f s)\n",
infile, signal_len, samprate,
(double)signal_len / (double)samprate);
/* Allocate a buffer large enough for the maximum capacity (capped at
* 16 MB to avoid unbounded allocation from corrupted headers). */
unsigned capacity = MD_steg_capacity(signal_len, (double)samprate, method);
if (capacity > 16u * 1024 * 1024)
capacity = 16u * 1024 * 1024;
unsigned char *buf = malloc(capacity > 0 ? capacity : 1);
if (!buf) { free(stego); return 1; }
unsigned decoded = MD_steg_decode_bytes(stego, signal_len,
(double)samprate,
buf, capacity, method);
free(stego);
if (decoded == 0) {
printf("No hidden data found (method: %s).\n", method_name(method));
free(buf);
return 1;
}
printf("Decoded %u bytes (method: %s)\n", decoded, method_name(method));
FILE *fp = fopen(outfile, "wb");
if (!fp) {
fprintf(stderr, "Cannot open output file: %s\n", outfile);
free(buf);
return 1;
}
fwrite(buf, 1, decoded, fp);
fclose(fp);
free(buf);
printf("Written to %s\n", outfile);
return 0;
}

Auto-detect and decode (no method needed):

static int auto_decode_wav(const char *infile)
{
double *stego = NULL;
unsigned signal_len;
unsigned samprate;
if (read_wav_to_double(infile, &stego, &signal_len, &samprate) != 0)
return 1;
printf("Read %s: %u samples at %u Hz (%.3f s)\n",
infile, signal_len, samprate,
(double)signal_len / (double)samprate);
int payload_type = -1;
int method = MD_steg_detect(stego, signal_len, (double)samprate,
&payload_type);
if (method < 0) {
printf("No hidden payload detected.\n");
free(stego);
return 1;
}
printf("Detected: %s method, %s payload\n",
method_name(method),
payload_type == MD_STEG_TYPE_BINARY ? "binary" : "text");
if (payload_type == MD_STEG_TYPE_BINARY) {
/* Probe the payload size without a full decode. */
unsigned capacity = MD_steg_capacity(signal_len, (double)samprate,
method);
if (capacity > 16u * 1024 * 1024)
capacity = 16u * 1024 * 1024;
unsigned char *buf = malloc(capacity > 0 ? capacity : 1);
if (!buf) { free(stego); return 1; }
unsigned decoded = MD_steg_decode_bytes(stego, signal_len,
(double)samprate,
buf, capacity, method);
free(buf);
free(stego);
printf("Binary payload: %u bytes\n", decoded);
printf("Use --decode-image %s %s -o OUTPUT to extract.\n",
method_cli_name(method), infile);
return 0;
}
/* Text payload — decode and print. */
char message[4096];
unsigned decoded = MD_steg_decode(stego, signal_len, (double)samprate,
message, sizeof(message), method);
free(stego);
if (decoded == 0) {
printf("Decode returned 0 bytes.\n");
return 1;
}
printf("\nDecoded %u bytes:\n \"%s\"\n", decoded, message);
return 0;
}

Usage:

# Self-test (encode + decode with both methods, verify round-trip)
./audio_steg
# Encode a text message using LSB into a default 440 Hz host
./audio_steg --encode lsb "my secret" -o stego.wav
# Encode using frequency-band into an existing WAV host
./audio_steg --encode freq "hidden" -i music.wav -o stego.wav
# Auto-detect and decode (just pass the file)
./audio_steg stego.wav
# Decode with explicit method
./audio_steg --decode lsb stego.wav
./audio_steg --decode freq stego.wav
# Decode without specifying method (auto-detect)
./audio_steg --decode stego.wav
# Encode using spectext (hybrid LSB + spectrogram art)
./audio_steg --encode spectext "miniDSP" -i music.wav -o stego.wav
# Encode a binary file (image) using LSB
./audio_steg --encode-image lsb space_invader.png -o steg_invader.wav
# Decode a binary file (auto-detect method)
./audio_steg --decode-image steg_invader.wav -o recovered.png

Choosing a method

Criterion LSB Frequency-band Spectext
Message size Up to ~16 KB/s of audio Up to ~121 B per 3 s ~4 chars/s (visual); LSB capacity for data
Audio quality Imperceptible (-90 dB) Near-imperceptible (-34 dB) Imperceptible (ultrasonic, -34 dB)
Survives lossy compression No No (but tolerates noise) No
Survives additive noise No (bit errors) Yes (mild noise) No (LSB channel)
Sample rate requirement Any >= 40 kHz Output always 48 kHz
Visual verification No No (spectrogram shows carriers) Yes — text readable in spectrogram
Best for Lossless pipelines (WAV/FLAC) Light interference environments Visual watermarking + machine decode

For maximum capacity and fidelity in lossless pipelines, use LSB. For slightly more robust hiding in near-ultrasonic bands, use frequency-band. For a human-readable visual watermark with machine-readable data recovery, use spectext.