AudioFrameProcessor interface

Per-frame audio transform. Same shape as VideoFrameProcessor with audio frames. Used for noise cancellation, audio FX, mixing, file playback (as mic), TTS, and source-style processors.

Attach via VideoSDK.applyAudioProcessor(). Sticky — set once, applies to every microphone stream of the local participant.

Blueprint

Single-method contract. No lifecycle hooks. App owns its own resources (DSP models, decoded audio buffers, workers) — the SDK never touches them.

interface AudioFrameProcessor { // Called once per captured audio frame. Return the (possibly modified) frame. process(frame: AudioData): AudioData | Promise<AudioData>; }

App owns all resource lifecycle. Load DSP models, decode audio files, allocate AudioWorklet buffers before creating the processor. Clean up your own resources after removeAudioProcessor(); the SDK doesn't manage them.

Members

process

Called once per captured audio frame. Return an AudioData — that's what gets encoded and published. Can be sync or async.

process(frame: AudioData): AudioData | Promise<AudioData>

For source-style processors (file playback, TTS, generative): ignore the input frame and return generated frames.

Frame handling. Call frame.close() after reading samples. Return a new AudioData for output.

Latency. Audio is real-time. Aim to keep process well under 10ms per frame. Heavy DSP should run in an AudioWorklet thread, not on the main thread.

When the SDK stops calling `process`

Trigger	What happens
`VideoSDK.removeAudioProcessor()`	SDK stops calling `process`. Frames flow mic → encoder unmodified.
`VideoSDK.applyAudioProcessor(other)`	SDK stops calling the old processor and starts calling the new one on the next frame.
`me.unpublishAudio()`	Microphone stream ends — no more frames to process.
`room.leave()`	All local streams stop. Processor receives no further frames.
Mic unplugged / permission revoked	Underlying source ends. SDK stops calling `process`.

Examples

Noise suppression — RNNoise (app owns the model)

// App-owned setup
const rnnoise = await loadRNNoise();
const inputBuf  = new Float32Array(480);
const outputBuf = new Float32Array(480);

const noiseSuppress = {
  process(frame) {
    frame.copyTo(inputBuf, { planeIndex: 0, format: 'f32-planar' });
    rnnoise.process(inputBuf, outputBuf);

    const cleaned = new AudioData({
      format:           'f32-planar',
      sampleRate:       frame.sampleRate,
      numberOfFrames:   frame.numberOfFrames,
      numberOfChannels: 1,
      timestamp:        frame.timestamp,
      data:             outputBuf.buffer,
    });

    frame.close();
    return cleaned;
  },
};

await VideoSDK.applyAudioProcessor(noiseSuppress);

// Later
await VideoSDK.removeAudioProcessor();
rnnoise.dispose();   // app cleans its own resources

Mix mic with background music

const audioCtx  = new AudioContext();
const arrBuf    = await fetch('/audio/intro.mp3').then(r => r.arrayBuffer());
const musicBuf  = await audioCtx.decodeAudioData(arrBuf);
const musicData = musicBuf.getChannelData(0);
let cursor = 0;

const micPlusMusic = {
  process(frame) {
    const n   = frame.numberOfFrames;
    const mic = new Float32Array(n);
    const out = new Float32Array(n);

    frame.copyTo(mic, { planeIndex: 0, format: 'f32-planar' });
    for (let i = 0; i < n; i++) {
      out[i] = mic[i] * 0.7 + musicData[(cursor + i) % musicData.length] * 0.3;
    }
    cursor = (cursor + n) % musicData.length;

    frame.close();
    return new AudioData({
      format: 'f32-planar',
      sampleRate: frame.sampleRate,
      numberOfFrames: n,
      numberOfChannels: 1,
      timestamp: frame.timestamp,
      data: out.buffer,
    });
  },
};

await VideoSDK.applyAudioProcessor(micPlusMusic);

Source-style — file playback (ignore mic, send file)

const fileBuffer = await loadAudioFile('/audio/jingle.mp3');
const fileData   = fileBuffer.getChannelData(0);
let cursor = 0;

const filePlayback = {
  process(frame) {
    const n   = frame.numberOfFrames;
    const out = new Float32Array(n);

    for (let i = 0; i < n; i++) {
      out[i] = fileData[(cursor + i) % fileData.length];
    }
    cursor = (cursor + n) % fileData.length;

    frame.close();
    return new AudioData({
      format: 'f32-planar',
      sampleRate: frame.sampleRate,
      numberOfFrames: n,
      numberOfChannels: 1,
      timestamp: frame.timestamp,
      data: out.buffer,
    });
  },
};

await VideoSDK.applyAudioProcessor(filePlayback);

TTS source — synthetic agent voice (frame queue)

const queue = [];
const tts = openTtsStream({ voice: 'alice' });
tts.onFrame = (pcm) => queue.push(pcm);

const ttsSource = {
  process(frame) {
    const next = queue.shift();
    frame.close();
    return next ?? silentFrame(frame);
  },
};

await VideoSDK.applyAudioProcessor(ttsSource);
tts.speak('Hello, I am an AI agent.');

AudioFrameProcessor interface

Blueprint

Members

process

When the SDK stops calling process

Examples

When the SDK stops calling `process`