AudioFrameProcessor interface

Per-frame audio transform. Same shape as VideoFrameProcessor with audio frames. Used for noise cancellation, audio FX, mixing, file playback (as mic), TTS, and source-style processors.

Attach via VideoSDK.applyAudioProcessor(). Sticky โ€” set once, applies to every microphone stream of the local participant.

Blueprint

Single-method contract. No lifecycle hooks. App owns its own resources (DSP models, decoded audio buffers, workers) โ€” the SDK never touches them.

interface AudioFrameProcessor { // Called once per captured audio frame. Return the (possibly modified) frame. process(frame: AudioData): AudioData | Promise<AudioData>; }
App owns all resource lifecycle. Load DSP models, decode audio files, allocate AudioWorklet buffers before creating the processor. Clean up your own resources after removeAudioProcessor(); the SDK doesn't manage them.

Members

process

Called once per captured audio frame. Return an AudioData โ€” that's what gets encoded and published. Can be sync or async.

process(frame: AudioData): AudioData | Promise<AudioData>

For source-style processors (file playback, TTS, generative): ignore the input frame and return generated frames.

Frame handling. Call frame.close() after reading samples. Return a new AudioData for output.

Latency. Audio is real-time. Aim to keep process well under 10ms per frame. Heavy DSP should run in an AudioWorklet thread, not on the main thread.

When the SDK stops calling process

TriggerWhat happens
VideoSDK.removeAudioProcessor()SDK stops calling process. Frames flow mic โ†’ encoder unmodified.
VideoSDK.applyAudioProcessor(other)SDK stops calling the old processor and starts calling the new one on the next frame.
me.unpublishAudio()Microphone stream ends โ€” no more frames to process.
room.leave()All local streams stop. Processor receives no further frames.
Mic unplugged / permission revokedUnderlying source ends. SDK stops calling process.

Examples

Noise suppression โ€” RNNoise (app owns the model)
// App-owned setup
const rnnoise = await loadRNNoise();
const inputBuf  = new Float32Array(480);
const outputBuf = new Float32Array(480);

const noiseSuppress = {
  process(frame) {
    frame.copyTo(inputBuf, { planeIndex: 0, format: 'f32-planar' });
    rnnoise.process(inputBuf, outputBuf);

    const cleaned = new AudioData({
      format:           'f32-planar',
      sampleRate:       frame.sampleRate,
      numberOfFrames:   frame.numberOfFrames,
      numberOfChannels: 1,
      timestamp:        frame.timestamp,
      data:             outputBuf.buffer,
    });

    frame.close();
    return cleaned;
  },
};

await VideoSDK.applyAudioProcessor(noiseSuppress);

// Later
await VideoSDK.removeAudioProcessor();
rnnoise.dispose();   // app cleans its own resources
Mix mic with background music
const audioCtx  = new AudioContext();
const arrBuf    = await fetch('/audio/intro.mp3').then(r => r.arrayBuffer());
const musicBuf  = await audioCtx.decodeAudioData(arrBuf);
const musicData = musicBuf.getChannelData(0);
let cursor = 0;

const micPlusMusic = {
  process(frame) {
    const n   = frame.numberOfFrames;
    const mic = new Float32Array(n);
    const out = new Float32Array(n);

    frame.copyTo(mic, { planeIndex: 0, format: 'f32-planar' });
    for (let i = 0; i < n; i++) {
      out[i] = mic[i] * 0.7 + musicData[(cursor + i) % musicData.length] * 0.3;
    }
    cursor = (cursor + n) % musicData.length;

    frame.close();
    return new AudioData({
      format: 'f32-planar',
      sampleRate: frame.sampleRate,
      numberOfFrames: n,
      numberOfChannels: 1,
      timestamp: frame.timestamp,
      data: out.buffer,
    });
  },
};

await VideoSDK.applyAudioProcessor(micPlusMusic);
Source-style โ€” file playback (ignore mic, send file)
const fileBuffer = await loadAudioFile('/audio/jingle.mp3');
const fileData   = fileBuffer.getChannelData(0);
let cursor = 0;

const filePlayback = {
  process(frame) {
    const n   = frame.numberOfFrames;
    const out = new Float32Array(n);

    for (let i = 0; i < n; i++) {
      out[i] = fileData[(cursor + i) % fileData.length];
    }
    cursor = (cursor + n) % fileData.length;

    frame.close();
    return new AudioData({
      format: 'f32-planar',
      sampleRate: frame.sampleRate,
      numberOfFrames: n,
      numberOfChannels: 1,
      timestamp: frame.timestamp,
      data: out.buffer,
    });
  },
};

await VideoSDK.applyAudioProcessor(filePlayback);
TTS source โ€” synthetic agent voice (frame queue)
const queue = [];
const tts = openTtsStream({ voice: 'alice' });
tts.onFrame = (pcm) => queue.push(pcm);

const ttsSource = {
  process(frame) {
    const next = queue.shift();
    frame.close();
    return next ?? silentFrame(frame);
  },
};

await VideoSDK.applyAudioProcessor(ttsSource);
tts.speak('Hello, I am an AI agent.');

See also: VideoSDK.applyAudioProcessor VideoFrameProcessor