AudioFrameProcessor interface
Per-frame audio transform. Same shape as VideoFrameProcessor with audio frames. Used for noise cancellation, audio FX, mixing, file playback (as mic), TTS, and source-style processors.
Attach via VideoSDK.applyAudioProcessor(). Sticky โ set once, applies to every microphone stream of the local participant.
Blueprint
Single-method contract. No lifecycle hooks. App owns its own resources (DSP models, decoded audio buffers, workers) โ the SDK never touches them.
removeAudioProcessor(); the SDK doesn't manage them.Members
process
Called once per captured audio frame. Return an AudioData โ that's what gets encoded and published. Can be sync or async.
For source-style processors (file playback, TTS, generative): ignore the input frame and return generated frames.
Frame handling. Call frame.close() after reading samples. Return a new AudioData for output.
process well under 10ms per frame. Heavy DSP should run in an AudioWorklet thread, not on the main thread.When the SDK stops calling process
| Trigger | What happens |
|---|---|
VideoSDK.removeAudioProcessor() | SDK stops calling process. Frames flow mic โ encoder unmodified. |
VideoSDK.applyAudioProcessor(other) | SDK stops calling the old processor and starts calling the new one on the next frame. |
me.unpublishAudio() | Microphone stream ends โ no more frames to process. |
room.leave() | All local streams stop. Processor receives no further frames. |
| Mic unplugged / permission revoked | Underlying source ends. SDK stops calling process. |
Examples
// App-owned setup
const rnnoise = await loadRNNoise();
const inputBuf = new Float32Array(480);
const outputBuf = new Float32Array(480);
const noiseSuppress = {
process(frame) {
frame.copyTo(inputBuf, { planeIndex: 0, format: 'f32-planar' });
rnnoise.process(inputBuf, outputBuf);
const cleaned = new AudioData({
format: 'f32-planar',
sampleRate: frame.sampleRate,
numberOfFrames: frame.numberOfFrames,
numberOfChannels: 1,
timestamp: frame.timestamp,
data: outputBuf.buffer,
});
frame.close();
return cleaned;
},
};
await VideoSDK.applyAudioProcessor(noiseSuppress);
// Later
await VideoSDK.removeAudioProcessor();
rnnoise.dispose(); // app cleans its own resources
const audioCtx = new AudioContext();
const arrBuf = await fetch('/audio/intro.mp3').then(r => r.arrayBuffer());
const musicBuf = await audioCtx.decodeAudioData(arrBuf);
const musicData = musicBuf.getChannelData(0);
let cursor = 0;
const micPlusMusic = {
process(frame) {
const n = frame.numberOfFrames;
const mic = new Float32Array(n);
const out = new Float32Array(n);
frame.copyTo(mic, { planeIndex: 0, format: 'f32-planar' });
for (let i = 0; i < n; i++) {
out[i] = mic[i] * 0.7 + musicData[(cursor + i) % musicData.length] * 0.3;
}
cursor = (cursor + n) % musicData.length;
frame.close();
return new AudioData({
format: 'f32-planar',
sampleRate: frame.sampleRate,
numberOfFrames: n,
numberOfChannels: 1,
timestamp: frame.timestamp,
data: out.buffer,
});
},
};
await VideoSDK.applyAudioProcessor(micPlusMusic);
const fileBuffer = await loadAudioFile('/audio/jingle.mp3');
const fileData = fileBuffer.getChannelData(0);
let cursor = 0;
const filePlayback = {
process(frame) {
const n = frame.numberOfFrames;
const out = new Float32Array(n);
for (let i = 0; i < n; i++) {
out[i] = fileData[(cursor + i) % fileData.length];
}
cursor = (cursor + n) % fileData.length;
frame.close();
return new AudioData({
format: 'f32-planar',
sampleRate: frame.sampleRate,
numberOfFrames: n,
numberOfChannels: 1,
timestamp: frame.timestamp,
data: out.buffer,
});
},
};
await VideoSDK.applyAudioProcessor(filePlayback);
const queue = [];
const tts = openTtsStream({ voice: 'alice' });
tts.onFrame = (pcm) => queue.push(pcm);
const ttsSource = {
process(frame) {
const next = queue.shift();
frame.close();
return next ?? silentFrame(frame);
},
};
await VideoSDK.applyAudioProcessor(ttsSource);
tts.speak('Hello, I am an AI agent.');