Open questions

Decisions that aren't fully locked β€” either deferred for v1, awaiting prototype validation, or parked for re-evaluation if real apps surface a problem. Each entry summarizes the question, the current state, and what would trigger a revisit.

The full rationale and historical discussion for each item lives in MEDIA_API_V1.md under "Open Questions" (Q1 onwards). This page is the index β€” open and re-evaluate whenever a related design discussion comes up.

πŸ“₯ Join flow

Questions about VideoSDK.join() and JoinOptions.

Q14 β€” One-step vs two-step join

Status: locked to one-step (VideoSDK.join()) for v1. Two-step (new Room() + room.connect()) parked.

The question

Should the SDK use one-step join (returns a connected Room) or two-step (construct Room locally, then connect() separately)?

Why this came up

v0 had a real bug: app calls room.on('subscribe-topic', ...) immediately after join; SDK socket isn't fully ready; SDK fails to send the upstream subscribe; messages are missed. Two-step would solve this naturally β€” listeners registered before connect() get bundled into the join handshake.

Why one-step still wins

The race is fixable with proper SDK contract:

  1. VideoSDK.join() Promise resolves only when socket is fully ready (joined room, can send/receive)
  2. Late room.on('event') calls immediately send the upstream subscribe β€” socket is ready by definition
  3. Events that fired during the join handshake (already-present participants, queued pubsub messages) are retained and re-delivered when listeners register β€” same retroactive pattern as participant-joined
  4. JoinOptions.subscribeEvents pre-warms known-upfront events as part of the join handshake (single round-trip)

What two-step would have given us beyond race-resolution

Cost that kept us on one-step

Revisit if

Q15 β€” `VideoSDK.prepareConnection()` for pre-warm

Status: deferred for v1. Additive non-breaking change later.

LiveKit's pattern: open the WebSocket + warm TLS during a loading screen, before the user clicks Join. Saves 200–500ms perceived join time. Most v1 apps go straight to join, so deferred until real demand surfaces.

Q16 β€” Boolean shortcut for JoinOptions.publishVideo / publishAudio

Status: deferred for v1. Locked on opts-only shape; revisit if real apps prefer the boolean form.

The question

Should JoinOptions.publishVideo / publishAudio accept a plain boolean in addition to a publish-opts object? i.e. publishVideo: true as a shorthand for "publish camera with defaults", mirroring v0's webcamEnabled / micEnabled.

What we picked for v1 (opts-only)

Single field per kind. Presence (even empty) means "publish on join":

Default-constructor pattern across strongly-typed SDKs:

Why we didn't add the boolean shortcut

Revisit if

Q17 β€” When does VideoSDK.join() Promise resolve? RESOLVED

Status: locked via decision 36 in MEDIA_API_V1.md. Resolve at phase 4 (connected) + phase 6 (pre-warm done). Phase 5 (publish) and phase 7 (retroactive events) run async after resolve. Publish lifecycle events on LocalParticipant: video-published / audio-published / screen-published / publish-failed. Original analysis preserved below for context.

The question

At what point in the join handshake should the VideoSDK.join() Promise resolve? Three plausible boundaries:

OptionResolves onLatencyTradeoff
AServer ack of join requestLowestApp gets a Room object before media is flowing β€” publishVideo calls might queue or fail
BSignaling open + ICE/DTLS completeMidConnection is up but pre-warmed events / initial publish may not be done yet
C (current)Fully ready: signaling + ICE/DTLS + initial publish + retroactive events deliveredHighestApp is guaranteed everything works the moment await resolves; no race-windows

Why we default to C (full readiness)

When this might bite

Alternatives to consider if C is too slow

Revisit if

Resolved in this group

Q#QuestionResolution
Q2Naming for entry pointLocked VideoSDK.join({...})

πŸ”” Events

Questions about event surfaces β€” what fires on LocalParticipant / RemoteParticipant / Room and how.

Q1 β€” Per-kind events vs single event with kind param RESOLVED

Status: locked v1 as per-kind. Worth revisiting before public release.

The question

Should publish/subscribe lifecycle use 18 per-kind events (3 kinds Γ— 6 actions) or 6 generic events with a kind field in the payload?

Current v1 (per-kind)

me.on('video-published',  stream => stream.attach(localEl));
me.on('audio-published',  stream => ...);
me.on('screen-published', stream => ...);
me.on('publish-failed',   ({ kind, error }) => ...);   // shared failure event

Alternative β€” single event with kind param

me.on('published',   ({ kind, stream }) => {
  if (kind === MediaKind.Video) stream.attach(localEl);
});
me.on('unpublished', ({ kind }) => ...);

Tradeoffs

Per-kind (locked)Single event with kind
Event count186
Handler signatureprecise per kindunion β€” stream type varies per kind
Type safetystrongrequires discriminated union
"Any media change" listenerregister 6+register 1
Listener noiselow (only fires for the kind)high (fires for every kind)
DiscoverabilityIDE autocomplete on 'video-'one event, one signature
Refactor costlowadding a kind requires updating switch statements

Why per-kind for v1

Type safety + listener focus + discoverability. The 18-event count sounds high but it's six concepts Γ— three kinds β€” fully regular.

Revisit if

Q11 β€” Event mirroring: should publish/subscribe events also fire on room?

Status: locked v1 as participant-only. Under review β€” could be added as backwards-compatible additive change later.

The question

Should the publish/subscribe events fire only on the participant they relate to, or also mirror to the Room object with a participant field added to the payload?

Current v1 behavior (participant-only)

me.on('video-published', stream => {});           // local β€” only on me
p.on('video-subscribed', stream => {});           // remote β€” only on that participant

// Cross-cutting: register per participant; SDK auto-cleans on leave
room.on('participant-joined', p => {
  p.on('video-subscribed', stream => analytics.log(stream));
});

Alternative β€” also mirror to room (LiveKit / Daily / current VideoSDK convention)

p.on('video-subscribed', stream => {});                       // on p
room.on('video-subscribed', ({ participant, stream }) => {}); // mirror

Why we picked participant-only for v1

What we'd give up by NOT mirroring

Revisit if

Resolved in this group

Q#QuestionResolution
Q12Event naming convention β€” kebab vs camelCaseLocked β€” kebab-case strings + enums (both forms work). E.g. 'video-published' or LocalEvent.VideoPublished

πŸŽ₯ Streams & publishing

Questions about VideoStream / AudioStream / ScreenStream shape, publish options, and the cross-platform abstraction.

Q4 β€” Custom / auxiliary tracks API

Status: was deferred to v2; under the "no v0 feature drops in v1" directive this is now in scope for v1. Currently TBD β€” needs design.

The question

How does v1 expose multiple simultaneous tracks of the same kind β€” e.g. multi-camera (dual-cam broadcast), document camera alongside webcam, multi-language audio feeds, multiple screen shares?

What v1 already covers via the frame-transform processor (decision 13)

VideoSDK.applyVideoProcessor() / applyAudioProcessor() cover virtual bg, noise cancel, audio mixing, file playback, TTS, canvas video β€” most "I want a custom source / effect" use cases.

What still needs v1 design

Use cases that genuinely need separate tracks (independent subscribe / render / quality control per consumer) β€” compositing into one track via processor doesn't fit:

Likely v1 shape (sketch)

me.addCustomVideo({ name: 'doc-cam', device: docCam });
p.subscribeCustom(['doc-cam']);
p.customTracks: Map<string, RemoteVideoStream>

Q13 β€” Stream object abstraction across platforms

Status: open. Needs prototype validation on React Native / Flutter / iOS / Android before treating as locked. Many other decisions assume a stable stream object.

The proposed abstraction

// Conceptually identical across web / RN / Flutter / iOS / Android
interface VideoStream {
  attach(target: PlatformElement): void;
  detach(target: PlatformElement): void;
  // ...plus dimensions, frameRate, getStats, getMediaStreamTrack, etc.
}

A stable handle per (participant, kind) β€” same instance across calls, internally swaps tracks on device change, invalidated on unpublish/leave, lets users attach to one or many native rendering elements.

Higher-level layer (also assumed)

Framework-specific components on top β€” <RemoteVideoView /> (React + RN), RemoteVideoView widget (Flutter), VideoSDKVideoView (iOS UIView, Android FrameLayout) β€” that internally call stream.attach() so app code stays declarative.

Feasibility questions to validate before locking

PlatformConcernConfidence
Web (vanilla JS)srcObject = new MediaStream([track])βœ“ likely fine
React (web)Component wraps stream.attachβœ“ likely fine
React NativeBridge support for handing track ID to native view component, with stable lifecycle across re-renders⚠ needs prototype
FlutterTexture-based rendering: does stream.attach(VideoRenderer) map cleanly to platform-channel + texture ID?⚠ needs prototype
iOS nativetrack.add(rtcMTLVideoView)βœ“ likely fine
Android nativetrack.addSink(surfaceViewRenderer)βœ“ likely fine
Hot-swap mid-attachReplace underlying track on a connected view without rebuilding it (critical for "stable stream instance" guarantee)⚠ needs verification per platform
Audio auto-playInternal <audio> (web) / AVAudioPlayer (iOS) / AudioTrack (Android) with output device routingβœ“ feasible

Validation work needed

Fallbacks if abstraction breaks down

Risk if we don't validate

The abstraction reads clean on paper but might require platform-specific shims that break the "same conceptual API everywhere" promise. Better to find out now than after writing 200 pages of docs.

Q18 β€” Pre-call preview lifecycle: singleton + auto-promote vs render-only with auto-dispose

The question

How should the SDK expose a pre-call camera/mic stream β€” the one a user sees in a lobby UI before clicking Join?

Two competing shapes:

Approach A β€” Singleton + auto-promote (current)

// Lobby flow β€” same physical MediaStreamTrack flows preview β†’ publish
const preview = await VideoSDK.createVideoStream({ device, resolution: 'h720' });
preview.attach(previewEl);

// Singleton state is visible
VideoSDK.videoStream;              // = preview (until join or stop)

// Join silently adopts the singleton
await VideoSDK.join({});           // preview becomes me.video β€” no flicker, no re-acquire

// If NOT joining, manual cleanup required
await preview.stop();              // releases camera, clears VideoSDK.videoStream slot

Return type: LocalVideoStream (same type used in-call as me.video). The stop() method is pre-call only β€” it aborts the preview (releases the device + clears the singleton). Post-join, use me.unpublishVideo() for full teardown (unpublish + release device).

ProsCons
Zero-flicker preview β†’ publish (same physical track). Hidden coupling β€” join() reads global singleton state.
One method, one type (LocalVideoStream) for preview AND publish. Render-only footgun β€” settings-page preview lingers until next join, then accidentally promotes.
runPreCallTest() reuses the singleton β€” no double permission, no double acquire. Manual stop() discipline required if app doesn't proceed to join.
Inspectable state via VideoSDK.videoStream getter. Mid-call preview of a different camera fails β€” singleton conflicts with active publish.
Continuity with v0's createCameraVideoTrack mental model. Singleton-per-kind doesn't scale to multi-track v2 (e.g., dual-camera previews).
No "render target" cross-platform problem β€” same .attach(el) pattern as in-call streams. Preview configuration leaks into publish via the same instance ("why did my resolution change on join?").

Approach B β€” precallPreview with render + auto-dispose on join

// Pre-call preview β€” SDK owns the render binding AND the lifecycle
const preview = await VideoSDK.precallPreview({
  device:     cam,
  resolution: 'h720',
  render:     previewEl,           // SDK manages the binding
});

// Three terminal paths:
await VideoSDK.join({});                                          // preview disposed; no publish
await VideoSDK.join({ publishVideo: preview.getPublishOpts() });  // preview disposed; publish acquires fresh
                                                                  // with same device/resolution as preview.
                                                                  // (Processor β€” if any β€” is sticky on VideoSDK, applies automatically.)
                                                                  // SDK can swap render seamlessly (no visible flicker).
await preview.dispose();                                          // explicit (e.g. SPA navigation)

Return type: PreviewVideoStream (new pre-call-only type, distinct from LocalVideoStream). Blueprint:

interface PreviewVideoStream {
  // === Identity ===
  readonly id: string;
  readonly inputDevice: CameraDeviceInfo;

  // === Publish forwarding ===
  // Snapshot of current config β€” pass to VideoSDK.join({ publishVideo: preview.getPublishOpts() })
  // to publish with the same device / resolution as the preview.
  getPublishOpts(): PublishVideoOpts;

  // === Lifecycle ===
  // Releases the device. Auto-called by VideoSDK.join(). Manual only for abandon paths
  // (cancel button, SPA navigation, settings unmount). After dispose() all methods throw PREVIEW_DISPOSED.
  dispose(): Promise<void>;

  // === Device control (mid-preview swap) ===
  setInputDevice(device: CameraDeviceInfo): Promise<void>;
  updateSettings(opts: { resolution?: VideoResolution; frameRate?: number }): Promise<void>;
  getInputCapabilities(): Promise<VideoCapabilities>;

  // === Render binding ===
  // Original render target was passed at creation via { render }. Use rebind() to swap to a new target.
  rebind(target: PlatformRenderTarget): void;
  unbind(): void;

  // === Effects β€” handled globally via VideoSDK.applyVideoProcessor() (sticky), not per-preview ===

  // === Events ===
  //   'ended'    ({ timestamp })  β€” device unplugged / permission revoked
  //   'disposed' ()               β€” dispose() ran (manual or auto-on-join)
  on(event: "ended" | "disposed", listener): void;
}

Audio equivalent PreviewAudioStream mirrors this shape (with MicrophoneDeviceInfo, PublishAudioOpts, an audioLevel read for VU meter, and no rebind β€” audio doesn't have a render target).

Config doesn't auto-flow. Preview and publish are independent acquisitions. The device/resolution the user picked in the preview must be threaded into publishVideo. The preview.getPublishOpts() method is the one-line forward β€” without it, publishVideo: {} would publish the default camera (not the user's choice). The lobby's device picker outcome is lost if not forwarded. (Frame processors are NOT in this list β€” they're sticky on VideoSDK and apply automatically across all streams.)
No captureFrame / getMediaStreamTrack / setProcessor on the preview type. Snapshot capture and raw track access are kept on LocalVideoStream (post-join). Processors are sticky on VideoSDK, not per-stream. The preview type intentionally stays focused on the lobby lifecycle.
ProsCons
Clear intent at call site β€” precallPreview means "I'm rendering for inspection only". Either visible flicker on join OR SDK must orchestrate a render swap (acquire publish track β†’ swap srcObject β†’ release preview). More SDK complexity.
Render-only use cases (settings page, diagnostics) work natively β€” no accidental promotion. Two methods to learn β€” precallPreview vs JoinOptions.publishVideo.
No singleton state, no global VideoSDK.videoStream to leak. Different type from in-call streams (or fuzzy semantics if same type).
Auto-dispose on join eliminates the most common leak path. render parameter needs platform-typed native view (HTMLVideoElement / UIView / RTCVideoView / etc.) β€” same as .attach(el), just at creation time.
render as a creation-time parameter enables SDK-orchestrated transitions. runPreCallTest() integration needs explicit wiring β€” preview must expose track access for reuse.
Independent acquisitions = independent configs (preview at 720p, publish at 1080p, no leak between). Migration from v0 is more disruptive β€” different shape from createCameraVideoTrack.
Scales naturally to multi-track v2 β€” multiple independent preview handles, no singleton conflict. "Why is my preview gone after I joined?" β€” documentation burden on the dispose contract.
Mid-call preview of a different camera works β€” independent handle doesn't touch the active publish. Device / resolution from preview doesn't auto-flow to publish β€” app must thread explicitly via publishVideo: preview.getPublishOpts() (footgun if forgotten: defaults are used instead of user's choice). Frame processors are NOT affected β€” they're sticky on VideoSDK.

Use cases this affects

Pre-call track/stream use cases divide cleanly by whether they lead to publishing:

Will publish on join (preview β†’ publish flow):

#Use caseApproach AApproach B
1Green-room / lobby previewβœ… Perfect (no flicker)βœ… Works (needs SDK swap orchestration)
2Device selection ("which camera?")βœ… Implicit β€” preview's device flows to publish via the singleton⚠ Works but app must thread device into publishVideo (or use preview.getPublishOpts()) β€” footgun if forgotten
3Background blur / virtual bg setupβœ… Same effect across preview β†’ publish via sticky VideoSDK.applyVideoProcessorβœ… Same β€” sticky global processor applies regardless of preview/publish split
4Face filter / AR effect previewβœ… Same as 3⚠ Same as 3
5Resolution/quality selectionβœ… Resolution flows to publish via the singleton⚠ Resolution must be re-passed into publish opts
6Pre-call network quality testβœ… Reuses singleton⚠ Needs explicit wiring (runPreCallTest({ usePreview }))

Won't publish (render-only):

#Use caseApproach AApproach B
7Settings page preview❌ Singleton lingers, accidentally promotesβœ… Clean
8"Camera works?" diagnostic❌ Same footgunβœ… Clean
9Mid-call preview of different camera❌ Broken β€” singleton conflicts with active publishβœ… Independent handle
10Browser/device support check❌ Footgunβœ… Clean
Use case 9 is the killer test. Mid-call camera preview before switching is a standard UX pattern (Slack, Zoom, Google Meet all have it). Under Approach A the singleton conflicts with the active publish β€” the app cannot preview a second camera while publishing the first. Under Approach B, independent handles solve it naturally. This alone may force Approach B (or a hybrid).

Resolved in this group

Q#QuestionResolution
Q3Quality presets vs raw values for video optionsLocked β€” both work. Presets ('h720') or raw heights (720)
Q5Pre-join previewOriginally resolved via decision 32 (VideoSDK.createVideoStream() + auto-promote). Re-opened β€” see Q18.
Q6Plugins (transform + source)Resolved β€” collapsed into the frame-transform processor (decision 13). Source plugins = processor whose init loads the source
Q7MediaStreamTrack escape hatch β€” fully removed?Locked β€” no escape hatch in v1. Custom sources flow through processor
Q8Audio rendering β€” auto-play vs explicit attachLocked β€” remote auto-plays (decision 9), local doesn't (feedback avoidance), attach() available as escape hatch

Q10 was never assigned.

See also: Media API v1 overview