Open questions
Decisions that aren't fully locked β either deferred for v1, awaiting prototype validation, or parked for re-evaluation if real apps surface a problem. Each entry summarizes the question, the current state, and what would trigger a revisit.
MEDIA_API_V1.md under "Open Questions" (Q1 onwards). This page is the index β open and re-evaluate whenever a related design discussion comes up.π₯ Join flow
Questions about VideoSDK.join() and JoinOptions.
Q14 β One-step vs two-step join
Status: locked to one-step (VideoSDK.join()) for v1. Two-step (new Room() + room.connect()) parked.
The question
Should the SDK use one-step join (returns a connected Room) or two-step (construct Room locally, then connect() separately)?
Why this came up
v0 had a real bug: app calls room.on('subscribe-topic', ...) immediately after join; SDK socket isn't fully ready; SDK fails to send the upstream subscribe; messages are missed. Two-step would solve this naturally β listeners registered before connect() get bundled into the join handshake.
Why one-step still wins
The race is fixable with proper SDK contract:
- VideoSDK.join() Promise resolves only when socket is fully ready (joined room, can send/receive)
- Late
room.on('event')calls immediately send the upstream subscribe β socket is ready by definition - Events that fired during the join handshake (already-present participants, queued pubsub messages) are retained and re-delivered when listeners register β same retroactive pattern as
participant-joined JoinOptions.subscribeEventspre-warms known-upfront events as part of the join handshake (single round-trip)
What two-step would have given us beyond race-resolution
- Reusable
Roominstance β same object candisconnect()thenconnect()to a different room without re-wiring listeners. Useful for room-switching UIs but rare in real apps. - React-friendly construction β
useState(() => new Room())+useEffect(connect)pattern. - Industry alignment β LiveKit / mediasoup use two-step.
Cost that kept us on one-step
- Two-call API adds ceremony for the simple case
- State-gated methods β
subscribe()/requestPublishX()etc. would need to error pre-connect (NOT_CONNECTED) - Lifecycle-bug risk if app constructs
Roomthen never connects (similar to the lobby leak we already rejected) - Documentation rewrite β many decisions assume one-step semantics
Revisit if
- Real-world apps consistently want room-instance reuse (room-switching UIs at scale)
- The "Promise resolves only when fully ready" contract proves hard to deliver consistently across platforms
- Late-listener bundling latency becomes a measurable bottleneck in production telemetry
Q15 β `VideoSDK.prepareConnection()` for pre-warm
Status: deferred for v1. Additive non-breaking change later.
LiveKit's pattern: open the WebSocket + warm TLS during a loading screen, before the user clicks Join. Saves 200β500ms perceived join time. Most v1 apps go straight to join, so deferred until real demand surfaces.
Q16 β Boolean shortcut for JoinOptions.publishVideo / publishAudio
Status: deferred for v1. Locked on opts-only shape; revisit if real apps prefer the boolean form.
The question
Should JoinOptions.publishVideo / publishAudio accept a plain boolean in addition to a publish-opts object? i.e. publishVideo: true as a shorthand for "publish camera with defaults", mirroring v0's webcamEnabled / micEnabled.
What we picked for v1 (opts-only)
Single field per kind. Presence (even empty) means "publish on join":
publishVideo: {}β publish camera with defaults (TS/JS: empty object literal)publishVideo: { resolution: 'h720' }β publish with custom config- omit field β don't publish
Default-constructor pattern across strongly-typed SDKs:
- Kotlin:
publishVideo = PublishVideoOpts() - Swift:
publishVideo: PublishVideoOpts() - Dart:
publishVideo: PublishVideoOpts()
Why we didn't add the boolean shortcut
boolean | PublishVideoOptsdoesn't translate cleanly to Kotlin / Swift / Dart β would force sealed-class wrappers in 3 SDKs to save a couple of characters in JS- Two ways to express the same thing (boolean OR opts) creates ambiguity: what if both are set? what if user sets
publishVideo: truebut expects per-stream defaults from somewhere else? - Default-constructor pattern reads naturally in every SDK β same mental model uniformly
Revisit if
- v0 migrants consistently complain about losing
webcamEnabled: trueergonomics - Documentation / examples show the empty-object form confuses first-time readers
- A separate enable-only API surfaces (e.g.
VideoSDK.join({ enable: ['video', 'audio'] })) that side-steps the union-type issue
Q17 β When does VideoSDK.join() Promise resolve? RESOLVED
Status: locked via decision 36 in MEDIA_API_V1.md. Resolve at phase 4 (connected) + phase 6 (pre-warm done). Phase 5 (publish) and phase 7 (retroactive events) run async after resolve. Publish lifecycle events on LocalParticipant: video-published / audio-published / screen-published / publish-failed. Original analysis preserved below for context.
The question
At what point in the join handshake should the VideoSDK.join() Promise resolve? Three plausible boundaries:
| Option | Resolves on | Latency | Tradeoff |
|---|---|---|---|
| A | Server ack of join request | Lowest | App gets a Room object before media is flowing β publishVideo calls might queue or fail |
| B | Signaling open + ICE/DTLS complete | Mid | Connection is up but pre-warmed events / initial publish may not be done yet |
| C (current) | Fully ready: signaling + ICE/DTLS + initial publish + retroactive events delivered | Highest | App is guaranteed everything works the moment await resolves; no race-windows |
Why we default to C (full readiness)
- One-step join's whole value proposition (see Q14) is "no race windows" β only C delivers that
- App code
const room = await VideoSDK.join(); room.localParticipant.video.attach(el);works immediately β no "wait for connected" guard - Late
room.on('event')subscriptions never miss anything β handshake events are already delivered or retained
When this might bite
- Slow initial publish β if the app passes
publishVideo: { resolution: 'h1080' }over a slow uplink, the publish handshake adds 500ms+ to perceived join time. App can't show "connectingβ¦" UI past this point. - Slow retroactive event delivery β joining a 200-participant room means 200
participant-joinedevents fire before resolve. Could be hundreds of ms. - App wants to show partial UI β e.g. show participant tiles while own publish is still negotiating
Alternatives to consider if C is too slow
- Resolve at B (connected); fire a separate
'ready'event when publish + retroactive events done - Resolve at C but expose progress via a
onProgresscallback in JoinOptions - Resolve at C for happy path; expose
VideoSDK.join({ resolveOn: 'connected' })opt for apps wanting earlier resolution
Revisit if
- Telemetry shows publish-on-join is a measurable fraction of total join time, and apps want to render UI sooner
- Large-room joins (100+ participants) cause noticeable resolve delay due to retroactive event flood
- Real apps consistently want a "connected but not yet publishing" state to handle
Resolved in this group
| Q# | Question | Resolution |
|---|---|---|
| Q2 | Naming for entry point | Locked VideoSDK.join({...}) |
π Events
Questions about event surfaces β what fires on LocalParticipant / RemoteParticipant / Room and how.
Q1 β Per-kind events vs single event with kind param RESOLVED
Status: locked v1 as per-kind. Worth revisiting before public release.
The question
Should publish/subscribe lifecycle use 18 per-kind events (3 kinds Γ 6 actions) or 6 generic events with a kind field in the payload?
Current v1 (per-kind)
me.on('video-published', stream => stream.attach(localEl));
me.on('audio-published', stream => ...);
me.on('screen-published', stream => ...);
me.on('publish-failed', ({ kind, error }) => ...); // shared failure event
Alternative β single event with kind param
me.on('published', ({ kind, stream }) => {
if (kind === MediaKind.Video) stream.attach(localEl);
});
me.on('unpublished', ({ kind }) => ...);
Tradeoffs
| Per-kind (locked) | Single event with kind | |
|---|---|---|
| Event count | 18 | 6 |
| Handler signature | precise per kind | union β stream type varies per kind |
| Type safety | strong | requires discriminated union |
| "Any media change" listener | register 6+ | register 1 |
| Listener noise | low (only fires for the kind) | high (fires for every kind) |
| Discoverability | IDE autocomplete on 'video-' | one event, one signature |
| Refactor cost | low | adding a kind requires updating switch statements |
Why per-kind for v1
Type safety + listener focus + discoverability. The 18-event count sounds high but it's six concepts Γ three kinds β fully regular.
Revisit if
- v2 multi-track (auxiliary cameras, custom-named sources) makes the event count blow up
- Apps consistently want a single "any media change" listener
Q11 β Event mirroring: should publish/subscribe events also fire on room?
Status: locked v1 as participant-only. Under review β could be added as backwards-compatible additive change later.
The question
Should the publish/subscribe events fire only on the participant they relate to, or also mirror to the Room object with a participant field added to the payload?
Current v1 behavior (participant-only)
me.on('video-published', stream => {}); // local β only on me
p.on('video-subscribed', stream => {}); // remote β only on that participant
// Cross-cutting: register per participant; SDK auto-cleans on leave
room.on('participant-joined', p => {
p.on('video-subscribed', stream => analytics.log(stream));
});
Alternative β also mirror to room (LiveKit / Daily / current VideoSDK convention)
p.on('video-subscribed', stream => {}); // on p
room.on('video-subscribed', ({ participant, stream }) => {}); // mirror
Why we picked participant-only for v1
- Smaller event surface to document and reason about
- One canonical place per event β no "did I register on the right level?" confusion
- SDK auto-cleanup of handlers on participant leave neutralizes verbosity for the cross-cutting case
- Adding room mirrors later is additive (non-breaking)
What we'd give up by NOT mirroring
- Cross-cutting concerns (analytics, logger, observability, room-wide UI badges) require the participant-joined-then-on pattern β slightly more verbose
- LiveKit explicitly states "Room events are generally a superset of participant events β this is intentional"
- Three of four major SDKs we surveyed (LiveKit, current VideoSDK, Daily) mirror events to room β divergence from industry convention
Revisit if
- Real users complain about cross-cutting handler verbosity
- Analytics / logging customers find the per-participant pattern unergonomic
- React / Vue users hit issues with the participant-joined-then-on pattern at scale
Resolved in this group
| Q# | Question | Resolution |
|---|---|---|
| Q12 | Event naming convention β kebab vs camelCase | Locked β kebab-case strings + enums (both forms work). E.g. 'video-published' or LocalEvent.VideoPublished |
π₯ Streams & publishing
Questions about VideoStream / AudioStream / ScreenStream shape, publish options, and the cross-platform abstraction.
Q4 β Custom / auxiliary tracks API
Status: was deferred to v2; under the "no v0 feature drops in v1" directive this is now in scope for v1. Currently TBD β needs design.
The question
How does v1 expose multiple simultaneous tracks of the same kind β e.g. multi-camera (dual-cam broadcast), document camera alongside webcam, multi-language audio feeds, multiple screen shares?
What v1 already covers via the frame-transform processor (decision 13)
VideoSDK.applyVideoProcessor() / applyAudioProcessor() cover virtual bg, noise cancel, audio mixing, file playback, TTS, canvas video β most "I want a custom source / effect" use cases.
What still needs v1 design
Use cases that genuinely need separate tracks (independent subscribe / render / quality control per consumer) β compositing into one track via processor doesn't fit:
- Multi-camera (dual-cam broadcast)
- Document camera alongside webcam
- Multi-language audio feeds
- Multiple screen shares
Likely v1 shape (sketch)
me.addCustomVideo({ name: 'doc-cam', device: docCam });
p.subscribeCustom(['doc-cam']);
p.customTracks: Map<string, RemoteVideoStream>
Q13 β Stream object abstraction across platforms
Status: open. Needs prototype validation on React Native / Flutter / iOS / Android before treating as locked. Many other decisions assume a stable stream object.
The proposed abstraction
// Conceptually identical across web / RN / Flutter / iOS / Android
interface VideoStream {
attach(target: PlatformElement): void;
detach(target: PlatformElement): void;
// ...plus dimensions, frameRate, getStats, getMediaStreamTrack, etc.
}
A stable handle per (participant, kind) β same instance across calls, internally swaps tracks on device change, invalidated on unpublish/leave, lets users attach to one or many native rendering elements.
Higher-level layer (also assumed)
Framework-specific components on top β <RemoteVideoView /> (React + RN), RemoteVideoView widget (Flutter), VideoSDKVideoView (iOS UIView, Android FrameLayout) β that internally call stream.attach() so app code stays declarative.
Feasibility questions to validate before locking
| Platform | Concern | Confidence |
|---|---|---|
| Web (vanilla JS) | srcObject = new MediaStream([track]) | β likely fine |
| React (web) | Component wraps stream.attach | β likely fine |
| React Native | Bridge support for handing track ID to native view component, with stable lifecycle across re-renders | β needs prototype |
| Flutter | Texture-based rendering: does stream.attach(VideoRenderer) map cleanly to platform-channel + texture ID? | β needs prototype |
| iOS native | track.add(rtcMTLVideoView) | β likely fine |
| Android native | track.addSink(surfaceViewRenderer) | β likely fine |
| Hot-swap mid-attach | Replace underlying track on a connected view without rebuilding it (critical for "stable stream instance" guarantee) | β needs verification per platform |
| Audio auto-play | Internal <audio> (web) / AVAudioPlayer (iOS) / AudioTrack (Android) with output device routing | β feasible |
Validation work needed
- Build minimal prototype on each target platform that:
- Receives a track from the SFU
- Wraps it in a
stream - Exposes
attach(nativeElement) - Survives a hot-swap (device change β underlying track swapped, attached element keeps showing video)
- Validate the framework component layer can wrap the imperative
stream.attach()cleanly per framework
Fallbacks if abstraction breaks down
- If a platform forces a fundamentally different rendering API (e.g., Flutter requires texture ID up front and the stream can't proxy), expose a platform-specific render API there and document the difference
- If hot-swap-without-rebuild isn't possible on some platform, change the contract: stream instance may be replaced on track change. App listens for an event to re-attach. Slightly worse DX but still workable.
- Worst case: drop the unified
streamabstraction and adopt the current SDK pattern (participant.renderVideo()returns a fully-formed element). Trade-off: less control, but proven to work.
Risk if we don't validate
The abstraction reads clean on paper but might require platform-specific shims that break the "same conceptual API everywhere" promise. Better to find out now than after writing 200 pages of docs.
Q18 β Pre-call preview lifecycle: singleton + auto-promote vs render-only with auto-dispose
The question
How should the SDK expose a pre-call camera/mic stream β the one a user sees in a lobby UI before clicking Join?
Two competing shapes:
- Approach A (current, decision 32) β
VideoSDK.createVideoStream()returns a LocalVideoStream tracked in an SDK-internal singleton slot. The next VideoSDK.join() auto-promotes it intome.video. App calls.attach(el)for rendering. - Approach B β
VideoSDK.precallPreview({ render })takes the render target at creation; returns an independent handle (no singleton). On the nextVideoSDK.join()the preview is auto-disposed β publish is always a fresh acquisition viaJoinOptions.publishVideo(SDK can orchestrate a seamless render swap so there's no visible flicker).
Approach A β Singleton + auto-promote (current)
// Lobby flow β same physical MediaStreamTrack flows preview β publish
const preview = await VideoSDK.createVideoStream({ device, resolution: 'h720' });
preview.attach(previewEl);
// Singleton state is visible
VideoSDK.videoStream; // = preview (until join or stop)
// Join silently adopts the singleton
await VideoSDK.join({}); // preview becomes me.video β no flicker, no re-acquire
// If NOT joining, manual cleanup required
await preview.stop(); // releases camera, clears VideoSDK.videoStream slot
Return type: LocalVideoStream (same type used in-call as me.video). The stop() method is pre-call only β it aborts the preview (releases the device + clears the singleton). Post-join, use me.unpublishVideo() for full teardown (unpublish + release device).
| Pros | Cons |
|---|---|
| Zero-flicker preview β publish (same physical track). | Hidden coupling β join() reads global singleton state. |
| One method, one type (LocalVideoStream) for preview AND publish. | Render-only footgun β settings-page preview lingers until next join, then accidentally promotes. |
| runPreCallTest() reuses the singleton β no double permission, no double acquire. | Manual stop() discipline required if app doesn't proceed to join. |
Inspectable state via VideoSDK.videoStream getter. |
Mid-call preview of a different camera fails β singleton conflicts with active publish. |
Continuity with v0's createCameraVideoTrack mental model. |
Singleton-per-kind doesn't scale to multi-track v2 (e.g., dual-camera previews). |
No "render target" cross-platform problem β same .attach(el) pattern as in-call streams. |
Preview configuration leaks into publish via the same instance ("why did my resolution change on join?"). |
Approach B β precallPreview with render + auto-dispose on join
// Pre-call preview β SDK owns the render binding AND the lifecycle
const preview = await VideoSDK.precallPreview({
device: cam,
resolution: 'h720',
render: previewEl, // SDK manages the binding
});
// Three terminal paths:
await VideoSDK.join({}); // preview disposed; no publish
await VideoSDK.join({ publishVideo: preview.getPublishOpts() }); // preview disposed; publish acquires fresh
// with same device/resolution as preview.
// (Processor β if any β is sticky on VideoSDK, applies automatically.)
// SDK can swap render seamlessly (no visible flicker).
await preview.dispose(); // explicit (e.g. SPA navigation)
Return type: PreviewVideoStream (new pre-call-only type, distinct from LocalVideoStream). Blueprint:
interface PreviewVideoStream {
// === Identity ===
readonly id: string;
readonly inputDevice: CameraDeviceInfo;
// === Publish forwarding ===
// Snapshot of current config β pass to VideoSDK.join({ publishVideo: preview.getPublishOpts() })
// to publish with the same device / resolution as the preview.
getPublishOpts(): PublishVideoOpts;
// === Lifecycle ===
// Releases the device. Auto-called by VideoSDK.join(). Manual only for abandon paths
// (cancel button, SPA navigation, settings unmount). After dispose() all methods throw PREVIEW_DISPOSED.
dispose(): Promise<void>;
// === Device control (mid-preview swap) ===
setInputDevice(device: CameraDeviceInfo): Promise<void>;
updateSettings(opts: { resolution?: VideoResolution; frameRate?: number }): Promise<void>;
getInputCapabilities(): Promise<VideoCapabilities>;
// === Render binding ===
// Original render target was passed at creation via { render }. Use rebind() to swap to a new target.
rebind(target: PlatformRenderTarget): void;
unbind(): void;
// === Effects β handled globally via VideoSDK.applyVideoProcessor() (sticky), not per-preview ===
// === Events ===
// 'ended' ({ timestamp }) β device unplugged / permission revoked
// 'disposed' () β dispose() ran (manual or auto-on-join)
on(event: "ended" | "disposed", listener): void;
}
Audio equivalent PreviewAudioStream mirrors this shape (with MicrophoneDeviceInfo, PublishAudioOpts, an audioLevel read for VU meter, and no rebind β audio doesn't have a render target).
publishVideo. The preview.getPublishOpts() method is the one-line forward β without it, publishVideo: {} would publish the default camera (not the user's choice). The lobby's device picker outcome is lost if not forwarded. (Frame processors are NOT in this list β they're sticky on VideoSDK and apply automatically across all streams.)captureFrame / getMediaStreamTrack / setProcessor on the preview type. Snapshot capture and raw track access are kept on LocalVideoStream (post-join). Processors are sticky on VideoSDK, not per-stream. The preview type intentionally stays focused on the lobby lifecycle.| Pros | Cons |
|---|---|
Clear intent at call site β precallPreview means "I'm rendering for inspection only". |
Either visible flicker on join OR SDK must orchestrate a render swap (acquire publish track β swap srcObject β release preview). More SDK complexity. |
| Render-only use cases (settings page, diagnostics) work natively β no accidental promotion. | Two methods to learn β precallPreview vs JoinOptions.publishVideo. |
No singleton state, no global VideoSDK.videoStream to leak. |
Different type from in-call streams (or fuzzy semantics if same type). |
| Auto-dispose on join eliminates the most common leak path. | render parameter needs platform-typed native view (HTMLVideoElement / UIView / RTCVideoView / etc.) β same as .attach(el), just at creation time. |
render as a creation-time parameter enables SDK-orchestrated transitions. |
runPreCallTest() integration needs explicit wiring β preview must expose track access for reuse. |
| Independent acquisitions = independent configs (preview at 720p, publish at 1080p, no leak between). | Migration from v0 is more disruptive β different shape from createCameraVideoTrack. |
| Scales naturally to multi-track v2 β multiple independent preview handles, no singleton conflict. | "Why is my preview gone after I joined?" β documentation burden on the dispose contract. |
| Mid-call preview of a different camera works β independent handle doesn't touch the active publish. | Device / resolution from preview doesn't auto-flow to publish β app must thread explicitly via publishVideo: preview.getPublishOpts() (footgun if forgotten: defaults are used instead of user's choice). Frame processors are NOT affected β they're sticky on VideoSDK. |
Use cases this affects
Pre-call track/stream use cases divide cleanly by whether they lead to publishing:
Will publish on join (preview β publish flow):
| # | Use case | Approach A | Approach B |
|---|---|---|---|
| 1 | Green-room / lobby preview | β Perfect (no flicker) | β Works (needs SDK swap orchestration) |
| 2 | Device selection ("which camera?") | β Implicit β preview's device flows to publish via the singleton | β Works but app must thread device into publishVideo (or use preview.getPublishOpts()) β footgun if forgotten |
| 3 | Background blur / virtual bg setup | β
Same effect across preview β publish via sticky VideoSDK.applyVideoProcessor | β Same β sticky global processor applies regardless of preview/publish split |
| 4 | Face filter / AR effect preview | β Same as 3 | β Same as 3 |
| 5 | Resolution/quality selection | β Resolution flows to publish via the singleton | β Resolution must be re-passed into publish opts |
| 6 | Pre-call network quality test | β Reuses singleton | β Needs explicit wiring (runPreCallTest({ usePreview })) |
Won't publish (render-only):
| # | Use case | Approach A | Approach B |
|---|---|---|---|
| 7 | Settings page preview | β Singleton lingers, accidentally promotes | β Clean |
| 8 | "Camera works?" diagnostic | β Same footgun | β Clean |
| 9 | Mid-call preview of different camera | β Broken β singleton conflicts with active publish | β Independent handle |
| 10 | Browser/device support check | β Footgun | β Clean |
Resolved in this group
| Q# | Question | Resolution |
|---|---|---|
| Q3 | Quality presets vs raw values for video options | Locked β both work. Presets ('h720') or raw heights (720) |
| Q5 | Pre-join preview | Originally resolved via decision 32 (VideoSDK.createVideoStream() + auto-promote). Re-opened β see Q18. |
| Q6 | Plugins (transform + source) | Resolved β collapsed into the frame-transform processor (decision 13). Source plugins = processor whose init loads the source |
| Q7 | MediaStreamTrack escape hatch β fully removed? | Locked β no escape hatch in v1. Custom sources flow through processor |
| Q8 | Audio rendering β auto-play vs explicit attach | Locked β remote auto-plays (decision 9), local doesn't (feedback avoidance), attach() available as escape hatch |
Q10 was never assigned.
See also: Media API v1 overview