Authentication concept
An auth token is a JWT your backend generates โ signed with your API key + secret โ and passed to VideoSDK.join() as the token field of JoinOptions. It is never generated on the client: minting requires the app secret, which must stay on your server.
v1 uses a grant-based model. The token carries a grant object โ a set of capability flags โ that the SFU enforces server-side. The SDK doesn't gate anything locally; the token is the source of truth.
VideoSDK.join({ token }). The SFU validates the signature and enforces the grant on every publish / subscribe / moderate action.- Role-based โ token names a role (e.g.
'host') and the server expands it. Cleaner backend DX, but requires a server-side role catalog the customer manages. Kept as a planned non-breaking wrapper over the grant โ a role just expands to{ grant, isViewer }, so it layers on without changing the wire. is_owner+ feature flags (Daily-style) โ one "owner" boolean plus a handful of flags. Too coarse to express our 11 distinct capabilities cleanly.- Role + per-privilege expiry (Agora-style) โ every capability gets its own expiry. Complexity v1 doesn't need; one
expper token suffices. - Hybrid role + permission overrides (v0-style) โ two ways to grant a capability โ precedence confusion. One source of truth.
Token claims
The JWT payload. roomId and grant are the load-bearing claims; the rest are standard JWT registered claims plus two VideoSDK-specific identity/tier markers.
| Claim | Required | Description |
|---|---|---|
roomId | No | The room this token authorizes. Optional: set โ scoped to one room (joining a different roomId rejects with UNAUTHORIZED_ROOM); omitted โ domain-wide (any room in the project), bounded by guardrails โ see roomId scoping. |
participantId | No | Stable identity for this participant. If omitted, the SDK generates one. If set, it pins the identity โ joining with a mismatched participantId in JoinOptions rejects with UNAUTHORIZED_PARTICIPANT. |
isViewer | No (default false) | Tier marker, not a capability. false = on-stage (rostered, fires participant-joined); true = audience (count-only). A sibling of grant โ see Tier & participant events. |
joinPolicy | No (default { mode: 'direct' }) | Entry behavior, not a capability. { mode: 'direct' } = valid token โ straight in; { mode: 'ask', ttl?: number } = the joiner is held in a lobby until a moderator admits or denies. A sibling of grant. Setting mode: 'ask' on a token that also carries canModerate is rejected as INVALID_ENTRY_CLAIM. Full reference: Waiting Lobby. |
grant | Yes | The capability object. Defines what this token may do โ publish, subscribe, record, moderate, etc. See The grant. |
iss | Yes | Your API key. Identifies which application minted the token. An unknown / revoked key rejects with INVALID_API_KEY. |
exp | Yes | Expiry (unix seconds). An expired token rejects with INVALID_TOKEN. Keep tokens short-lived. |
iat / nbf / jti | No | Standard JWT registered claims โ issued-at, not-before, and a unique token id. |
iss claim carries the API key (public); the secret is the signing key (never leaves your server).joinPolicy. The joinPolicy claim is the single switch that opts a participant into the knock-and-admit flow. The lobby's joiner / moderator APIs, the EntryRequest shape, the timeout and retry semantics, and the full set of worked examples (telehealth, webinar green rooms, late-joining moderator) all live on Waiting Lobby.roomId scoping โ optional, with guardrails
roomId is optional. Present โ the token is scoped to that one room (joining a different roomId rejects with UNAUTHORIZED_ROOM). Omitted โ domain-wide (any room in the project). Two guardrails bound the risk of domain-wide tokens.
The guardrails
| Guardrail | What it does |
|---|---|
Shorter max exp when roomless | A domain-wide token gets a tighter expiry ceiling than a room-scoped one โ a leaked any-room token can't live long. (Agora applies the same cap to its wildcard tokens โ 24h.) |
| No privileged grants when roomless | A domain-wide token may not carry canModerate, canRecord, canHls, or canLivestream โ only join + publish / subscribe + data. Privileged actions stay room-scoped. So a leaked domain-wide token can never moderate-or-record every room. |
Why this shape
- Room-scoped is the secure default โ least privilege; a leak hits one meeting. Required by LiveKit and 100ms for exactly this reason.
- Domain-wide is a real convenience โ one token for a multi-room viewer / lobby experience, or a trusted server token; mint before the room is known. Daily (omit
room_name) and Agora (wildcard"*") both support it. - The guardrails close the blast-radius hole that makes the strict SDKs refuse it: short-lived + unprivileged means a domain-wide token is bounded in both time and power.
roomId) combined with no participantId is the single-mint audience token โ one JWT distributed to 1,000+ viewers, each getting a fresh SDK-generated identity. Because privileged grants are blocked, the worst-case leak is "anyone can subscribe to any public room," not "anyone can moderate every room."The grant
The capability object. Each flag is enforced server-side by the SFU. A grant with everything false is a no-op participant (connected, but can't publish, subscribe, or moderate). Mint the narrowest grant that fits the role.
| Flag | Type | Controls |
|---|---|---|
canPublish | boolean | Whether this participant may publish media at all. If false, every publish attempt is rejected server-side regardless of canPublishSources. |
canPublishSources | ('camera' | 'microphone' | 'screen')[] | When canPublish is true, restricts which sources may be published. E.g. ['microphone'] = audio-only; ['camera','microphone','screen'] = full. |
canSubscribe | boolean | Whether this participant may receive (subscribe to) other participants' media. |
canPublishData | boolean | Gates sending data โ room.pubsub.publish and room.dataStream.send. If false, all data sends are rejected. |
canSubscribeData | boolean | Gates receiving data โ room.pubsub.subscribe / getHistory and room.dataStream. Default true; set false for a send-only or data-cut-off participant. Per-message delivery is separately controlled at publish via PublishOpts.to. |
canRecord | boolean | May start / stop server-side recording for the room. |
canHls | boolean | May start / stop HLS output for the room. |
canLivestream | boolean | May start / stop RTMP livestream output for the room. |
canTranscribe | boolean | May start / stop transcription for the room. |
canWhiteboard | boolean | May start / stop the server-hosted whiteboard for the room. |
canModerate | boolean | May control other participants' media โ request / force-unpublish their streams โ plus remove() participants and end the room. Acting on a target you lack moderate rights for rejects with INVALID_PERMISSIONS. |
isViewer is not a grant flag. It sits beside grant, not inside it. It decides the participant tier (on-stage vs audience), not a capability. An on-stage participant (isViewer: false) with an empty grant is still rostered and fires participant-joined โ they just can't do anything. See Tier & participant events.Sample token
A host token โ on-stage, full media, data, all services, and moderation.
{
"roomId": "team-standup",
"participantId": "alice-42",
"isViewer": false,
"joinPolicy": { "mode": "direct" },
"grant": {
"canPublish": true,
"canPublishSources": ["camera", "microphone", "screen"],
"canSubscribe": true,
"canPublishData": true,
"canSubscribeData": true,
"canRecord": true,
"canHls": true,
"canLivestream": true,
"canTranscribe": true,
"canWhiteboard": true,
"canModerate": true
},
"iss": "vsdk_live_a1b2c3d4",
"iat": 1716800000,
"nbf": 1716800000,
"exp": 1716803600,
"jti": "e8c1f0a2-7b3d-4e6f-9a01-2c3d4e5f6071"
}
Tier & participant events
isViewer decides whether a participant is on-stage (speaker) or audience (viewer). The two tiers reach the client through different surfaces:
| Tier | isViewer | Per-participant events | Enumeration |
|---|---|---|---|
| On-stage (speaker) | false |
Individual participant-joined / participant-left, lazy + retroactive on subscribe. |
room.remoteParticipants (sync Map) |
| Audience (viewer) | true |
None. Not rostered client-side as individual RemoteParticipant objects โ keeps large rooms scalable. | Live count via participant-count-changed ยท paginated pull via room.getParticipants({ tier: 'viewer' }) |
Net effect: a 10-person stage with a 10,000-person audience produces 10 participant events, not 10,010 โ the 10,000 viewers are reflected only in the coalesced participant-count-changed event ({ speaker, viewer, total }) and become enumerable on demand via getParticipants.
isViewer: false โ there's no client-side opt-in to get individual events out of a viewer-tier participant. Pushing the choice up to token-mint is intentional.Errors
All auth failures surface on VideoSDK.join() as a rejected Promise with kind === ErrorKind.Auth. Catch the whole class with err.kind === ErrorKind.Auth and branch on err.code if you need specifics. Full inventory: Errors โ VideoSDK.join().
| Code | Kind | Cause |
|---|---|---|
INVALID_API_KEY | Auth | The iss API key is revoked or never valid. |
INVALID_TOKEN | Auth | Token missing, malformed, badly signed, or expired (past exp). |
INVALID_PERMISSIONS | Auth | The grant doesn't permit the attempted action (e.g. publishing without canPublish, moderating without canModerate). |
UNAUTHORIZED_ROOM | Auth | Token's roomId doesn't match the room being joined. |
UNAUTHORIZED_PARTICIPANT | Auth | Token's participantId doesn't match the identity being joined. |
Open questions
Two token-related design questions still being decided. The shapes below are the options on the table; none are locked. Until they are, mint an exp comfortably longer than your expected session length, and rely on token re-mint + rejoin for grant changes.
Q1 โ What happens when the token expires mid-call?
Tokens have an exp claim; long meetings outlive a single mint. The SDK needs a way to obtain a fresh token without disconnecting the user.
| Option | Shape | Pros | Cons |
|---|---|---|---|
| Provider callback (Stream, Ably, Sendbird) |
JoinOptions.tokenProvider: () => Promise<string> โ SDK calls it for initial join, pre-expiry refresh, and reconnect. |
One function covers init + refresh + reconnect; SDK owns timing (proactive); reconnect just works. | Customer's mint endpoint gets called repeatedly (cacheable). |
| Event + method (Twilio, Agora) |
token: string for init + room.on('token-expiring', โฆ) + room.updateToken(newToken). |
Token field stays a string; event-driven is familiar. | Two paths (init vs refresh); app must wire both; SDK can't be as proactive. |
| Server-driven refresh (LiveKit) |
Server quietly re-issues short tokens; client mostly invisible. | Zero client code; transparent to the app. | Requires server machinery we'd have to build; doesn't cover permission changes; client must persist the refreshed token across reconnects. |
| No refresh (Daily) |
Tokens are long-lived; mint with exp past your max meeting length. |
Simplest โ no SDK surface at all. | Larger blast radius if a token leaks. |
Current lean: provider callback + keep token: string for the simple short-call path. Sub-questions to settle if we go that way: how far before exp to pre-refresh (default 30โ60s), the retry / backoff policy on provider failure, and the disconnect reason if refresh ultimately fails (likely DisconnectReason.TokenExpired).
Q2 โ How does a participant's grant change mid-call?
Today the grant is set at token-mint time and frozen for the session. Real cases need to change it: host promotes a viewer to speaker, demotes a speaker, revokes recording mid-meeting, flips isViewer. How does the new grant reach the running client?
| Option | Shape | Pros | Cons |
|---|---|---|---|
| Refresh-driven (Twilio, Agora) |
Backend mints a new token with the new grant; client receives it via whichever refresh mechanism Q1 picks (provider callback or updateToken). |
One mechanism for both expiry and permission changes โ nothing new on the wire. | Always a client round-trip (fetch the new token); permission changes are coupled to whatever Q1 lands on. |
Server-side updateParticipant REST(LiveKit) |
Backend calls a server REST endpoint; the SFU pushes the new grant to the target client and fires a grant-changed event. No new token. |
Admin can flip a flag without involving the target's app code; immediate (no client round-trip); cleanly decoupled from token lifecycle. | A whole new surface (REST API + push event); overlaps with the refresh path conceptually. |
| Hybrid | Refresh-driven in v1; updateParticipant added later as an additive optimization. |
Smallest v1 surface; non-breaking to add the REST path later. | Deferred work for v2 customers who want server-side flips. |
Current lean: hybrid โ refresh-driven for v1 (one mechanism for both expiry and permission changes), updateParticipant deferred. Sub-questions: should the client emit a grant-changed event when its own grant updates (so the app can re-render UI / disable buttons)? Should an isViewer flip mid-call retro-fire participant-joined / participant-left as the participant moves between on-stage and audience tiers?
Example
The two halves of the flow: your backend mints a scoped token with a grant; the client passes it to join().
// === YOUR BACKEND โ never ship the secret to the client ===
import jwt from 'jsonwebtoken';
function mintToken({ roomId, participantId, role }) {
const grant =
role === 'host'
? {
canPublish: true,
canPublishSources: ['camera', 'microphone', 'screen'],
canSubscribe: true,
canPublishData: true,
canRecord: true, canHls: true, canLivestream: true, canTranscribe: true,
canModerate: true,
}
: {
canPublish: true,
canPublishSources: ['camera', 'microphone'],
canSubscribe: true,
canPublishData: true,
canRecord: false, canHls: false, canLivestream: false, canTranscribe: false,
canModerate: false,
};
return jwt.sign(
{
roomId,
participantId,
isViewer: false, // on-stage
grant,
},
process.env.VIDEOSDK_APP_SECRET, // signing key โ backend only
{
algorithm: 'HS256',
issuer: process.env.VIDEOSDK_API_KEY, // iss
expiresIn: '1h', // exp
},
);
}
// Expose behind your own auth โ e.g. POST /api/meeting-token
app.post('/api/meeting-token', requireLogin, (req, res) => {
const token = mintToken({
roomId: req.body.roomId,
participantId: req.user.id,
role: req.user.role,
});
res.json({ token });
});
// === CLIENT โ never mints a token; just fetches and joins ===
import { VideoSDK, ErrorKind } from '@videosdk/js';
const { token } = await fetch('/api/meeting-token', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ roomId: 'team-standup' }),
}).then((r) => r.json());
try {
const room = await VideoSDK.join({ token, roomId: 'team-standup', name: 'Alice' });
console.log('connected', room.localParticipant.id);
} catch (err) {
if (err.kind === ErrorKind.Auth) showLoginUI(); // any auth failure
else showGenericError(err);
}
See also: Waiting Lobby VideoSDK.join() JoinOptions Room โ Events Errors โ VideoSDK.join()