Authentication concept

An auth token is a JWT your backend generates — signed with your API key + secret — and passed to VideoSDK.join() as the token field of JoinOptions. It is never generated on the client: minting requires the app secret, which must stay on your server.

v1 uses a grant-based model. The token carries a grant object — a set of capability flags — that the SFU enforces server-side. The SDK doesn't gate anything locally; the token is the source of truth.

Token in, room out. The client never sees your API secret. Your backend signs a short-lived JWT scoped to one room with one grant, hands it to the client, and the client calls VideoSDK.join({ token }). The SFU validates the signature and enforces the grant on every publish / subscribe / moderate action.

Why grant-based, not role-based. The token carries the capabilities directly — there is no dashboard / role-definition step in the loop. Your backend states what this participant may do per token, and the SFU enforces it. Alternatives considered:

Role-based — token names a role (e.g. 'host') and the server expands it. Cleaner backend DX, but requires a server-side role catalog the customer manages. Kept as a planned non-breaking wrapper over the grant — a role just expands to { grant, isViewer }, so it layers on without changing the wire.
is_owner + feature flags (Daily-style) — one "owner" boolean plus a handful of flags. Too coarse to express our 11 distinct capabilities cleanly.
Role + per-privilege expiry (Agora-style) — every capability gets its own expiry. Complexity v1 doesn't need; one exp per token suffices.
Hybrid role + permission overrides (v0-style) — two ways to grant a capability ⇒ precedence confusion. One source of truth.

Net effect: the SDK ships self-contained, granular, with zero dashboard setup, and we don't lose the role option later.

Token claims

The JWT payload. roomId and grant are the load-bearing claims; the rest are standard JWT registered claims plus two VideoSDK-specific identity/tier markers.

// JWT payload — signed with the app secret (HS256) { "roomId": string, // optional — scopes to one room; omit = domain-wide "participantId": string, // optional — stable identity; SDK generates one if omitted "isViewer": boolean, // default false — tier marker (on-stage vs audience) "joinPolicy": JoinPolicy, // optional — entry behavior; default { mode: 'direct' }. See Waiting Lobby. "grant": Grant, // the capability object — see below // === Standard JWT registered claims === "iss": string, // your API key "exp": number, // expiry (unix seconds) — keep short-lived "iat": number, // issued-at "nbf": number, // not-before (optional) "jti": string // unique token id (optional) }

Claim	Required	Description
`roomId`	No	The room this token authorizes. Optional: set → scoped to one room (joining a different `roomId` rejects with `UNAUTHORIZED_ROOM`); omitted → domain-wide (any room in the project), bounded by guardrails — see `roomId` scoping.
`participantId`	No	Stable identity for this participant. If omitted, the SDK generates one. If set, it pins the identity — joining with a mismatched `participantId` in JoinOptions rejects with `UNAUTHORIZED_PARTICIPANT`.
`isViewer`	No (default `false`)	Tier marker, not a capability. `false` = on-stage (rostered, fires `participant-joined`); `true` = audience (count-only). A sibling of `grant` — see Tier & participant events.
`joinPolicy`	No (default `{ mode: 'direct' }`)	Entry behavior, not a capability. `{ mode: 'direct' }` = valid token → straight in; `{ mode: 'ask', ttl?: number }` = the joiner is held in a lobby until a moderator admits or denies. A sibling of `grant`. Setting `mode: 'ask'` on a token that also carries `canModerate` is rejected as `INVALID_ENTRY_CLAIM`. Full reference: Waiting Lobby.
`grant`	Yes	The capability object. Defines what this token may do — publish, subscribe, record, moderate, etc. See The grant.
`iss`	Yes	Your API key. Identifies which application minted the token. An unknown / revoked key rejects with `INVALID_API_KEY`.
`exp`	Yes	Expiry (unix seconds). An expired token rejects with `INVALID_TOKEN`. Keep tokens short-lived.
`iat` / `nbf` / `jti`	No	Standard JWT registered claims — issued-at, not-before, and a unique token id.

Signed with the app secret (HS256). The token is an HS256 JWT signed with your application secret. Because signing needs the secret, tokens are always backend-generated — there is no client-side mint path. The iss claim carries the API key (public); the secret is the signing key (never leaves your server).

Waiting lobby lives behind joinPolicy. The joinPolicy claim is the single switch that opts a participant into the knock-and-admit flow. The lobby's joiner / moderator APIs, the EntryRequest shape, the timeout and retry semantics, and the full set of worked examples (telehealth, webinar green rooms, late-joining moderator) all live on Waiting Lobby.

`roomId` scoping — optional, with guardrails

roomId is optional. Present → the token is scoped to that one room (joining a different roomId rejects with UNAUTHORIZED_ROOM). Omitted → domain-wide (any room in the project). Two guardrails bound the risk of domain-wide tokens.

The guardrails

Guardrail	What it does
Shorter max `exp` when roomless	A domain-wide token gets a tighter expiry ceiling than a room-scoped one — a leaked any-room token can't live long. (Agora applies the same cap to its wildcard tokens — 24h.)
No privileged grants when roomless	A domain-wide token may not carry `canModerate`, `canRecord`, `canHls`, or `canLivestream` — only join + publish / subscribe + data. Privileged actions stay room-scoped. So a leaked domain-wide token can never moderate-or-record every room.

Why this shape

Room-scoped is the secure default — least privilege; a leak hits one meeting. Required by LiveKit and 100ms for exactly this reason.
Domain-wide is a real convenience — one token for a multi-room viewer / lobby experience, or a trusted server token; mint before the room is known. Daily (omit room_name) and Agora (wildcard "*") both support it.
The guardrails close the blast-radius hole that makes the strict SDKs refuse it: short-lived + unprivileged means a domain-wide token is bounded in both time and power.

The audience-token simplification. A domain-wide token (no roomId) combined with no participantId is the single-mint audience token — one JWT distributed to 1,000+ viewers, each getting a fresh SDK-generated identity. Because privileged grants are blocked, the worst-case leak is "anyone can subscribe to any public room," not "anyone can moderate every room."

The grant

The capability object. Each flag is enforced server-side by the SFU. A grant with everything false is a no-op participant (connected, but can't publish, subscribe, or moderate). Mint the narrowest grant that fits the role.

interface Grant { canPublish: boolean; // may publish any media at all canPublishSources: ('camera' | 'microphone' | 'screen')[]; // which sources, when canPublish is true canSubscribe: boolean; // may receive others' media canPublishData: boolean; // send data — room.pubsub.publish + room.dataStream.send canSubscribeData: boolean; // receive data — room.pubsub.subscribe/getHistory + room.dataStream (default true) canRecord: boolean; // may start/stop recording canHls: boolean; // may start/stop HLS canLivestream: boolean; // may start/stop RTMP livestream canTranscribe: boolean; // may start/stop transcription canWhiteboard: boolean; // may start/stop the whiteboard canModerate: boolean; // may control others — see below }

Flag	Type	Controls
`canPublish`	`boolean`	Whether this participant may publish media at all. If `false`, every publish attempt is rejected server-side regardless of `canPublishSources`.
`canPublishSources`	`('camera' \| 'microphone' \| 'screen')[]`	When `canPublish` is `true`, restricts which sources may be published. E.g. `['microphone']` = audio-only; `['camera','microphone','screen']` = full.
`canSubscribe`	`boolean`	Whether this participant may receive (subscribe to) other participants' media.
`canPublishData`	`boolean`	Gates sending data — `room.pubsub.publish` and `room.dataStream.send`. If `false`, all data sends are rejected.
`canSubscribeData`	`boolean`	Gates receiving data — `room.pubsub.subscribe` / `getHistory` and `room.dataStream`. Default `true`; set `false` for a send-only or data-cut-off participant. Per-message delivery is separately controlled at publish via `PublishOpts.to`.
`canRecord`	`boolean`	May start / stop server-side recording for the room.
`canHls`	`boolean`	May start / stop HLS output for the room.
`canLivestream`	`boolean`	May start / stop RTMP livestream output for the room.
`canTranscribe`	`boolean`	May start / stop transcription for the room.
`canWhiteboard`	`boolean`	May start / stop the server-hosted whiteboard for the room.
`canModerate`	`boolean`	May control other participants' media — request / force-unpublish their streams — plus `remove()` participants and end the room. Acting on a target you lack moderate rights for rejects with `INVALID_PERMISSIONS`.

isViewer is not a grant flag. It sits beside grant, not inside it. It decides the participant tier (on-stage vs audience), not a capability. An on-stage participant (isViewer: false) with an empty grant is still rostered and fires participant-joined — they just can't do anything. See Tier & participant events.

Sample token

A host token — on-stage, full media, data, all services, and moderation.

Example — decoded host token payload

{
  "roomId": "team-standup",
  "participantId": "alice-42",
  "isViewer": false,
  "joinPolicy": { "mode": "direct" },

  "grant": {
    "canPublish": true,
    "canPublishSources": ["camera", "microphone", "screen"],
    "canSubscribe": true,
    "canPublishData": true,
    "canSubscribeData": true,
    "canRecord": true,
    "canHls": true,
    "canLivestream": true,
    "canTranscribe": true,
    "canWhiteboard": true,
    "canModerate": true
  },

  "iss": "vsdk_live_a1b2c3d4",
  "iat": 1716800000,
  "nbf": 1716800000,
  "exp": 1716803600,
  "jti": "e8c1f0a2-7b3d-4e6f-9a01-2c3d4e5f6071"
}

Tier & participant events

isViewer decides whether a participant is on-stage (speaker) or audience (viewer). The two tiers reach the client through different surfaces:

Tier	`isViewer`	Per-participant events	Enumeration
On-stage (speaker)	`false`	Individual `participant-joined` / `participant-left`, lazy + retroactive on subscribe.	`room.remoteParticipants` (sync Map)
Audience (viewer)	`true`	None. Not rostered client-side as individual RemoteParticipant objects — keeps large rooms scalable.	Live count via `participant-count-changed` · paginated pull via `room.getParticipants({ tier: 'viewer' })`

Net effect: a 10-person stage with a 10,000-person audience produces 10 participant events, not 10,010 — the 10,000 viewers are reflected only in the coalesced participant-count-changed event ({ speaker, viewer, total }) and become enumerable on demand via getParticipants.

The tier choice has teeth. If a customer needs individual events for someone, mint that participant's token with isViewer: false — there's no client-side opt-in to get individual events out of a viewer-tier participant. Pushing the choice up to token-mint is intentional.

Audience visibility is a bandwidth concern, not a token grant. Whether an audience member can see the stage is decided client-side by lazy event subscription (subscribe to the streams you render), not by a capability flag in the grant. The grant governs what you may do; the tier governs how you're counted.

Errors

All auth failures surface on VideoSDK.join() as a rejected Promise with kind === ErrorKind.Auth. Catch the whole class with err.kind === ErrorKind.Auth and branch on err.code if you need specifics. Full inventory: Errors → VideoSDK.join().

Code	Kind	Cause
`INVALID_API_KEY`	`Auth`	The `iss` API key is revoked or never valid.
`INVALID_TOKEN`	`Auth`	Token missing, malformed, badly signed, or expired (past `exp`).
`INVALID_PERMISSIONS`	`Auth`	The `grant` doesn't permit the attempted action (e.g. publishing without `canPublish`, moderating without `canModerate`).
`UNAUTHORIZED_ROOM`	`Auth`	Token's `roomId` doesn't match the room being joined.
`UNAUTHORIZED_PARTICIPANT`	`Auth`	Token's `participantId` doesn't match the identity being joined.

Open questions

Two token-related design questions still being decided. The shapes below are the options on the table; none are locked. Until they are, mint an exp comfortably longer than your expected session length, and rely on token re-mint + rejoin for grant changes.

Q1 — What happens when the token expires mid-call?

Tokens have an exp claim; long meetings outlive a single mint. The SDK needs a way to obtain a fresh token without disconnecting the user.

Option	Shape	Pros	Cons
Provider callback (Stream, Ably, Sendbird)	`JoinOptions.tokenProvider: () => Promise<string>` — SDK calls it for initial join, pre-expiry refresh, and reconnect.	One function covers init + refresh + reconnect; SDK owns timing (proactive); reconnect just works.	Customer's mint endpoint gets called repeatedly (cacheable).
Event + method (Twilio, Agora)	`token: string` for init + `room.on('token-expiring', …)` + `room.updateToken(newToken)`.	Token field stays a string; event-driven is familiar.	Two paths (init vs refresh); app must wire both; SDK can't be as proactive.
Server-driven refresh (LiveKit)	Server quietly re-issues short tokens; client mostly invisible.	Zero client code; transparent to the app.	Requires server machinery we'd have to build; doesn't cover permission changes; client must persist the refreshed token across reconnects.
No refresh (Daily)	Tokens are long-lived; mint with `exp` past your max meeting length.	Simplest — no SDK surface at all.	Larger blast radius if a token leaks.

Current lean: provider callback + keep token: string for the simple short-call path. Sub-questions to settle if we go that way: how far before exp to pre-refresh (default 30–60s), the retry / backoff policy on provider failure, and the disconnect reason if refresh ultimately fails (likely DisconnectReason.TokenExpired).

Q2 — How does a participant's grant change mid-call?

Today the grant is set at token-mint time and frozen for the session. Real cases need to change it: host promotes a viewer to speaker, demotes a speaker, revokes recording mid-meeting, flips isViewer. How does the new grant reach the running client?

Option	Shape	Pros	Cons
Refresh-driven (Twilio, Agora)	Backend mints a new token with the new grant; client receives it via whichever refresh mechanism Q1 picks (provider callback or `updateToken`).	One mechanism for both expiry and permission changes — nothing new on the wire.	Always a client round-trip (fetch the new token); permission changes are coupled to whatever Q1 lands on.
Server-side `updateParticipant` REST (LiveKit)	Backend calls a server REST endpoint; the SFU pushes the new grant to the target client and fires a `grant-changed` event. No new token.	Admin can flip a flag without involving the target's app code; immediate (no client round-trip); cleanly decoupled from token lifecycle.	A whole new surface (REST API + push event); overlaps with the refresh path conceptually.
Hybrid	Refresh-driven in v1; `updateParticipant` added later as an additive optimization.	Smallest v1 surface; non-breaking to add the REST path later.	Deferred work for v2 customers who want server-side flips.

Current lean: hybrid — refresh-driven for v1 (one mechanism for both expiry and permission changes), updateParticipant deferred. Sub-questions: should the client emit a grant-changed event when its own grant updates (so the app can re-render UI / disable buttons)? Should an isViewer flip mid-call retro-fire participant-joined / participant-left as the participant moves between on-stage and audience tiers?

Example

The two halves of the flow: your backend mints a scoped token with a grant; the client passes it to join().

Example — backend: mint a scoped token (conceptual)

// === YOUR BACKEND — never ship the secret to the client ===
import jwt from 'jsonwebtoken';

function mintToken({ roomId, participantId, role }) {
  const grant =
    role === 'host'
      ? {
          canPublish: true,
          canPublishSources: ['camera', 'microphone', 'screen'],
          canSubscribe: true,
          canPublishData: true,
          canRecord: true, canHls: true, canLivestream: true, canTranscribe: true,
          canModerate: true,
        }
      : {
          canPublish: true,
          canPublishSources: ['camera', 'microphone'],
          canSubscribe: true,
          canPublishData: true,
          canRecord: false, canHls: false, canLivestream: false, canTranscribe: false,
          canModerate: false,
        };

  return jwt.sign(
    {
      roomId,
      participantId,
      isViewer: false,        // on-stage
      grant,
    },
    process.env.VIDEOSDK_APP_SECRET,   // signing key — backend only
    {
      algorithm: 'HS256',
      issuer: process.env.VIDEOSDK_API_KEY,   // iss
      expiresIn: '1h',                        // exp
    },
  );
}

// Expose behind your own auth — e.g. POST /api/meeting-token
app.post('/api/meeting-token', requireLogin, (req, res) => {
  const token = mintToken({
    roomId: req.body.roomId,
    participantId: req.user.id,
    role: req.user.role,
  });
  res.json({ token });
});

Example — client: fetch the token, then join

// === CLIENT — never mints a token; just fetches and joins ===
import { VideoSDK, ErrorKind } from '@videosdk/js';

const { token } = await fetch('/api/meeting-token', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ roomId: 'team-standup' }),
}).then((r) => r.json());

try {
  const room = await VideoSDK.join({ token, roomId: 'team-standup', name: 'Alice' });
  console.log('connected', room.localParticipant.id);
} catch (err) {
  if (err.kind === ErrorKind.Auth) showLoginUI();   // any auth failure
  else                             showGenericError(err);
}

Authentication concept

Token claims

roomId scoping — optional, with guardrails

The guardrails

Why this shape

The grant

Sample token

Tier & participant events

Errors

Open questions

Q1 — What happens when the token expires mid-call?

Q2 — How does a participant's grant change mid-call?

Example

`roomId` scoping — optional, with guardrails