Authentication concept

An auth token is a JWT your backend generates โ€” signed with your API key + secret โ€” and passed to VideoSDK.join() as the token field of JoinOptions. It is never generated on the client: minting requires the app secret, which must stay on your server.

v1 uses a grant-based model. The token carries a grant object โ€” a set of capability flags โ€” that the SFU enforces server-side. The SDK doesn't gate anything locally; the token is the source of truth.

Token in, room out. The client never sees your API secret. Your backend signs a short-lived JWT scoped to one room with one grant, hands it to the client, and the client calls VideoSDK.join({ token }). The SFU validates the signature and enforces the grant on every publish / subscribe / moderate action.
Why grant-based, not role-based. The token carries the capabilities directly โ€” there is no dashboard / role-definition step in the loop. Your backend states what this participant may do per token, and the SFU enforces it. Alternatives considered:
  • Role-based โ€” token names a role (e.g. 'host') and the server expands it. Cleaner backend DX, but requires a server-side role catalog the customer manages. Kept as a planned non-breaking wrapper over the grant โ€” a role just expands to { grant, isViewer }, so it layers on without changing the wire.
  • is_owner + feature flags (Daily-style) โ€” one "owner" boolean plus a handful of flags. Too coarse to express our 11 distinct capabilities cleanly.
  • Role + per-privilege expiry (Agora-style) โ€” every capability gets its own expiry. Complexity v1 doesn't need; one exp per token suffices.
  • Hybrid role + permission overrides (v0-style) โ€” two ways to grant a capability โ‡’ precedence confusion. One source of truth.
Net effect: the SDK ships self-contained, granular, with zero dashboard setup, and we don't lose the role option later.

Token claims

The JWT payload. roomId and grant are the load-bearing claims; the rest are standard JWT registered claims plus two VideoSDK-specific identity/tier markers.

// JWT payload โ€” signed with the app secret (HS256) { "roomId": string, // optional โ€” scopes to one room; omit = domain-wide "participantId": string, // optional โ€” stable identity; SDK generates one if omitted "isViewer": boolean, // default false โ€” tier marker (on-stage vs audience) "joinPolicy": JoinPolicy, // optional โ€” entry behavior; default { mode: 'direct' }. See Waiting Lobby. "grant": Grant, // the capability object โ€” see below // === Standard JWT registered claims === "iss": string, // your API key "exp": number, // expiry (unix seconds) โ€” keep short-lived "iat": number, // issued-at "nbf": number, // not-before (optional) "jti": string // unique token id (optional) }
ClaimRequiredDescription
roomIdNoThe room this token authorizes. Optional: set โ†’ scoped to one room (joining a different roomId rejects with UNAUTHORIZED_ROOM); omitted โ†’ domain-wide (any room in the project), bounded by guardrails โ€” see roomId scoping.
participantIdNoStable identity for this participant. If omitted, the SDK generates one. If set, it pins the identity โ€” joining with a mismatched participantId in JoinOptions rejects with UNAUTHORIZED_PARTICIPANT.
isViewerNo (default false)Tier marker, not a capability. false = on-stage (rostered, fires participant-joined); true = audience (count-only). A sibling of grant โ€” see Tier & participant events.
joinPolicyNo (default { mode: 'direct' })Entry behavior, not a capability. { mode: 'direct' } = valid token โ†’ straight in; { mode: 'ask', ttl?: number } = the joiner is held in a lobby until a moderator admits or denies. A sibling of grant. Setting mode: 'ask' on a token that also carries canModerate is rejected as INVALID_ENTRY_CLAIM. Full reference: Waiting Lobby.
grantYesThe capability object. Defines what this token may do โ€” publish, subscribe, record, moderate, etc. See The grant.
issYesYour API key. Identifies which application minted the token. An unknown / revoked key rejects with INVALID_API_KEY.
expYesExpiry (unix seconds). An expired token rejects with INVALID_TOKEN. Keep tokens short-lived.
iat / nbf / jtiNoStandard JWT registered claims โ€” issued-at, not-before, and a unique token id.
Signed with the app secret (HS256). The token is an HS256 JWT signed with your application secret. Because signing needs the secret, tokens are always backend-generated โ€” there is no client-side mint path. The iss claim carries the API key (public); the secret is the signing key (never leaves your server).
Waiting lobby lives behind joinPolicy. The joinPolicy claim is the single switch that opts a participant into the knock-and-admit flow. The lobby's joiner / moderator APIs, the EntryRequest shape, the timeout and retry semantics, and the full set of worked examples (telehealth, webinar green rooms, late-joining moderator) all live on Waiting Lobby.

roomId scoping โ€” optional, with guardrails

roomId is optional. Present โ†’ the token is scoped to that one room (joining a different roomId rejects with UNAUTHORIZED_ROOM). Omitted โ†’ domain-wide (any room in the project). Two guardrails bound the risk of domain-wide tokens.

The guardrails

GuardrailWhat it does
Shorter max exp when roomlessA domain-wide token gets a tighter expiry ceiling than a room-scoped one โ€” a leaked any-room token can't live long. (Agora applies the same cap to its wildcard tokens โ€” 24h.)
No privileged grants when roomlessA domain-wide token may not carry canModerate, canRecord, canHls, or canLivestream โ€” only join + publish / subscribe + data. Privileged actions stay room-scoped. So a leaked domain-wide token can never moderate-or-record every room.

Why this shape

The audience-token simplification. A domain-wide token (no roomId) combined with no participantId is the single-mint audience token โ€” one JWT distributed to 1,000+ viewers, each getting a fresh SDK-generated identity. Because privileged grants are blocked, the worst-case leak is "anyone can subscribe to any public room," not "anyone can moderate every room."

The grant

The capability object. Each flag is enforced server-side by the SFU. A grant with everything false is a no-op participant (connected, but can't publish, subscribe, or moderate). Mint the narrowest grant that fits the role.

interface Grant { canPublish: boolean; // may publish any media at all canPublishSources: ('camera' | 'microphone' | 'screen')[]; // which sources, when canPublish is true canSubscribe: boolean; // may receive others' media canPublishData: boolean; // send data โ€” room.pubsub.publish + room.dataStream.send canSubscribeData: boolean; // receive data โ€” room.pubsub.subscribe/getHistory + room.dataStream (default true) canRecord: boolean; // may start/stop recording canHls: boolean; // may start/stop HLS canLivestream: boolean; // may start/stop RTMP livestream canTranscribe: boolean; // may start/stop transcription canWhiteboard: boolean; // may start/stop the whiteboard canModerate: boolean; // may control others โ€” see below }
FlagTypeControls
canPublishbooleanWhether this participant may publish media at all. If false, every publish attempt is rejected server-side regardless of canPublishSources.
canPublishSources('camera' | 'microphone' | 'screen')[]When canPublish is true, restricts which sources may be published. E.g. ['microphone'] = audio-only; ['camera','microphone','screen'] = full.
canSubscribebooleanWhether this participant may receive (subscribe to) other participants' media.
canPublishDatabooleanGates sending data โ€” room.pubsub.publish and room.dataStream.send. If false, all data sends are rejected.
canSubscribeDatabooleanGates receiving data โ€” room.pubsub.subscribe / getHistory and room.dataStream. Default true; set false for a send-only or data-cut-off participant. Per-message delivery is separately controlled at publish via PublishOpts.to.
canRecordbooleanMay start / stop server-side recording for the room.
canHlsbooleanMay start / stop HLS output for the room.
canLivestreambooleanMay start / stop RTMP livestream output for the room.
canTranscribebooleanMay start / stop transcription for the room.
canWhiteboardbooleanMay start / stop the server-hosted whiteboard for the room.
canModeratebooleanMay control other participants' media โ€” request / force-unpublish their streams โ€” plus remove() participants and end the room. Acting on a target you lack moderate rights for rejects with INVALID_PERMISSIONS.
isViewer is not a grant flag. It sits beside grant, not inside it. It decides the participant tier (on-stage vs audience), not a capability. An on-stage participant (isViewer: false) with an empty grant is still rostered and fires participant-joined โ€” they just can't do anything. See Tier & participant events.

Sample token

A host token โ€” on-stage, full media, data, all services, and moderation.

Example โ€” decoded host token payload
{
  "roomId": "team-standup",
  "participantId": "alice-42",
  "isViewer": false,
  "joinPolicy": { "mode": "direct" },

  "grant": {
    "canPublish": true,
    "canPublishSources": ["camera", "microphone", "screen"],
    "canSubscribe": true,
    "canPublishData": true,
    "canSubscribeData": true,
    "canRecord": true,
    "canHls": true,
    "canLivestream": true,
    "canTranscribe": true,
    "canWhiteboard": true,
    "canModerate": true
  },

  "iss": "vsdk_live_a1b2c3d4",
  "iat": 1716800000,
  "nbf": 1716800000,
  "exp": 1716803600,
  "jti": "e8c1f0a2-7b3d-4e6f-9a01-2c3d4e5f6071"
}

Tier & participant events

isViewer decides whether a participant is on-stage (speaker) or audience (viewer). The two tiers reach the client through different surfaces:

TierisViewerPer-participant eventsEnumeration
On-stage (speaker) false Individual participant-joined / participant-left, lazy + retroactive on subscribe. room.remoteParticipants (sync Map)
Audience (viewer) true None. Not rostered client-side as individual RemoteParticipant objects โ€” keeps large rooms scalable. Live count via participant-count-changed ยท paginated pull via room.getParticipants({ tier: 'viewer' })

Net effect: a 10-person stage with a 10,000-person audience produces 10 participant events, not 10,010 โ€” the 10,000 viewers are reflected only in the coalesced participant-count-changed event ({ speaker, viewer, total }) and become enumerable on demand via getParticipants.

The tier choice has teeth. If a customer needs individual events for someone, mint that participant's token with isViewer: false โ€” there's no client-side opt-in to get individual events out of a viewer-tier participant. Pushing the choice up to token-mint is intentional.
Audience visibility is a bandwidth concern, not a token grant. Whether an audience member can see the stage is decided client-side by lazy event subscription (subscribe to the streams you render), not by a capability flag in the grant. The grant governs what you may do; the tier governs how you're counted.

Errors

All auth failures surface on VideoSDK.join() as a rejected Promise with kind === ErrorKind.Auth. Catch the whole class with err.kind === ErrorKind.Auth and branch on err.code if you need specifics. Full inventory: Errors โ†’ VideoSDK.join().

CodeKindCause
INVALID_API_KEYAuthThe iss API key is revoked or never valid.
INVALID_TOKENAuthToken missing, malformed, badly signed, or expired (past exp).
INVALID_PERMISSIONSAuthThe grant doesn't permit the attempted action (e.g. publishing without canPublish, moderating without canModerate).
UNAUTHORIZED_ROOMAuthToken's roomId doesn't match the room being joined.
UNAUTHORIZED_PARTICIPANTAuthToken's participantId doesn't match the identity being joined.

Open questions

Two token-related design questions still being decided. The shapes below are the options on the table; none are locked. Until they are, mint an exp comfortably longer than your expected session length, and rely on token re-mint + rejoin for grant changes.

Q1 โ€” What happens when the token expires mid-call?

Tokens have an exp claim; long meetings outlive a single mint. The SDK needs a way to obtain a fresh token without disconnecting the user.

OptionShapeProsCons
Provider callback
(Stream, Ably, Sendbird)
JoinOptions.tokenProvider: () => Promise<string> โ€” SDK calls it for initial join, pre-expiry refresh, and reconnect. One function covers init + refresh + reconnect; SDK owns timing (proactive); reconnect just works. Customer's mint endpoint gets called repeatedly (cacheable).
Event + method
(Twilio, Agora)
token: string for init + room.on('token-expiring', โ€ฆ) + room.updateToken(newToken). Token field stays a string; event-driven is familiar. Two paths (init vs refresh); app must wire both; SDK can't be as proactive.
Server-driven refresh
(LiveKit)
Server quietly re-issues short tokens; client mostly invisible. Zero client code; transparent to the app. Requires server machinery we'd have to build; doesn't cover permission changes; client must persist the refreshed token across reconnects.
No refresh
(Daily)
Tokens are long-lived; mint with exp past your max meeting length. Simplest โ€” no SDK surface at all. Larger blast radius if a token leaks.

Current lean: provider callback + keep token: string for the simple short-call path. Sub-questions to settle if we go that way: how far before exp to pre-refresh (default 30โ€“60s), the retry / backoff policy on provider failure, and the disconnect reason if refresh ultimately fails (likely DisconnectReason.TokenExpired).

Q2 โ€” How does a participant's grant change mid-call?

Today the grant is set at token-mint time and frozen for the session. Real cases need to change it: host promotes a viewer to speaker, demotes a speaker, revokes recording mid-meeting, flips isViewer. How does the new grant reach the running client?

OptionShapeProsCons
Refresh-driven
(Twilio, Agora)
Backend mints a new token with the new grant; client receives it via whichever refresh mechanism Q1 picks (provider callback or updateToken). One mechanism for both expiry and permission changes โ€” nothing new on the wire. Always a client round-trip (fetch the new token); permission changes are coupled to whatever Q1 lands on.
Server-side updateParticipant REST
(LiveKit)
Backend calls a server REST endpoint; the SFU pushes the new grant to the target client and fires a grant-changed event. No new token. Admin can flip a flag without involving the target's app code; immediate (no client round-trip); cleanly decoupled from token lifecycle. A whole new surface (REST API + push event); overlaps with the refresh path conceptually.
Hybrid Refresh-driven in v1; updateParticipant added later as an additive optimization. Smallest v1 surface; non-breaking to add the REST path later. Deferred work for v2 customers who want server-side flips.

Current lean: hybrid โ€” refresh-driven for v1 (one mechanism for both expiry and permission changes), updateParticipant deferred. Sub-questions: should the client emit a grant-changed event when its own grant updates (so the app can re-render UI / disable buttons)? Should an isViewer flip mid-call retro-fire participant-joined / participant-left as the participant moves between on-stage and audience tiers?

Example

The two halves of the flow: your backend mints a scoped token with a grant; the client passes it to join().

Example โ€” backend: mint a scoped token (conceptual)
// === YOUR BACKEND โ€” never ship the secret to the client ===
import jwt from 'jsonwebtoken';

function mintToken({ roomId, participantId, role }) {
  const grant =
    role === 'host'
      ? {
          canPublish: true,
          canPublishSources: ['camera', 'microphone', 'screen'],
          canSubscribe: true,
          canPublishData: true,
          canRecord: true, canHls: true, canLivestream: true, canTranscribe: true,
          canModerate: true,
        }
      : {
          canPublish: true,
          canPublishSources: ['camera', 'microphone'],
          canSubscribe: true,
          canPublishData: true,
          canRecord: false, canHls: false, canLivestream: false, canTranscribe: false,
          canModerate: false,
        };

  return jwt.sign(
    {
      roomId,
      participantId,
      isViewer: false,        // on-stage
      grant,
    },
    process.env.VIDEOSDK_APP_SECRET,   // signing key โ€” backend only
    {
      algorithm: 'HS256',
      issuer: process.env.VIDEOSDK_API_KEY,   // iss
      expiresIn: '1h',                        // exp
    },
  );
}

// Expose behind your own auth โ€” e.g. POST /api/meeting-token
app.post('/api/meeting-token', requireLogin, (req, res) => {
  const token = mintToken({
    roomId: req.body.roomId,
    participantId: req.user.id,
    role: req.user.role,
  });
  res.json({ token });
});
Example โ€” client: fetch the token, then join
// === CLIENT โ€” never mints a token; just fetches and joins ===
import { VideoSDK, ErrorKind } from '@videosdk/js';

const { token } = await fetch('/api/meeting-token', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ roomId: 'team-standup' }),
}).then((r) => r.json());

try {
  const room = await VideoSDK.join({ token, roomId: 'team-standup', name: 'Alice' });
  console.log('connected', room.localParticipant.id);
} catch (err) {
  if (err.kind === ErrorKind.Auth) showLoginUI();   // any auth failure
  else                             showGenericError(err);
}

See also: Waiting Lobby VideoSDK.join() JoinOptions Room โ†’ Events Errors โ†’ VideoSDK.join()