SSE vs WebSocket vs REST API for Live AI

June 2026 · Published by Amar Kumar

AI products need the server to push live updates — task progress, tool calls, LLM tokens, and “need your input” prompts. You can bolt that on with REST polling, but it feels broken fast. This guide compares regular REST APIs, SSE streaming endpoints, and WebSockets with working code, then shows how to build a browser agent that asks for input and continues on the same connection.

Already streaming chat tokens? See How SSE Streaming Works in Chatbots for LLM-specific token plumbing. This post focuses on when to pick REST vs SSE vs WebSocket and how they combine for agents and long tasks.

Mental model

Three transport patterns cover almost every AI dashboard, agent UI, and ops console:

REST — start job, fetch state + SSE — stream tokens & logs + WebSocket — bidirectional control

Regular REST API

The default pattern: client sends a request, server returns complete JSON, connection closes.

When REST is enough

REST for long tasks — polling anti-pattern

A background job runs for two minutes. The browser polls every second:

// Client — polling (works, but wasteful)
async function waitForTask(taskId) {
  while (true) {
    const res = await fetch(`/api/tasks/${taskId}`);
    const task = await res.json();
    renderStatus(task.status, task.progress);
    if (task.status === "done" || task.status === "failed") return task;
    await new Promise((r) => setTimeout(r, 1000));
  }
}
# Server — regular REST handler (FastAPI)
from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI()
TASKS: dict[str, dict] = {}

class CreateTask(BaseModel):
    prompt: str

@app.post("/api/tasks")
def create_task(body: CreateTask):
    task_id = "task-123"
    TASKS[task_id] = {"status": "queued", "progress": 0, "result": None}
    # ... enqueue background worker ...
    return {"task_id": task_id}

@app.get("/api/tasks/{task_id}")
def get_task(task_id: str):
    return TASKS.get(task_id, {"error": "not found"})

Problems: up to 1s latency on updates, N users × poll rate = needless load, no way for the server to push need_input without the client guessing to poll faster.

SSE streaming API

Server-Sent Events keep a single HTTP response open. The server writes data: ...\n\n lines as events happen. Content-Type is text/event-stream.

When to use SSE

SSE API example (FastAPI)

import asyncio
import json
from fastapi import FastAPI
from fastapi.responses import StreamingResponse

app = FastAPI()

async def task_event_stream(task_id: str):
    """Yield SSE frames as the worker makes progress."""
    yield f"event: status\ndata: {json.dumps({'task_id': task_id, 'status': 'running'})}\n\n"
    for step in range(1, 6):
        await asyncio.sleep(0.4)
        payload = {"task_id": task_id, "progress": step * 20, "message": f"Step {step}/5"}
        yield f"event: progress\ndata: {json.dumps(payload)}\n\n"
    yield f"event: done\ndata: {json.dumps({'task_id': task_id, 'result': 'ok'})}\n\n"

@app.get("/api/tasks/{task_id}/stream")
async def stream_task(task_id: str):
    return StreamingResponse(
        task_event_stream(task_id),
        media_type="text/event-stream",
        headers={
            "Cache-Control": "no-cache",
            "Connection": "keep-alive",
            "X-Accel-Buffering": "no",
        },
    )

Browser client — EventSource (GET)

const es = new EventSource(`/api/tasks/${taskId}/stream`);

es.addEventListener("progress", (e) => {
  const data = JSON.parse(e.data);
  document.getElementById("bar").style.width = `${data.progress}%`;
  log(data.message);
});

es.addEventListener("done", (e) => {
  const data = JSON.parse(e.data);
  log("Finished:", data.result);
  es.close();
});

es.onerror = () => {
  log("Stream error — EventSource will retry");
};

Browser client — fetch + ReadableStream (POST + SSE)

Chat APIs usually POST the user message, then read an SSE body from the response — EventSource only supports GET, so use fetch:

async function streamChat(message) {
  const res = await fetch("/api/chat/stream", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({ message }),
  });
  const reader = res.body.getReader();
  const decoder = new TextDecoder();
  let buffer = "";
  while (true) {
    const { value, done } = await reader.read();
    if (done) break;
    buffer += decoder.decode(value, { stream: true });
    const parts = buffer.split("\n\n");
    buffer = parts.pop() || "";
    for (const block of parts) {
      const line = block.split("\n").find((l) => l.startsWith("data:"));
      if (!line) continue;
      const data = JSON.parse(line.slice(5).trim());
      if (data.type === "token") appendToken(data.content);
      if (data.type === "done") return;
    }
  }
}

SSE limitation: the server cannot receive user input on the same HTTP response. You need a separate POST (or a WebSocket) to send the next message.

WebSocket API

WebSocket upgrades HTTP to a full-duplex TCP channel. Both sides send JSON frames anytime — ideal for agents that pause and ask questions.

When to use WebSocket

WebSocket server (FastAPI)

import json
from fastapi import FastAPI, WebSocket, WebSocketDisconnect

app = FastAPI()

@app.websocket("/ws/agent/{run_id}")
async def agent_socket(ws: WebSocket, run_id: str):
    await ws.accept()
    await ws.send_json({"type": "status", "run_id": run_id, "state": "connected"})
    try:
        await ws.send_json({"type": "tool_call", "name": "search_docs", "args": {"q": "pricing"}})
        await ws.send_json({"type": "progress", "pct": 40, "message": "Searching..."})
        # Agent needs human approval
        await ws.send_json({
            "type": "need_input",
            "prompt": "Deploy to production? (yes/no)",
            "resume_token": "step-7",
        })
        while True:
            msg = await ws.receive_json()
            if msg.get("type") == "user_reply":
                answer = msg.get("text", "").strip().lower()
                if answer == "yes":
                    await ws.send_json({"type": "progress", "pct": 90, "message": "Deploying..."})
                    await ws.send_json({"type": "done", "result": "deployed"})
                else:
                    await ws.send_json({"type": "done", "result": "cancelled"})
                break
    except WebSocketDisconnect:
        pass

WebSocket client (browser)

const ws = new WebSocket(`wss://api.example.com/ws/agent/${runId}`);
const inputBox = document.getElementById("agent-input");
const logEl = document.getElementById("log");

ws.onmessage = (event) => {
  const msg = JSON.parse(event.data);
  if (msg.type === "progress") {
    logEl.textContent += `\n[${msg.pct}%] ${msg.message}`;
  }
  if (msg.type === "need_input") {
    inputBox.disabled = false;
    inputBox.placeholder = msg.prompt;
    inputBox.dataset.resumeToken = msg.resume_token;
  }
  if (msg.type === "done") {
    logEl.textContent += `\nDone: ${msg.result}`;
    inputBox.disabled = true;
  }
};

inputBox.addEventListener("keydown", (e) => {
  if (e.key !== "Enter" || inputBox.disabled) return;
  ws.send(JSON.stringify({
    type: "user_reply",
    text: inputBox.value,
    resume_token: inputBox.dataset.resumeToken,
  }));
  inputBox.value = "";
  inputBox.disabled = true;
});

Side-by-side comparison

RESTSSEWebSocket
DirectionRequest → responseServer → clientBidirectional
ConnectionShort-livedOne long HTTP responsePersistent WS frame
Browser APIfetchEventSource or fetch streamWebSocket
LLM token streamPoor (buffer full JSON)ExcellentGood (but more code)
Mid-task user inputAwkward (new POST)Awkward (separate POST)Natural
Live task progressRequires pollingExcellentExcellent
Proxy / CDNEasiestEasy with no-buffer headersUpgrade + sticky sessions
ReconnectN/ABuilt-in (EventSource)You implement heartbeat
ComplexityLowestLowMedium

Relative fit (0–10) for common AI product needs

Decision guide

You need…Pick
Create resources, fetch JSON onceREST
Stream tokens or logs server → browserSSE
Agent asks questions mid-run on same sessionWebSocket
Chat + agent tools + approvalsWebSocket + SSE (hybrid)

The hybrid stack (WebSocket + SSE)

Production AI consoles often use both:

  1. POST /api/runs (REST) — create run, return run_id
  2. WebSocket /ws/runs/{id} — control plane: tool_call, need_input, error
  3. GET /api/runs/{id}/tokens (SSE) — data plane: LLM token stream
// After POST /api/runs returns { run_id }
const { run_id } = await (await fetch("/api/runs", { method: "POST", body })).json();

const ws = new WebSocket(`/ws/runs/${run_id}`);
ws.onmessage = (e) => handleControl(JSON.parse(e.data));

const es = new EventSource(`/api/runs/${run_id}/tokens`);
es.onmessage = (e) => appendToken(JSON.parse(e.data).content);

Separating control (WebSocket) from content (SSE) keeps token parsers simple and avoids multiplexing text chunks with binary-ish frame ordering bugs.

Browser agent: ask input and continue

A minimal agent loop on the server:

async def run_agent(ws: WebSocket, run_id: str):
    await ws.send_json({"type": "status", "state": "planning"})
    plan = await llm_plan(run_id)
    for step in plan.steps:
        await ws.send_json({"type": "step", "name": step.name})
        if step.requires_approval:
            await ws.send_json({
                "type": "need_input",
                "prompt": step.approval_prompt,
            })
            reply = await wait_for_user_reply(ws)
            if reply != "yes":
                await ws.send_json({"type": "done", "state": "cancelled"})
                return
        result = await execute_tool(step)
        await ws.send_json({"type": "tool_result", "summary": result[:200]})
    await ws.send_json({"type": "done", "state": "completed"})

The browser enables the input box only when need_input arrives — not before. That is the UX difference between a chatbot (one message in, stream out) and an agent (multi-turn control on one session).

Wire formats

SSE frame

event: progress
data: {"pct": 60, "message": "Indexing chunk 120/200"}

event: token
data: {"content": "The"}

event: done
data: {"finish_reason": "stop"}

WebSocket JSON message

{"type": "need_input", "prompt": "Confirm delete?", "resume_token": "abc"}
{"type": "user_reply", "text": "yes", "resume_token": "abc"}

REST JSON (single shot)

{"task_id": "t-1", "status": "done", "progress": 100, "result": {"files": 42}}

Production checklist

FAQ

Should I stream LLM tokens over WebSocket?

You can, but SSE or chunked HTTP is simpler and proxy-friendly. Use WebSocket for everything only if you must share one connection.

Is SSE the same as HTTP streaming?

SSE is a standard format (data: lines) on top of HTTP streaming. Many APIs stream newline-delimited JSON without SSE headers — same idea, different parsing.

When is REST still correct?

Always for mutations and fetches that complete in one round trip. Streaming transports complement REST; they rarely replace it entirely.

REST to start, SSE to stream, WebSocket to converse — that is the live AI stack.