When is REST enough for AI tasks?

REST is right for CRUD, short completions, and starting async jobs. Use SSE or WebSockets when the UI needs live progress or mid-task user input.

What is the hybrid WebSocket plus SSE pattern?

POST to create a run over REST, WebSocket for control events like need_input and tool_call, SSE for LLM token output on a separate endpoint.

SSE vs WebSocket vs REST API for Live AI

Q: Should I stream LLM tokens over WebSocket?

SSE or chunked HTTP is usually simpler and proxy-friendly. Use WebSocket for control and human-in-the-loop; SSE for token streams unless you need a single connection.

June 2026 · Published by Amar Kumar

AI products need the server to push live updates — task progress, tool calls, LLM tokens, and “need your input” prompts. You can bolt that on with REST polling, but it feels broken fast. This guide compares regular REST APIs, SSE streaming endpoints, and WebSockets with working code, then shows how to build a browser agent that asks for input and continues on the same connection.

Already streaming chat tokens? See How SSE Streaming Works in Chatbots for LLM-specific token plumbing. This post focuses on when to pick REST vs SSE vs WebSocket and how they combine for agents and long tasks.

Mental model

Three transport patterns cover almost every AI dashboard, agent UI, and ops console:

REST — start job, fetch state + SSE — stream tokens & logs + WebSocket — bidirectional control

REST — short request/response cycles. Create a run, upload a file, fetch JSON status when the client asks.
SSE — one HTTP response stays open; the server pushes lines of events. Perfect for LLM output and read-only progress.
WebSocket — persistent bidirectional channel. Perfect when the server must ask the user something mid-task and wait for an answer on the same session.

Regular REST API

The default pattern: client sends a request, server returns complete JSON, connection closes.

When REST is enough

CRUD on resources (POST /documents, GET /runs/{id})
Short LLM calls where time-to-first-token does not matter
Webhooks and server-to-server integrations

REST for long tasks — polling anti-pattern

A background job runs for two minutes. The browser polls every second:

// Client — polling (works, but wasteful)
async function waitForTask(taskId) {
  while (true) {
    const res = await fetch(`/api/tasks/${taskId}`);
    const task = await res.json();
    renderStatus(task.status, task.progress);
    if (task.status === "done" || task.status === "failed") return task;
    await new Promise((r) => setTimeout(r, 1000));
  }
}

# Server — regular REST handler (FastAPI)
from fastapi import FastAPI
from pydantic import BaseModel

app = FastAPI()
TASKS: dict[str, dict] = {}

class CreateTask(BaseModel):
    prompt: str

@app.post("/api/tasks")
def create_task(body: CreateTask):
    task_id = "task-123"
    TASKS[task_id] = {"status": "queued", "progress": 0, "result": None}
    # ... enqueue background worker ...
    return {"task_id": task_id}

@app.get("/api/tasks/{task_id}")
def get_task(task_id: str):
    return TASKS.get(task_id, {"error": "not found"})

Problems: up to 1s latency on updates, N users × poll rate = needless load, no way for the server to push need_input without the client guessing to poll faster.

SSE streaming API

Server-Sent Events keep a single HTTP response open. The server writes data: ...\n\n lines as events happen. Content-Type is text/event-stream.

When to use SSE

Streaming LLM tokens to the browser
Live logs and progress bars (server → client only)
Works through most HTTP proxies and CDNs (with buffering disabled)

SSE API example (FastAPI)

import asyncio
import json
from fastapi import FastAPI
from fastapi.responses import StreamingResponse

app = FastAPI()

async def task_event_stream(task_id: str):
    """Yield SSE frames as the worker makes progress."""
    yield f"event: status\ndata: {json.dumps({'task_id': task_id, 'status': 'running'})}\n\n"
    for step in range(1, 6):
        await asyncio.sleep(0.4)
        payload = {"task_id": task_id, "progress": step * 20, "message": f"Step {step}/5"}
        yield f"event: progress\ndata: {json.dumps(payload)}\n\n"
    yield f"event: done\ndata: {json.dumps({'task_id': task_id, 'result': 'ok'})}\n\n"

@app.get("/api/tasks/{task_id}/stream")
async def stream_task(task_id: str):
    return StreamingResponse(
        task_event_stream(task_id),
        media_type="text/event-stream",
        headers={
            "Cache-Control": "no-cache",
            "Connection": "keep-alive",
            "X-Accel-Buffering": "no",
        },
    )

Browser client — EventSource (GET)

const es = new EventSource(`/api/tasks/${taskId}/stream`);

es.addEventListener("progress", (e) => {
  const data = JSON.parse(e.data);
  document.getElementById("bar").style.width = `${data.progress}%`;
  log(data.message);
});

es.addEventListener("done", (e) => {
  const data = JSON.parse(e.data);
  log("Finished:", data.result);
  es.close();
});

es.onerror = () => {
  log("Stream error — EventSource will retry");
};

Browser client — fetch + ReadableStream (POST + SSE)

Chat APIs usually POST the user message, then read an SSE body from the response — EventSource only supports GET, so use fetch:

async function streamChat(message) {
  const res = await fetch("/api/chat/stream", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({ message }),
  });
  const reader = res.body.getReader();
  const decoder = new TextDecoder();
  let buffer = "";
  while (true) {
    const { value, done } = await reader.read();
    if (done) break;
    buffer += decoder.decode(value, { stream: true });
    const parts = buffer.split("\n\n");
    buffer = parts.pop() || "";
    for (const block of parts) {
      const line = block.split("\n").find((l) => l.startsWith("data:"));
      if (!line) continue;
      const data = JSON.parse(line.slice(5).trim());
      if (data.type === "token") appendToken(data.content);
      if (data.type === "done") return;
    }
  }
}

SSE limitation: the server cannot receive user input on the same HTTP response. You need a separate POST (or a WebSocket) to send the next message.

WebSocket API

WebSocket upgrades HTTP to a full-duplex TCP channel. Both sides send JSON frames anytime — ideal for agents that pause and ask questions.

When to use WebSocket

Human-in-the-loop agents (need_input → user reply → resume)
Live collaboration, multiple event types both directions
Single connection policy (mobile apps, strict firewalls)

WebSocket server (FastAPI)

import json
from fastapi import FastAPI, WebSocket, WebSocketDisconnect

app = FastAPI()

@app.websocket("/ws/agent/{run_id}")
async def agent_socket(ws: WebSocket, run_id: str):
    await ws.accept()
    await ws.send_json({"type": "status", "run_id": run_id, "state": "connected"})
    try:
        await ws.send_json({"type": "tool_call", "name": "search_docs", "args": {"q": "pricing"}})
        await ws.send_json({"type": "progress", "pct": 40, "message": "Searching..."})
        # Agent needs human approval
        await ws.send_json({
            "type": "need_input",
            "prompt": "Deploy to production? (yes/no)",
            "resume_token": "step-7",
        })
        while True:
            msg = await ws.receive_json()
            if msg.get("type") == "user_reply":
                answer = msg.get("text", "").strip().lower()
                if answer == "yes":
                    await ws.send_json({"type": "progress", "pct": 90, "message": "Deploying..."})
                    await ws.send_json({"type": "done", "result": "deployed"})
                else:
                    await ws.send_json({"type": "done", "result": "cancelled"})
                break
    except WebSocketDisconnect:
        pass

WebSocket client (browser)

const ws = new WebSocket(`wss://api.example.com/ws/agent/${runId}`);
const inputBox = document.getElementById("agent-input");
const logEl = document.getElementById("log");

ws.onmessage = (event) => {
  const msg = JSON.parse(event.data);
  if (msg.type === "progress") {
    logEl.textContent += `\n[${msg.pct}%] ${msg.message}`;
  }
  if (msg.type === "need_input") {
    inputBox.disabled = false;
    inputBox.placeholder = msg.prompt;
    inputBox.dataset.resumeToken = msg.resume_token;
  }
  if (msg.type === "done") {
    logEl.textContent += `\nDone: ${msg.result}`;
    inputBox.disabled = true;
  }
};

inputBox.addEventListener("keydown", (e) => {
  if (e.key !== "Enter" || inputBox.disabled) return;
  ws.send(JSON.stringify({
    type: "user_reply",
    text: inputBox.value,
    resume_token: inputBox.dataset.resumeToken,
  }));
  inputBox.value = "";
  inputBox.disabled = true;
});

Side-by-side comparison

	REST	SSE	WebSocket
Direction	Request → response	Server → client	Bidirectional
Connection	Short-lived	One long HTTP response	Persistent WS frame
Browser API	`fetch`	`EventSource` or `fetch` stream	`WebSocket`
LLM token stream	Poor (buffer full JSON)	Excellent	Good (but more code)
Mid-task user input	Awkward (new POST)	Awkward (separate POST)	Natural
Live task progress	Requires polling	Excellent	Excellent
Proxy / CDN	Easiest	Easy with no-buffer headers	Upgrade + sticky sessions
Reconnect	N/A	Built-in (EventSource)	You implement heartbeat
Complexity	Lowest	Low	Medium

Relative fit (0–10) for common AI product needs

Decision guide

You need…	Pick
Create resources, fetch JSON once	REST
Stream tokens or logs server → browser	SSE
Agent asks questions mid-run on same session	WebSocket
Chat + agent tools + approvals	WebSocket + SSE (hybrid)

The hybrid stack (WebSocket + SSE)

Production AI consoles often use both:

POST /api/runs (REST) — create run, return run_id
WebSocket /ws/runs/{id} — control plane: tool_call, need_input, error
GET /api/runs/{id}/tokens (SSE) — data plane: LLM token stream

// After POST /api/runs returns { run_id }
const { run_id } = await (await fetch("/api/runs", { method: "POST", body })).json();

const ws = new WebSocket(`/ws/runs/${run_id}`);
ws.onmessage = (e) => handleControl(JSON.parse(e.data));

const es = new EventSource(`/api/runs/${run_id}/tokens`);
es.onmessage = (e) => appendToken(JSON.parse(e.data).content);

Separating control (WebSocket) from content (SSE) keeps token parsers simple and avoids multiplexing text chunks with binary-ish frame ordering bugs.

Browser agent: ask input and continue

A minimal agent loop on the server:

async def run_agent(ws: WebSocket, run_id: str):
    await ws.send_json({"type": "status", "state": "planning"})
    plan = await llm_plan(run_id)
    for step in plan.steps:
        await ws.send_json({"type": "step", "name": step.name})
        if step.requires_approval:
            await ws.send_json({
                "type": "need_input",
                "prompt": step.approval_prompt,
            })
            reply = await wait_for_user_reply(ws)
            if reply != "yes":
                await ws.send_json({"type": "done", "state": "cancelled"})
                return
        result = await execute_tool(step)
        await ws.send_json({"type": "tool_result", "summary": result[:200]})
    await ws.send_json({"type": "done", "state": "completed"})

The browser enables the input box only when need_input arrives — not before. That is the UX difference between a chatbot (one message in, stream out) and an agent (multi-turn control on one session).

Wire formats

SSE frame

event: progress
data: {"pct": 60, "message": "Indexing chunk 120/200"}

event: token
data: {"content": "The"}

event: done
data: {"finish_reason": "stop"}

WebSocket JSON message

{"type": "need_input", "prompt": "Confirm delete?", "resume_token": "abc"}
{"type": "user_reply", "text": "yes", "resume_token": "abc"}

REST JSON (single shot)

{"task_id": "t-1", "status": "done", "progress": 100, "result": {"files": 42}}

Production checklist

SSE: Cache-Control: no-cache, X-Accel-Buffering: no, flush after each event
WebSocket: ping/pong or application heartbeat every 30s; auth on connect; close on idle timeout
REST: return 202 Accepted + Location for async jobs; use idempotent run_id
All: propagate AbortSignal so cancel stops LLM billing
Agents: persist run state server-side so refresh can resume via GET /runs/{id}

FAQ

Should I stream LLM tokens over WebSocket?

You can, but SSE or chunked HTTP is simpler and proxy-friendly. Use WebSocket for everything only if you must share one connection.

Is SSE the same as HTTP streaming?

SSE is a standard format (data: lines) on top of HTTP streaming. Many APIs stream newline-delimited JSON without SSE headers — same idea, different parsing.

When is REST still correct?

Always for mutations and fetches that complete in one round trip. Streaming transports complement REST; they rarely replace it entirely.

REST to start, SSE to stream, WebSocket to converse — that is the live AI stack.