SSE vs WebSocket vs REST API for Live AI
AI products need the server to push live updates — task progress, tool calls, LLM tokens, and “need your input” prompts. You can bolt that on with REST polling, but it feels broken fast. This guide compares regular REST APIs, SSE streaming endpoints, and WebSockets with working code, then shows how to build a browser agent that asks for input and continues on the same connection.
Already streaming chat tokens? See How SSE Streaming Works in Chatbots for LLM-specific token plumbing. This post focuses on when to pick REST vs SSE vs WebSocket and how they combine for agents and long tasks.
Mental model
Three transport patterns cover almost every AI dashboard, agent UI, and ops console:
- REST — short request/response cycles. Create a run, upload a file, fetch JSON status when the client asks.
- SSE — one HTTP response stays open; the server pushes lines of events. Perfect for LLM output and read-only progress.
- WebSocket — persistent bidirectional channel. Perfect when the server must ask the user something mid-task and wait for an answer on the same session.
Regular REST API
The default pattern: client sends a request, server returns complete JSON, connection closes.
When REST is enough
- CRUD on resources (
POST /documents,GET /runs/{id}) - Short LLM calls where time-to-first-token does not matter
- Webhooks and server-to-server integrations
REST for long tasks — polling anti-pattern
A background job runs for two minutes. The browser polls every second:
// Client — polling (works, but wasteful)
async function waitForTask(taskId) {
while (true) {
const res = await fetch(`/api/tasks/${taskId}`);
const task = await res.json();
renderStatus(task.status, task.progress);
if (task.status === "done" || task.status === "failed") return task;
await new Promise((r) => setTimeout(r, 1000));
}
}
# Server — regular REST handler (FastAPI)
from fastapi import FastAPI
from pydantic import BaseModel
app = FastAPI()
TASKS: dict[str, dict] = {}
class CreateTask(BaseModel):
prompt: str
@app.post("/api/tasks")
def create_task(body: CreateTask):
task_id = "task-123"
TASKS[task_id] = {"status": "queued", "progress": 0, "result": None}
# ... enqueue background worker ...
return {"task_id": task_id}
@app.get("/api/tasks/{task_id}")
def get_task(task_id: str):
return TASKS.get(task_id, {"error": "not found"})
Problems: up to 1s latency on updates, N users × poll rate = needless load, no way for the server to push need_input without the client guessing to poll faster.
SSE streaming API
Server-Sent Events keep a single HTTP response open. The server writes data: ...\n\n lines as events happen. Content-Type is text/event-stream.
When to use SSE
- Streaming LLM tokens to the browser
- Live logs and progress bars (server → client only)
- Works through most HTTP proxies and CDNs (with buffering disabled)
SSE API example (FastAPI)
import asyncio
import json
from fastapi import FastAPI
from fastapi.responses import StreamingResponse
app = FastAPI()
async def task_event_stream(task_id: str):
"""Yield SSE frames as the worker makes progress."""
yield f"event: status\ndata: {json.dumps({'task_id': task_id, 'status': 'running'})}\n\n"
for step in range(1, 6):
await asyncio.sleep(0.4)
payload = {"task_id": task_id, "progress": step * 20, "message": f"Step {step}/5"}
yield f"event: progress\ndata: {json.dumps(payload)}\n\n"
yield f"event: done\ndata: {json.dumps({'task_id': task_id, 'result': 'ok'})}\n\n"
@app.get("/api/tasks/{task_id}/stream")
async def stream_task(task_id: str):
return StreamingResponse(
task_event_stream(task_id),
media_type="text/event-stream",
headers={
"Cache-Control": "no-cache",
"Connection": "keep-alive",
"X-Accel-Buffering": "no",
},
)
Browser client — EventSource (GET)
const es = new EventSource(`/api/tasks/${taskId}/stream`);
es.addEventListener("progress", (e) => {
const data = JSON.parse(e.data);
document.getElementById("bar").style.width = `${data.progress}%`;
log(data.message);
});
es.addEventListener("done", (e) => {
const data = JSON.parse(e.data);
log("Finished:", data.result);
es.close();
});
es.onerror = () => {
log("Stream error — EventSource will retry");
};
Browser client — fetch + ReadableStream (POST + SSE)
Chat APIs usually POST the user message, then read an SSE body from the response — EventSource only supports GET, so use fetch:
async function streamChat(message) {
const res = await fetch("/api/chat/stream", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({ message }),
});
const reader = res.body.getReader();
const decoder = new TextDecoder();
let buffer = "";
while (true) {
const { value, done } = await reader.read();
if (done) break;
buffer += decoder.decode(value, { stream: true });
const parts = buffer.split("\n\n");
buffer = parts.pop() || "";
for (const block of parts) {
const line = block.split("\n").find((l) => l.startsWith("data:"));
if (!line) continue;
const data = JSON.parse(line.slice(5).trim());
if (data.type === "token") appendToken(data.content);
if (data.type === "done") return;
}
}
}
SSE limitation: the server cannot receive user input on the same HTTP response. You need a separate POST (or a WebSocket) to send the next message.
WebSocket API
WebSocket upgrades HTTP to a full-duplex TCP channel. Both sides send JSON frames anytime — ideal for agents that pause and ask questions.
When to use WebSocket
- Human-in-the-loop agents (
need_input→ user reply → resume) - Live collaboration, multiple event types both directions
- Single connection policy (mobile apps, strict firewalls)
WebSocket server (FastAPI)
import json
from fastapi import FastAPI, WebSocket, WebSocketDisconnect
app = FastAPI()
@app.websocket("/ws/agent/{run_id}")
async def agent_socket(ws: WebSocket, run_id: str):
await ws.accept()
await ws.send_json({"type": "status", "run_id": run_id, "state": "connected"})
try:
await ws.send_json({"type": "tool_call", "name": "search_docs", "args": {"q": "pricing"}})
await ws.send_json({"type": "progress", "pct": 40, "message": "Searching..."})
# Agent needs human approval
await ws.send_json({
"type": "need_input",
"prompt": "Deploy to production? (yes/no)",
"resume_token": "step-7",
})
while True:
msg = await ws.receive_json()
if msg.get("type") == "user_reply":
answer = msg.get("text", "").strip().lower()
if answer == "yes":
await ws.send_json({"type": "progress", "pct": 90, "message": "Deploying..."})
await ws.send_json({"type": "done", "result": "deployed"})
else:
await ws.send_json({"type": "done", "result": "cancelled"})
break
except WebSocketDisconnect:
pass
WebSocket client (browser)
const ws = new WebSocket(`wss://api.example.com/ws/agent/${runId}`);
const inputBox = document.getElementById("agent-input");
const logEl = document.getElementById("log");
ws.onmessage = (event) => {
const msg = JSON.parse(event.data);
if (msg.type === "progress") {
logEl.textContent += `\n[${msg.pct}%] ${msg.message}`;
}
if (msg.type === "need_input") {
inputBox.disabled = false;
inputBox.placeholder = msg.prompt;
inputBox.dataset.resumeToken = msg.resume_token;
}
if (msg.type === "done") {
logEl.textContent += `\nDone: ${msg.result}`;
inputBox.disabled = true;
}
};
inputBox.addEventListener("keydown", (e) => {
if (e.key !== "Enter" || inputBox.disabled) return;
ws.send(JSON.stringify({
type: "user_reply",
text: inputBox.value,
resume_token: inputBox.dataset.resumeToken,
}));
inputBox.value = "";
inputBox.disabled = true;
});
Side-by-side comparison
| REST | SSE | WebSocket | |
|---|---|---|---|
| Direction | Request → response | Server → client | Bidirectional |
| Connection | Short-lived | One long HTTP response | Persistent WS frame |
| Browser API | fetch | EventSource or fetch stream | WebSocket |
| LLM token stream | Poor (buffer full JSON) | Excellent | Good (but more code) |
| Mid-task user input | Awkward (new POST) | Awkward (separate POST) | Natural |
| Live task progress | Requires polling | Excellent | Excellent |
| Proxy / CDN | Easiest | Easy with no-buffer headers | Upgrade + sticky sessions |
| Reconnect | N/A | Built-in (EventSource) | You implement heartbeat |
| Complexity | Lowest | Low | Medium |
Relative fit (0–10) for common AI product needs
Decision guide
| You need… | Pick |
|---|---|
| Create resources, fetch JSON once | REST |
| Stream tokens or logs server → browser | SSE |
| Agent asks questions mid-run on same session | WebSocket |
| Chat + agent tools + approvals | WebSocket + SSE (hybrid) |
The hybrid stack (WebSocket + SSE)
Production AI consoles often use both:
POST /api/runs(REST) — create run, returnrun_idWebSocket /ws/runs/{id}— control plane:tool_call,need_input,errorGET /api/runs/{id}/tokens(SSE) — data plane: LLM token stream
// After POST /api/runs returns { run_id }
const { run_id } = await (await fetch("/api/runs", { method: "POST", body })).json();
const ws = new WebSocket(`/ws/runs/${run_id}`);
ws.onmessage = (e) => handleControl(JSON.parse(e.data));
const es = new EventSource(`/api/runs/${run_id}/tokens`);
es.onmessage = (e) => appendToken(JSON.parse(e.data).content);
Separating control (WebSocket) from content (SSE) keeps token parsers simple and avoids multiplexing text chunks with binary-ish frame ordering bugs.
Browser agent: ask input and continue
A minimal agent loop on the server:
async def run_agent(ws: WebSocket, run_id: str):
await ws.send_json({"type": "status", "state": "planning"})
plan = await llm_plan(run_id)
for step in plan.steps:
await ws.send_json({"type": "step", "name": step.name})
if step.requires_approval:
await ws.send_json({
"type": "need_input",
"prompt": step.approval_prompt,
})
reply = await wait_for_user_reply(ws)
if reply != "yes":
await ws.send_json({"type": "done", "state": "cancelled"})
return
result = await execute_tool(step)
await ws.send_json({"type": "tool_result", "summary": result[:200]})
await ws.send_json({"type": "done", "state": "completed"})
The browser enables the input box only when need_input arrives — not before. That is the UX difference between a chatbot (one message in, stream out) and an agent (multi-turn control on one session).
Wire formats
SSE frame
event: progress
data: {"pct": 60, "message": "Indexing chunk 120/200"}
event: token
data: {"content": "The"}
event: done
data: {"finish_reason": "stop"}
WebSocket JSON message
{"type": "need_input", "prompt": "Confirm delete?", "resume_token": "abc"}
{"type": "user_reply", "text": "yes", "resume_token": "abc"}
REST JSON (single shot)
{"task_id": "t-1", "status": "done", "progress": 100, "result": {"files": 42}}
Production checklist
- SSE:
Cache-Control: no-cache,X-Accel-Buffering: no, flush after each event - WebSocket: ping/pong or application heartbeat every 30s; auth on connect; close on idle timeout
- REST: return
202 Accepted+Locationfor async jobs; use idempotentrun_id - All: propagate
AbortSignalso cancel stops LLM billing - Agents: persist run state server-side so refresh can resume via
GET /runs/{id}
FAQ
Should I stream LLM tokens over WebSocket?
You can, but SSE or chunked HTTP is simpler and proxy-friendly. Use WebSocket for everything only if you must share one connection.
Is SSE the same as HTTP streaming?
SSE is a standard format (data: lines) on top of HTTP streaming. Many APIs stream newline-delimited JSON without SSE headers — same idea, different parsing.
When is REST still correct?
Always for mutations and fetches that complete in one round trip. Streaming transports complement REST; they rarely replace it entirely.
Related guides
REST to start, SSE to stream, WebSocket to converse — that is the live AI stack.