Autonomous Agent Orchestration Platform for Multi-Brand Operations

By Amar Kumar

A proposed autonomous agent orchestration platform for a US-based operator running multiple brands — AI video production, real-estate technology, and regulated medical-compliance clinics. Instead of one-off gigs and copy-paste prompts, the architecture would treat the agent harness (routing, skills, MCP, hooks, observability) as the product, with Claude Code, Kimi, GLM, Minimax, and Qwen routed like an orchestra rather than competing chatbots.

Proposed outcome: One standing platform where overnight planner / generator / evaluator workflows, Supabase data pipelines, custom MCP servers, and operator dashboards share a model router, security layer, and observability stack — so weekly briefs ship without rebuilding the harness each time.

Scenario

This brief describes a proposed solution — not a delivered engagement. It maps a recurring pattern: a multi-brand operator that needs a long-term AI engineering function, not project-by-project contractors.

Problem

Treating AI as “better autocomplete” does not scale across three regulated and media-heavy brands. Common failure modes:

Requirements

Functional

Non-functional

Architecture

Four layers: a control plane (router, scheduler, observability), a harness (Claude Code + MCP + skills/hooks), brand workflows (LangGraph graphs per domain), and data + media (Supabase, object storage, Runpod workers).

flowchart TB classDef control fill:#ede9fe,stroke:#7c3aed,color:#5b21b6 classDef harness fill:#dbeafe,stroke:#2563eb,color:#1e3a8a classDef workflow fill:#f1f5f9,stroke:#64748b,color:#334155 classDef data fill:#ccfbf1,stroke:#0d9488,color:#115e59 classDef ext fill:#f8fafc,stroke:#475569,color:#334155 RT["Model Router\norchestrator · coder · fallback"]:::control SCH["Scheduler\nTemporal / cron"]:::control OBS["Observability\nLangfuse traces"]:::control RT --> SCH SCH --> OBS CC["Claude Code harness\nskills · hooks · subagents"]:::harness MCP["MCP server fleet\nMLS · CRM · media · WP"]:::harness CC --> MCP RT --> CC RE["Real-estate graph\nplanner · ETL · evaluator"]:::workflow VI["Video graph\ngen · QA · render"]:::workflow CL["Clinic graph\ncompliance · docs"]:::workflow CC --> RE CC --> VI CC --> CL SB[("Supabase\nproperties · jobs · audit")]:::data S3["Object storage\nmedia · exports"]:::data RP["Runpod / VPS\nFFmpeg · GPU"]:::data RE --> SB VI --> S3 VI --> RP CL --> SB MLS["MLS / HUD / public APIs"]:::ext FAL["fal.ai · Whisper"]:::ext WP["WordPress / Next.js"]:::ext RE --> MLS VI --> FAL CL --> WP

Platform architecture — shared control plane and harness, brand-specific LangGraph workflows, Supabase + media workers

sequenceDiagram autonumber participant SCH as Scheduler participant RT as Model Router participant PL as Planner Agent participant GEN as Generator Agent participant EV as Evaluator Agent participant MCP as MCP Tools participant SB as Supabase participant OBS as Langfuse SCH->>RT: enqueue overnight job RT->>PL: route orchestrator model PL->>SB: read backlog + context PL->>OBS: trace plan_id PL->>GEN: task graph + constraints RT->>GEN: route primary coder GEN->>MCP: tool calls (ETL, render, publish) MCP->>SB: upsert rows / artifacts GEN->>EV: candidate output RT->>EV: route evaluator model alt pass EV->>SB: mark complete + notify else fail EV->>PL: requeue with critique end EV->>OBS: score + cost rollup

Overnight sequence — router picks models per role; evaluator gates promotion to production

Component map by platform layer (major services per tier)

End-to-end flow

From ownership brief to production — shared harness, brand workflow, human approval where required

Illustrative model routing mix by agent role (% of routed calls in a typical week)

Indicative standing-engineering capacity split across brand domains (% of weekly hours)

Recommendation: Claude Code (or Cursor-equivalent harness) as the daily driver; LangGraph for durable overnight graphs; Supabase for operational data and realtime operator UI; a YAML-driven model router with cost caps; Temporal or Cloudflare Workers cron for schedules; Langfuse for traces.

LayerTechnologyWhy
Daily harnessClaude Code + skills/hooks/MCPSubagents, repo-aware edits, repeatable slash commands — the “framework” layer
Model routerCustom router service + YAML rulesExplicit orchestrator / coder / fallback; routes Claude, Kimi K2, GLM, Minimax, Qwen by task type and budget
Overnight orchestrationLangGraph + Python 3.11Checkpointed planner/generator/evaluator graphs with human-in-the-loop nodes
SchedulingTemporal Cloud or Supabase pg_cronReliable overnight runs, retries, visibility into stuck workflows
Data planeSupabase (Postgres + Edge Functions)MLS/HUD normalized schema, webhooks, Row Level Security per brand
MCP fleetTypeScript + Python MCP serversUniform tool surface for harness and headless agents; versioned in monorepo
Media workersRunpod / VPS + FFmpeg + fal.aiGPU bursts for video; CPU workers for transcode and Whisper batch
WebNext.js on Vercel + WordPress RESTFast marketing surfaces; existing WP estates stay integrated via MCP
ObservabilityLangfuse (self-hosted or cloud)Trace spans per agent, prompt/version tags, cost by brand
SecretsCloudflare Workers secrets / DopplerCentral rotation; no keys in agent prompts or repos

Why not a single model everywhere? Orchestration benefits from a strong reasoning model; bulk codegen and ETL transforms can run on open models at lower cost; evaluators may use a different model to reduce self-confirmation bias. The router encodes these rules explicitly instead of “pick what feels best.”

Why not n8n-only? Multi-step agent QA, MCP tool auth, and checkpointed overnight graphs outgrow visual chains. Use n8n only for lightweight webhook fan-out (Slack, email digests).

Agent & component design

Model router — routing rules (example)

RoleDefault modelFallbackRule
Orchestrator brainClaude Sonnet / Opus classGPT-4.1Planning, decomposition, tool-selection — always highest reasoning tier under daily cost cap
Primary coderClaude Code defaultKimi K2 or Qwen CoderRepo edits and MCP tool loops; switch to open model when task tag is bulk_etl or token estimate > 80k
EvaluatorDifferent family than generatorGLM or MinimaxStructured rubric JSON; reject if generator and evaluator share same model ID
Embeddings / classifySmall open modelHosted embed APIRouter pre-step; never burn frontier tokens on routing labels

1 — Planner agent

2 — Generator agent (domain variants)

3 — Evaluator agent

4 — MCP server fleet (shared)

5 — Security envelope

Suggested phase timeline (weeks) for platform foundation through first overnight production graph

Implementation plan

Phase 1 — Harness & router foundation (week 1–2)

Monorepo layout: router/, mcp/, graphs/, skills/. Claude Code skills for deploy, test, and trace replay. YAML router with three roles and cost caps. Langfuse project per brand. Dev Supabase with RLS skeleton.

Risk: Model API variance — stub adapters early. Rollback: manual Claude Code sessions without overnight scheduler until router stable.

Phase 2 — MCP server fleet v1 (week 3–4)

Ship supabase-ops and wordpress-mcp; stub mls-bridge with sample feed. Document tool contracts in OpenAPI-style markdown. Integration tests that run headless against local Supabase.

Risk: MLS vendor access delays — use public HUD/sample RESO sandbox. Rollback: generators write to staging schema only.

Phase 3 — Real-estate ETL graph (week 5–6)

LangGraph overnight job: ingest → normalize → dedupe → webhook emit. Idempotent upserts on natural keys (APN, listing ID). Operator dashboard: last run, row counts, error samples.

Risk: Silent hang on subprocess stdin — spawn MCP and ETL children with piped stdio and watchdog timeouts. Rollback: disable webhooks; keep tables updating.

Phase 4 — Video & media pipeline (week 7–8)

Whisper batch transcode, fal.ai render queue MCP, FFmpeg concat worker on Runpod. Evaluator checks duration, resolution, and brand template compliance. Object storage URLs written to Supabase.

Phase 5 — Operator console & clinic workflows (week 9–10)

Next.js internal app: job inbox, approve/reject, trace deep-link to Langfuse. Clinic graph with human-in-the-loop on any external-facing output. Audit table immutable append-only.

Risk: Regulated content — default deny publish without operator click. Rollback: draft-only mode across all publish MCP tools.

Phase 6 — Hardening & runbooks (week 11–12)

On-call runbook: stuck planner, runaway token spend, MCP OOM, ETL duplicate keys. Load test overnight queue. Playbook for adding a new brand tenant (RLS policy + Langfuse project + router budget line).

Reporting & ops

SignalSourceCadence
Agent traces, latency, token costLangfuse dashboardsReal-time; daily Slack rollup
Overnight job pass/fail rateSupabase job_runsPer run; weekly trend
ETL freshness (MLS/HUD)Supabase ingest_watermarksAlert if > SLA hours stale
Media queue depthRunpod + media_jobsAlert on queue > N or failure rate spike
Router fallback frequencyRouter logsWeekly — indicates primary model outages or cost cap hits
Evaluator rejection reasonsLangfuse scores + critiques JSONWeekly engineering retro input

Morning digest to ownership: completed overnight jobs, items awaiting approval, cost vs budget, and any dead-letter entries with one-click trace links. On-call rotation would use PagerDuty or Slack escalation only for SLA breaches (ETL stale, zero successful overnight runs, runaway spend).

Proposed deliverables

Following the phased plan, a build would ship these artifacts:

Effort estimate

Indicative effort for platform foundation through first production overnight graphs across two brand workflows (assumes MLS sandbox or sample feeds available, Supabase/Vercel/Runpod accounts provisioned):

ScopeHours (range)
Platform foundation (phases 1–6)280–360 hrs
Standing weekly engineering (post-foundation)30–40 hrs/week ongoing
Platform maintenance (router tuning, MCP upgrades)12–20 hrs/month

The ongoing weekly hours reflect the operating model: recurring briefs across brands, not a one-off handoff. Initial platform build is a one-time investment; subsequent briefs reuse the harness.

Glossary

TermMeaning
Agent harnessSkills, hooks, MCP, subagents, and router config around the LLM — the durable product layer
Claude CodeAnthropic’s agentic coding environment with repo context and tool use
MCPModel Context Protocol — standard for exposing tools and data sources to agents
LangGraphLibrary for checkpointed multi-step agent workflows with branches and retries
Model routerService that picks orchestrator, coder, and evaluator models from explicit rules
Lethal trifectaRisk pattern: untrusted input + privileged tools + external communications without guards
RESO / RETSReal-estate data standards and legacy MLS transport protocols
Dead-letter queueStorage for jobs that exhausted retries — requires human inspection
LangfuseOpen-source LLM observability — traces, scores, prompt versioning, cost attribution