Shannon: Production-Grade Multi-Agent Orchestration with Temporal

Most agent frameworks give you building blocks. Shannon gives you a production system.

Shannon Dashboard

Why Shannon Exists

Building AI agents is easy. Running them in production is hard.

After prototyping with LangGraph, CrewAI, or similar frameworks, teams hit the same walls:

Reliability: How do you reproduce bugs when LLM calls are non-deterministic?
Cost control: How do you prevent runaway token usage without killing performance?
Task complexity: How do you orchestrate 10+ agents with dependencies without manual DAG wiring?
Observability: Where did the tokens go? Which agent failed? What was the execution path?
Enterprise integration: How do you integrate proprietary APIs without forking the framework?

Shannon was built to solve these production problems from day one.

Architecture: Temporal + Rust + Go + Python

Shannon's hybrid architecture gives you the best of three worlds:

Temporal Workflows (Go)

The orchestration layer runs on Temporal, giving you:

Deterministic replay: Export any workflow execution and replay it locally to reproduce bugs
Built-in retries: Automatic retry logic with exponential backoff
Workflow versioning: Deploy new workflow versions without breaking running tasks
Durable execution: Tasks survive service restarts, network failures, and crashes

Unlike state machines in LangGraph or CrewAI's sequential execution, Temporal workflows are event-sourced and replay-safe. Every decision point is recorded. Every execution is reproducible.

Rust Agent Core

The enforcement layer handles:

WASI sandbox: Run untrusted Python code with no network access, read-only filesystem
gRPC gateway: High-performance agent execution with sub-millisecond overhead
Policy enforcement: OPA-based governance and approval workflows

Native-code performance where it matters most—security and networking.

Python LLM Service

The integration layer provides:

Multi-provider support: OpenAI, Anthropic, X.AI, Google Gemini, custom providers
MCP tools: Add tools via YAML config, no code changes required
Flexible scripting: Rapid prototyping for new agent behaviors

Keep LLM integration simple and extensible without sacrificing system reliability.

Intelligent Workflow Orchestration

Shannon doesn't make you choose workflows manually. The orchestrator analyzes your task and routes it to the optimal execution pattern.

Automatic Workflow Selection

// Orchestrator routing logic (simplified)
if complexity < 0.3 && simpleByShape:
    → SimpleTaskWorkflow          // Single agent, fast path
else if len(subtasks) > 5 || hasDependencies:
    → SupervisorWorkflow          // Coordinate multiple agents
else if cognitiveStrategy == "react":
    → ReactWorkflow               // Reasoning loop with tools
else if cognitiveStrategy == "research":
    → ResearchWorkflow            // Multi-step research pipeline
else:
    → DAGWorkflow                 // Standard parallel execution

8+ Built-In Workflow Patterns

Core workflows:

SimpleTaskWorkflow: Single-agent execution (complexity < 0.3)
SupervisorWorkflow: Coordinates 5+ subtasks with dependencies
StreamingWorkflow: Real-time token streaming (single/multi-agent)
TemplateWorkflow: Pre-defined workflows for repeatable tasks

Strategy workflows:

DAGWorkflow: Fan-out/fan-in parallel execution
ReactWorkflow: Iterative reasoning + tool use (ReAct pattern)
ResearchWorkflow: Multi-step research with parallel source gathering, citation filtering, gap detection
ExploratoryWorkflow: Tree-of-Thoughts for complex decision-making
ScientificWorkflow: Hypothesis testing, debate, multi-perspective validation

No manual workflow wiring. No DAG construction. Just submit a task and Shannon picks the right pattern.

See multi-agent-workflow-architecture.md for full routing logic.

Deep Research Agent: Production-Ready Information Gathering

Shannon's ResearchWorkflow is built for multi-step research tasks with quality controls:

Research Pipeline

Query understanding: Analyze research goal and decompose into search queries
Parallel sourcing: Execute multiple searches concurrently (web search, vector DB, APIs)
Citation filtering: Apply credibility rules to filter low-quality sources
Gap detection: Identify missing information and trigger follow-up searches
Synthesis: Combine findings with proper attribution and citations
Cost tracking: Record tokens and cost per research phase

Citation Credibility Filtering

Configure citation filters in config/citation_credibility.yaml:

citation_filter:
  enabled: true
  credible_domains:
    - openai.com
    - anthropic.com
    - arxiv.org
    - github.com
  suspicious_patterns:
    - "click here"
    - "buy now"
    - "limited time"
  min_confidence: 0.7

Only citations meeting credibility thresholds make it to the final report. Full pipeline documented in research-workflow.md.

Example: Research Task

curl -X POST http://localhost:8080/api/v1/tasks \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Survey the latest breakthroughs in quantum error correction, find 3 authoritative sources",
    "cognitive_strategy": "research",
    "max_citations": 5
  }'

Shannon will:

Generate search queries
Gather sources in parallel
Filter by credibility
Detect gaps (e.g., "need more recent papers from 2025")
Synthesize findings with citations
Return structured results with token costs

Task Decomposition & Agent Orchestration

Shannon's SupervisorWorkflow handles complex tasks with multiple dependencies:

Automatic Subtask Decomposition

Submit a complex task:

curl -X POST http://localhost:8080/api/v1/tasks \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Analyze our Q4 sales data, create a forecast model, and generate an executive summary with visualizations"
  }'

Shannon will:

Analyze task complexity (triggers SupervisorWorkflow for 5+ subtasks)
Decompose into subtasks:
- Load and clean Q4 sales data
- Perform statistical analysis
- Train forecast model
- Generate visualizations
- Write executive summary
Build dependency graph: Ensure data loading completes before analysis
Execute in parallel: Run independent subtasks concurrently
Coordinate results: Aggregate outputs and synthesize final report

Agent Collaboration

Agents communicate via Temporal signals and shared context:

Signals: Agent A signals Agent B when results are ready
Session memory: Agents read/write to shared Redis session storage
Vector recall: Agents query Qdrant for relevant historical context
Result passing: Agents receive structured outputs from dependencies

No manual message passing. No complex state management. Temporal handles coordination.

Memory Architecture: Redis + Qdrant

Shannon's memory system balances speed and recall:

Session Memory (Redis)

Fast, ephemeral storage for active conversations:

Token usage tracking (prevent budget overruns mid-conversation)
Recent message history (last N turns)
Session metadata (user_id, created_at, title)

Vector Memory (Qdrant)

Long-term storage for semantic recall:

Workflow recall: Retrieve similar past executions
Diversity sampling: Avoid redundant similar memories
Cross-session context: Find relevant information across conversations

Agents automatically query both stores. Memory integration documented in memory-system.md.

Comprehensive Token Tracking & Cost Attribution

Every workflow populates usage metadata—no exceptions:

{
  "workflow_id": "task-xxx",
  "status": "COMPLETED",
  "result": "Research findings...",
  "usage": {
    "model_used": "gpt-5-nano-2025-08-07",
    "provider": "openai",
    "total_tokens": 8547,
    "input_tokens": 6201,
    "output_tokens": 2346,
    "cost_usd": 0.0127
  },
  "agent_usages": [
    {
      "agent_id": "research-coordinator",
      "model": "gpt-5-nano-2025-08-07",
      "tokens": 2103,
      "cost_usd": 0.0031
    },
    {
      "agent_id": "web-search",
      "model": "gpt-5-nano-2025-08-07",
      "tokens": 3421,
      "cost_usd": 0.0051
    },
    {
      "agent_id": "synthesis",
      "model": "claude-sonnet-4-5",
      "tokens": 3023,
      "cost_usd": 0.0045
    }
  ]
}

Data flow: LLM provider → Agent activity → Workflow aggregation → Database → API response. Every agent execution records usage exactly once (no duplicates). Full details in token-budget-tracking.md.

Query costs by agent, model, or time range:

SELECT agent_id, SUM(total_cost_usd) as total_cost
FROM task_executions
WHERE created_at > NOW() - INTERVAL '7 days'
GROUP BY agent_id
ORDER BY total_cost DESC;

Config-Driven Everything (Single Source of Truth)

All LLM provider configs live in config/models.yaml:

# Model tiers with priority ranking
model_tiers:
  small:
    providers:
      - provider: openai
        model: gpt-5-nano-2025-08-07
        priority: 1
      - provider: anthropic
        model: claude-3-5-haiku-20241022
        priority: 2

# Model catalog (capabilities)
model_catalog:
  openai:
    gpt-5-nano-2025-08-07:
      tier: small
      context_window: 200000
      max_tokens: 16000
      supports_functions: true
      supports_streaming: true

# Pricing (per 1K tokens)
pricing:
  models:
    openai:
      gpt-5-nano-2025-08-07:
        input_per_1k: 0.0005
        output_per_1k: 0.0015

Hot-reload enabled—change configs without restarting services.

Adding a new provider:

Update config/models.yaml (tiers, catalog, pricing)
Implement Python provider class in llm_provider/{provider}_provider.py
Register in llm_service/providers/__init__.py

Done. No Go/Rust code changes. See centralized-pricing.md.

Enterprise Integration: Vendor Adapter Pattern

Shannon uses a vendor adapter pattern for domain-specific integrations without polluting core code:

Generic Shannon (open-source):
├── python/llm-service/.../tools/openapi_tool.py    # Generic OpenAPI loader
├── config/shannon.yaml                              # Base config

Vendor Extensions (kept private):
├── config/overlays/shannon.vendor.yaml              # Vendor tool configs
├── python/llm-service/.../tools/vendor_adapters/    # API transformations
└── python/llm-service/.../roles/vendor/             # Custom agent roles

Use conditional imports and config overlays to keep vendor-specific code separate:

# In presets.py (generic Shannon):
try:
    from .vendor.custom_agent import CUSTOM_AGENT_PRESET
    _PRESETS["custom_agent"] = CUSTOM_AGENT_PRESET
except ImportError:
    pass  # Shannon works without vendor module

Perfect for enterprise deployments with proprietary APIs. Full guide: vendor-adapters.md.

Framework Comparison

Feature	Shannon	LangGraph	CrewAI	AgentKit
Orchestration	Temporal workflows (event-sourced, replay-safe)	State graphs (in-memory)	Sequential execution	Hosted platform
Task decomposition	Automatic (8+ patterns, complexity analysis)	Manual graph construction	Manual role assignment	Agent Builder (visual)
Workflow replay	Full deterministic replay (export/import)	Limited (LangSmith)	No	Platform traces
Cost tracking	Per-agent, per-model attribution	Total only (callbacks)	Total only	Platform analytics
Memory	Redis + Qdrant (session + vector)	Checkpoints	Short/long-term (in-memory)	Hosted vector DB
Code execution	WASI sandbox (no network)	Jupyter kernel	No built-in	Code Interpreter
Multi-provider	OpenAI, Anthropic, X.AI, Google, custom	OpenAI, Anthropic	OpenAI, Anthropic	OpenAI only
Research workflows	Built-in (citation filtering, gap detection)	Build yourself	Build yourself	Build yourself
Hosting	Self-hosted	Self-hosted	Self-hosted	Hosted platform
Enterprise patterns	Vendor adapters, config overlays	Custom code	Custom code	Platform integration

When to use Shannon:

You need deterministic replay for debugging production issues
You're running multi-agent workflows with complex dependencies
You need per-agent cost attribution and budget controls
You want research workflows with quality controls built-in
You need to integrate proprietary APIs without forking the framework

When to use alternatives:

LangGraph: Rapid prototyping, Python-native workflows, LangSmith integration
CrewAI: Simple sequential agent patterns, minimal setup
AgentKit: Visual workflow builder, hosted platform, no infrastructure management

Production Features

Hard Budgets & Rate Limits

Set per-task budgets to prevent runaway costs:

curl -X POST http://localhost:8080/api/v1/tasks \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Generate marketing copy",
    "max_budget_usd": 0.50,
    "rate_limits": {
      "requests_per_minute": 10,
      "tokens_per_minute": 50000
    }
  }'

Shannon will halt execution if budget is exceeded. Rate-aware scheduling documented in rate-aware-budgeting.md.

Sandboxed Code Execution

Shannon runs untrusted Python code in a WASI sandbox:

No network access: Code cannot make external API calls
Read-only filesystem: Code cannot write to disk
Memory limits: Prevent resource exhaustion
Execution timeout: Kill long-running code

# Agent can safely execute user-provided code
curl -X POST http://localhost:8080/api/v1/tasks \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Calculate fibonacci(50) using Python",
    "enable_code_execution": true
  }'

Full details in python-code-execution.md.

Governance & Approvals

Block high-risk actions until human approval:

# Agent requests approval for high-risk action
# Workflow pauses and emits approval event

curl -X POST http://localhost:8081/approvals/decision \
  -H "Content-Type: application/json" \
  -d '{
    "approval_id": "<id>",
    "workflow_id": "<wid>",
    "approved": true,
    "feedback": "Approved for production deployment"
  }'

Configure OPA policies for fine-grained control.

Deterministic Replay

Reproduce any bug by exporting and replaying workflow history:

# Export workflow history
make replay-export WORKFLOW_ID=task-xxx OUT=bug.json

# Replay locally to reproduce issue
make replay HISTORY=bug.json

# Fix bug, replay again to verify
make replay HISTORY=bug.json

Replay uses the exact same event history as the original execution. No more "works on my machine" for AI workflows.

Observability: Metrics, Traces, Events

Shannon provides multiple observability layers:

Real-Time SSE Events

Stream execution events in real-time:

curl -N "http://localhost:8081/stream/sse?workflow_id=task-xxx"

# Output:
event: agent_thinking
data: {"agent":"research-coordinator","message":"Analyzing query..."}

event: tool_invoked
data: {"tool":"web_search","params":{"query":"quantum error correction 2025"}}

event: task_completed
data: {"workflow_id":"task-xxx","status":"COMPLETED"}

Prometheus Metrics

Task execution counts by workflow type
Token usage by model and provider
Latency percentiles (p50, p95, p99)
Budget utilization rates

Query via Prometheus: http://localhost:9090

OpenTelemetry Traces

Distributed tracing across Rust, Go, and Python services. Every agent execution is a trace span.

View in Jaeger or export to your observability platform.

Demo: 30-Second Setup

git clone https://github.com/Kocoro-lab/Shannon.git
cd Shannon

# Setup environment
make setup-env
echo "OPENAI_API_KEY=your-key" >> .env

# Install Python WASI interpreter
./scripts/setup_python_wasi.sh

# Start all services
make dev

# Run smoke tests
make smoke

Open the dashboard: http://localhost:2111

Submit Your First Task

Simple task:

export GATEWAY_SKIP_AUTH=1

curl -X POST http://localhost:8080/api/v1/tasks \
  -H "Content-Type: application/json" \
  -d '{"query":"Explain quantum entanglement in simple terms"}'

Research task:

curl -X POST http://localhost:8080/api/v1/tasks \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Find 3 authoritative sources on GPT-5 capabilities and summarize key findings",
    "cognitive_strategy": "research",
    "max_citations": 5
  }'

Complex decomposition:

curl -X POST http://localhost:8080/api/v1/tasks \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Analyze Tesla stock performance Q4 2024, create forecast model, generate report with charts"
  }'

Shannon will automatically route to the optimal workflow pattern.

Python SDK

Install: pip install shannon-sdk

from shannon import ShannonClient, EventType

with ShannonClient(gateway_endpoint="http://localhost:8080") as client:
    # Submit research task
    task = client.submit_task(
        query="Survey AI agent frameworks: LangGraph, CrewAI, Shannon",
        cognitive_strategy="research",
        max_citations=10
    )

    # Stream events
    for event in client.stream(task.workflow_id):
        if event.type == EventType.AGENT_THINKING:
            print(f"🤔 {event.message}")
        elif event.type == EventType.TOOL_INVOKED:
            print(f"🔧 Tool: {event.data['tool']}")
        elif event.type == EventType.TASK_COMPLETED:
            print(f"✅ Result: {event.data['result']}")
            print(f"💰 Cost: ${event.data['usage']['cost_usd']:.4f}")

Full SDK docs: clients/python/README.md

Adding Custom Tools (No Code Changes)

Add an OpenAPI tool in config/shannon.yaml:

openapi_tools:
  weather_api:
    enabled: true
    spec_path: "./config/openapi_specs/weather.yaml"
    base_url: "https://api.weather.com/v1"
    operations:
      - get_forecast
      - get_current

Or add an MCP tool:

mcp_tools:
  github_search:
    enabled: true
    func_name: "search_repos"
    description: "Search GitHub repositories"
    category: "data"
    parameters:
      - { name: query, type: string, required: true }
      - { name: language, type: string, enum: [python, go, rust] }

Restart LLM service—tools are immediately available to agents. No proto/Rust/Go changes.

Full guide: adding-custom-tools.md.

Why Teams Choose Shannon

Enterprises:

Self-hosted, no vendor lock-in
Vendor adapter pattern for proprietary APIs
OPA policies and approval workflows
Comprehensive audit logs and replay

Research teams:

Built-in research workflows with citation filtering
Vector memory for cross-experiment recall
Deterministic replay for reproducible research

Cost-conscious teams:

Per-agent cost attribution
Hard budgets prevent overruns
Multi-provider support (fallback to cheaper models)
Token tracking at every layer

DevOps teams:

Prometheus/OpenTelemetry integration
Temporal for workflow reliability
Replay for debugging production issues
Docker Compose for local dev

Documentation & Community

Repo: https://github.com/Kocoro-lab/Shannon
Quickstart: README.md
Python SDK: clients/python/README.md
Architecture:

Get involved:

Star the repo: https://github.com/Kocoro-lab/Shannon
Open issues for bugs or feature requests
Join discussions for architecture questions
Contribute workflows, tools, or provider integrations

If you're building production AI agents and need reliability, observability, and cost control without vendor lock-in, Shannon is built for you. Try the demo and open an issue with your use case—we'd love feedback.