Most agent frameworks give you building blocks. Shannon gives you a production system.

Why Shannon Exists
Building AI agents is easy. Running them in production is hard.
After prototyping with LangGraph, CrewAI, or similar frameworks, teams hit the same walls:
- Reliability: How do you reproduce bugs when LLM calls are non-deterministic?
- Cost control: How do you prevent runaway token usage without killing performance?
- Task complexity: How do you orchestrate 10+ agents with dependencies without manual DAG wiring?
- Observability: Where did the tokens go? Which agent failed? What was the execution path?
- Enterprise integration: How do you integrate proprietary APIs without forking the framework?
Shannon was built to solve these production problems from day one.
Architecture: Temporal + Rust + Go + Python
Shannon's hybrid architecture gives you the best of three worlds:
Temporal Workflows (Go)
The orchestration layer runs on Temporal, giving you:
- Deterministic replay: Export any workflow execution and replay it locally to reproduce bugs
- Built-in retries: Automatic retry logic with exponential backoff
- Workflow versioning: Deploy new workflow versions without breaking running tasks
- Durable execution: Tasks survive service restarts, network failures, and crashes
Unlike state machines in LangGraph or CrewAI's sequential execution, Temporal workflows are event-sourced and replay-safe. Every decision point is recorded. Every execution is reproducible.
Rust Agent Core
The enforcement layer handles:
- WASI sandbox: Run untrusted Python code with no network access, read-only filesystem
- gRPC gateway: High-performance agent execution with sub-millisecond overhead
- Policy enforcement: OPA-based governance and approval workflows
Native-code performance where it matters most—security and networking.
Python LLM Service
The integration layer provides:
- Multi-provider support: OpenAI, Anthropic, X.AI, Google Gemini, custom providers
- MCP tools: Add tools via YAML config, no code changes required
- Flexible scripting: Rapid prototyping for new agent behaviors
Keep LLM integration simple and extensible without sacrificing system reliability.
Intelligent Workflow Orchestration
Shannon doesn't make you choose workflows manually. The orchestrator analyzes your task and routes it to the optimal execution pattern.
Automatic Workflow Selection
// Orchestrator routing logic (simplified)
if complexity < 0.3 && simpleByShape:
→ SimpleTaskWorkflow // Single agent, fast path
else if len(subtasks) > 5 || hasDependencies:
→ SupervisorWorkflow // Coordinate multiple agents
else if cognitiveStrategy == "react":
→ ReactWorkflow // Reasoning loop with tools
else if cognitiveStrategy == "research":
→ ResearchWorkflow // Multi-step research pipeline
else:
→ DAGWorkflow // Standard parallel execution
8+ Built-In Workflow Patterns
Core workflows:
- SimpleTaskWorkflow: Single-agent execution (complexity < 0.3)
- SupervisorWorkflow: Coordinates 5+ subtasks with dependencies
- StreamingWorkflow: Real-time token streaming (single/multi-agent)
- TemplateWorkflow: Pre-defined workflows for repeatable tasks
Strategy workflows:
- DAGWorkflow: Fan-out/fan-in parallel execution
- ReactWorkflow: Iterative reasoning + tool use (ReAct pattern)
- ResearchWorkflow: Multi-step research with parallel source gathering, citation filtering, gap detection
- ExploratoryWorkflow: Tree-of-Thoughts for complex decision-making
- ScientificWorkflow: Hypothesis testing, debate, multi-perspective validation
No manual workflow wiring. No DAG construction. Just submit a task and Shannon picks the right pattern.
See multi-agent-workflow-architecture.md for full routing logic.
Deep Research Agent: Production-Ready Information Gathering
Shannon's ResearchWorkflow is built for multi-step research tasks with quality controls:
Research Pipeline
- Query understanding: Analyze research goal and decompose into search queries
- Parallel sourcing: Execute multiple searches concurrently (web search, vector DB, APIs)
- Citation filtering: Apply credibility rules to filter low-quality sources
- Gap detection: Identify missing information and trigger follow-up searches
- Synthesis: Combine findings with proper attribution and citations
- Cost tracking: Record tokens and cost per research phase
Citation Credibility Filtering
Configure citation filters in config/citation_credibility.yaml:
citation_filter:
enabled: true
credible_domains:
- openai.com
- anthropic.com
- arxiv.org
- github.com
suspicious_patterns:
- "click here"
- "buy now"
- "limited time"
min_confidence: 0.7
Only citations meeting credibility thresholds make it to the final report. Full pipeline documented in research-workflow.md.
Example: Research Task
curl -X POST http://localhost:8080/api/v1/tasks \
-H "Content-Type: application/json" \
-d '{
"query": "Survey the latest breakthroughs in quantum error correction, find 3 authoritative sources",
"cognitive_strategy": "research",
"max_citations": 5
}'
Shannon will:
- Generate search queries
- Gather sources in parallel
- Filter by credibility
- Detect gaps (e.g., "need more recent papers from 2025")
- Synthesize findings with citations
- Return structured results with token costs
Task Decomposition & Agent Orchestration
Shannon's SupervisorWorkflow handles complex tasks with multiple dependencies:
Automatic Subtask Decomposition
Submit a complex task:
curl -X POST http://localhost:8080/api/v1/tasks \
-H "Content-Type: application/json" \
-d '{
"query": "Analyze our Q4 sales data, create a forecast model, and generate an executive summary with visualizations"
}'
Shannon will:
- Analyze task complexity (triggers SupervisorWorkflow for 5+ subtasks)
- Decompose into subtasks:
- Load and clean Q4 sales data
- Perform statistical analysis
- Train forecast model
- Generate visualizations
- Write executive summary
- Build dependency graph: Ensure data loading completes before analysis
- Execute in parallel: Run independent subtasks concurrently
- Coordinate results: Aggregate outputs and synthesize final report
Agent Collaboration
Agents communicate via Temporal signals and shared context:
- Signals: Agent A signals Agent B when results are ready
- Session memory: Agents read/write to shared Redis session storage
- Vector recall: Agents query Qdrant for relevant historical context
- Result passing: Agents receive structured outputs from dependencies
No manual message passing. No complex state management. Temporal handles coordination.
Memory Architecture: Redis + Qdrant
Shannon's memory system balances speed and recall:
Session Memory (Redis)
Fast, ephemeral storage for active conversations:
- Token usage tracking (prevent budget overruns mid-conversation)
- Recent message history (last N turns)
- Session metadata (user_id, created_at, title)
Vector Memory (Qdrant)
Long-term storage for semantic recall:
- Workflow recall: Retrieve similar past executions
- Diversity sampling: Avoid redundant similar memories
- Cross-session context: Find relevant information across conversations
Agents automatically query both stores. Memory integration documented in memory-system.md.
Comprehensive Token Tracking & Cost Attribution
Every workflow populates usage metadata—no exceptions:
{
"workflow_id": "task-xxx",
"status": "COMPLETED",
"result": "Research findings...",
"usage": {
"model_used": "gpt-5-nano-2025-08-07",
"provider": "openai",
"total_tokens": 8547,
"input_tokens": 6201,
"output_tokens": 2346,
"cost_usd": 0.0127
},
"agent_usages": [
{
"agent_id": "research-coordinator",
"model": "gpt-5-nano-2025-08-07",
"tokens": 2103,
"cost_usd": 0.0031
},
{
"agent_id": "web-search",
"model": "gpt-5-nano-2025-08-07",
"tokens": 3421,
"cost_usd": 0.0051
},
{
"agent_id": "synthesis",
"model": "claude-sonnet-4-5",
"tokens": 3023,
"cost_usd": 0.0045
}
]
}
Data flow: LLM provider → Agent activity → Workflow aggregation → Database → API response. Every agent execution records usage exactly once (no duplicates). Full details in token-budget-tracking.md.
Query costs by agent, model, or time range:
SELECT agent_id, SUM(total_cost_usd) as total_cost
FROM task_executions
WHERE created_at > NOW() - INTERVAL '7 days'
GROUP BY agent_id
ORDER BY total_cost DESC;
Config-Driven Everything (Single Source of Truth)
All LLM provider configs live in config/models.yaml:
# Model tiers with priority ranking
model_tiers:
small:
providers:
- provider: openai
model: gpt-5-nano-2025-08-07
priority: 1
- provider: anthropic
model: claude-3-5-haiku-20241022
priority: 2
# Model catalog (capabilities)
model_catalog:
openai:
gpt-5-nano-2025-08-07:
tier: small
context_window: 200000
max_tokens: 16000
supports_functions: true
supports_streaming: true
# Pricing (per 1K tokens)
pricing:
models:
openai:
gpt-5-nano-2025-08-07:
input_per_1k: 0.0005
output_per_1k: 0.0015
Hot-reload enabled—change configs without restarting services.
Adding a new provider:
- Update
config/models.yaml(tiers, catalog, pricing) - Implement Python provider class in
llm_provider/{provider}_provider.py - Register in
llm_service/providers/__init__.py
Done. No Go/Rust code changes. See centralized-pricing.md.
Enterprise Integration: Vendor Adapter Pattern
Shannon uses a vendor adapter pattern for domain-specific integrations without polluting core code:
Generic Shannon (open-source):
├── python/llm-service/.../tools/openapi_tool.py # Generic OpenAPI loader
├── config/shannon.yaml # Base config
Vendor Extensions (kept private):
├── config/overlays/shannon.vendor.yaml # Vendor tool configs
├── python/llm-service/.../tools/vendor_adapters/ # API transformations
└── python/llm-service/.../roles/vendor/ # Custom agent roles
Use conditional imports and config overlays to keep vendor-specific code separate:
# In presets.py (generic Shannon):
try:
from .vendor.custom_agent import CUSTOM_AGENT_PRESET
_PRESETS["custom_agent"] = CUSTOM_AGENT_PRESET
except ImportError:
pass # Shannon works without vendor module
Perfect for enterprise deployments with proprietary APIs. Full guide: vendor-adapters.md.
Framework Comparison
| Feature | Shannon | LangGraph | CrewAI | AgentKit |
|---|---|---|---|---|
| Orchestration | Temporal workflows (event-sourced, replay-safe) | State graphs (in-memory) | Sequential execution | Hosted platform |
| Task decomposition | Automatic (8+ patterns, complexity analysis) | Manual graph construction | Manual role assignment | Agent Builder (visual) |
| Workflow replay | Full deterministic replay (export/import) | Limited (LangSmith) | No | Platform traces |
| Cost tracking | Per-agent, per-model attribution | Total only (callbacks) | Total only | Platform analytics |
| Memory | Redis + Qdrant (session + vector) | Checkpoints | Short/long-term (in-memory) | Hosted vector DB |
| Code execution | WASI sandbox (no network) | Jupyter kernel | No built-in | Code Interpreter |
| Multi-provider | OpenAI, Anthropic, X.AI, Google, custom | OpenAI, Anthropic | OpenAI, Anthropic | OpenAI only |
| Research workflows | Built-in (citation filtering, gap detection) | Build yourself | Build yourself | Build yourself |
| Hosting | Self-hosted | Self-hosted | Self-hosted | Hosted platform |
| Enterprise patterns | Vendor adapters, config overlays | Custom code | Custom code | Platform integration |
When to use Shannon:
- You need deterministic replay for debugging production issues
- You're running multi-agent workflows with complex dependencies
- You need per-agent cost attribution and budget controls
- You want research workflows with quality controls built-in
- You need to integrate proprietary APIs without forking the framework
When to use alternatives:
- LangGraph: Rapid prototyping, Python-native workflows, LangSmith integration
- CrewAI: Simple sequential agent patterns, minimal setup
- AgentKit: Visual workflow builder, hosted platform, no infrastructure management
Production Features
Hard Budgets & Rate Limits
Set per-task budgets to prevent runaway costs:
curl -X POST http://localhost:8080/api/v1/tasks \
-H "Content-Type: application/json" \
-d '{
"query": "Generate marketing copy",
"max_budget_usd": 0.50,
"rate_limits": {
"requests_per_minute": 10,
"tokens_per_minute": 50000
}
}'
Shannon will halt execution if budget is exceeded. Rate-aware scheduling documented in rate-aware-budgeting.md.
Sandboxed Code Execution
Shannon runs untrusted Python code in a WASI sandbox:
- No network access: Code cannot make external API calls
- Read-only filesystem: Code cannot write to disk
- Memory limits: Prevent resource exhaustion
- Execution timeout: Kill long-running code
# Agent can safely execute user-provided code
curl -X POST http://localhost:8080/api/v1/tasks \
-H "Content-Type: application/json" \
-d '{
"query": "Calculate fibonacci(50) using Python",
"enable_code_execution": true
}'
Full details in python-code-execution.md.
Governance & Approvals
Block high-risk actions until human approval:
# Agent requests approval for high-risk action
# Workflow pauses and emits approval event
curl -X POST http://localhost:8081/approvals/decision \
-H "Content-Type: application/json" \
-d '{
"approval_id": "<id>",
"workflow_id": "<wid>",
"approved": true,
"feedback": "Approved for production deployment"
}'
Configure OPA policies for fine-grained control.
Deterministic Replay
Reproduce any bug by exporting and replaying workflow history:
# Export workflow history
make replay-export WORKFLOW_ID=task-xxx OUT=bug.json
# Replay locally to reproduce issue
make replay HISTORY=bug.json
# Fix bug, replay again to verify
make replay HISTORY=bug.json
Replay uses the exact same event history as the original execution. No more "works on my machine" for AI workflows.
Observability: Metrics, Traces, Events
Shannon provides multiple observability layers:
Real-Time SSE Events
Stream execution events in real-time:
curl -N "http://localhost:8081/stream/sse?workflow_id=task-xxx"
# Output:
event: agent_thinking
data: {"agent":"research-coordinator","message":"Analyzing query..."}
event: tool_invoked
data: {"tool":"web_search","params":{"query":"quantum error correction 2025"}}
event: task_completed
data: {"workflow_id":"task-xxx","status":"COMPLETED"}
Prometheus Metrics
- Task execution counts by workflow type
- Token usage by model and provider
- Latency percentiles (p50, p95, p99)
- Budget utilization rates
Query via Prometheus: http://localhost:9090
OpenTelemetry Traces
Distributed tracing across Rust, Go, and Python services. Every agent execution is a trace span.
View in Jaeger or export to your observability platform.
Demo: 30-Second Setup
git clone https://github.com/Kocoro-lab/Shannon.git
cd Shannon
# Setup environment
make setup-env
echo "OPENAI_API_KEY=your-key" >> .env
# Install Python WASI interpreter
./scripts/setup_python_wasi.sh
# Start all services
make dev
# Run smoke tests
make smoke
Open the dashboard: http://localhost:2111
Submit Your First Task
Simple task:
export GATEWAY_SKIP_AUTH=1
curl -X POST http://localhost:8080/api/v1/tasks \
-H "Content-Type: application/json" \
-d '{"query":"Explain quantum entanglement in simple terms"}'
Research task:
curl -X POST http://localhost:8080/api/v1/tasks \
-H "Content-Type: application/json" \
-d '{
"query": "Find 3 authoritative sources on GPT-5 capabilities and summarize key findings",
"cognitive_strategy": "research",
"max_citations": 5
}'
Complex decomposition:
curl -X POST http://localhost:8080/api/v1/tasks \
-H "Content-Type: application/json" \
-d '{
"query": "Analyze Tesla stock performance Q4 2024, create forecast model, generate report with charts"
}'
Shannon will automatically route to the optimal workflow pattern.
Python SDK
Install: pip install shannon-sdk
from shannon import ShannonClient, EventType
with ShannonClient(gateway_endpoint="http://localhost:8080") as client:
# Submit research task
task = client.submit_task(
query="Survey AI agent frameworks: LangGraph, CrewAI, Shannon",
cognitive_strategy="research",
max_citations=10
)
# Stream events
for event in client.stream(task.workflow_id):
if event.type == EventType.AGENT_THINKING:
print(f"🤔 {event.message}")
elif event.type == EventType.TOOL_INVOKED:
print(f"🔧 Tool: {event.data['tool']}")
elif event.type == EventType.TASK_COMPLETED:
print(f"✅ Result: {event.data['result']}")
print(f"💰 Cost: ${event.data['usage']['cost_usd']:.4f}")
Full SDK docs: clients/python/README.md
Adding Custom Tools (No Code Changes)
Add an OpenAPI tool in config/shannon.yaml:
openapi_tools:
weather_api:
enabled: true
spec_path: "./config/openapi_specs/weather.yaml"
base_url: "https://api.weather.com/v1"
operations:
- get_forecast
- get_current
Or add an MCP tool:
mcp_tools:
github_search:
enabled: true
func_name: "search_repos"
description: "Search GitHub repositories"
category: "data"
parameters:
- { name: query, type: string, required: true }
- { name: language, type: string, enum: [python, go, rust] }
Restart LLM service—tools are immediately available to agents. No proto/Rust/Go changes.
Full guide: adding-custom-tools.md.
Why Teams Choose Shannon
Enterprises:
- Self-hosted, no vendor lock-in
- Vendor adapter pattern for proprietary APIs
- OPA policies and approval workflows
- Comprehensive audit logs and replay
Research teams:
- Built-in research workflows with citation filtering
- Vector memory for cross-experiment recall
- Deterministic replay for reproducible research
Cost-conscious teams:
- Per-agent cost attribution
- Hard budgets prevent overruns
- Multi-provider support (fallback to cheaper models)
- Token tracking at every layer
DevOps teams:
- Prometheus/OpenTelemetry integration
- Temporal for workflow reliability
- Replay for debugging production issues
- Docker Compose for local dev
Documentation & Community
- Repo: https://github.com/Kocoro-lab/Shannon
- Quickstart: README.md
- Python SDK: clients/python/README.md
- Architecture:
Get involved:
- Star the repo: https://github.com/Kocoro-lab/Shannon
- Open issues for bugs or feature requests
- Join discussions for architecture questions
- Contribute workflows, tools, or provider integrations
If you're building production AI agents and need reliability, observability, and cost control without vendor lock-in, Shannon is built for you. Try the demo and open an issue with your use case—we'd love feedback.