Dnalyaw: Engineering an AI Quant Trading System from Scratch

In my previous post on AI quantitative trading, I explored the landscape — what quant trading really is, why AI has a genuine edge, and where the real barriers lie. The conclusion was clear: the mystery of quant funds isn't intellectual complexity, it's information asymmetry and engineering execution.

This post is the follow-up. Over the past several months we've been building Dnalyaw — a vertically integrated quantitative trading platform where research and execution share a single pipeline, end to end. The thesis is simple: in modern markets, the moat isn't any one model or signal — it's owning the full loop from idea to live capital, with hundreds of features, dozens of systematic strategies, and disciplined risk sitting between them.

This post walks through the architecture, the engineering decisions behind it, and why vertical integration — not model novelty — is where the edge compounds.

Why Build From Scratch?

The honest answer: existing frameworks don't respect the latency hierarchy.

Most open-source quant platforms treat everything the same — Python top to bottom. That works for research. It fails spectacularly when you need sub-microsecond risk checks on every order while simultaneously running ML inference and managing a portfolio of hundreds of positions across global markets.

The other issue is risk. When real money is on the line, you can't afford "best effort" risk management bolted on as middleware. The risk engine needs veto authority. It needs to be physically incapable of being bypassed by a rogue strategy or a misbehaving LLM agent. That requires architectural separation, not just a Python decorator.

The Polyglot Architecture: Rust + Go + Python

Dnalyaw uses three languages, each chosen for a specific latency tier. This isn't complexity for its own sake — it's matching the tool to the constraint.

Rust: The Hot Path (<1ms)

The risk engine and execution core run in Rust. Every single order passes through pre-trade risk checks that complete in single-digit nanoseconds for basic validation and remain sub-microsecond even with hundreds of open positions. These are Criterion.rs measurements on the actual production risk engine.

Why Rust? Memory safety without garbage collection pauses. When you're checking risk limits on every order, a 10ms GC pause in Java or Go means 10ms where orders queue up unchecked. Rust's ownership model eliminates this entirely. The risk engine has zero allocations in the hot path.

The risk engine also implements a kill switch — a circuit breaker that can flatten all positions and enter reduce-only mode within microseconds. It has a direct-wire bypass to the broker, meaning it can emergency-close positions even if the Go orchestrator is completely down.

Go: The Warm Path (<10ms)

The Order Management System, strategy orchestration, API layer, and real-time monitoring TUI all run in Go. Goroutines and channels are a natural fit for the workflow: receive market data, fan out to strategies, collect target portfolios, route through risk, manage order lifecycle.

The OMS achieves p99 order submission of 228 microseconds at 907 orders/second throughput — well within our target of <1ms. The order state machine handles the full lifecycle: new, submitted, acknowledged, partially filled, filled — with idempotency guarantees and automatic reconciliation against broker state.

Go sits in the sweet spot between Rust's raw performance and Python's ecosystem. It's fast enough that the warm path never becomes a bottleneck, while being productive enough to iterate on business logic quickly.

Python: The Cold Path (async)

Research, backtesting, ML/RL model training, and LLM integration live in Python. This is where developer velocity matters more than execution speed. The strategy library spans dozens of active systematic strategies and hundreds of underlying features — classical factor (quality, value, momentum, low-vol), statistical arbitrage (pairs, cointegration, cross-asset mean reversion), microstructure, event-driven, and alt-data signals — with a research pipeline designed to onboard and retire features continuously. The breadth is deliberate: no single feature is load-bearing, and no single strategy carries the portfolio.

Python strategies don't generate orders directly. They output target portfolio weights that flow through gRPC to the Go execution layer. This separation is critical: a bug in a Python strategy can produce bad targets, but it physically cannot bypass risk limits or send malformed orders to the broker.

The Python layer also hosts the backtest engine (wrapped in FastAPI for remote access), a feature store backed by TimescaleDB, and the bridge to our LLM orchestration platform for sentiment analysis.

Risk Management: Defense Before Offense

If I had to pick the single most important design decision in Dnalyaw, it's this: the risk engine has absolute veto authority, and it's written in a language where entire classes of bugs are impossible.

Four Risk Decisions

Every order receives one of four responses:

APPROVE: Proceed as requested
REJECT: Block entirely — limit breached, market closed, or concentration too high
REDUCE: Scale down the order (e.g., you requested 100 shares but exposure limits allow only 60)
FLATTEN: Emergency mode — close all positions, enter reduce-only

REDUCE is the interesting one. Most risk systems are binary — yes or no. But in practice, a strategy might be directionally correct while over-sizing. Reducing the order preserves the alpha signal while enforcing the risk boundary.

Hardcoded Limits (Not Configurable)

Some limits are deliberately immutable constants in Rust, not configuration:

Max 5% notional per instrument
Max 25% sector exposure
5% daily drawdown kill switch
1% single-trade risk (Van Tharp floor)

Why hardcode? Because at 2 AM when a position is bleeding and your brain is telling you "just override it this once," the system should say no. Configurable limits get reconfigured. Constants don't.

Position Sizing: Half-Kelly + Van Tharp

I use two complementary models simultaneously:

Half-Kelly (offense — sets the ceiling): The Kelly criterion gives mathematically optimal bet sizes for long-term compound growth. The full Kelly fraction is:

f^* = \frac{p(b+1) - 1}{b}

where $p$ is the win rate and $b$ is the reward/risk ratio. We use half-Kelly $f = f^*/2$ to account for estimation error and fat tails. This tells you the most you should bet.

Van Tharp R-Multiple (defense — sets the floor):

\text{position\_size} = \frac{\text{equity} \times \text{risk\%}}{\text{stop\_loss\_distance}}

This ensures no single loss exceeds 1% of capital. This tells you the least protection you need.

The final position size is always the most conservative of all constraints:

\text{final\_position} = \min(f_{\text{kelly}},\; f_{\text{van\_tharp}},\; f_{\text{max\_concentration}})

Offense defines opportunity; defense defines survival.

Multi-Market Execution

Dnalyaw runs a two-region execution footprint: US infrastructure deployed near NYSE and NASDAQ matching engines, and Hong Kong infrastructure near HKEX. HK execution is deliberately split across two independent venue paths, giving us continuous parallel fill-quality measurements in the same region — a cheap form of execution A/B testing that would otherwise require a dedicated research project. The architecture is ready to extend into futures and additional Asian markets.

Provider Abstraction

Every external integration — data feeds, brokers, exchanges — goes through a clean abstraction interface. Adding a new data provider means implementing Connect, Subscribe, and Stream. Adding a new broker means implementing SubmitOrder, CancelOrder, and ModifyOrder. The internal canonical data model normalizes everything: AAPL.NASDAQ, 0700.HKEX — same format, same pipeline.

This matters because the real world is messy. One provider sends binary-encoded market data. Another uses a TCP streaming protocol. A third uses REST. Yet another has its own protobuf-based protocol. All of this gets normalized into a single Quote struct before anything downstream sees it.

Quality Gates

Raw market data is unreliable. Spreads spike. Prices go stale. Data feeds drop. Before any market data reaches the strategy layer, it passes through quality gates that check for:

Abnormal bid-ask spreads (likely bad tick)
Stale timestamps (feed lag or disconnection)
Price spikes beyond statistical norms (fat-finger trades or data errors)

A bad quote that sneaks through can trigger a cascade of wrong orders. Quality gates are cheap insurance.

From Backtest to Live: The Calibration Pipeline

This is where most quant projects fail. The backtest shows a Sharpe of 2.0, you go live, and reality delivers 0.3. The gap is always the same: unrealistic fill assumptions.

Two-Layer Backtesting

Dnalyaw runs two distinct backtest modes:

Fast Python vectorized backtest: For rapid strategy iteration. Tests signal logic against historical data in seconds.
Go OMS-integrated backtest: Exercises the real order state machine with a simulated venue. Catches bugs that only appear in production — partial fills, order rejections, settlement timing.

Realistic Fill Modeling

The backtest fill model incorporates:

Bid-ask spread slippage: You always cross the spread
Volume-based market impact: Using the Almgren-Chriss square-root impact model $\Delta P = \sigma \cdot \sqrt{\frac{Q}{V}}$ — your own orders move prices against you
Fill probability for limit orders: Not every limit order fills
Execution latency: Log-normal distribution calibrated to actual measured latencies

Pre-Production Calibration

Before any strategy receives allocation, it runs in parallel against real-time market data, with its modeled P&L continuously benchmarked against what live execution would have produced. The system tracks drift across four dimensions — daily P&L, Sharpe ratio, fill rates, and average slippage — and flags any material divergence for review.

Staged Capital Deployment

Capital allocation follows a disciplined progression: validation → calibration → limited allocation → full allocation. Each stage has explicit quantitative graduation criteria. Strategies don't advance because a backtest looked good; they advance because real-time execution matches model expectations within tolerance.

The AI Layer: One Input Among Hundreds

Here's where it gets interesting — and where I disagree with most of the "AI trading" discourse.

LLMs Are a Feature Source, Not the Moat

One of the most important framing decisions in Dnalyaw: LLMs produce features, they do not produce alpha. Everyone feeding OHLCV data into GPT and asking for buy/sell signals is doing the same crowded thing — the alpha is near zero, and the approach is trivially copyable. Any edge a language model delivers vanishes the moment it lives on an API endpoint anyone else can call.

Where language models earn their keep is turning unstructured data into numeric features that join the same feature store as everything else. 10-K filings become lawsuit-risk scores. Earnings call transcripts become tone-shift vectors. Insider filings become sentiment numerics. Management turnover becomes a categorical feature. These take their place in the feature matrix alongside price-based, volume-based, microstructure, and alt-data features — no privileged access to the order path, no narrative judgment calls.

This is why the moat isn't the LLM. The moat is the pipeline that has hundreds of features from dozens of heterogeneous sources converging into a single research-to-execution loop. An LLM on its own is a commodity. An LLM as one of many feature producers inside a vertically integrated system is leverage.

In Dnalyaw, the LLM and agent layer is orchestrated through Shannon — our multi-agent platform — with strict architectural boundaries:

Kocoro-lab/Shannon

Production-Grade Multi-Agent Platform - Built with Rust, Go, and Python for deterministic execution, budget enforcement, and enterprise-grade observability.

View on GitHub

What LLMs and agents produce: Numericized features from unstructured sources — filings, calls, news, alt-data, transcripts

What LLMs and agents cannot do: Generate orders, size positions, modify risk parameters, override stop-losses, or touch the execution path in any way

This isn't policy — it's architecture. LLM outputs are just additional columns in a large feature matrix. The deterministic quantitative pipeline — not the language model — decides what the portfolio holds.

RL for Trading Actions

Signal generation — alpha factors, technical indicators, LLM sentiment scores — comes from the quant research layer. The RL agent's job is different: it consumes those signals as part of its state and decides how to act on them. The pipeline formulates this as a Markov Decision Process:

State: Market regime (VIX, trend indicators), portfolio state (leverage, drawdown, Sharpe), per-instrument features (price, volatility, RSI, sentiment scores from LLM), pairs features (spread z-score, cointegration)
Action: Target portfolio weights — what the portfolio should look like, not individual orders
Reward: A weighted combination that balances profit with risk discipline:

R = 0.6 \cdot \text{PnL} + 0.2 \cdot \text{Sharpe} - 0.1 \cdot \text{Constraint Penalty} - 0.1 \cdot \text{Transaction Costs}

The RL agent outputs TargetPortfolios, and the execution layer figures out how to get from current positions to those targets optimally — handling order routing, partial fills, and settlement. This separation means the RL agent focuses purely on the decision: given these signals and this portfolio state, what should I hold?

We're exploring both PPO (stable for non-stationary markets) and SAC (sample-efficient when data is expensive), with rolling retraining every 30 days to handle regime changes.

TensorLogic: The Reasoning Engine Behind RL

The RL pipeline above needs more than just pattern matching — it needs to reason about market structure. This is where TensorLogic comes in.

Kocoro-lab/tensorlogic

A Python implementation of Tensor Logic - a unified programming language for AI that combines neural and symbolic reasoning through tensor equations.

View on GitHub

TensorLogic is a framework I built based on Pedro Domingos' paper "Tensor Logic: The Language of AI". It unifies symbolic reasoning and neural learning into a single computational substrate — tensor equations. A logical rule like $\text{grandparent}(X,Z) \leftarrow \text{parent}(X,Y) \land \text{parent}(Y,Z)$ becomes a matrix multiplication: $\text{parent} \otimes \text{parent}$ . Everything — logic, attention, composition — reduces to the same tensor operations.

Why does this matter for quant trading? Traditional RL agents are pure neural — they learn patterns from data but can't enforce logical constraints or explain their reasoning. Pure rule-based systems are brittle and can't adapt. TensorLogic gives us both:

Boolean mode for hard trading constraints: Market regime rules, position limits, and risk boundaries are expressed as strict logical constraints — no hallucination, no approximation. When the system says "this violates dollar neutrality," it's provably correct.

Continuous mode for learning**: Relations between instruments, sectors, and macro factors become learnable embedding matrices. The scoring function $s(h, r, t) = \mathbf{h}^\top \mathbf{R} \cdot \mathbf{t}$ lets the model discover latent relationships — like which sector rotations predict which, or how a VIX regime shift propagates through correlated pairs — without being told these patterns exist.

The dual-mode architecture maps directly to Dnalyaw's latency hierarchy. During training, the RL agent operates in continuous mode — learning portfolio weight policies through gradient descent on the reward function. During live inference, critical constraints switch to Boolean mode — guaranteeing that no learned policy can violate risk limits, regardless of what the neural components suggest.

The most powerful capability is predicate invention via RESCAL tensor factorization. Feed TensorLogic a knowledge graph of instrument relationships, macro indicators, and historical regime labels, and it autonomously discovers latent factors — concepts like "momentum cluster" or "liquidity regime" that weren't explicitly defined. These invented predicates become first-class features for the RL state space, expanding what the agent can reason about beyond human-defined indicators.

This is fundamentally different from throwing OHLCV data at a neural network and hoping it learns something. TensorLogic provides structured, composable, explainable reasoning — and at 10-500KB model size with training in seconds, it's orders of magnitude more efficient than LLM-based approaches.

Agent-Assisted Factor Mining

One of the more experimental pipelines: an agent proposes factor hypotheses (e.g., "companies with increasing R&D spend relative to revenue outperform over 6 months"), a backtesting validator tests them against historical data, and a decay checker uses AST similarity analysis to detect whether the proposed factor is too close to known crowded factors. If the originality score falls below 0.7, the candidate is rejected before it ever enters the feature store.

The agent here is a hypothesis generator, not a signal — every proposed factor still has to pay its way through backtesting, calibration, and the staged-allocation pipeline like any other feature. This addresses alpha decay head-on: in efficient markets, any factor everyone discovers simultaneously loses its edge, so the system explicitly checks for novelty before accepting a new column into the feature matrix.

Observability: Built In, Not Bolted On

Every component in Dnalyaw includes observability at implementation time. Not after. Not in the next sprint. In the same commit.

Three pillars:

Structured logging with correlation IDs that trace an order from signal generation through risk check, OMS, venue submission, to fill receipt
Metrics (OpenTelemetry → Prometheus) with latency histograms, order counters by status/venue/symbol, exposure gauges, P&L tracking
Audit trail — an immutable PostgreSQL table capturing every state change with timestamp, actor, and full payload. Every risk decision includes a snapshot of all metrics at decision time.

For real-time operations, I built a terminal UI (TUI) using Go's bubbletea library — tabs for positions, live quotes, order book, risk dashboard, alerts, and watchlists. No web framework. Most prop trading desks use Grafana + custom terminal tools, and there's a good reason: they're fast, reliable, and don't crash when WebSocket connections drop.

What I've Learned So Far

Building Dnalyaw has reinforced several beliefs:

Build execution first, strategies last. You cannot test strategies without reliable execution. Most quant projects start with the model and bolt on execution later. This is backwards.

Backtests always lie. Not because they're broken, but because they can't model everything. Slippage, borrow costs, market impact at scale, latency — these compound. Shadow trading is the antidote.

Alpha expectations should be honest. In efficient US markets, stat-arb pairs trading has a realistic 5-10% annual return ceiling after costs. A Sharpe above 1.0 is good. A Sharpe above 2.0 is suspicious (check your assumptions). 15-20% annually is excellent. If your backtest shows 50%+, something is wrong.

The edge isn't the model — it's the execution. Renaissance's Medallion fund reportedly spends more engineering effort on execution cost optimization than on alpha generation. At any AUM, the difference between a good fill and a bad fill compounds dramatically over thousands of trades.

Risk engineering is more important than alpha. A mediocre strategy with bulletproof risk management survives to trade another day. A brilliant strategy with bad risk management eventually blows up. The asymmetry is absolute.

What's Next

Dnalyaw runs daily across US equities (NYSE/NASDAQ) and Hong Kong equities (HKEX), with dual-venue execution in Asia for independent fill-quality benchmarking. The current investment areas:

Feature store scaling — hundreds of features today, with a pipeline designed to ingest thousands, all with point-in-time correctness to prevent lookahead bias
RL training pipeline extensions — moving beyond the initial PPO/SAC baselines into calibrated environment simulators that reflect regime shifts
China A-shares expansion — 80% retail participation means dramatically higher alpha opportunity compared to institutional-dominated US markets, and the infrastructure is ready to extend via Stock Connect
Decision Transformer research track — return-conditioned policy generation where a target Sharpe is specified and the model generates actions to reach it
TCA as a first-class research activity — every fill feeds back into execution calibration, making transaction-cost measurement as important as signal research

The real test isn't the backtest. It's how the full loop — research, risk, execution, TCA — compounds across thousands of live trades, day after day, across global markets.

This is part of my series on AI quantitative trading. The previous post, AI Quantitative Trading: From Models to Quant Funds, covers the conceptual foundation. Follow me for updates as the system evolves.