Blog

May 5, 2026

DeepSeek V4 and Manifold Tearing

Why training-time loss spikes are not optimization failures but geometric tears — DeepSeek V4 three defenses turn the "manifold prior" from an abstract stance into engineering evidence.

April 25, 2026

Transformer Architecture Book — English Edition Now Online

The complete English edition of my Transformer book is now online — 32 chapters and 3 appendices that walk through every component of the Transformer, from tokenization through attention to a from-scratch implementation, and on to RLHF, Mixture-of-Experts, reasoning models, and post-Transformer architectures.

April 24, 2026

The Four Realms of Neural Networks

A cultivation-novel reading of deep learning — PDE solvers, manifold geometry, gauge fields, and quantum attention — and why the last realm tells us AGI must be personal.

April 20, 2026

The Backtest-to-Live Gap Is a Cost Model Problem

Your backtest says +2.1% per month. Live says +1.4%. The culprit is almost never signal decay — it is the cost model. A walk through the three layers of the backtest-to-live gap across a hybrid IB + Futu HK footprint, why each one silently eats returns, and what it takes to close them.

April 19, 2026

Mid-Turn Checkpointing in a Long-Running Agent Loop

An agent turn can span 20 tool calls and 10 minutes. If the daemon dies at minute 9, naive designs throw everything away. A walk through why "just retry the turn" is wrong, the phase state machine that replaced it in Kocoro, and the checkpoint discipline that survives a SIGKILL mid-turn.

April 16, 2026

Byte-Stability Tests for Prompt Caching

Prompt caching offers 90% cost savings when it works and 0% when it silently breaks. Why prompts are fragile in ways unit tests do not normally catch, and the testing discipline we built into Shannon to keep cache hit rates stable as the codebase evolves.

April 14, 2026

Flatten Verifiers: When Your 'Flatten All' Order Doesn't Actually Flatten

The single most safety-critical operation in any trading system is also one of the hardest correctness problems in execution engineering. A close-call incident, the race condition behind it, and the architectural pattern that replaced naive "close everything" logic with something correct by construction.

April 4, 2026

The AI Agent Harness: How Kocoro Evolved After Claude Code Went Public

I built Kocoro — a Go agent runtime with tool dispatch, permissions, context management, and loop detection — before Claude Code went open source. When their architecture became public, the convergences were striking. Here is what we independently arrived at, what I learned from their caching practices, and what I think defines a production harness.

March 1, 2026

What Claude Code Learned About Multi-Agent Tool Design (and What Shannon Already Did)

An Anthropic engineer described Claude Code's evolution from simple todo lists to dependency-aware task graphs — the same patterns Shannon independently arrived at. Five lessons mapped, including where Shannon needs to catch up.

February 17, 2026

Information Theory Is All You Need (to Understand LLMs)

Shannon laid the foundation in 1948. Seventy-eight years later, his framework still explains why Transformers work, what cross-entropy loss actually means, and why your model can never be smarter than its training data.

February 14, 2026

Dnalyaw: Engineering an AI Quant Trading System from Scratch

Why vertical integration — research and execution inside a single unified pipeline — is the real moat in quant trading, and how Dnalyaw builds it with hundreds of features, disciplined risk, and a polyglot Rust/Go/Python execution core across global markets.

January 11, 2026

My Book on Building Production AI Agents

A practical guide to building production-grade AI Agent systems. Covers single-agent design, multi-agent orchestration, MCP protocol, Computer Use, cost control, and enterprise deployment patterns.

January 11, 2026

My Book on Transformers and LLM Architecture

A deep dive into every component of the Transformer architecture—from Tokenization to Attention mechanisms, from forward propagation to code implementation. For developers who want to truly understand how GPT and ChatGPT work.

January 6, 2026

2025 Year-In-Review and 2026 Prediction

Reflecting on how 2025 normalized agent workflows and reasoning models, and why 2026 feels less like a prediction and more like a state you can already opt into.

October 21, 2025

Tensor Logic: A Brain‑Like Architecture

Bridging the gap between logic and learning—how tensor equations create AI systems that think both symbolically and intuitively.

October 7, 2025

Shannon: Designing a Production-Grade Multi-Agent Platform

An architectural deep-dive into Shannon, a self-hosted multi-agent platform that addresses the three hardest problems in production AI: runaway costs, non-deterministic failures, and security vulnerabilities—through deliberate technology choices in Rust, Go, and Python.

April 29, 2025

AI Quantitative Trading: From Models to Quant Funds

Demystifying quantitative trading for AI practitioners — what quant funds actually do, why reinforcement learning fits markets better than LLMs, and where the real barriers lie.

February 25, 2025

From RNNs to LLMs: A Decade of Simplicity and Transformation

Reflecting on Andrej Karpathy’s 2015 RNN post and the surprising evolution of LLMs with transformers and fine-tuning.

February 22, 2024

Transformer Architecture Explained: A Comprehensive Review

A step-by-step walkthrough of the Transformer architecture—from input embeddings and positional encoding to self-attention and the decoder-only GPT variant.