Flatten Verifiers: When Your 'Flatten All' Order Doesn't Actually Flatten

In any trading system, the single most safety-critical operation is the one that closes everything and returns you to cash. Flatten all positions is the emergency brake — the last line of defense when a signal misfires, a venue throws errors, or a risk limit breaches. It should be the most reliable button in the entire platform.

It sounds simple. In practice, it's one of the hardest correctness problems in execution engineering.

This post is about why — and about the architectural pattern we built into Dnalyaw after a close call on one of our brokerage integrations. Dnalyaw runs on a deliberately hybrid venue footprint — Interactive Brokers (the quant-native institutional standard used by most systematic funds) alongside Futu HK (Asia-focused, faster to iterate on, closer to retail-and-prop flow). Whatever pattern we land on has to work identically across both. The specific shape of the problem is trading-flavored. The underlying lesson — that verification layers need explicit state models, not timeouts — generalizes far beyond markets.

Why Flatten Is Harder Than It Looks

The intuition is disarming: close everything, return to cash, stop the bleeding. Three forces conspire against anything that simple:

Fills are asynchronous. You submit a close. The venue acknowledges. A fill arrives. A commission arrives seconds later on a separate callback. "Did it fill completely?" has no synchronous answer.
Your internal view of positions can lag reality. Most trading systems keep an in-memory snapshot of what you own, updated from venue callbacks. If a callback is delayed or dropped, the snapshot reports positions that are already closed — or misses positions that are already open.
Verification is itself an action. If you check "did it flatten?" by re-reading positions, you're trusting the same data source that may have been the cause of the problem in the first place.

These compound. A naive flatten designed to be robust can cascade into exactly the failure it was meant to prevent:

The strategy sends flatten. The order manager sends the close. The venue acknowledges. The verifier reads positions. The ledger — which hasn't processed the venue callback yet — still reflects the pre-flatten state. The verifier concludes "didn't flatten" and fires a second close. Moments later the venue callback lands, the ledger updates, and the second close fills against a now-correct state. You've over-closed.

One of our venues — Futu HK — traced a close-call incident to exactly this race. The underlying hazard is venue-agnostic: every broker callback path has its own timing profile and its own ways for the internal ledger to lag reality. Interactive Brokers, Futu, any venue we add next — the abstraction has to hold or the rest of the stack sits on sand. Futu HK was simply where our specific combination of conditions exposed the problem first. The system functioned as designed. The design was wrong.

Layer 1: Stop Trying to Be Clever

The immediate fix was to remove the retry. If the post-flatten check came back inconsistent with expectations, the system stopped taking further action. It kept the check as a diagnostic log but returned a loud failure instead of another order.

This made the system safe against the specific race. It also made flatten less resilient: any transient condition — partial fill, late acknowledgement, momentary connectivity blip — now produced a loud failure, because the verifier had nothing else to do but wait and re-read a source it could not trust.

We had traded silent incorrectness for loud fragility. An acceptable trade for a week. Not a long-term design.

Layer 2: Verify the Intent, Not the Ledger

The key insight: stop reading the position ledger to verify what the order manager just did.

The order manager already knows what it submitted. The order manager already receives fill callbacks. The system of record for every submission and every fill — the authoritative log of trade intents and their outcomes — is the right source of truth. That log is the thing that produces the fill events; it cannot be stale relative to itself.

Layer 2 replaces the position-ledger diagnostic with an intent-lifecycle convergence check sourced from the authoritative log.

The verifier polls the order manager at sub-second intervals, within a few-second budget. Each poll returns a per-symbol lifecycle summary: what was submitted, what's filled, whether the order has reached a terminal state, and the reason if it ended without fully completing.

Convergence is defined precisely: every intent must reach a terminal state and be completely filled. An empty result is accepted — no orders means nothing to flatten. Any partial fill, any non-terminal intent, any terminal intent with an underfill, fails loud at the budget boundary.

There is no second submit. Ever. That invariant is permanent, named in the regression-test suite, and re-asserted on every commit.

The Intent State Machine

The verifier's decision is a pure function of intent state. Here's the space it polls across:

Only fully filled accepts. Everything else either stays in the poll loop until it reaches a terminal state, or times out and fails loud. Cancelled and rejected intents are terminal and failed — they bubble up with a rejection reason so the operator can see immediately why flatten didn't complete.

Three different terminal failure modes, three different signatures, three different operator responses. The set of terminal states is declared in one place; no case is hiding in the code.

Hard Safeties

Two invariants carry through every revision of this layer:

No resubmit. A stale read, a partial fill, a rejection — none of these trigger another close. The verifier's job is to observe, never to re-drive.
No position-ledger read in the flatten path. The lying source is not consulted. If convergence can't be established from the authoritative log within budget, flatten fails loud and waits for a human.

Safety-critical systems need hardcoded invariants you can name. Configurable limits get reconfigured at 2 AM, when a position is bleeding and the operator is tired. Constants don't.

The Gap We Didn't Close

One failure mode this design explicitly does not fix: if the position ledger was already stale at the moment the flatten request was first assembled — reporting no open positions when the venue actually held some — the flatten plan is built from that stale input. The order manager submits nothing. The verifier sees an empty intent set and accepts. The real position stays open.

This gap is honest. Detecting it requires an independent venue-position source, cross-checked against the strategy's own view before flatten is even submitted — a separate workstream with its own design document and its own tests.

We wrote this gap into the design doc rather than hiding it. A verifier that silently papers over a different class of bug is exactly what we just ripped out.

The Test Surface

We don't pretend unit tests substitute for production experience, but they catch the regressions. Eight lifecycle scenarios, plus an integration test against a real database with a simulated venue, cover the shape of every failure mode the design acknowledges:

Convergence on the first poll (happy path)
Convergence on a later poll (partial → full transition)
Persistent partial fill until timeout
Rejected intent with a surfaced reason
Failed intent covered by the same terminal predicate
Empty intent set (idempotent no-op)
Mid-flight cancellation
Database error propagation

Plus one named invariant test: the verifier never resubmits, regardless of what verification input it sees. Every commit to this layer is gated on that invariant.

The cancellation test is the subtle one. A naive poll loop sleeps between polls; when an operator issues cancel, it won't notice until the next sleep wakes up. Responsive cancellation requires the sleep itself to be interruptible — obvious the first time a 2 AM operator has to sit and watch an unresponsive flatten they just tried to abort.

The Lesson That Generalizes

The specific architecture — authoritative log as truth, lifecycle convergence as the acceptance predicate, explicit terminal states — is Dnalyaw-specific. The generalizable principle is not.

Verification layers need explicit state models, not timeouts.

A timeout tells you the system isn't in the expected state. It tells you nothing about why. Was the fill slow? Did the order get rejected? Did a callback get dropped? All three answers require different responses — and the natural next move in the absence of a state model is to retry, which is precisely the move that creates cascading failures.

An explicit state model — every intent has a well-defined lifecycle, convergence is a pure function over that lifecycle, failure modes have names and rejection reasons — lets the verifier distinguish "still in progress" from "terminal but wrong" from "rejected by risk." Different failures, different responses. And zero retries, because observation is not driving.

The retry loop was the proximate cause of the incident. The missing state model is why the retry loop existed at all.

If you're auditing a verifier in any system — trading, distributed jobs, long-running workflows — the first invariant to check is whether it resubmits. Observation that can retry is not observation; it's hidden control. Find that line, and you've found the one piece of the system most likely to turn a transient anomaly into a correlated one.

Why This Matters Beyond Flatten

For readers evaluating quantitative trading platforms — as investors, quants, or operators — this kind of work is where the edge lives. The research layer is visible; everyone sees the strategy. The execution layer is where correctness gets earned, one invariant at a time, mostly out of sight.

That earning is amplified when the platform runs across a hybrid venue footprint. IB is the quant-native institutional standard; Futu HK is the Asia-focused bridge; the two have different API philosophies, different callback timings, and different failure modes. One verifier that works identically on both is a stronger claim than one that works on either alone — it means the abstraction is load-bearing, not accidental. Every new venue we bring online gets plugged into the same verification gate; no new code paths, no new ways for flatten to silently fail.

Dnalyaw was built around the thesis that vertical integration of research and execution is the real moat in modern quant. Flatten verifiers are a small window into that thesis. The same design discipline — hardcoded invariants, explicit state models, honest residual-risk documentation — runs through the risk engine, the venue adapters, the backtest-to-live calibration pipeline, and the multi-region execution footprint.

Flatten is where correctness matters most. Everything else compounds from that foundation.