design-note · 2026-04-17
paisamaker-dual-lane
Subject: execution infrastructure for automated 0DTE options trading. Started March 30, 2026 · IBKR paper account · active.
The problem
Automate 0DTE SPX options trading against two independent signal sources (a custom GEX evaluator and a separate advisor service) without blowing up when one part of the pipeline misbehaves. The system runs 24/7 on a Hetzner VM, polls a gamma-data vendor every 30 seconds, evaluates ~100 signals per session, and executes 3–5 trades per day through an IBKR paper account.
The decision I almost made
Single-threaded order submission with retries. One loop: read signals → pick strike → submit order → wait for fill → repeat. Simple. Testable. Obvious.
Why I killed it
The IBKR gateway has a ~6-second reconnect window I observed in pre-production testing. During that window, a synchronous retry loop blocks the entire pipeline — meaning the next signal doesn't even get evaluated, let alone acted on. On a 30-second cadence, losing 6 seconds to a reconnect means missing 20% of the evaluation window. On a 0DTE product where the last hour is where the move happens, that's unacceptable.
I also realized the two signal sources needed different safety semantics. The GEX evaluator fires on regime flips — high frequency, noisy, needs tight per-ticker cooldowns. The advisor service fires on CASCADE_WATCH / CHARM_SQUEEZE / STRUCTURE_BREAK alerts — lower frequency, higher conviction, needs watermark-based deduplication. One loop couldn't do both.
What I built instead
- Dual-lane architecture. Separate ingestion paths for gexwatch and advisor. Per-source, per-ticker gates. One lane failing doesn't block the other. Unified execution dispatcher downstream, but upstream they don't know about each other.
- IBKR mode state machine. Four states: DATA_ONLY → DEGRADED → WARMUP → FULLY_LIVE. Trades only fire in FULLY_LIVE. After a reconnect, 60 seconds of stability required before returning to FULLY_LIVE. No synchronous retries. The system trades less during instability — by design.
- Fail-closed by default. 20+ safety gates, each returning a specific reject code: kill switch (3 levels), spread gate (rejects bid-ask > 30%), bid validation (rejects into empty markets), contract validation (0DTE expiry, trading class, right/direction match), paper-port fail-closed (rejects if port isn't 4002), EOD flatten (force close 15:45 ET with MKT escalation starting 15:35), max hold time (30–90 min by ticker), 3-tier exit escalation (LMT@mid → LMT@bid → MKT).
- Position deduplication with an atomic UNIQUE index. Two fast signals on the same contract can't race to create duplicate positions. The DB rejects the second one before IBKR sees it.
What I'm watching for
- Fill rate at market open (currently ~33%, improving). The 30% spread gate is aggressive during the first 15 minutes when bid-ask is wide. Tuning this means balancing fill rate against slippage.
- Kill-switch trigger rate. L1 (halt entries) should fire occasionally on abnormal volatility. L2 (flatten all) should never fire in normal operation. If L2 ever fires, something upstream is broken.
- False rejections from the spread gate. I'm logging every rejection with full context. If a real tradable contract is being rejected as too wide, the gate threshold needs revisiting.
What Week 1 proved
All 20+ safety gates held. Zero manual interventions. The system survived one IBKR reconnect event cleanly — no phantom orders, no stuck positions. Best single trade was +$410 on an SPX 0DTE.
What this tests is the infrastructure. Whether the strategy generates positive expected value is a separate, longer-run question — at least 6–8 weeks of paper data before that conversation is honest.
Stack
Python 3.12 · ib_insync · asyncio · SQLite (WAL) · GEXBot REST API · systemd + Docker · GitHub Actions CI · Discord webhooks · YAML config (56 tunable params) · 1,327 tests · 18,387 lines · 20+ safety gates