ADR 0027: State DB Concurrency Model and Tiered Loss Window

Status: Accepted for architectural direction.

Date: 2026-05-15.

Context

The version 1 backend uses a writable SQLite state database (ADR 0010).

Phase P10 targets a single Fly.io machine serving 50–200 concurrent users.

SQLite in its default rollback-journal mode allows at most one connection to write or read at a time, which would cascade SQLITE_BUSY errors when concurrent requests collide.

SQLite supports WAL mode, in which one writer and many readers can hold the database simultaneously: sqlite.org/wal.html.

modernc.org/sqlite is a pure-Go SQLite driver that accepts URI pragmas via the connection string, so WAL and busy_timeout can be set at sql.Open time without separate PRAGMA statements that bind to a single connection only.

The state DB lives on a single Fly volume; Fly takes daily volume snapshots and retains them for an operator-configurable window.

Hourly VACUUM INTO produces a consistent point-in-time backup file without blocking the live writer (WAL is required for the non-blocking property).

A horizontally scaled deployment would require either a remote write target (e.g., a server-class DB) or distributed coordination on top of SQLite; both fall outside V1’s scope.

Decision

The state DB will open with the URI pragma chain ?_pragma=journal_mode(WAL)&_pragma=busy_timeout(5000)&_pragma=synchronous(NORMAL)&_pragma=foreign_keys(ON).

The state DB connection pool will be sized SetMaxOpenConns(50) / SetMaxIdleConns(10) / SetConnMaxIdleTime(5m).

All state-mutating endpoints will route through a single WriteCoordinator goroutine with a buffered request channel (depth 64); the coordinator serializes every mutation against the same *sql.DB so concurrent mutators cannot collide on the WAL writer slot.

Reads bypass the coordinator and run on the connection pool directly, since WAL permits unlimited concurrent readers alongside one writer.

The WriteCoordinator will return ErrWriteQueueFull when the channel is at capacity and a 5-second enqueue deadline elapses; the API layer maps this to HTTP 503 + Retry-After: 5.

Hourly VACUUM INTO /data/snapshots/state-<utc-timestamp>.db will run in a BackupRunner goroutine; retention sweeps keep the most recent 168 files (one week).

Fly daily volume snapshots will be retained for 14 days as a second backup tier.

Litestream and other streaming replicators are explicitly rejected for V1; the single-machine + tiered-snapshot model is the V1 boundary.

The migration trigger out of single-machine SQLite into a server-class DB will be state_writes_per_second p99 > 50 sustained for 1 hour.

Tiered Loss Window

Tier 1 — process crash, volume intact: zero loss. WAL recovery on the next sql.Open replays committed frames.

Tier 2 — state DB corruption, volume intact: ≤1 hour loss. Restore from the newest /data/snapshots/state-*.db by cp over state.sqlite.

Tier 3 — volume catastrophic failure: ≤24 hour loss. Restore from the most recent Fly daily volume snapshot via fly volumes create --snapshot-id ....

Tier 4 — beyond Fly retention: loss window = (now − last manual offsite copy). Offsite copy is an operator chore documented in docs/operations/p10-deploy/.

CONTEXT D7 originally stated “≤1 hour loss” as a universal claim; this ADR supersedes that with the four-tier model above.

Consequences

The default WAL pragma supersedes the single per-connection PRAGMA foreign_keys = ON issued by the previous startup path; the URI pragma chain is applied to every connection in the pool, fixing a latent bug where pooled connections lost the foreign-key enforcement.

The single-writer model precludes horizontal scaling without re-architecting the state DB layer. A second machine cannot safely share the same state SQLite file.

The WriteCoordinator adds one goroutine-hop latency per mutation; at the V1 write rate (RESEARCH Q1 envelope: ≤50 writes/sec at 200 users) the additional cost is far below the SQLite commit latency it replaces.

The hourly snapshot interval bounds the worst-case data loss to 1 hour within Tier 2. Operators trading capacity for a tighter window can lower UWSCRAPE_STATE_SNAPSHOT_INTERVAL.

The 168-file retention default consumes roughly the size of one state DB × 168 on the volume; for V1 traffic this is well under the 1 GB volume budget.

/api/v1/health returns 503 + Retry-After: 5 until the catalog runtime caches have warmed; Fly’s health check grace_period=30s covers warmup so the rolling deploy waits before sending traffic.

References

ADR 0010 (preserved): Go runtime with read-only index and separate state store.
ADR 0020 (superseded for production adapter choice): SvelteKit frontend application framework — V1 production now uses adapter-static embedded in the Go binary via embed.FS. See REQ-043 / REQ-046 amendments.
P10 CONTEXT D2 (write coordinator), D3 (Fly.io single machine), D4 (rolling restart), D5 (rate limits), D6 (logging), D7 (snapshots), D10 (single-binary frontend serving).
P10 RESEARCH Q1 (write rate envelope), Q3 (tiered loss model), Q5 (proxy headers).