Backend State Storage Specification
Backend State Storage Specification
Status: Draft v0.1.
Project: UWScrape.
Directory: docs/specs.
Audience: backend implementers, data reviewers, security reviewers, and migration implementers.
Last reviewed: 2026-05-11.
Primary architecture document: docs/reference/architecture/backend-runtime-architecture/.
Related documents:
docs/specifications/student-state-schema-spec/docs/specifications/backend-api-spec/docs/specifications/backend-runtime-operations-spec/docs/decisions/0004-student-state-catalog-version-migration-policy/docs/decisions/0010-go-runtime-readonly-index-and-state-store/docs/decisions/0011-anonymous-state-token-policy/docs/decisions/0019-v1-state-deletion-and-export-semantics/
1. Purpose
This document specifies the writable backend state store.
The state store is a SQLite database separate from the published index SQLite database.
The state store persists anonymous student planning state.
The state store persists token verifiers.
The state store persists migration and deletion metadata.
The state store must never contain raw state tokens.
The state store must never contain published index tables copied wholesale from the runtime index.
2. Source Anchors
The backend uses Go database/sql concepts for database handles, transactions, rows, and statements unless a later implementation spec adds a narrower wrapper: Go database/sql.
State token generation should use cryptographically secure randomness from Go crypto/rand: Go crypto/rand.
Token verifiers should use a keyed digest such as HMAC; Go documents HMAC in crypto/hmac: Go crypto/hmac.
Verifier comparison should use constant-time comparison where applicable; Go documents helpers in crypto/subtle: Go crypto/subtle.
SQLite URI parameters and read-only behavior are documented by SQLite: SQLite URI filenames.
SQLite PRAGMA user_version can store an application-controlled schema version integer: SQLite PRAGMA user_version.
3. Storage Boundary
The backend has two SQLite files in version 1:
published-index/course-universe.sqliteruntime/state.sqlitecourse-universe.sqlite is read-only.
state.sqlite is writable.
The backend must not attach writable state tables into the published index.
The backend must not write into the published index directory.
The backend must not run index migrations at startup.
The backend may run state store migrations at startup.
Index schema version and state store schema version are separate.
4. State Store Goals
The state store must support:
- state creation;
- token verification;
- state loading;
- full state replacement;
- state export;
- state hard deletion;
- migration preview metadata;
- migration acceptance metadata;
- state version increments;
- last-used tracking when enabled.
The state store should be inspectable with standard SQLite tooling.
The state store should keep sensitive data minimal.
The state store should make accidental token leakage difficult.
5. Schema Versioning
The database must store schema version in two places:
PRAGMA user_version;schema_migrationstable.
PRAGMA user_version is a quick integer compatibility check.
schema_migrations records exact applied migrations.
Initial user_version:
1Initial logical state_schema_version:
1.0.0Migration records use monotone integer migration ids.
The backend must refuse to run with a state database whose version is newer than the backend supports.
The backend may migrate older state databases forward when migrations are available.
The backend should back up or transactionally apply migrations so partial migration does not leave the database unusable.
6. Table Overview
Version 1 tables:
schema_migrations;states;state_tokens;state_events;state_tombstoneswhen minimal deletion metadata is enabled;migration_previews.
The states table stores the current logical student state as canonical JSON plus indexed metadata.
The state_tokens table stores token verifiers.
The state_events table records non-secret audit events useful for debugging and migration.
The optional state_tombstones table records minimal deletion metadata without retaining state content.
The migration_previews table stores optional short-lived preview results.
The project may later normalize completed courses and planned courses into separate tables.
Version 1 may keep the canonical state JSON as the source of truth to avoid duplicating schema logic before query needs require more indexes.
7. schema_migrations
Columns:
| Column | Type | Required | Meaning |
|---|---|---|---|
migration_id | integer | yes | Monotone migration id. |
name | text | yes | Human-readable migration name. |
applied_at | text | yes | UTC timestamp. |
checksum | text | yes | Migration file or content checksum. |
Constraints:
- primary key on
migration_id; - unique
name.
8. states
Columns:
| Column | Type | Required | Meaning |
|---|---|---|---|
state_id | text | yes | Internal state id. |
state_schema_version | text | yes | Logical state schema version. |
catalog_version_id | text | yes | Catalog version anchoring the state. |
state_version | integer | yes | Monotone version incremented on mutation. |
state_json | text | yes | Canonical logical student_state JSON. |
created_at | text | yes | UTC timestamp. |
updated_at | text | yes | UTC timestamp. |
deleted_at | text | no | UTC timestamp when deleted. |
migration_origin_state_id | text | no | Origin state for accepted migration. |
migration_origin_catalog_version_id | text | no | Origin catalog for accepted migration. |
Constraints:
- primary key on
state_id; state_version >= 1;deleted_at is nullfor active states.
Recommended indexes:
idx_states_catalog_version_id;idx_states_updated_at;idx_states_deleted_at.
state_json must not include raw state tokens.
state_json must not include token verifiers.
The backend should canonicalize JSON field order only if it materially helps diffing or checksums.
JSON canonicalization is not required for academic correctness.
9. state_tokens
Columns:
| Column | Type | Required | Meaning |
|---|---|---|---|
token_id | text | yes | Internal token record id. |
state_id | text | yes | Associated state id. |
state_token_verifier | blob or text | yes | Keyed verifier derived from raw token. |
verifier_algorithm | text | yes | Example: hmac_sha256. |
verifier_key_version | integer | yes | Runtime key version. |
created_at | text | yes | UTC timestamp. |
last_used_at | text | no | UTC timestamp when successfully used. |
revoked_at | text | no | UTC timestamp when revoked. |
Constraints:
- primary key on
token_id; - foreign key from
state_idtostates.state_id; - unique
state_token_verifierwhen stored as deterministic keyed digest; - active token means
revoked_at is nulland associated state is not deleted.
Recommended indexes:
idx_state_tokens_verifier;idx_state_tokens_state_id;idx_state_tokens_revoked_at.
The table must not store raw state_token.
The table must not store reversible token encodings.
The table must not store user passwords.
10. Token Generation
The server generates a state token during POST /api/v1/state.
Requirements:
- at least 256 bits of entropy;
- recommended v1 shape is 32 random bytes;
- generated by cryptographically secure randomness;
- encoded as base64url without padding;
- contains no state id;
- contains no user data;
- shown to the client once.
The backend derives:
state_token_verifier = HMAC(runtime_secret_key, state_token)The exact algorithm can be hmac_sha256 in v1.
The verifier key version is stored with the verifier.
Verifier keys are runtime secret material.
Verifier keys are not stored in state.sqlite.
Verifier comparisons should use constant-time comparison where applicable.
11. Token Verification Flow
Flow:
- Parse
Authorization: Bearer <state_token>. - Reject missing or malformed tokens.
- Derive verifier using the current and any accepted previous verifier keys.
- Look up active token verifier.
- Confirm associated state is active.
- Optionally update
last_used_at. - Return
state_idto the request context.
The backend should not reveal which step failed.
Invalid token, revoked token, deleted state, and unknown token should all produce 401 unauthorized for ordinary clients.
Rate limiting should count failures.
12. Key Rotation
The backend should support multiple verifier key versions.
Version 1 does not need an online rotation UI.
The schema must allow:
- current key version;
- previous accepted key versions;
- per-token verifier key version.
A later maintenance command may re-derive verifiers after successful token authentication.
Because raw tokens are not stored, bulk offline verifier rotation is not possible unless old keys remain available or users reauthenticate.
This is acceptable for version 1.
13. state_events
Columns:
| Column | Type | Required | Meaning |
|---|---|---|---|
event_id | text | yes | Event id. |
state_id | text | yes | Associated state id. |
event_kind | text | yes | Event kind. |
created_at | text | yes | UTC timestamp. |
request_id | text | no | Request correlation id. |
details_json | text | no | Non-secret event details. |
Event kinds:
state_created;state_replaced;migration_preview_created;migration_accepted;state_deleted;token_revoked;token_verified.
details_json must not include raw tokens.
details_json must not include full state by default.
token_verified events may be disabled if they create excessive write load.
14. state_tombstones
Columns:
| Column | Type | Required | Meaning |
|---|---|---|---|
state_id | text | yes | Deleted state id. |
deleted_at | text | yes | UTC timestamp. |
deletion_mode | text | yes | hard_delete in v1. |
catalog_version_id | text | no | Catalog version for aggregate debugging. |
state_schema_version | text | no | State schema version at deletion. |
Version 1 uses hard deletion.
The backend deletes the states row or otherwise removes the state payload.
The tombstone must not contain course history, grades, planned courses, credential targets, assumptions, notes, or UI preferences.
The tombstone should not require a foreign key to an active states row.
Token records for deleted states must be removed or revoked before commit.
15. migration_previews
Columns:
| Column | Type | Required | Meaning |
|---|---|---|---|
migration_preview_id | text | yes | Preview id. |
state_id | text | yes | Source state id. |
from_catalog_version_id | text | yes | Source catalog version. |
to_catalog_version_id | text | yes | Target catalog version. |
preview_json | text | yes | Advisory preview result. |
created_at | text | yes | UTC timestamp. |
expires_at | text | yes | Expiry timestamp. |
accepted_at | text | no | Acceptance timestamp. |
V1 backend loads one active catalog.
If the source catalog is unavailable, the preview may contain unresolved or unavailable findings rather than exact requirement diffs.
Preview results are advisory.
Acceptance is explicit.
16. State Replacement Transaction
PUT /api/v1/state/current should run in one transaction.
Steps:
- authenticate token;
- load current state row for update when supported;
- validate request
student_state; - ensure catalog-version policy is followed;
- write replacement
state_json; - increment
state_version; - update
updated_at; - write
state_eventsrecord; - commit.
If any step fails, the state remains unchanged.
17. State Creation Transaction
POST /api/v1/state should run in one transaction.
Steps:
- validate optional initial state;
- choose active catalog version if initial state omits one;
- generate
state_id; - generate raw
state_token; - derive
state_token_verifier; - insert
states; - insert
state_tokens; - insert
state_events; - commit;
- return raw token once.
If commit fails, the raw token must not be returned.
18. Export Transaction
GET /api/v1/state/current/export should:
- authenticate token;
- read current state;
- construct export JSON;
- return export with
Cache-Control: no-store.
The export must not include token verifier data.
The export must not include raw token.
Export is read-only by API contract.
It must not update states, state_events, token rows, or other persistent audit tables.
Persistent export auditing requires a future explicit mutating endpoint.
19. Deletion Transaction
DELETE /api/v1/state/current should:
- authenticate token;
- validate confirmation;
- remove or revoke token records;
- insert minimal hard-delete tombstone or deletion event only if configured;
- delete associated migration preview rows when practical;
- delete the state row or erase the state payload;
- commit.
If a deletion event is retained, it must not require retaining the states row.
The schema must not keep state_json merely to satisfy an audit foreign key.
After deletion, the same token should not authorize state access.
The response should be safe to repeat from the client’s perspective.
20. Concurrency
SQLite write concurrency is acceptable for v1.
The backend should use transactions for all mutations.
The backend should avoid long write transactions.
The backend should avoid running expensive query evaluation inside a state write transaction.
The backend should detect lost updates through state_version.
PUT /api/v1/state/current should support optional optimistic concurrency:
{ "expected_state_version": 4, "student_state": {}}If the expected version does not match, return 409 conflict.
21. Retention
Version 1 should retain active state until the user deletes it.
Deleted state payload must be erased according to ADR 0019.
Only minimal non-content deletion metadata may remain.
Migration previews should expire.
Event records may be pruned.
No retention policy should depend on email identity because there are no accounts in v1.
22. Backup
The state database should be backed up independently from the published index.
Backups contain sensitive academic state.
Backups must not include raw tokens because raw tokens are never stored.
Backups do include token verifiers.
Backups should be protected like sensitive application data.
23. Local Development
Default local path:
data/runtime/state.sqliteThe backend may create parent directories in local development.
The backend should not create missing published index directories.
The backend should fail clearly when UWSCRAPE_STATE_DB_PATH is not writable.
24. Test Scenarios
State storage tests should cover:
- new database migration;
- unsupported future state database version;
- state creation stores verifier but not raw token;
- token verification succeeds with correct token;
- token verification fails with wrong token;
- token comparison path does not leak failure detail;
- full state replacement increments
state_version; - optimistic concurrency conflict;
- export excludes token fields;
- export performs no persistent state-store writes;
- delete revokes token;
- deleted token cannot load state;
- catalog version is persisted;
- unresolved references survive replacement;
- migration preview expires;
- state write does not touch published index file.
25. References
- Go
database/sql: https://pkg.go.dev/database/sql - Go
crypto/rand: https://pkg.go.dev/crypto/rand - Go
crypto/hmac: https://pkg.go.dev/crypto/hmac - Go
crypto/subtle: https://pkg.go.dev/crypto/subtle - SQLite URI filenames: https://www.sqlite.org/uri.html
- SQLite
PRAGMA user_version: https://www.sqlite.org/pragma.html#pragma_user_version