Skip to content

Backend State Storage Specification

Backend State Storage Specification

Status: Draft v0.1.

Project: UWScrape.

Directory: docs/specs.

Audience: backend implementers, data reviewers, security reviewers, and migration implementers.

Last reviewed: 2026-05-11.

Primary architecture document: docs/reference/architecture/backend-runtime-architecture/.

Related documents:

  • docs/specifications/student-state-schema-spec/
  • docs/specifications/backend-api-spec/
  • docs/specifications/backend-runtime-operations-spec/
  • docs/decisions/0004-student-state-catalog-version-migration-policy/
  • docs/decisions/0010-go-runtime-readonly-index-and-state-store/
  • docs/decisions/0011-anonymous-state-token-policy/
  • docs/decisions/0019-v1-state-deletion-and-export-semantics/

1. Purpose

This document specifies the writable backend state store.

The state store is a SQLite database separate from the published index SQLite database.

The state store persists anonymous student planning state.

The state store persists token verifiers.

The state store persists migration and deletion metadata.

The state store must never contain raw state tokens.

The state store must never contain published index tables copied wholesale from the runtime index.

2. Source Anchors

The backend uses Go database/sql concepts for database handles, transactions, rows, and statements unless a later implementation spec adds a narrower wrapper: Go database/sql.

State token generation should use cryptographically secure randomness from Go crypto/rand: Go crypto/rand.

Token verifiers should use a keyed digest such as HMAC; Go documents HMAC in crypto/hmac: Go crypto/hmac.

Verifier comparison should use constant-time comparison where applicable; Go documents helpers in crypto/subtle: Go crypto/subtle.

SQLite URI parameters and read-only behavior are documented by SQLite: SQLite URI filenames.

SQLite PRAGMA user_version can store an application-controlled schema version integer: SQLite PRAGMA user_version.

3. Storage Boundary

The backend has two SQLite files in version 1:

published-index/course-universe.sqlite
runtime/state.sqlite

course-universe.sqlite is read-only.

state.sqlite is writable.

The backend must not attach writable state tables into the published index.

The backend must not write into the published index directory.

The backend must not run index migrations at startup.

The backend may run state store migrations at startup.

Index schema version and state store schema version are separate.

4. State Store Goals

The state store must support:

  • state creation;
  • token verification;
  • state loading;
  • full state replacement;
  • state export;
  • state hard deletion;
  • migration preview metadata;
  • migration acceptance metadata;
  • state version increments;
  • last-used tracking when enabled.

The state store should be inspectable with standard SQLite tooling.

The state store should keep sensitive data minimal.

The state store should make accidental token leakage difficult.

5. Schema Versioning

The database must store schema version in two places:

  • PRAGMA user_version;
  • schema_migrations table.

PRAGMA user_version is a quick integer compatibility check.

schema_migrations records exact applied migrations.

Initial user_version:

1

Initial logical state_schema_version:

1.0.0

Migration records use monotone integer migration ids.

The backend must refuse to run with a state database whose version is newer than the backend supports.

The backend may migrate older state databases forward when migrations are available.

The backend should back up or transactionally apply migrations so partial migration does not leave the database unusable.

6. Table Overview

Version 1 tables:

  • schema_migrations;
  • states;
  • state_tokens;
  • state_events;
  • state_tombstones when minimal deletion metadata is enabled;
  • migration_previews.

The states table stores the current logical student state as canonical JSON plus indexed metadata.

The state_tokens table stores token verifiers.

The state_events table records non-secret audit events useful for debugging and migration.

The optional state_tombstones table records minimal deletion metadata without retaining state content.

The migration_previews table stores optional short-lived preview results.

The project may later normalize completed courses and planned courses into separate tables.

Version 1 may keep the canonical state JSON as the source of truth to avoid duplicating schema logic before query needs require more indexes.

7. schema_migrations

Columns:

ColumnTypeRequiredMeaning
migration_idintegeryesMonotone migration id.
nametextyesHuman-readable migration name.
applied_attextyesUTC timestamp.
checksumtextyesMigration file or content checksum.

Constraints:

  • primary key on migration_id;
  • unique name.

8. states

Columns:

ColumnTypeRequiredMeaning
state_idtextyesInternal state id.
state_schema_versiontextyesLogical state schema version.
catalog_version_idtextyesCatalog version anchoring the state.
state_versionintegeryesMonotone version incremented on mutation.
state_jsontextyesCanonical logical student_state JSON.
created_attextyesUTC timestamp.
updated_attextyesUTC timestamp.
deleted_attextnoUTC timestamp when deleted.
migration_origin_state_idtextnoOrigin state for accepted migration.
migration_origin_catalog_version_idtextnoOrigin catalog for accepted migration.

Constraints:

  • primary key on state_id;
  • state_version >= 1;
  • deleted_at is null for active states.

Recommended indexes:

  • idx_states_catalog_version_id;
  • idx_states_updated_at;
  • idx_states_deleted_at.

state_json must not include raw state tokens.

state_json must not include token verifiers.

The backend should canonicalize JSON field order only if it materially helps diffing or checksums.

JSON canonicalization is not required for academic correctness.

9. state_tokens

Columns:

ColumnTypeRequiredMeaning
token_idtextyesInternal token record id.
state_idtextyesAssociated state id.
state_token_verifierblob or textyesKeyed verifier derived from raw token.
verifier_algorithmtextyesExample: hmac_sha256.
verifier_key_versionintegeryesRuntime key version.
created_attextyesUTC timestamp.
last_used_attextnoUTC timestamp when successfully used.
revoked_attextnoUTC timestamp when revoked.

Constraints:

  • primary key on token_id;
  • foreign key from state_id to states.state_id;
  • unique state_token_verifier when stored as deterministic keyed digest;
  • active token means revoked_at is null and associated state is not deleted.

Recommended indexes:

  • idx_state_tokens_verifier;
  • idx_state_tokens_state_id;
  • idx_state_tokens_revoked_at.

The table must not store raw state_token.

The table must not store reversible token encodings.

The table must not store user passwords.

10. Token Generation

The server generates a state token during POST /api/v1/state.

Requirements:

  • at least 256 bits of entropy;
  • recommended v1 shape is 32 random bytes;
  • generated by cryptographically secure randomness;
  • encoded as base64url without padding;
  • contains no state id;
  • contains no user data;
  • shown to the client once.

The backend derives:

state_token_verifier = HMAC(runtime_secret_key, state_token)

The exact algorithm can be hmac_sha256 in v1.

The verifier key version is stored with the verifier.

Verifier keys are runtime secret material.

Verifier keys are not stored in state.sqlite.

Verifier comparisons should use constant-time comparison where applicable.

11. Token Verification Flow

Flow:

  1. Parse Authorization: Bearer <state_token>.
  2. Reject missing or malformed tokens.
  3. Derive verifier using the current and any accepted previous verifier keys.
  4. Look up active token verifier.
  5. Confirm associated state is active.
  6. Optionally update last_used_at.
  7. Return state_id to the request context.

The backend should not reveal which step failed.

Invalid token, revoked token, deleted state, and unknown token should all produce 401 unauthorized for ordinary clients.

Rate limiting should count failures.

12. Key Rotation

The backend should support multiple verifier key versions.

Version 1 does not need an online rotation UI.

The schema must allow:

  • current key version;
  • previous accepted key versions;
  • per-token verifier key version.

A later maintenance command may re-derive verifiers after successful token authentication.

Because raw tokens are not stored, bulk offline verifier rotation is not possible unless old keys remain available or users reauthenticate.

This is acceptable for version 1.

13. state_events

Columns:

ColumnTypeRequiredMeaning
event_idtextyesEvent id.
state_idtextyesAssociated state id.
event_kindtextyesEvent kind.
created_attextyesUTC timestamp.
request_idtextnoRequest correlation id.
details_jsontextnoNon-secret event details.

Event kinds:

  • state_created;
  • state_replaced;
  • migration_preview_created;
  • migration_accepted;
  • state_deleted;
  • token_revoked;
  • token_verified.

details_json must not include raw tokens.

details_json must not include full state by default.

token_verified events may be disabled if they create excessive write load.

14. state_tombstones

Columns:

ColumnTypeRequiredMeaning
state_idtextyesDeleted state id.
deleted_attextyesUTC timestamp.
deletion_modetextyeshard_delete in v1.
catalog_version_idtextnoCatalog version for aggregate debugging.
state_schema_versiontextnoState schema version at deletion.

Version 1 uses hard deletion.

The backend deletes the states row or otherwise removes the state payload.

The tombstone must not contain course history, grades, planned courses, credential targets, assumptions, notes, or UI preferences.

The tombstone should not require a foreign key to an active states row.

Token records for deleted states must be removed or revoked before commit.

15. migration_previews

Columns:

ColumnTypeRequiredMeaning
migration_preview_idtextyesPreview id.
state_idtextyesSource state id.
from_catalog_version_idtextyesSource catalog version.
to_catalog_version_idtextyesTarget catalog version.
preview_jsontextyesAdvisory preview result.
created_attextyesUTC timestamp.
expires_attextyesExpiry timestamp.
accepted_attextnoAcceptance timestamp.

V1 backend loads one active catalog.

If the source catalog is unavailable, the preview may contain unresolved or unavailable findings rather than exact requirement diffs.

Preview results are advisory.

Acceptance is explicit.

16. State Replacement Transaction

PUT /api/v1/state/current should run in one transaction.

Steps:

  1. authenticate token;
  2. load current state row for update when supported;
  3. validate request student_state;
  4. ensure catalog-version policy is followed;
  5. write replacement state_json;
  6. increment state_version;
  7. update updated_at;
  8. write state_events record;
  9. commit.

If any step fails, the state remains unchanged.

17. State Creation Transaction

POST /api/v1/state should run in one transaction.

Steps:

  1. validate optional initial state;
  2. choose active catalog version if initial state omits one;
  3. generate state_id;
  4. generate raw state_token;
  5. derive state_token_verifier;
  6. insert states;
  7. insert state_tokens;
  8. insert state_events;
  9. commit;
  10. return raw token once.

If commit fails, the raw token must not be returned.

18. Export Transaction

GET /api/v1/state/current/export should:

  1. authenticate token;
  2. read current state;
  3. construct export JSON;
  4. return export with Cache-Control: no-store.

The export must not include token verifier data.

The export must not include raw token.

Export is read-only by API contract.

It must not update states, state_events, token rows, or other persistent audit tables.

Persistent export auditing requires a future explicit mutating endpoint.

19. Deletion Transaction

DELETE /api/v1/state/current should:

  1. authenticate token;
  2. validate confirmation;
  3. remove or revoke token records;
  4. insert minimal hard-delete tombstone or deletion event only if configured;
  5. delete associated migration preview rows when practical;
  6. delete the state row or erase the state payload;
  7. commit.

If a deletion event is retained, it must not require retaining the states row.

The schema must not keep state_json merely to satisfy an audit foreign key.

After deletion, the same token should not authorize state access.

The response should be safe to repeat from the client’s perspective.

20. Concurrency

SQLite write concurrency is acceptable for v1.

The backend should use transactions for all mutations.

The backend should avoid long write transactions.

The backend should avoid running expensive query evaluation inside a state write transaction.

The backend should detect lost updates through state_version.

PUT /api/v1/state/current should support optional optimistic concurrency:

{
"expected_state_version": 4,
"student_state": {}
}

If the expected version does not match, return 409 conflict.

21. Retention

Version 1 should retain active state until the user deletes it.

Deleted state payload must be erased according to ADR 0019.

Only minimal non-content deletion metadata may remain.

Migration previews should expire.

Event records may be pruned.

No retention policy should depend on email identity because there are no accounts in v1.

22. Backup

The state database should be backed up independently from the published index.

Backups contain sensitive academic state.

Backups must not include raw tokens because raw tokens are never stored.

Backups do include token verifiers.

Backups should be protected like sensitive application data.

23. Local Development

Default local path:

data/runtime/state.sqlite

The backend may create parent directories in local development.

The backend should not create missing published index directories.

The backend should fail clearly when UWSCRAPE_STATE_DB_PATH is not writable.

24. Test Scenarios

State storage tests should cover:

  • new database migration;
  • unsupported future state database version;
  • state creation stores verifier but not raw token;
  • token verification succeeds with correct token;
  • token verification fails with wrong token;
  • token comparison path does not leak failure detail;
  • full state replacement increments state_version;
  • optimistic concurrency conflict;
  • export excludes token fields;
  • export performs no persistent state-store writes;
  • delete revokes token;
  • deleted token cannot load state;
  • catalog version is persisted;
  • unresolved references survive replacement;
  • migration preview expires;
  • state write does not touch published index file.

25. References