Skip to content

Backend Runtime Operations Specification

Backend Runtime Operations Specification

Status: Draft v0.1.

Project: UWScrape.

Directory: docs/specs.

Audience: backend implementers, operators, release reviewers, and local development maintainers.

Last reviewed: 2026-05-11.

Primary architecture document: docs/reference/architecture/backend-runtime-architecture/.

Related documents:

  • docs/specifications/backend-api-spec/
  • docs/specifications/backend-state-storage-spec/
  • docs/decisions/0006-backend-index-contract/
  • docs/decisions/0007-index-release-gate-policy/
  • docs/decisions/0010-go-runtime-readonly-index-and-state-store/
  • docs/decisions/0011-anonymous-state-token-policy/

1. Purpose

This document specifies version 1 backend runtime operations.

It covers startup validation, configuration, health responses, request limits, timeouts, logging, metrics, static asset serving, and deployment assumptions.

It does not specify endpoint payload semantics.

It does not specify scraper or index builder commands.

It does not choose a concrete SQLite driver.

2. Source Anchors

The Go implementation should use net/http server concepts such as Server, handlers, request contexts, timeouts, and shutdown: Go net/http.

The backend should use Go database/sql concepts for state and index database access unless a later implementation spec narrows this: Go database/sql.

SQLite URI filename parameters such as mode=ro are documented by SQLite: SQLite URI filenames.

HTTP caching behavior follows RFC 9111: RFC 9111.

HTTP method and status semantics follow RFC 9110: RFC 9110.

Token handling follows OWASP session guidance for unpredictable identifiers, secure handling, and avoiding URL transport and logging: OWASP Session Management Cheat Sheet.

3. Runtime Inputs

Version 1 backend inputs:

  • backend binary;
  • published index directory;
  • writable state database path;
  • token verifier key file or equivalent local secret source;
  • optional static frontend directory for fallback deployments;
  • environment configuration.

The backend does not fetch Waterloo or Kuali data at runtime.

The backend does not read raw scraper snapshots at runtime.

The backend does not apply parser patches at runtime.

The backend does not hot-swap indexes in version 1.

4. Configuration Keys

Required keys:

KeyMeaning
UWSCRAPE_BIND_ADDRHTTP bind address, such as 127.0.0.1:8080.
UWSCRAPE_INDEX_DIRPublished index directory.
UWSCRAPE_STATE_DB_PATHWritable state SQLite database path.
UWSCRAPE_TOKEN_KEY_PATHSecret key material for token verifiers.

Optional keys:

KeyMeaning
UWSCRAPE_STATIC_DIROptional fallback frontend static asset directory. Primary v1 frontend deployment uses the SvelteKit Node server.
UWSCRAPE_LOG_LEVELLog verbosity.
UWSCRAPE_ALLOWED_ORIGINSAllowed frontend origins when split from API.
UWSCRAPE_MAX_STATE_BODY_BYTESOverride state request body limit.
UWSCRAPE_MAX_QUERY_BODY_BYTESOverride query request body limit.
UWSCRAPE_MAX_GRAPH_BODY_BYTESOverride graph request body limit.

Production defaults:

  • no unapproved index override;
  • no unsupported schema override;
  • no request body logging;
  • no raw token logging;
  • bind to localhost unless configured otherwise.

5. Startup Sequence

Startup must fail closed by default.

Sequence:

  1. load configuration;
  2. validate configuration values;
  3. resolve absolute paths;
  4. read build-metadata.json;
  5. read validation-summary.json;
  6. read release-decision.json;
  7. verify release decision status is approved or approved_with_warnings;
  8. verify index_schema_version is supported;
  9. open course-universe.sqlite read-only;
  10. compare SQLite metadata with build-metadata.json;
  11. validate required index tables exist;
  12. run cheap index consistency probes;
  13. open or create state database;
  14. run state database migrations;
  15. load token verifier key material;
  16. initialize route handlers;
  17. start HTTP server.

If any required step fails, the backend must not serve ordinary API traffic.

If an explicit diagnostic server mode is added later, development diagnostics must be visible in /api/v1/health and /api/v1/index.

Diagnostic mode must not serve ordinary query or state mutation traffic with an unsupported schema, missing release decision, or rejected release decision.

6. Published Index Checks

The backend must confirm the published index directory contains:

  • course-universe.sqlite;
  • build-metadata.json;
  • validation-summary.json;
  • release-decision.json;
  • build-report.md.

Optional:

  • graph-projection.json.

The backend must treat SQLite as canonical.

If optional graph projection data is present but malformed, has the wrong view type, omits index/catalog identity metrics, or declares identity metrics that disagree with the published index metadata, the backend must refuse startup. Missing graph projection data is not a startup error; the backend falls back to SQLite-backed graph construction.

Graph routes that use the projection cache must identify that in response metrics, for example with projection_source: "published_artifact".

The backend must parse release-decision.json.

Required release decision fields for startup:

  • release decision id;
  • release decision status;
  • reviewed timestamp;
  • index_id;
  • index_schema_version;
  • parser version;
  • source catalog metadata;
  • artifact hashes.

Allowed release decision statuses:

  • approved;
  • approved_with_warnings;
  • rejected.

Startup accepts approved and approved_with_warnings.

Startup rejects rejected, missing status, unknown status, missing release decision file, and release decision metadata that conflicts with build-metadata.json or SQLite metadata.

7. Single Active Catalog Policy

Version 1 serves one active published index and one active catalog for normal runtime evaluation.

Multi-catalog runtime loading is not a version 1 requirement.

If saved state references a different catalog version, query responses must report catalog mismatch.

If the state catalog is unavailable, query responses must report catalog_unavailable or academic unknown depending on endpoint semantics.

The backend must not silently reinterpret old state under the active catalog.

8. Read-Only Index Opening

The backend should open the index database read-only.

When the SQLite driver supports URI filenames, use mode=ro.

Use immutable=1 only when deployment guarantees the file will not change during process lifetime.

Do not use immutable=1 if deployment overwrites the same SQLite file in place.

The backend must not write to the index database.

The backend should include a test or startup check that detects accidental write capability when practical.

9. State Database Startup

The backend opens or creates the state database at UWSCRAPE_STATE_DB_PATH.

If the database does not exist, create it and apply migrations.

If the database exists, verify supported state store schema version.

If the database schema is newer than the backend supports, fail startup.

If the database schema is older and migrations are available, run migrations.

State database migration failure is startup failure.

State database migration must not touch the published index.

10. Health Endpoint

GET /api/v1/health returns operational health.

Required fields:

  • status;
  • degraded;
  • checks;
  • loaded index summary when available;
  • release decision summary when an index is loaded.

status values:

  • ok;
  • degraded;
  • starting;
  • error.

Health response must not include:

  • filesystem paths;
  • raw tokens;
  • token verifier values;
  • verifier key material;
  • full state data.

11. Index Metadata Endpoint

GET /api/v1/index returns loaded index metadata.

It should include:

  • index_id;
  • index_schema_version;
  • catalog_version_id;
  • catalog_title;
  • upstream_catalog_id;
  • release_status;
  • release_decision_id;
  • release_decision_reviewed_at;
  • build_started_at;
  • build_completed_at;
  • parser_version;
  • validation_summary;
  • release decision summary.

It should not include local absolute paths by default.

It should not include raw source payloads.

12. Request Body Limits

Default limits:

Request classDefault
state create or replace256 KiB
query request256 KiB
graph view request128 KiB
migration preview256 KiB

The backend should reject oversized requests before fully decoding them.

Go implementations can use request body limiting through net/http facilities such as MaxBytesReader: Go net/http.

Oversized requests return 413 payload_too_large.

13. Timeouts

Default timeouts:

TimeoutDefault
read header timeout5 seconds
request body read timeout15 seconds
catalog request timeout5 seconds
state request timeout10 seconds
query request timeout15 seconds
graph view request timeout15 seconds
migration preview timeout30 seconds

Request handlers should use request contexts for cancellation.

Client disconnect should cancel unnecessary downstream work.

Long-running future planning should use an explicit asynchronous workflow rather than ordinary synchronous query requests.

Timeouts that affect academic query completeness must be reflected as time_limit_reached unknowns when a partial response is still returned.

Operational timeouts may return HTTP errors when no useful response can be produced.

14. Rate Limiting

Version 1 should rate limit:

  • state creation;
  • token verification failures;
  • state mutation;
  • query endpoints;
  • graph view endpoints;
  • migration preview;
  • state deletion attempts.

Catalog GET endpoints can have looser limits.

Rate limiting can use:

  • client IP;
  • token verifier id after successful authentication;
  • route class.

Rate limit failures return 429 rate_limited.

Rate limit responses must not reveal whether a state token exists.

15. Logging

Logs should include:

  • request id;
  • method;
  • route pattern;
  • status code;
  • duration;
  • response size when available;
  • index id;
  • query type;
  • graph view type;
  • semantic status counts;
  • error code.

Logs must not include:

  • raw state tokens;
  • token verifiers;
  • verifier key material;
  • authorization headers;
  • full request bodies by default;
  • user notes;
  • full grade history by default;
  • raw source HTML by default.

Debug mode may log additional local details only when explicitly enabled.

Debug mode must still redact tokens.

16. Metrics

Recommended metrics:

  • request count by route;
  • request latency by route;
  • error count by code;
  • rate limit count by route;
  • query latency by query type;
  • query academic status counts;
  • unknown count by unknown_reason;
  • conflict count by conflict_reason;
  • graph view node count;
  • graph view edge count;
  • graph view truncation count;
  • state creation count;
  • state replacement count;
  • state deletion count;
  • token verification failure count;
  • loaded index id;
  • loaded index schema version;
  • state database schema version.

Metrics must not contain raw tokens.

Metrics should not contain user-specific course lists.

Unknown counts are important because they reveal parser and data quality gaps.

V1 diagnostics expose a redacted in-process request-metrics snapshot under /api/v1/diagnostics for local demo and development triage. Route keys must be templated, for example /api/v1/courses/{subject}/{catalog_number}, so diagnostics can identify hot graph/query routes without recording course-specific paths, state tokens, or student-state content. The snapshot should include request count, client/server error count, in-flight count, and average/max/last duration in milliseconds.

17. Optional Static Asset Serving

The backend may serve static frontend assets in version 1 only as an optional fallback or local demo mode.

Static assets are optional.

The primary version 1 frontend deployment is the SvelteKit Node server defined by ADR 0020 and the frontend runtime architecture.

API routes remain under /api/v1.

Unknown /api/ routes must return API errors, not index.html.

Unknown non-API routes may serve index.html for client-side routing when static serving is enabled.

Hashed static assets may use long cache lifetimes.

index.html should use cautious caching.

API caching is specified separately from static asset caching.

Static asset serving must not change backend academic semantics.

18. CORS

If frontend and backend are same-origin, CORS may be disabled.

If frontend and backend are split, allowed origins must be explicit.

The backend should not use wildcard CORS for state endpoints in production.

Authorization headers must be allowed only for trusted origins.

Cookie credentials are not part of v1 because browser cookie sessions are not part of v1.

HttpOnly, Secure, and SameSite cookie settings belong to a later session extension.

19. Deployment

Version 1 deployment is a single backend process.

Inputs:

  • backend binary;
  • published index directory;
  • state database path;
  • token verifier key path;
  • optional static directory.

Index promotion should happen by changing configuration or symlink target and restarting the backend.

The backend should not hot-swap indexes in version 1.

The state database must be backed up independently from the published index.

Backups should be protected as sensitive data.

20. Local Development

Example local run:

UWSCRAPE_BIND_ADDR=127.0.0.1:8080 \
UWSCRAPE_INDEX_DIR=data/published/2026-2027-undergraduate \
UWSCRAPE_STATE_DB_PATH=data/runtime/state.sqlite \
UWSCRAPE_TOKEN_KEY_PATH=data/runtime/token-key \
go run ./cmd/uwscrape-server

The exact command may change after implementation.

Local development must use the same published index contract as production.

Do not introduce a separate development-only index format.

21. Check Config Mode

The backend should support a startup validation mode.

Example:

go run ./cmd/uwscrape-server --check-config

The mode should:

  1. load configuration;
  2. validate index metadata;
  3. open index read-only;
  4. run index probes;
  5. open or create state database if configured to do so;
  6. validate state schema;
  7. report success or failure;
  8. exit without serving HTTP.

This mode is useful for deployment and release validation.

22. Failure Modes

Startup failure examples:

  • missing index directory;
  • missing course-universe.sqlite;
  • unsupported index schema;
  • missing release decision;
  • rejected release decision;
  • unsupported release decision status;
  • unreadable metadata file;
  • SQLite metadata mismatch;
  • state database migration failure;
  • missing token key in production mode;
  • invalid bind address.

Runtime error examples:

  • invalid JSON;
  • missing authorization;
  • invalid state token;
  • request too large;
  • graph hard bound exceeded;
  • state version conflict;
  • catalog mismatch;
  • catalog unavailable.

Catalog mismatch and catalog unavailable are not always HTTP errors.

For query endpoints, they may be academic unknowns or warnings inside a successful response.

23. Security Checklist

Implementation must:

  • generate tokens with cryptographically secure randomness;
  • store only token verifiers;
  • compare verifiers safely;
  • reject tokens in URLs;
  • redact authorization headers;
  • use Cache-Control: no-store for state responses;
  • rate limit token failures;
  • avoid wildcard CORS for state endpoints;
  • avoid logging request bodies by default;
  • use HTTPS in non-local deployment;
  • keep published index read-only.

24. Test Scenarios

Runtime operation tests should cover:

  • startup succeeds with approved index;
  • startup succeeds with approved-with-warnings index and exposes warnings;
  • startup fails with missing index file;
  • startup fails with unsupported index schema;
  • startup fails with missing release decision;
  • startup fails with rejected release decision;
  • index opens read-only;
  • state database migrates from empty file;
  • future state database version fails startup;
  • missing token key fails production startup;
  • health endpoint redacts paths and secrets;
  • index endpoint reports loaded index;
  • state response has Cache-Control: no-store;
  • catalog response can be cacheable;
  • request body limit returns 413;
  • route timeout produces expected error or unknown;
  • logs redact authorization headers;
  • static serving does not swallow /api/ errors;
  • check-config exits without serving HTTP.

25. Acceptance Criteria

An implementation satisfies this spec when:

  • startup fail-closed behavior is implemented;
  • single active index policy is enforced;
  • state store and index store are separate files;
  • state database migrations are versioned;
  • token key material is required for state-token deployments;
  • request limits and timeouts are configured;
  • logs and metrics redact tokens;
  • health and index endpoints expose useful non-secret state;
  • static serving is optional and cannot override API routes.

26. References