Backend Runtime Architecture
Backend Runtime Architecture
Status: Draft for backend design.
Date: 2026-05-11.
Audience: backend implementers, frontend implementers, data reviewers, and future API spec authors.
Related documents:
docs/reference/architecture/offline-index-runtime-architecture/docs/specifications/scraper-pipeline-spec/docs/decisions/0002-index-artifact-storage-and-publication-policy/docs/decisions/0003-index-schema-versioning-policy/docs/decisions/0004-student-state-catalog-version-migration-policy/docs/decisions/0005-unparsed-requirement-semantics/docs/decisions/0006-backend-index-contract/docs/decisions/0007-index-release-gate-policy/docs/decisions/0008-manual-parser-patch-governance/
1. Thesis
The backend is a small Go service over a published academic index.
The published index is generated offline.
The backend does not scrape Waterloo or Kuali during user requests.
The backend does not apply parser patches during user requests.
The backend loads one active validated index at startup.
The backend stores anonymous student planning state in a separate writable store.
The backend exposes ordinary resource and query endpoints over HTTP.
The backend also exposes typed graph view endpoints for frontend-heavy graph screens.
The graph view endpoints are not a separate canonical model.
They are bounded projections from the same indexed course and requirement data.
The first backend should optimize for correctness, explainability, and stable contracts.
It should not optimize for a maximally general graph query language in version 1.
2. Design Position
The scraper pipeline owns source acquisition.
The index builder owns the runtime read model.
The release gate owns publication eligibility.
The backend owns request handling, anonymous state, query evaluation, and graph view projection.
The frontend owns interaction, layout, visual exploration, and client-side rendering performance.
The backend must therefore be boring in the best sense.
It should be easy to start locally.
It should be easy to inspect.
It should be difficult to accidentally serve an unapproved or incompatible index.
It should report uncertainty instead of smoothing it away.
It should keep user state separate from index artifacts so a termly index refresh cannot corrupt a student’s plan.
3. External Design Anchors
Go’s standard net/http package provides the basic server and handler model, including Server, ServeMux, handlers, shutdown, request contexts, and response writers.
That is enough for the first backend surface without committing to a large web framework.
See the official Go package documentation for net/http: pkg.go.dev/net/http.
SQLite supports URI parameters such as mode=ro and immutable=1.
The backend can use those to reinforce the read-only index boundary when the runtime index is deployed as a file.
See SQLite’s URI filename documentation: sqlite.org/uri.html.
HTTP method semantics matter for frontend behavior and caching.
GET should remain read-only.
PUT should replace state in an idempotent way.
POST is appropriate for computation requests and creation requests.
See RFC 9110 for safe and idempotent method semantics: RFC 9110.
HTTP caching matters because catalog resources are stable for an index while state and query responses may be private or state-dependent.
Public catalog responses can use validators such as ETag.
State responses and token-bearing responses should use restrictive cache directives.
See RFC 9111 for HTTP caching and no-store: RFC 9111.
GraphQL is a powerful fit for flexible client-driven entity queries.
GraphQL’s own materials emphasize precise data selection and retrieving related data in one request.
See the GraphQL project overview: graphql.org.
UWScrape should still defer a full GraphQL interface in version 1 because the hard problem is not fetching arbitrary fields.
The hard problem is preserving requirement semantics, source references, uncertainty, and bounded graph projections.
Typed graph view endpoints provide the useful part of graph-shaped fetching without making every frontend screen define its own semantics.
Anonymous state tokens should be treated as secrets.
OWASP’s session guidance emphasizes unpredictable identifiers, meaningless token contents, avoiding URL token transport, and avoiding sensitive token logging.
See the OWASP Session Management Cheat Sheet: cheatsheetseries.owasp.org.
4. Runtime Topology
The version 1 backend has five runtime inputs.
- A published index directory.
- A writable student state database path.
- Backend configuration.
- Optional frontend static assets for fallback deployments.
- Optional runtime secret material for token verifiers.
The primary version 1 frontend deployment is the SvelteKit Node server defined in ADR 0020.
The backend API runtime must work independently of that frontend server.
The backend has three runtime output classes.
- HTTP responses.
- Logs and metrics.
- Student state mutations.
It must not write into the published index directory.
It must not write into raw snapshot directories.
It must not write into parsed intermediate directories.
It must not mutate parser patch files.
It must not create a new runtime index.
flowchart LR subgraph Offline["Offline Pipeline"] Kuali["Waterloo/Kuali Public Calendar"] Fetch["fetch"] Parse["parse"] Validate["validate"] Build["build-index"] Gate["release gate"] Kuali --> Fetch --> Parse --> Validate --> Build --> Gate end
Gate --> Published["published index directory"]
subgraph Backend["Go Backend Runtime"] IndexRepo["read-only index repository"] StateStore["writable state store"] Evaluate["requirement evaluation"] GraphViews["graph view projection"] API["HTTP API"] end
Published --> IndexRepo StateStore <--> Evaluate IndexRepo --> Evaluate IndexRepo --> GraphViews Evaluate --> GraphViews API <--> StateStore API <--> Evaluate API <--> GraphViews
Frontend["interactive frontend"] <--> API5. Non-Goals
The version 1 backend does not run the scraper.
The version 1 backend does not fetch Kuali data during normal user requests.
The version 1 backend does not provide administrative patch editing.
The version 1 backend does not provide email-based accounts.
The version 1 backend does not provide social login.
The version 1 backend does not provide a generic graph query language.
The version 1 backend does not provide a complete timetable or enrollment system.
The version 1 backend does not guarantee that a student can enroll in a course.
The version 1 backend does not replace academic advising.
The version 1 backend does not make layout geometry authoritative.
The version 1 backend does not silently resolve unparsed calendar text.
The version 1 backend does not silently migrate student state to a newer catalog.
6. Identifier Vocabulary
The backend must use identifier names that reflect operational meaning.
This is important because the project already has complex requirement semantics.
Ambiguous or metaphorical component names would make future implementation harder to review.
Use these names in API contracts, database schemas, logs, and code unless a later spec changes them.
| Name | Meaning |
|---|---|
index_id | Stable id for one built runtime index artifact. |
index_schema_version | Version of the SQLite/read-model schema. |
parser_version | Version of scraper/parser code that created the index. |
catalog_version_id | Internal UWScrape id for an academic calendar version. |
catalog_slug | Human-readable calendar slug, if assigned by the project. |
upstream_catalog_id | Waterloo/Kuali catalog id, such as a Kuali catalog UUID. |
catalog_title | Source calendar title. |
source_pid | Kuali public item pid when available. |
source_item_id | Kuali public item id when available. |
course_listing_id | Internal id for one listing such as MATH 135. |
course_credit_id | Internal id for a credit identity shared by equivalent or cross-listed listings. |
credential_id | Internal id for a plan, major, minor, specialization, option, or similar credential. |
requirement_source_id | Internal id for a requirement-bearing source field. |
requirement_expression_id | Internal id for a parsed logical requirement expression. |
requirement_condition_id | Internal id for a leaf requirement condition. |
source_reference_id | Internal id for source provenance. |
state_id | Internal id for one anonymous student state record. |
state_token | User-held secret used to recover or access a state record. |
state_token_verifier | Server-stored verifier derived from the token. |
Do not use calendar_id when the value specifically means a Kuali id.
Use upstream_catalog_id for the Kuali value.
Do not use catalog_id by itself in API responses unless a nearby field explains whether it is internal or upstream.
Prefer catalog_version_id for the internal value.
Prefer index_id when the endpoint talks about the built artifact being served.
7. Runtime Artifact Contract
The backend consumes a published index directory.
The current contract comes from ADR 0006.
The minimum directory contains:
published-index/ course-universe.sqlite build-metadata.json validation-summary.json release-decision.json build-report.md release-decision.md # recommended human-readable companion graph-projection.json # optional precomputed visual projectionThe backend must treat course-universe.sqlite as the canonical runtime read model.
The backend loads graph-projection.json as an optional precomputed projection
cache when the file is present. Missing projection cache is not a startup error;
the runtime falls back to SQLite-backed graph construction.
Large projection artifacts may later use graph-projection.json.zst or another
explicitly versioned compressed companion, but the initial artifact is
uncompressed JSON so it can be inspected and release-hashed without another
compression dependency.
The backend must not trust a graph projection more than the SQLite index.
If the graph projection is present but malformed, has the wrong view_type,
omits index_id / catalog_version_id identity metrics, or declares identity
metrics that disagree with the published index metadata, the backend refuses
startup. SQLite remains canonical even when the cache is accepted.
Runtime graph routes may serve the projection cache directly for
course-universe, applying request bounds by truncating the cached projection
with graph_view_truncated warnings.
8. Startup Sequence
Startup is part of the correctness model.
A backend that starts with the wrong index can give wrong answers even if each endpoint is implemented well.
The backend startup sequence should be:
- Load configuration.
- Resolve absolute paths for the index directory and state database.
- Read
build-metadata.json. - Read
validation-summary.json. - Read
release-decision.json. - Verify the release decision status is
approvedorapproved_with_warnings. - Verify the
index_schema_versionis supported. - Open
course-universe.sqliteread-only. - Validate the SQLite
metadatatable againstbuild-metadata.json. - Validate required tables and critical indexes exist.
- Run a small internal consistency probe.
- Load optional graph projection metadata.
- Open or create the writable state database.
- Run state database migrations.
- Start the HTTP server.
The internal consistency probe should be cheap.
It should verify that the index can answer representative queries.
It should check at least:
- one catalog version row exists;
- course listing count is nonzero;
- credential count is nonzero when credentials are in scope;
- requirement source count is nonzero;
- source reference lookup works for at least one referenced item;
- validation status and release decision match the approved release gate policy.
Startup must fail closed when the index is unsupported.
Diagnostic development modes may report degraded startup checks, but they must not serve ordinary API traffic with an unsupported schema, missing release decision, or rejected release decision.
Degraded diagnostics must be visible through health and metadata endpoints when a diagnostic server mode is explicitly implemented.
9. Configuration
Version 1 configuration should stay file and environment variable based.
A minimal local run should not require Kubernetes, service discovery, or managed secrets.
Suggested configuration keys:
| Key | Required | Meaning |
|---|---|---|
UWSCRAPE_BIND_ADDR | Yes | HTTP listen address, such as 127.0.0.1:8080. |
UWSCRAPE_INDEX_DIR | Yes | Absolute or working-directory-relative path to the published index directory. |
UWSCRAPE_STATE_DB_PATH | Yes | Path to the writable state SQLite database. |
UWSCRAPE_TOKEN_KEY_PATH | Yes for shared deployments | Path to secret key material used for token verifiers. |
UWSCRAPE_ALLOWED_ORIGINS | Later | Comma-separated allowed frontend origins if API and frontend are split. |
UWSCRAPE_STATIC_DIR | Optional fallback | Directory of fallback frontend assets to serve from the same process. Primary v1 frontend deployment uses the SvelteKit Node server. |
UWSCRAPE_LOG_LEVEL | Optional | Log verbosity. |
The backend should not provide a normal serving override for unsupported index schemas, missing release decisions, or rejected release decisions.
The production default should not bind publicly without explicit configuration.
The production default should not log request bodies.
The production default should not log state tokens.
10. Storage Boundary
The backend has two storage roles.
The index store is read-only.
The student state store is writable.
These stores must be physically separate files in version 1.
This separation makes backup, deployment, and disaster recovery easier.
It also prevents a state write from changing a published index artifact.
Recommended local layout:
data/ published/ 2026-2027-undergraduate/ course-universe.sqlite build-metadata.json validation-summary.json build-report.md runtime/ state.sqliteThe published directory may be replaced atomically during deployment.
The state database must not be replaced during index deployment.
The backend should open the index with a read-only SQLite connection string.
If the driver supports SQLite URI filenames, the backend should use mode=ro.
If the deployment guarantees the file is immutable while the process is running, the backend may also use immutable=1.
Using immutable=1 when the file can change underneath the process is unsafe because SQLite warns that incorrect query results or corruption errors can follow if the file changes after being marked immutable.
Therefore immutable=1 should be enabled only when deployment creates a new file path or content-addressed artifact for each index.
11. State Database Role
The state database stores user planning records.
It does not store source calendar facts.
It does not store runtime index rows copied from the published index, except for stable ids needed to remember user choices.
It must preserve user-entered unresolved values.
It must keep academic progress separate from academic standing.
Academic progress examples include 1A, 1B, 2A, 2B, 3A, 3B, 4A, and 4B.
Academic standing examples include good standing, academic probation, required to withdraw, or source-specific standing text.
The backend should avoid assuming that academic progress implies academic standing.
The backend should avoid assuming that academic standing implies academic progress.
State records should store:
state_id;catalog_version_id;- created time;
- updated time;
- completed courses;
- grades when provided;
- planned courses;
- academic progress;
- academic standing;
- declared credentials;
- desired credentials;
- unresolved user-entered course references;
- unresolved user-entered credential references;
- optional user notes;
- optional pinned or dismissed explanation ids;
- optional frontend preferences.
State records should not store:
- email address;
- legal name;
- student number;
- Waterloo login;
- government id;
- payment information;
- exact location;
- class schedule location unless a later timetable feature explicitly requires it.
12. State Token Model
The user wanted a lightweight access mechanism without asking for email.
The backend should implement this as a generated secret state token.
The token is a capability secret.
Whoever has the token can access the state record.
The token should be generated by the server using a cryptographically secure random source.
The token should contain at least 256 bits of randomness.
The token should be encoded as base64url without padding.
The token should be shown once when a state record is created.
The token should not encode user data.
The token should not encode the state id in reversible form.
The token should not be called a hash in user-facing text.
The user-facing text should call it an access key, recovery key, or state token.
The backend should store only a verifier derived from the token.
The recommended verifier is an HMAC or keyed hash over the token using runtime secret material.
The state database should store:
- token verifier;
- verifier key version;
- state id;
- created time;
- last used time when useful;
- revoked time when useful.
The backend should accept the token through an Authorization header.
The backend should not accept tokens in URLs.
The backend should not log tokens.
The backend should redact authorization headers from logs.
The backend may later add an HttpOnly cookie flow for same-origin browser use.
In that flow, the state token remains the long-term recovery secret.
The cookie should be a browser session handle, not a plaintext copy of the recovery token.
13. Package Boundaries
The first Go implementation should keep packages small and operationally named.
Suggested package layout:
cmd/uwscrape-server/ main.go
internal/server/ server.go config.go startup.go
internal/api/ routes.go catalog_handlers.go state_handlers.go query_handlers.go graph_handlers.go errors.go
internal/catalogstore/ sqlite.go metadata.go courses.go credentials.go requirements.go sources.go
internal/statestore/ sqlite.go migrations.go state.go tokens.go
internal/evaluate/ unlock.go credential_progress.go what_if.go course_impact.go result.go
internal/graphview/ views.go course_neighborhood.go credential_requirements.go overlays.go
internal/response/ envelope.go warnings.go source_references.go
internal/model/ identifiers.go catalog.go state.go requirements.gocatalogstore should expose read methods over the index.
statestore should expose read and write methods over student state.
evaluate should combine catalog and state data into query results.
graphview should produce bounded graph projections.
api should translate HTTP requests and responses.
response should define shared envelope and error structures.
model should define shared identifiers and typed values.
Package names should remain descriptive.
Avoid names that hide meaning behind metaphor.
14. Request Flow
A typical catalog lookup request follows this path:
apiparses and validates the request.catalogstorereads from the index.responseattaches metadata and source references.apiwrites JSON with cache headers.
A typical state query request follows this path:
apivalidates the request body.apiauthenticates the state token when the endpoint requires state.statestoreloads the state record.catalogstoreloads required catalog records.evaluatecomputes the query result.responseattaches warnings, unknowns, source references, and index metadata.apiwrites JSON with private or no-store cache headers.
A typical graph view request follows this path:
apivalidates the requested view type and bounds.apiauthenticates state if the view contains state-dependent overlays.catalogstoreloads required nodes, requirement data, and source references.evaluatecomputes state-dependent statuses when needed.graphviewbuilds a typed bounded projection.responseattaches projection metadata and warnings.apiwrites JSON.
15. API Principles
The backend API should be stable enough for frontend iteration.
It should use versioned paths beginning with /api/v1.
It should use plural resource names for resource collections.
It should use GET for read-only resource retrieval.
It should use POST for query requests with complex request bodies.
It should use POST for state creation.
It should use PUT for full state replacement.
It should reserve PATCH for partial state updates after a patch semantics spec exists.
It should use DELETE for destructive state removal only with explicit confirmation semantics.
It should return JSON for API endpoints.
It should return a consistent response envelope.
It should include index metadata in every successful API response.
It should include source references where a response depends on calendar source text.
It should include unknowns where parsed data is incomplete.
It should not hide unknowns in human-readable text alone.
It should keep endpoint names plain and operational.
16. Response Envelope
Every successful JSON response should use the same top-level structure.
{ "data": {}, "meta": { "api_version": "v1", "index_id": "idx_2026_2027_undergrad_20260511_001", "index_schema_version": "0.1.0", "catalog_version_id": "uw_undergrad_2026_2027", "catalog_title": "2026-2027 Undergraduate Studies Academic Calendar", "upstream_catalog_id": "67e557ed6ed2fe2bd3a38956" }, "warnings": [], "unknowns": [], "source_references": []}data contains the endpoint-specific payload.
meta contains API, index, and request metadata.
warnings contains machine-readable non-fatal warnings.
unknowns contains semantic uncertainty that can affect correctness.
source_references contains source provenance used directly by the response.
The envelope prevents important caveats from being trapped inside prose.
The frontend can render warnings and unknowns consistently across views.
Error responses should use a compatible structure.
{ "error": { "code": "unsupported_index_schema", "message": "The loaded index schema version is not supported by this backend.", "details": { "loaded": "2.0.0", "supported": ["0.1.0"] } }, "meta": { "api_version": "v1" }, "warnings": [], "unknowns": [], "source_references": []}The message field is for humans.
The code field is for clients.
The details field must not expose secrets.
17. Result Status Vocabulary
Requirement and query results should not collapse into booleans.
Use the following status vocabulary in version 1.
| Status | Meaning |
|---|---|
satisfied | Known satisfied under the loaded index and provided state. |
not_satisfied | Known not satisfied under the loaded index and provided state. |
partial | Some subrequirements are satisfied, but the whole requirement is not. |
unknown | Unparsed or unavailable data prevents a reliable answer. |
conflict | The state or requirement set contains incompatible facts. |
not_applicable | The requirement or query does not apply to the selected target. |
Do not use blocked as the only negative status.
blocked is useful as an explanation category, not as the whole result.
A course can be not_satisfied because a prerequisite is missing.
A course can be not_satisfied because a minimum grade is missing.
A course can be unknown because a requirement fragment is unparsed.
A course can be conflict because the state includes mutually exclusive credit assumptions.
The response should expose those distinctions.
18. Endpoint Map
Version 1 should start with a small endpoint set.
The exact request and response schemas belong in later specs.
This architecture document fixes the endpoint families and intent.
18.1 System and Index Metadata
| Method | Path | Purpose |
|---|---|---|
GET | /api/v1/health | Process health and degraded-mode status. |
GET | /api/v1/index | Loaded index metadata and validation status. |
GET | /api/v1/catalog-versions | Catalog versions available in the loaded index. |
GET | /api/v1/catalog-versions/{catalog_version_id} | One catalog version record. |
/api/v1/index should identify the artifact being served.
/api/v1/catalog-versions should identify academic calendar versions contained in the artifact.
Those are related but not identical.
18.2 Catalog Resource Endpoints
| Method | Path | Purpose |
|---|---|---|
GET | /api/v1/courses | Search or list course listings. |
GET | /api/v1/courses/{subject}/{catalog_number} | Get one course listing by subject and catalog number. |
GET | /api/v1/courses/{subject}/{catalog_number}/requirements | Get prerequisite, corequisite, antirequisite, and related parsed requirements. |
GET | /api/v1/courses/{subject}/{catalog_number}/source | Get source fields and source references for one course. |
GET | /api/v1/course-credits/{course_credit_id} | Get credit identity and linked listings. |
GET | /api/v1/credentials | Search or list credentials. |
GET | /api/v1/credentials/{credential_id} | Get one credential. |
GET | /api/v1/credentials/{credential_id}/requirements | Get parsed credential requirements. |
GET | /api/v1/credentials/{credential_id}/source | Get source fields and source references for one credential. |
Course codes in responses should use canonical uppercase display form.
The router should accept case-insensitive subject input.
Path inputs avoid embedded spaces by separating subject and catalog_number.
Ambiguous subject or catalog number input should return a structured error or multiple matches.
18.3 Source Reference Endpoints
| Method | Path | Purpose |
|---|---|---|
GET | /api/v1/source-references/{source_reference_id} | Retrieve one source reference. |
GET | /api/v1/source-references | Retrieve a bounded set by ids or filters. |
Source references must be independently addressable.
They should not only appear nested under course and credential endpoints.
This matters for graph views and explanations.
A graph edge may cite a requirement source without loading the whole course record.
A query result may cite one unparsed fragment shared by multiple display elements.
18.4 Student State Endpoints
| Method | Path | Purpose |
|---|---|---|
POST | /api/v1/state | Create anonymous state and return the token once. |
GET | /api/v1/state/current | Load the state associated with the token. |
PUT | /api/v1/state/current | Replace the state associated with the token. |
GET | /api/v1/state/current/export | Export state in a portable format. |
POST | /api/v1/state/current/migration-preview | Preview migration to the active catalog. |
| deferred | /api/v1/state/current/migrate | Future explicit migrated-state acceptance endpoint; not registered in v1. |
DELETE | /api/v1/state/current | Delete state after confirmation. |
GET /api/v1/state/current/export is read-only.
It should not use POST.
DELETE /api/v1/state/current is destructive.
It should require confirmation semantics, such as repeating a confirmation phrase or supplying a one-time confirmation token.
The delete endpoint should not be triggered by ordinary navigation.
18.5 Query Endpoints
| Method | Path | Purpose |
|---|---|---|
POST | /api/v1/query/course-unlock | Evaluate whether selected courses are unlocked. |
POST | /api/v1/query/credential-progress | Evaluate progress toward credentials. |
POST | /api/v1/query/what-if | Compare current state with proposed additions or removals. |
POST | /api/v1/query/course-impact | Explain what selected courses unlock or affect. |
POST | /api/v1/query/credential-gap-summary | Summarize remaining, blocked, conflicting, and unknown requirements for a target credential. |
Query endpoints use POST because request bodies can be complex.
Using POST here does not imply mutation.
The endpoints should still be side-effect free unless the spec explicitly adds saved query history later.
Query responses should state whether they were evaluated against persisted state, supplied state, or both.
18.6 Graph View Endpoints
| Method | Path | Purpose |
|---|---|---|
GET | /api/v1/graph/views | List supported graph view types and limits. |
POST | /api/v1/graph/views/course-neighborhood | Get bounded prerequisite and unlock neighborhood for courses. |
POST | /api/v1/graph/views/course-universe | Get bounded universal course relation graph for the active index. |
POST | /api/v1/graph/views/course-pathways | Get recursive upstream/downstream course pathways around selected courses. |
POST | /api/v1/graph/views/credential-requirements | Get bounded requirement graph for a credential. |
POST | /api/v1/graph/views/unlock-overlay | Get state-dependent unlock statuses for visible graph nodes. |
POST | /api/v1/graph/views/target-relevance | Get relevance of visible courses to selected credentials. |
POST | /api/v1/graph/views/expand-node | Expand one graph node within a named view. |
These endpoints are graph view endpoints.
They are not a general GraphQL endpoint.
They are not a general graph database API.
They are named, typed, bounded projections.
Each view should declare:
view_type;projection_version;index_id;- input bounds;
- output node count;
- output edge count;
- omitted count when limits are reached;
- source references;
- unknowns;
- warnings.
19. Catalog Endpoint Semantics
Catalog endpoints should preserve the distinction between listings and credit identities.
A course listing is a visible catalog listing such as CS 135 or MATH 135.
A course credit identity represents the credit-bearing equivalence class when listings are cross-listed or otherwise share credit.
The frontend may display listings as planets or islands.
The backend must still expose credit identity when it affects requisites, antirequisites, and credential contribution.
Course endpoint responses should include:
- course listing id;
- subject;
- catalog number;
- display code;
- title;
- unit value when known;
- description when available;
- course level;
- linked course credit id;
- cross-listed listings;
- antirequisite summaries;
- prerequisite summaries;
- source references;
- uncertainty summaries.
Credential endpoint responses should include:
- credential id;
- credential type;
- title;
- owning faculty or department when known;
- calendar version;
- source ids;
- requirement source ids;
- source references;
- uncertainty summaries.
Catalog endpoints should not evaluate a student’s personal progress unless explicitly part of a query or overlay endpoint.
20. Source Reference Semantics
Source references are not debug-only metadata.
They are part of the trust model.
Every requirement answer should be traceable to source text or source JSON where available.
Every unparsed requirement should have source references.
Every graph edge derived from a requirement should carry enough reference ids to explain itself.
The official University of Waterloo Academic Calendar remains the authoritative academic source.
UWScrape source references support traceability; they do not replace Waterloo’s calendar or advising channels.
Source reference responses should include:
- source reference id;
- source kind;
- upstream catalog id;
- source pid or item id when available;
- source field path;
- short source snippet when safe and permitted;
- raw HTML fragment hash when applicable;
- source URL when available;
- parser stage that produced the reference;
- build id;
- validation findings linked to the source.
The backend should avoid exposing large raw calendar text or raw HTML fragments by default.
It may provide truncated source snippets, stable source URLs, field paths, hashes, and source ids.
It should provide stable ids and hashes for full local debugging.
21. Query Evaluation Boundary
The backend owns semantic query answers.
The frontend can render, sort, filter, and animate results.
The frontend should not reimplement requirement evaluation as a separate authority.
The first query implementation can be direct Go evaluation over the indexed requirement tree.
It does not need to embed SAT, SMT, CP, or Datalog on day one.
However, API contracts should not prevent those implementations later.
For that reason, query responses should expose semantic results rather than solver internals.
Do expose:
- target ids;
- result status;
- satisfied requirement ids;
- unsatisfied requirement ids;
- unknown requirement ids;
- conflicting facts;
- source references;
- explanation tree;
- state facts used by the evaluation.
Do not expose as stable API:
- internal recursive function names;
- temporary solver variable ids;
- implementation-specific clause ids;
- internal cache keys;
- stack traces.
22. Query Input Modes
Query endpoints should support three input modes.
The first mode evaluates persisted state from the state token.
The second mode evaluates supplied state in the request body.
The third mode evaluates persisted state plus a request-local what-if patch.
The request should make the mode explicit.
Example shape:
{ "state_mode": "persisted", "targets": { "course_codes": ["CS 246"], "credential_ids": [] }}Example supplied-state shape:
{ "state_mode": "supplied", "student_state": { "catalog_version_id": "uw_undergrad_2026_2027", "academic_progress": "2A", "academic_standing": "good_standing", "completed_courses": [ { "course_code": "CS 135", "grade_percent": 82 } ], "planned_courses": [] }, "targets": { "course_codes": ["CS 246"] }}Example what-if shape:
{ "state_mode": "persisted_with_changes", "changes": { "add_completed_courses": [ { "course_code": "MATH 136", "grade_percent": 75 } ] }, "targets": { "course_codes": ["MATH 237"] }}This lets the frontend run interactive exploration without saving every tentative edit.
23. Course Unlock Query
POST /api/v1/query/course-unlock answers whether a course is unlocked.
It should evaluate:
- prerequisite requirements;
- corequisite requirements when modeled;
- minimum grade thresholds;
- academic progress constraints;
- program or credential restrictions when present in course requisites;
- enrollment constraints when present and parsed;
- antirequisite conflicts;
- unparsed requirement fragments.
The endpoint should distinguish:
- course is known unlocked;
- course is missing specific prerequisites;
- course is missing grade information;
- course is blocked by antirequisite conflict;
- course depends on an unparsed condition;
- course does not exist in the loaded catalog;
- state catalog does not match the active index;
- catalog needed for the state is unavailable.
The response should include a compact explanation tree.
The explanation tree should use requirement ids and condition ids from the index.
It should not invent frontend-only ids for semantic nodes.
24. Credential Progress Query
POST /api/v1/query/credential-progress evaluates progress toward one or more credentials.
It should answer:
- which requirements are satisfied;
- which requirements are partially satisfied;
- which requirements are not satisfied;
- which requirements are unknown;
- which completed courses contribute;
- which completed courses do not currently contribute;
- which planned courses may contribute;
- which courses are blocked by prerequisites;
- which requirements have multiple possible fulfillment paths.
The endpoint should avoid pretending there is one canonical path when many equivalent choices remain.
The user experience goal is degree CAD, not credential tunnel vision.
The backend should therefore expose optionality.
It should expose alternative groups and remaining choice sets where the index can support them.
25. What-If Query
POST /api/v1/query/what-if compares a baseline state against proposed changes.
The response should identify:
- courses newly unlocked;
- courses newly blocked;
- credential requirements newly satisfied;
- credential requirements newly made impossible or conflicting;
- unknowns introduced or removed;
- source references that explain each change.
What-if should be deterministic.
Given the same index, baseline state, and changes, it should return the same result.
It should not mutate persisted state unless a later endpoint explicitly saves a plan.
26. Course Impact Query
POST /api/v1/query/course-impact answers what a course influences.
It should include:
- direct unlocks;
- transitive unlocks within a requested depth;
- credentials that require or accept the course;
- credentials that require or accept the course credit identity;
- subjects affected by unlocks;
- prerequisite bottleneck contribution;
- antirequisite conflicts;
- uncertainty summary.
This endpoint supports planet or island sizing.
It should expose raw metrics separately from display recommendations.
Example metric fields:
direct_unlock_count;transitive_unlock_count;unlock_subject_count;unlock_subject_entropy;credential_relevance_count;conflict_count;unknown_dependency_count.
The frontend decides how to map metrics into visual size.
The backend computes the metrics consistently.
27. Credential Gap Summary Query
POST /api/v1/query/credential-gap-summary is the v1 conservative credential evaluation query.
It should summarize what is satisfied, remaining, blocked, conflicting, and unknown for a target credential under the current or supplied state.
It is not a guarantee of graduation.
It is not a complete future-course planning solver.
The first version can be simple.
It can report remaining requirements, likely conflicts, and unknown blockers.
Later versions may add schedule constraints, offering predictions, and solver-backed planning.
Gap summary responses should include:
- current academic status;
- remaining requirement groups;
- blocked requirement groups;
- conflict reasons;
- unknown reasons;
- reasons;
- conflicting courses or credits;
- missing grade data;
- missing academic progress data;
- unparsed requirements.
The response must not claim global feasibility or global infeasibility unless a future bounded planning endpoint supplies a complete encoding for declared bounds.
28. Graph View Design
The global view is a frontend visual experience.
The backend graph view API is the data boundary that feeds it.
The backend should provide graph data in stable semantic units.
It should not provide raw 3D coordinates as academic truth.
The frontend may use 3D geometry, force layout, semantic zoom, and visual grouping.
Those choices are projections.
The backend should preserve:
- node identity;
- node type;
- edge identity when possible;
- edge type;
- source requirement ids;
- source references;
- uncertainty flags;
- state-dependent statuses;
- grouping hints;
- metric values.
The backend may provide layout hints.
Layout hints should be explicitly marked as hints.
The frontend may ignore them.
29. Graph Node Types
Version 1 graph views should use a small typed node vocabulary.
Suggested node types:
| Node type | Meaning |
|---|---|
course_listing | Visible course listing such as MATH 135. |
course_credit | Credit identity shared by linked listings. |
credential | Degree, major, minor, specialization, option, or plan target. |
requirement_group | Parsed non-leaf requirement expression. |
requirement_condition | Parsed leaf requirement condition. |
unparsed_requirement | Requirement fragment preserved as opaque text. |
subject_group | Display grouping by subject. |
level_group | Display grouping by course level. |
Use requirement_group.
Use requirement_condition.
Do not introduce metaphorical names for these internal structures.
The frontend can render these as gates, threads, islands, or planets.
The API should keep the names semantic and plain.
30. Graph Edge Types
Version 1 graph views should use a small typed edge vocabulary.
Suggested edge types:
| Edge type | Meaning |
|---|---|
requires | Target requirement needs source item. |
unlocks | Source course or condition contributes to target availability. |
excludes | Antirequisite or incompatibility relation. |
equivalent_credit | Listings share or map to one credit identity. |
satisfies_requirement | Course or state fact satisfies a requirement condition. |
part_of_requirement | Requirement expression contains subexpression or condition. |
belongs_to_credential | Requirement source belongs to credential. |
belongs_to_subject | Course listing belongs to subject grouping. |
belongs_to_level | Course listing belongs to level grouping. |
The edge direction must be documented per view.
For example, requires can point from dependent course to prerequisite.
unlocks can point from prerequisite to dependent course.
Both may exist in different views if that is useful.
The response must name the edge type so clients do not infer semantics from direction alone.
31. Graph View Response Shape
Graph view responses should still use the common envelope.
The data payload should look like this:
{ "view_type": "course_neighborhood", "projection_version": "1.0.0", "center": { "node_type": "course_listing", "id": "course_listing:CS:246" }, "bounds": { "max_depth": 2, "max_nodes": 250, "max_edges": 600 }, "nodes": [], "edges": [], "groups": [], "metrics": {}, "omitted": { "nodes": 0, "edges": 0, "reason": null }}Nodes should contain typed identifiers and display labels.
Edges should contain typed source and target references.
Groups should be optional.
Metrics should be numeric and documented.
Omitted counts matter because graph views will be bounded for performance.
If limits are reached, the frontend should know the view is incomplete.
32. Graph View Bounds
Every graph view endpoint should enforce server-side bounds.
Bounds prevent accidental enormous responses.
Bounds also force the frontend to request intentional expansions.
Suggested initial bounds:
max_nodes: default 250, hard maximum 2500;max_edges: default 600, hard maximum 7500;max_depth: default 1 or 2 depending on view, hard maximum 4;max_credentials: default 5, hard maximum 25 for target relevance;max_courses: default 100 visible nodes for overlay requests.
The exact values should be tuned after real index measurements.
The response should report when a bound truncated the view.
Truncation should not be treated as an error unless the request exceeds hard limits.
33. Graph Views and State
Some graph views are catalog-only.
Some graph views are state-dependent.
Catalog-only views can be cached by index id and request body hash.
State-dependent views must not be shared across users.
State-dependent views should include:
- state id hash or stable response correlation id only when useful;
- state version;
- state catalog version;
- active index id;
- mismatch status when applicable.
Do not include the raw state token in graph responses.
Do not include the token verifier.
Do not include hidden user notes unless a specific endpoint requests them.
34. Caching Policy
Caching policy should follow data sensitivity.
Catalog endpoints are public for a given index.
They can use Cache-Control: public.
They should use ETag values derived from index_id, route, normalized query, and representation version.
They should support conditional requests later.
State endpoints are private.
They should use Cache-Control: no-store.
Token creation responses should use Cache-Control: no-store.
State-dependent query responses should use Cache-Control: no-store by default.
Catalog-only graph views may be cacheable.
State-dependent graph views should use Cache-Control: no-store.
The backend should not rely on cache headers as the only privacy mechanism.
Cache headers reduce accidental storage.
They do not protect against malicious clients or compromised intermediaries.
35. Error Model
Errors should be stable and actionable.
Every error should have:
code;message;details;request_idinmetawhen available;source_referenceswhen a source fact caused the error;unknownswhen uncertainty caused the failure.
Initial error codes:
| Code | HTTP status | Meaning |
|---|---|---|
bad_request | 400 | Request JSON or query parameters are invalid. |
unauthorized | 401 | Missing or invalid state token. |
forbidden | 403 | Token is valid but not allowed for this operation. |
not_found | 404 | Resource not found. |
conflict | 409 | Request conflicts with current state or catalog version. |
payload_too_large | 413 | Request body exceeds configured limit. |
unsupported_index_schema | 500 or startup failure | Loaded index schema is unsupported. |
catalog_unavailable | 409 or 503 | Required catalog version is not loaded. |
query_unknown | 200 with unknown result | Query is valid but cannot be fully answered. |
query_unknown usually should not be an HTTP error.
It is a successful computation whose semantic result is unknown.
HTTP errors should describe transport or request failure.
Semantic uncertainty should stay inside the response envelope.
36. Catalog Mismatch Handling
Student state is tied to a catalog_version_id.
The active backend index may differ.
When the state catalog matches the active index, queries proceed normally.
When the state catalog differs but both catalogs are available, the backend may evaluate against the recorded catalog or offer comparison.
When the state catalog differs and the recorded catalog is unavailable, the backend should still return the raw state.
Exact requirement evaluation should return a clear unavailable or mismatch result.
The backend should not silently reinterpret the state under the active catalog.
Migration preview should be advisory.
Migration acceptance should be explicit.
The original state should remain recoverable until migration is confirmed.
37. Unparsed Requirement Handling
Unparsed requirements are first-class data.
They are not parser failures to hide.
The backend must surface them in query results when they can affect an answer.
If an unparsed requirement is attached to a course prerequisite, unlock status may become unknown.
If an unparsed requirement is attached to a credential group, credential progress may become unknown or partial.
If an unparsed requirement is attached to a source field that is not relevant to the query, the backend may omit it from the result.
The response should distinguish:
- unknown because source text is unparsed;
- unknown because user state lacks grade data;
- unknown because user state lacks academic progress;
- unknown because catalog version is unavailable;
- unknown because a referenced course or credential could not be resolved.
38. Rate Limiting
The backend should rate limit endpoints that can be expensive or abused.
Rate limiting should apply to:
- state creation;
- token verification attempts;
- query endpoints;
- graph view endpoints;
- migration preview endpoints.
Catalog GET endpoints can have looser limits.
The rate limiter should not require account identities.
It can use IP address, token verifier id, and request type.
Token verification failures should be rate limited more aggressively than successful state reads.
Rate limit responses should not reveal whether a token exists.
39. Request Size Limits
The backend should limit request body size.
This is especially important for supplied-state and graph overlay requests.
Suggested first limits:
- state create or replace: 256 KiB;
- query request: 256 KiB;
- graph view request: 128 KiB;
- migration preview: 256 KiB.
The exact limits should be adjusted after state schema design.
The backend should reject oversized bodies before decoding them fully when possible.
40. Timeouts and Cancellation
The backend should treat request time as a bounded resource.
Go HTTP handlers receive request contexts.
Evaluation and graph view code should accept contexts or explicit cancellation signals.
A client disconnect should cancel unnecessary downstream work.
The HTTP server should configure read, header, and write timeouts.
The exact values should be tuned during implementation.
Suggested first defaults:
- read header timeout: 5 seconds;
- request body read timeout: 15 seconds;
- ordinary catalog request timeout: 5 seconds;
- query request timeout: 15 seconds;
- graph view request timeout: 15 seconds;
- migration preview timeout: 30 seconds.
Longer solver-backed planning should not run as an ordinary synchronous request without a separate job model.
If future degree-CAD planning needs minutes of computation, it should use an explicit asynchronous workflow.
That workflow should have job ids, progress status, cancellation, and result expiry.
Version 1 should keep all query endpoints short-running and bounded.
41. Observability
The backend should log enough to debug correctness without leaking private data.
Log fields should include:
- request id;
- method;
- route pattern;
- status code;
- duration;
- response size when available;
- index id;
- query type;
- graph view type;
- semantic result status counts;
- error code when present.
Logs should not include:
- raw state tokens;
- token verifiers;
- full request bodies by default;
- user notes;
- grades unless explicitly enabled in a local debug mode;
- raw source HTML by default.
Metrics should include:
- request count by route;
- error count by code;
- query latency by query type;
- graph view node and edge counts;
- unknown result count by cause;
- state creation count;
- token verification failure count;
- loaded index metadata.
Unknown counts are especially important.
They tell the project where parser coverage affects user trust.
42. Optional Static Frontend Serving
The backend may serve frontend assets in version 1 only as an optional fallback or local demo mode.
The primary version 1 frontend deployment is the SvelteKit Node server.
This optional fallback keeps local development and small demos possible.
If the backend serves static assets, API routes should remain under /api/v1.
Frontend assets should be served from a configured static directory.
Unknown non-API routes may serve index.html for client-side routing.
Static serving must not become a second frontend architecture with different academic semantics.
The frontend runtime architecture remains authoritative for the SvelteKit, atlas, Rust/WASM, WebGL, WebGPU, and fallback UI decisions.
API routes should never fall through to index.html.
Static asset caching should be separate from API caching.
Hashed static assets can be cached aggressively.
index.html should be cached cautiously.
43. Deployment Model
The first deployment model can be a single process.
Inputs:
- one backend binary;
- one published index directory;
- one writable state database path;
- optional static frontend directory;
- token verifier key file.
The deployment should promote indexes by changing configuration or symlink target, then restarting the backend.
The backend should not hot-swap indexes in version 1.
Hot swapping can be added later if needed.
A restart boundary is clearer and safer while query semantics are still changing.
The state database should be backed up independently from the published index.
Backups should not include logs containing tokens because logs should never contain tokens.
44. Local Development
Local development should be straightforward.
After the scraper creates a published index, a developer should be able to run:
UWSCRAPE_INDEX_DIR=data/published/2026-2027-undergraduate \UWSCRAPE_STATE_DB_PATH=data/runtime/state.sqlite \UWSCRAPE_TOKEN_KEY_PATH=data/runtime/token-key \go run ./cmd/uwscrape-serverThe exact command may change after implementation.
The important property is that local development should use the same index contract as production.
Do not add a separate “dev index format” unless a future ADR accepts it.
45. Testing Strategy
Backend tests should cover four layers.
The first layer is pure evaluation tests.
These tests construct small requirement trees and student states in memory.
They verify statuses and explanations.
The second layer is catalogstore tests.
These tests load small SQLite fixtures produced by the index builder.
They verify lookup, requirement traversal, source reference retrieval, and validation metadata.
The third layer is API handler tests.
These tests run handlers against fixture stores and verify HTTP status, JSON envelope, cache headers, and error codes.
The fourth layer is integration tests.
These tests start the backend with a fixture published index directory and a temporary state database.
They verify startup checks, state token flow, query endpoints, and graph view endpoints.
The backend should include regression tests for:
- unsupported index schema;
- missing or rejected release decision;
- missing source references;
- catalog mismatch;
- unparsed requirement propagation;
- state token redaction;
- graph view bounds;
- request cancellation;
- request timeout behavior;
- state export as
GET; - no runtime writes to index directory.
46. Verification Commands
The exact commands will be defined after code exists.
The intended verification set should look like:
go test ./...go test ./internal/evaluate -run TestCourseUnlockgo test ./internal/api -run TestResponseEnvelopego test ./internal/graphview -run TestBoundsgo run ./cmd/uwscrape-server --check-configThe backend should also have a startup-only validation mode.
That mode should load the configured index and state database path, perform startup checks, and exit.
This will be useful in deployment and release validation.
47. Performance Model
The backend should assume the index is modest but not tiny.
The scraper research observed thousands of course records in the undergraduate catalog.
Credential requirements, source references, and graph projections add more rows.
The backend should optimize common lookups with SQLite indexes.
The backend should avoid scanning all requirement rows per request.
Catalog resources should be pageable.
Search should be bounded.
Graph view responses should be bounded.
Query endpoints should prefetch requirement subtrees in batches where practical.
The first implementation can be simple.
It should still avoid algorithms that are obviously quadratic over the whole catalog for a single course lookup.
48. Versioning
The backend API path begins at /api/v1.
The response envelope includes api_version.
The loaded index has index_schema_version.
Graph view responses include projection_version.
Query explanations should include an explanation_version if the shape becomes complex.
State records should include state_schema_version.
Version names should identify the contract they version.
Do not use one global version number for every part of the system.
Different contracts will evolve at different rates.
49. Compatibility Rules
The backend should maintain compatibility within /api/v1 by adding optional fields.
It should not rename fields within /api/v1.
It should not change status vocabulary within /api/v1 without a compatibility plan.
It should not remove source references from responses that previously had them.
It should not turn an unknown into satisfied unless parser coverage or state input actually supports that change.
It should not silently change the meaning of graph edge types.
Breaking changes should use a new API version or a documented compatibility flag.
50. Security Notes
The state token is sensitive.
Treat it like a password-equivalent access key.
Use HTTPS in any non-local deployment.
Do not transmit tokens through URL parameters.
Do not include tokens in logs.
Do not include tokens in source references.
Do not include tokens in browser-visible error messages.
Do not store raw tokens.
Rate limit token verification failures.
Use constant-time comparison for token verifier checks where applicable.
Use restrictive CORS defaults.
Use SameSite, Secure, and HttpOnly attributes if cookies are introduced.
Avoid storing unnecessary personal data.
Make export and delete flows clear.
51. Privacy Notes
The no-email design reduces personal data collection.
It does not make state anonymous in every practical sense.
Course choices, grades, credentials, and notes can still be sensitive.
The backend should minimize retention.
The backend should allow export.
The backend should allow deletion.
The backend should avoid analytics that reconstruct individual academic histories unless explicitly approved later.
The backend should document what state is stored.
52. Backend and Frontend Contract
The frontend should rely on the backend for:
- requirement evaluation;
- source references;
- unknown propagation;
- graph view semantic nodes and edges;
- importance metrics;
- state persistence;
- catalog mismatch handling.
The frontend should own:
- 3D layout;
- interaction mode;
- visual symbolization;
- animation;
- keyboard shortcuts;
- local unsaved edits;
- semantic zoom;
- accessible presentation.
The boundary should let the frontend feel like Google Earth for degree exploration.
The boundary should not make geometry the source of academic meaning.
53. Remaining and Resolved Decisions
Remaining implementation decision:
- Which SQLite driver should the Go backend use?
- What importance metrics should be precomputed in the index versus computed by the backend?
Resolved by the backend specs and ADRs:
- Version 1 uses one active catalog at runtime.
- Version 1 anonymous state access uses bearer
state_tokenheaders only; HttpOnly cookie sessions are a later extension. - The exact logical state schema is defined in
docs/specifications/student-state-schema-spec/. - The exact graph view schema is defined in
docs/specifications/graph-view-response-spec/. - Static frontend serving is optional; the API runtime works independently.
- Request body limits are defined in
docs/specifications/backend-runtime-operations-spec/. - State deletion confirmation and hard-delete behavior are defined in
docs/decisions/0019-v1-state-deletion-and-export-semantics/.
Version 1 search decision:
Search should begin with ordinary indexed exact, prefix, and bounded text matching over course codes, titles, subjects, and credential titles. SQLite FTS should require a later ADR if it becomes necessary.
54. Implementable Specs
The implementable backend specs are:
- Backend API specification.
- Student state schema specification.
- Graph view response specification.
- Query evaluation semantics specification.
- Backend storage schema specification for state.
- Backend runtime operations specification.
These specs build on this document and the backend ADRs.
They should not reopen the offline scraper/runtime separation unless new evidence requires it.
55. References
- Go
net/httppackage documentation: https://pkg.go.dev/net/http - SQLite URI filename documentation: https://www.sqlite.org/uri.html
- RFC 9110, HTTP Semantics: https://www.rfc-editor.org/rfc/rfc9110
- RFC 9111, HTTP Caching: https://www.rfc-editor.org/rfc/rfc9111
- GraphQL project overview: https://graphql.org/
- OWASP Session Management Cheat Sheet: https://cheatsheetseries.owasp.org/cheatsheets/Session_Management_Cheat_Sheet.html