Backend API Specification
Backend API Specification
Status: Draft v0.1.
Project: UWScrape.
Directory: docs/specs.
Audience: backend implementers, frontend implementers, solver implementers, and API reviewers.
Last reviewed: 2026-05-11.
Primary architecture context: the Go backend runtime layer (locked by ADR 0010).
Related documents:
docs/specifications/student-state-schema-spec/docs/specifications/query-evaluation-semantics-spec/docs/specifications/graph-view-response-spec/docs/specifications/frontend-backend-contract-spec/docs/specifications/frontend-application-shell-spec/docs/specifications/backend-runtime-operations-spec/docs/decisions/0004-student-state-catalog-version-migration-policy/docs/decisions/0006-backend-index-contract/docs/decisions/0009-rest-and-typed-graph-view-endpoints/docs/decisions/0011-anonymous-state-token-policy/docs/decisions/0012-response-envelope-and-uncertainty-semantics/docs/decisions/0015-solver-result-and-explanation-contract/docs/decisions/0019-v1-state-deletion-and-export-semantics/
1. Purpose
This document specifies the version 1 HTTP API contract for the UWScrape backend.
Frontend consumption rules for this API are specified in docs/specifications/frontend-backend-contract-spec/.
The backend API serves one active published index.
The backend API stores and retrieves anonymous student planning state.
The backend API evaluates academic queries through backend-owned semantics.
The backend API provides bounded typed graph views for frontend-heavy screens.
The API root is /api/v1.
The API is Markdown-specified first.
This wave does not add an OpenAPI artifact.
2. Source Anchors
HTTP method semantics follow RFC 9110: RFC 9110.
HTTP caching semantics follow RFC 9111: RFC 9111.
The Go implementation should use net/http concepts such as handlers, request contexts, response writers, server timeouts, and graceful shutdown: Go net/http.
JSON encoding and decoding should follow Go encoding/json behavior unless a later implementation spec chooses another JSON library: Go encoding/json.
State token handling follows the OWASP guidance that session identifiers should be unpredictable, meaningless to the client, protected in transit, and not logged: OWASP Session Management Cheat Sheet.
3. API Principles
The backend owns academic semantics.
The frontend owns presentation, layout, and interaction.
The API exposes semantic results, not solver internals.
The API exposes source references whenever a response depends on source calendar facts.
The API exposes unknowns whenever missing, unparsed, unsupported, unavailable, or incomplete data can affect correctness.
The API uses operational field names.
The API does not use metaphorical names for durable types.
The API does not make graph geometry authoritative.
The API does not run the scraper.
The API does not mutate the published index.
The API does not silently reinterpret student state under a different catalog version.
4. Versioning
All version 1 routes live under /api/v1.
Every response envelope includes meta.api_version.
Every response envelope includes loaded index metadata when the backend has completed startup.
Graph view responses include projection_version.
Query explanation responses include explanation_version.
Student state records include state_schema_version.
Breaking response-shape changes require a new API version or an explicit compatibility mode.
Adding optional fields is allowed within /api/v1.
Renaming fields is not allowed within /api/v1.
Changing the meaning of a status is not allowed within /api/v1.
5. Common Envelope
Every successful response uses:
{ "data": {}, "meta": {}, "warnings": [], "unknowns": [], "source_references": []}Every error response uses:
{ "error": {}, "meta": {}, "warnings": [], "unknowns": [], "source_references": []}data is endpoint-specific.
error is present only for transport, request, authorization, conflict, or server failures.
meta is required.
warnings is required and may be empty.
unknowns is required and may be empty.
source_references is required and may be empty.
Semantic uncertainty should usually be represented as 200 OK with data.status = "unknown" or data.status = "partial".
Semantic uncertainty should not become an HTTP error unless the request itself cannot be processed.
6. Metadata Object
meta contains response-wide metadata.
Required fields when the backend has loaded an index:
| Field | Type | Required | Meaning |
|---|---|---|---|
api_version | string | yes | API version, initially v1. |
request_id | string | yes | Opaque request correlation id. |
index_id | string | yes | Loaded runtime index artifact id. |
index_schema_version | string | yes | Runtime index schema version. |
catalog_version_id | string | yes | Active internal catalog version id. |
catalog_title | string | yes | Human-readable catalog title. |
upstream_catalog_id | string | when known | Waterloo/Kuali catalog id. |
state_schema_version | string | state responses | Student state schema version. |
state_version | integer | state responses | Monotone state version. |
projection_version | string | graph responses | Graph projection contract version. |
explanation_version | string | query responses | Explanation tree contract version. |
Startup errors may have only api_version and request_id.
The backend must not include raw state tokens in meta.
The backend must not include token verifiers in meta.
7. Warning Object
warnings contains non-fatal machine-readable notices.
Shape:
{ "code": "catalog_mismatch", "message": "The state catalog differs from the active backend catalog.", "severity": "warning", "details": {}, "source_reference_ids": []}Required fields:
| Field | Type | Meaning |
|---|---|---|
code | string | Stable warning code. |
message | string | Human-readable summary. |
severity | string | info, warning, or critical. |
details | object | Machine-readable details, empty when not needed. |
source_reference_ids | array | Source references involved in the warning. |
Initial warning codes:
catalog_mismatch;catalog_unavailable;state_migration_available;graph_view_truncated;unparsed_requirement_present;missing_grade;missing_academic_progress;deprecated_field_ignored;request_limit_near.
8. Unknown Object
unknowns contains semantic uncertainty.
Shape:
{ "unknown_reason": "unparsed_requirement", "message": "A requirement fragment could not be parsed into structured conditions.", "requirement_id": "requirement_expression:req_123", "source_reference_ids": ["source_reference:src_456"], "details": {}}Required fields:
| Field | Type | Meaning |
|---|---|---|
unknown_reason | string | Machine-readable reason. |
message | string | Human-readable summary. |
source_reference_ids | array | Source references that explain the unknown, when available. |
details | object | Machine-readable details. |
Optional fields:
requirement_id;requirement_condition_id;course_listing_id;course_credit_id;credential_id;engine_route;state_field.
Initial unknown_reason values:
unparsed_requirement;missing_grade;missing_academic_progress;missing_academic_standing;missing_program_state;catalog_mismatch;catalog_unavailable;unresolved_course_reference;unresolved_credential_reference;unsupported_requirement_condition;engine_incomplete;time_limit_reached;source_conflict_unresolved.
Academic unknown is distinct from engine-native UNKNOWN.
Engine-native UNKNOWN must be mapped through unknown_reason = "engine_incomplete" or a more specific reason.
Timeout must be mapped through time_limit_reached when it affects the result.
9. Source Reference Summary
source_references contains compact references used by the response.
Shape:
{ "source_reference_id": "source_reference:src_456", "source_kind": "course_field", "catalog_version_id": "uw_undergrad_2026_2027", "upstream_catalog_id": "67e557ed6ed2fe2bd3a38956", "source_pid": "abc123", "source_field_path": "requisites.prerequisites", "source_url": "https://uwaterloo.ca/academic-calendar/undergraduate-studies/catalog#/courses/abc123", "snippet": "Prereq: ...", "confidence": "parsed"}The backend may omit long raw HTML.
The backend should include stable ids and enough context to retrieve the full source reference through the source reference endpoint.
The official University of Waterloo Academic Calendar remains the authoritative academic source.
Source references should prefer stable calendar links, source ids, field paths, hashes, and short snippets over broad republication of raw calendar text.
10. Error Object
error describes request, authorization, conflict, or server failure.
Shape:
{ "code": "bad_request", "message": "The request body is invalid JSON.", "details": { "field": "targets.course_codes" }}Initial error codes:
| Code | HTTP status | Meaning |
|---|---|---|
bad_request | 400 | Invalid JSON, invalid parameters, or invalid field values. |
unauthorized | 401 | Missing or invalid state token. |
forbidden | 403 | Valid token cannot perform the operation. |
not_found | 404 | Resource not found. |
method_not_allowed | 405 | Route exists but method is wrong. |
conflict | 409 | Request conflicts with state or catalog version. |
payload_too_large | 413 | Request exceeds configured body limit. |
unsupported_media_type | 415 | Request content type is not supported. |
rate_limited | 429 | Rate limit exceeded. |
unsupported_index_schema | 500 | Backend started with unsupported index schema; normally startup should fail first. |
catalog_unavailable | 409 or 503 | Required catalog version is unavailable. |
internal_error | 500 | Unhandled server failure. |
query_unknown is not an error code.
Valid query requests that cannot be fully answered return 200 OK with academic unknown or partial.
11. Authentication
Version 1 anonymous state access uses only state_token through the Authorization header.
Header format:
Authorization: Bearer <state_token>The token is a password-equivalent access key.
The v1 recommended token generation shape is 32 random bytes from a cryptographically secure random source, encoded as base64url without padding.
This provides 256 bits of randomness, exceeding OWASP’s minimum session identifier entropy guidance.
The backend must not accept state tokens in URL query parameters.
The backend must not include state tokens in response bodies except the one-time creation response.
The backend must not log authorization headers.
The backend must not store raw state tokens.
Browser cookie sessions are not part of version 1.
HttpOnly, Secure, and SameSite cookie policy belongs to a later optional browser-session extension.
12. Caching
Catalog-only responses may be cacheable.
State-dependent responses must use Cache-Control: no-store.
Token creation responses must use Cache-Control: no-store.
State export responses must use Cache-Control: no-store.
State-dependent graph views must use Cache-Control: no-store.
Catalog-only graph views may use public caching when the request contains no state token and no supplied state.
ETags for catalog-only responses should be derived from:
index_id;- route pattern;
- normalized query parameters or request body;
- representation version.
ETags must not include raw state tokens.
13. Content Type
Requests with bodies must use:
Content-Type: application/jsonResponses use:
Content-Type: application/json; charset=utf-8The backend may reject request bodies with unknown or missing content type for POST and PUT.
The backend should not accept form-encoded state tokens.
14. Course Identifier Normalization
Course resource paths use separate {subject} and {catalog_number} path components.
Course code values in query bodies and response payloads may use display form such as CS 246.
Path inputs avoid embedded spaces.
The backend normalizes:
- subject to uppercase;
- catalog number to the canonical stored form.
Response payloads use canonical course_code.
If input is ambiguous, the backend returns 409 conflict or 400 bad_request with candidate matches.
If input does not resolve, the backend returns 404 not_found for resource endpoints or academic unknown for query endpoints when the unresolved reference comes from user state.
15. State Modes for Query Endpoints
Query endpoints accept one of three state modes:
persisted;supplied;persisted_with_changes.
persisted requires a valid Authorization header.
supplied does not require a token unless the endpoint also accesses persisted state.
persisted_with_changes requires a valid token and applies request-local changes without saving them.
Shape:
{ "state_mode": "persisted", "targets": {}}Shape:
{ "state_mode": "supplied", "student_state": {}, "targets": {}}Shape:
{ "state_mode": "persisted_with_changes", "changes": {}, "targets": {}}The backend must return which mode it used in the response.
The backend must not persist changes from query endpoints.
16. System and Index Endpoints
GET /api/v1/health
Returns process health and startup mode.
Data shape:
{ "status": "ok", "degraded": false, "checks": { "index_loaded": true, "state_store_available": true, "release_decision_status": "approved" }}This endpoint should not require authentication.
It should not expose filesystem paths or secrets.
GET /api/v1/index
Returns loaded index metadata.
Data shape:
{ "index_id": "idx_2026_2027_undergrad_20260511_001", "index_schema_version": "0.1.0", "catalog_version_id": "uw_undergrad_2026_2027", "catalog_title": "2026-2027 Undergraduate Studies Academic Calendar", "upstream_catalog_id": "67e557ed6ed2fe2bd3a38956", "release_status": "approved", "release_decision_id": "release_decision:2026_2027_001", "release_decision_reviewed_at": "2026-05-11T00:12:00Z", "build_started_at": "2026-05-11T00:00:00Z", "build_completed_at": "2026-05-11T00:10:00Z", "parser_version": "0.1.0", "validation_summary": { "status": "approved_with_warnings", "finding_count": 12, "warning_count": 12, "error_count": 0 }, "course_count": 1925, "credential_count": 137, "requirement_condition_count": 9182, "source_reference_count": 4798, "graph_projection_course_count": 1925, "graph_projection_edge_count": 5136, "graph_projection_relation_count": 4960, "graph_projection_equivalent_count": 176, "graph_projection_source": "published_artifact", "profile_summary": { "profiles": ["engineering", "math"], "course_subject_count": 37, "course_subject_detail_count": 1873, "course_group_counts": [ { "name": "Faculty of Mathematics", "search_matched": 352, "detail_fetched": 352, "status": "complete" } ], "program_group_counts": [ { "name": "Faculty of Mathematics", "search_matched": 69, "detail_fetched": 69, "status": "complete" } ], "referenced_course_gap_subjects": [ { "subject": "BUS", "missing_course_count": 23, "reference_count": 46, "subject_coverage_status": "complete", "subject_detail_fetched": 28, "example_course_codes": ["BUS 223W", "BUS 227W"] } ] }}V1 serves one active index.
profile_summary is derived from the published validation report. It is
diagnostic release evidence for the loaded artifact, not an academic rule
source. UIs may use it to label index scope and residual referenced-course gaps,
but query correctness still comes from catalog records, requirement conditions,
source references, and evaluator results.
Referenced-course gap rows may include subject coverage status.
subject_coverage_status: "complete" means the subject was fetched for the
active snapshot profile, but the listed referenced course codes were still not
present in parsed course listings.
It does not mean the backend should invent those courses.
Multi-catalog runtime loading is not a v1 requirement.
GET /api/v1/catalog-versions
Returns catalog versions contained in the active index.
V1 normally returns one catalog version.
The array form is still used so the API can tolerate future indexes that contain more than one catalog version.
GET /api/v1/catalog-versions/{catalog_version_id}
Returns one catalog version record from the active index.
If the requested catalog is not present in the active index, return 404 not_found.
17. Catalog Resource Endpoints
GET /api/v1/courses
Searches or lists course listings.
Query parameters:
| Parameter | Type | Meaning |
|---|---|---|
q | string | Optional text query over code and title. |
subject | string | Optional subject filter. |
level | string | Optional level filter such as 100, 200, 300, or 400. |
limit | integer | Page size, default 50, hard maximum from runtime config. |
cursor | string | Opaque pagination cursor. |
Data shape:
{ "items": [ { "course_listing_id": "course_listing:CS:246", "course_credit_id": "course_credit:cs_246", "course_code": "CS 246", "subject": "CS", "catalog_number": "246", "title": "Object-Oriented Software Development", "units_x100": 50, "units_display": "0.50", "level": "200", "uncertainty_summary": { "has_unparsed_requirements": false } } ], "next_cursor": null, "total_count": 1}total_count is the number of indexed course listings matching the same
search and filter parameters before pagination. It is not the number of rows
visible on the current page.
GET /api/v1/courses/{subject}/{catalog_number}
Returns one course listing.
Data includes:
- listing identity;
- credit identity;
- canonical code;
- title;
- description;
- units as exact machine value and display value;
- level;
- cross-listed listings;
- antirequisite summaries;
- prerequisite summary;
- corequisite summary;
- source references;
- uncertainty summary.
GET /api/v1/courses/{subject}/{catalog_number}/requirements
Returns requirement expressions attached to a course.
Data shape:
{ "course_listing_id": "course_listing:CS:246", "course_code": "CS 246", "requirements": [ { "requirement_source_id": "requirement_source:course:CS:246:prerequisites", "requirement_kind": "prerequisite", "requirement_expression_id": "requirement_expression:req_123", "expression": {} } ]}expression uses the indexed requirement expression representation.
Opaque source fragments remain explicit as unparsed_requirement nodes.
GET /api/v1/courses/{subject}/{catalog_number}/source
Returns source fields and source references for one course.
The backend may provide snippets and stable source ids.
The backend should avoid returning large raw HTML by default.
GET /api/v1/course-credits/{course_credit_id}
Returns a credit identity and linked listings.
The response must distinguish credit identity from course listings.
GET /api/v1/credentials
Searches or lists credentials.
Query parameters:
q;credential_type;faculty;department;limit;cursor.
The response uses the same pagination shape as course search:
{ "items": [], "next_cursor": null, "total_count": 0}total_count is the matching credential count before pagination.
GET /api/v1/credentials/{credential_id}
Returns one credential record.
Data includes:
- credential id;
- credential type;
- title;
- owning faculty or department when known;
- catalog version;
- source ids;
- requirement source ids;
- source references;
- uncertainty summary.
GET /api/v1/credentials/{credential_id}/requirements
Returns parsed and opaque credential requirements.
The response must preserve requirement groups and cardinality.
GET /api/v1/credentials/{credential_id}/source
Returns source fields and source references for one credential.
18. Source Reference Endpoints
GET /api/v1/source-references/{source_reference_id}
Returns one source reference.
GET /api/v1/source-references
Returns a bounded set by ids or filters.
Supported query parameters:
ids;source_kind;course_listing_id;credential_id;requirement_id;limit;cursor.
The endpoint must enforce maximum result sizes.
19. Student State Endpoints
POST /api/v1/state
Creates an anonymous state record.
Authentication is not required.
The request body may include an initial student_state.
If omitted, the backend creates an empty state for the active catalog.
Response data includes state_token exactly once.
Shape:
{ "state_id": "state:7f3a", "state_version": 1, "state_token": "base64url-secret-token", "student_state": {}}The response must use Cache-Control: no-store.
The backend must store only a token verifier.
GET /api/v1/state/current
Loads the state associated with the bearer token.
Requires Authorization.
The response never includes state_token.
PUT /api/v1/state/current
Replaces the state associated with the bearer token.
Requires Authorization.
The request body contains a full student_state.
The request body must include positive expected_state_version.
The backend validates schema and increments state_version.
The backend preserves unresolved user-entered references.
GET /api/v1/state/current/export
Exports the state in a portable JSON shape.
Requires Authorization.
This endpoint is read-only.
It uses GET.
The response uses Cache-Control: no-store.
POST /api/v1/state/current/migration-preview
Returns advisory migration information from the state catalog to the active catalog.
Requires Authorization.
V1 is single-active-index.
If the old catalog is not available, return catalog_unavailable in warnings or unknowns and provide only reference-resolution preview that can be computed safely.
Deferred: POST /api/v1/state/current/migrate
Version 1 does not register a migration-acceptance endpoint.
Migration acceptance is a future explicit operation.
The backend must not silently overwrite the original state catalog version.
DELETE /api/v1/state/current
Hard-deletes the state associated with the bearer token.
Requires Authorization.
Requires confirmation.
Recommended request shape:
{ "confirm": "delete my state"}The endpoint should be idempotent after deletion from the client’s perspective.
It must not reveal whether a later token guess matched a deleted state beyond normal unauthorized behavior.
20. Query Endpoints
Query endpoints are side-effect free.
They use POST because request bodies can be complex.
They accept state_mode.
They return academic_result payloads specified in docs/specifications/query-evaluation-semantics-spec/.
Routes:
| Method | Path | Purpose |
|---|---|---|
POST | /api/v1/query/course-unlock | Evaluate whether selected courses are unlocked. |
POST | /api/v1/query/credential-progress | Evaluate progress toward credentials. |
POST | /api/v1/query/what-if | Compare baseline state with proposed changes. |
POST | /api/v1/query/advisory | Return missing requirements, conservative alternatives, and course signals for selected courses or credentials. |
POST | /api/v1/query/course-impact | Explain what selected courses unlock or affect. |
POST | /api/v1/query/credential-gap-summary | Summarize remaining, blocked, conflicting, and unknown requirements for target credentials. |
Every query response must include:
state_mode;target;status;academic_result;source_references;unknowns;warnings.
Catalog mismatch must be explicit.
Catalog unavailable must be explicit.
Advisory Query
POST /api/v1/query/advisory is the course-first product helper endpoint.
It accepts course and credential targets and returns:
- missing requirement rows;
- source-backed candidate courses for missing requirements;
- conservative alternative path hints;
- future-course signals from indexed requirement relations;
- source-reference ids for every surfaced academic claim;
- non-sensitive engine trace summaries.
Advisory output is not a complete planner.
It must not claim global feasibility or global infeasibility.
It may use direct evaluator output, bounded indexed relation closure, and resolver-style alternative grouping.
It must preserve unknowns and conflicts from the underlying query results.
persisted_with_changes is not a v1 advisory state mode.
Use what-if for request-local state changes.
Course Impact Query
POST /api/v1/query/course-impact is a bounded relationship query for the
course-first Canva and Advisory surfaces.
It accepts course targets and returns:
- direct unlocks within the active index;
- transitive unlock paths within explicit request bounds;
- exclusions and equivalent-credit relations;
- credentials that mention the course listing or shared credit identity;
- subject and relation counts;
- source references for surfaced indexed relations.
Course impact is not a satisfaction query.
The response uses status: "not_applicable" and
academic_result.status: "not_applicable" because the payload describes
relationship evidence, not whether a requirement is fulfilled.
The backend computes the indexed relation evidence.
The frontend may use that evidence to size, highlight, or rank visible nodes, but it must not reinterpret impact evidence as course unlock, credential progress, or feasibility.
persisted_with_changes is not a v1 course-impact state mode.
Use what-if when request-local state changes are part of the question.
21. Graph View Endpoints
Graph view endpoints return bounded typed projections.
They are not a general graph query language.
They are not GraphQL in version 1.
Routes:
| Method | Path | Purpose |
|---|---|---|
GET | /api/v1/graph/views | List supported graph view types and limits. |
POST | /api/v1/graph/views/course-neighborhood | Get prerequisite and unlock neighborhood. |
POST | /api/v1/graph/views/course-universe | Get bounded universal course relation graph for the active index. |
POST | /api/v1/graph/views/course-pathways | Get recursive upstream/downstream course pathways around selected courses. |
POST | /api/v1/graph/views/credential-requirements | Get requirement graph for a credential. |
POST | /api/v1/graph/views/unlock-overlay | Get state-dependent unlock statuses for visible nodes. |
POST | /api/v1/graph/views/target-relevance | Get relevance of visible courses to selected credentials. |
POST | /api/v1/graph/views/expand-node | Expand one graph node within a named view. |
The response contract is specified in docs/specifications/graph-view-response-spec/.
State-dependent graph views require Authorization when state_mode = "persisted" or state_mode = "persisted_with_changes".
Catalog-only graph views do not require authentication.
22. Required Header Behavior
All responses should include:
Content-Type;Cache-Control;X-Request-IDor equivalent request id header.
Catalog-only cacheable responses may include:
ETag;Last-Modifiedwhen meaningful.
State-dependent responses must include:
Cache-Control: no-store.
The backend must not reflect Authorization header values.
23. Request Validation
The backend validates:
- HTTP method;
- content type;
- body size;
- JSON syntax;
- required fields;
- enum values;
- identifier shapes;
- state mode rules;
- pagination limits;
- graph view bounds.
Validation errors return 400 bad_request unless a more specific status applies.
The backend should reject oversized bodies before decoding them fully.
Go implementations should use request body limiting with net/http facilities such as MaxBytesReader where applicable: Go net/http.
24. Compatibility Checklist
Within /api/v1, implementation must not:
- remove required envelope fields;
- rename status values;
- treat semantic unknown as an HTTP error;
- accept state tokens in URLs;
- expose token verifiers;
- expose raw solver internals;
- silently reinterpret state under the active catalog;
- silently drop source references from query responses.
25. Test Scenarios
API handler tests should cover:
- every route returns the common envelope;
- errors use the error envelope;
- catalog-only responses use cacheable headers when eligible;
- state-dependent responses use
no-store; - token creation returns token once;
- state load never returns token;
- health and index metadata expose release decision status without filesystem paths;
- course resource routes use separate subject and catalog number path components;
- missing token returns
401; - token in query parameters is rejected;
- query unknown returns
200with academicunknown; - catalog mismatch returns explicit warning or unknown;
- graph view bounds are enforced;
- unsupported method returns
405; - invalid JSON returns
400; - request too large returns
413; - source references appear in course, credential, query, and graph responses when source-backed data is used.
26. References
- Go
net/http: https://pkg.go.dev/net/http - Go
encoding/json: https://pkg.go.dev/encoding/json - RFC 9110, HTTP Semantics: https://www.rfc-editor.org/rfc/rfc9110
- RFC 9111, HTTP Caching: https://www.rfc-editor.org/rfc/rfc9111
- OWASP Session Management Cheat Sheet: https://cheatsheetseries.owasp.org/cheatsheets/Session_Management_Cheat_Sheet.html