Skip to content

Backend API Specification

Backend API Specification

Status: Draft v0.1.

Project: UWScrape.

Directory: docs/specs.

Audience: backend implementers, frontend implementers, solver implementers, and API reviewers.

Last reviewed: 2026-05-11.

Primary architecture context: the Go backend runtime layer (locked by ADR 0010).

Related documents:

  • docs/specifications/student-state-schema-spec/
  • docs/specifications/query-evaluation-semantics-spec/
  • docs/specifications/graph-view-response-spec/
  • docs/specifications/frontend-backend-contract-spec/
  • docs/specifications/frontend-application-shell-spec/
  • docs/specifications/backend-runtime-operations-spec/
  • docs/decisions/0004-student-state-catalog-version-migration-policy/
  • docs/decisions/0006-backend-index-contract/
  • docs/decisions/0009-rest-and-typed-graph-view-endpoints/
  • docs/decisions/0011-anonymous-state-token-policy/
  • docs/decisions/0012-response-envelope-and-uncertainty-semantics/
  • docs/decisions/0015-solver-result-and-explanation-contract/
  • docs/decisions/0019-v1-state-deletion-and-export-semantics/

1. Purpose

This document specifies the version 1 HTTP API contract for the UWScrape backend.

Frontend consumption rules for this API are specified in docs/specifications/frontend-backend-contract-spec/.

The backend API serves one active published index.

The backend API stores and retrieves anonymous student planning state.

The backend API evaluates academic queries through backend-owned semantics.

The backend API provides bounded typed graph views for frontend-heavy screens.

The API root is /api/v1.

The API is Markdown-specified first.

This wave does not add an OpenAPI artifact.

2. Source Anchors

HTTP method semantics follow RFC 9110: RFC 9110.

HTTP caching semantics follow RFC 9111: RFC 9111.

The Go implementation should use net/http concepts such as handlers, request contexts, response writers, server timeouts, and graceful shutdown: Go net/http.

JSON encoding and decoding should follow Go encoding/json behavior unless a later implementation spec chooses another JSON library: Go encoding/json.

State token handling follows the OWASP guidance that session identifiers should be unpredictable, meaningless to the client, protected in transit, and not logged: OWASP Session Management Cheat Sheet.

3. API Principles

The backend owns academic semantics.

The frontend owns presentation, layout, and interaction.

The API exposes semantic results, not solver internals.

The API exposes source references whenever a response depends on source calendar facts.

The API exposes unknowns whenever missing, unparsed, unsupported, unavailable, or incomplete data can affect correctness.

The API uses operational field names.

The API does not use metaphorical names for durable types.

The API does not make graph geometry authoritative.

The API does not run the scraper.

The API does not mutate the published index.

The API does not silently reinterpret student state under a different catalog version.

4. Versioning

All version 1 routes live under /api/v1.

Every response envelope includes meta.api_version.

Every response envelope includes loaded index metadata when the backend has completed startup.

Graph view responses include projection_version.

Query explanation responses include explanation_version.

Student state records include state_schema_version.

Breaking response-shape changes require a new API version or an explicit compatibility mode.

Adding optional fields is allowed within /api/v1.

Renaming fields is not allowed within /api/v1.

Changing the meaning of a status is not allowed within /api/v1.

5. Common Envelope

Every successful response uses:

{
"data": {},
"meta": {},
"warnings": [],
"unknowns": [],
"source_references": []
}

Every error response uses:

{
"error": {},
"meta": {},
"warnings": [],
"unknowns": [],
"source_references": []
}

data is endpoint-specific.

error is present only for transport, request, authorization, conflict, or server failures.

meta is required.

warnings is required and may be empty.

unknowns is required and may be empty.

source_references is required and may be empty.

Semantic uncertainty should usually be represented as 200 OK with data.status = "unknown" or data.status = "partial".

Semantic uncertainty should not become an HTTP error unless the request itself cannot be processed.

6. Metadata Object

meta contains response-wide metadata.

Required fields when the backend has loaded an index:

FieldTypeRequiredMeaning
api_versionstringyesAPI version, initially v1.
request_idstringyesOpaque request correlation id.
index_idstringyesLoaded runtime index artifact id.
index_schema_versionstringyesRuntime index schema version.
catalog_version_idstringyesActive internal catalog version id.
catalog_titlestringyesHuman-readable catalog title.
upstream_catalog_idstringwhen knownWaterloo/Kuali catalog id.
state_schema_versionstringstate responsesStudent state schema version.
state_versionintegerstate responsesMonotone state version.
projection_versionstringgraph responsesGraph projection contract version.
explanation_versionstringquery responsesExplanation tree contract version.

Startup errors may have only api_version and request_id.

The backend must not include raw state tokens in meta.

The backend must not include token verifiers in meta.

7. Warning Object

warnings contains non-fatal machine-readable notices.

Shape:

{
"code": "catalog_mismatch",
"message": "The state catalog differs from the active backend catalog.",
"severity": "warning",
"details": {},
"source_reference_ids": []
}

Required fields:

FieldTypeMeaning
codestringStable warning code.
messagestringHuman-readable summary.
severitystringinfo, warning, or critical.
detailsobjectMachine-readable details, empty when not needed.
source_reference_idsarraySource references involved in the warning.

Initial warning codes:

  • catalog_mismatch;
  • catalog_unavailable;
  • state_migration_available;
  • graph_view_truncated;
  • unparsed_requirement_present;
  • missing_grade;
  • missing_academic_progress;
  • deprecated_field_ignored;
  • request_limit_near.

8. Unknown Object

unknowns contains semantic uncertainty.

Shape:

{
"unknown_reason": "unparsed_requirement",
"message": "A requirement fragment could not be parsed into structured conditions.",
"requirement_id": "requirement_expression:req_123",
"source_reference_ids": ["source_reference:src_456"],
"details": {}
}

Required fields:

FieldTypeMeaning
unknown_reasonstringMachine-readable reason.
messagestringHuman-readable summary.
source_reference_idsarraySource references that explain the unknown, when available.
detailsobjectMachine-readable details.

Optional fields:

  • requirement_id;
  • requirement_condition_id;
  • course_listing_id;
  • course_credit_id;
  • credential_id;
  • engine_route;
  • state_field.

Initial unknown_reason values:

  • unparsed_requirement;
  • missing_grade;
  • missing_academic_progress;
  • missing_academic_standing;
  • missing_program_state;
  • catalog_mismatch;
  • catalog_unavailable;
  • unresolved_course_reference;
  • unresolved_credential_reference;
  • unsupported_requirement_condition;
  • engine_incomplete;
  • time_limit_reached;
  • source_conflict_unresolved.

Academic unknown is distinct from engine-native UNKNOWN.

Engine-native UNKNOWN must be mapped through unknown_reason = "engine_incomplete" or a more specific reason.

Timeout must be mapped through time_limit_reached when it affects the result.

9. Source Reference Summary

source_references contains compact references used by the response.

Shape:

{
"source_reference_id": "source_reference:src_456",
"source_kind": "course_field",
"catalog_version_id": "uw_undergrad_2026_2027",
"upstream_catalog_id": "67e557ed6ed2fe2bd3a38956",
"source_pid": "abc123",
"source_field_path": "requisites.prerequisites",
"source_url": "https://uwaterloo.ca/academic-calendar/undergraduate-studies/catalog#/courses/abc123",
"snippet": "Prereq: ...",
"confidence": "parsed"
}

The backend may omit long raw HTML.

The backend should include stable ids and enough context to retrieve the full source reference through the source reference endpoint.

The official University of Waterloo Academic Calendar remains the authoritative academic source.

Source references should prefer stable calendar links, source ids, field paths, hashes, and short snippets over broad republication of raw calendar text.

10. Error Object

error describes request, authorization, conflict, or server failure.

Shape:

{
"code": "bad_request",
"message": "The request body is invalid JSON.",
"details": {
"field": "targets.course_codes"
}
}

Initial error codes:

CodeHTTP statusMeaning
bad_request400Invalid JSON, invalid parameters, or invalid field values.
unauthorized401Missing or invalid state token.
forbidden403Valid token cannot perform the operation.
not_found404Resource not found.
method_not_allowed405Route exists but method is wrong.
conflict409Request conflicts with state or catalog version.
payload_too_large413Request exceeds configured body limit.
unsupported_media_type415Request content type is not supported.
rate_limited429Rate limit exceeded.
unsupported_index_schema500Backend started with unsupported index schema; normally startup should fail first.
catalog_unavailable409 or 503Required catalog version is unavailable.
internal_error500Unhandled server failure.

query_unknown is not an error code.

Valid query requests that cannot be fully answered return 200 OK with academic unknown or partial.

11. Authentication

Version 1 anonymous state access uses only state_token through the Authorization header.

Header format:

Authorization: Bearer <state_token>

The token is a password-equivalent access key.

The v1 recommended token generation shape is 32 random bytes from a cryptographically secure random source, encoded as base64url without padding.

This provides 256 bits of randomness, exceeding OWASP’s minimum session identifier entropy guidance.

The backend must not accept state tokens in URL query parameters.

The backend must not include state tokens in response bodies except the one-time creation response.

The backend must not log authorization headers.

The backend must not store raw state tokens.

Browser cookie sessions are not part of version 1.

HttpOnly, Secure, and SameSite cookie policy belongs to a later optional browser-session extension.

12. Caching

Catalog-only responses may be cacheable.

State-dependent responses must use Cache-Control: no-store.

Token creation responses must use Cache-Control: no-store.

State export responses must use Cache-Control: no-store.

State-dependent graph views must use Cache-Control: no-store.

Catalog-only graph views may use public caching when the request contains no state token and no supplied state.

ETags for catalog-only responses should be derived from:

  • index_id;
  • route pattern;
  • normalized query parameters or request body;
  • representation version.

ETags must not include raw state tokens.

13. Content Type

Requests with bodies must use:

Content-Type: application/json

Responses use:

Content-Type: application/json; charset=utf-8

The backend may reject request bodies with unknown or missing content type for POST and PUT.

The backend should not accept form-encoded state tokens.

14. Course Identifier Normalization

Course resource paths use separate {subject} and {catalog_number} path components.

Course code values in query bodies and response payloads may use display form such as CS 246.

Path inputs avoid embedded spaces.

The backend normalizes:

  • subject to uppercase;
  • catalog number to the canonical stored form.

Response payloads use canonical course_code.

If input is ambiguous, the backend returns 409 conflict or 400 bad_request with candidate matches.

If input does not resolve, the backend returns 404 not_found for resource endpoints or academic unknown for query endpoints when the unresolved reference comes from user state.

15. State Modes for Query Endpoints

Query endpoints accept one of three state modes:

  • persisted;
  • supplied;
  • persisted_with_changes.

persisted requires a valid Authorization header.

supplied does not require a token unless the endpoint also accesses persisted state.

persisted_with_changes requires a valid token and applies request-local changes without saving them.

Shape:

{
"state_mode": "persisted",
"targets": {}
}

Shape:

{
"state_mode": "supplied",
"student_state": {},
"targets": {}
}

Shape:

{
"state_mode": "persisted_with_changes",
"changes": {},
"targets": {}
}

The backend must return which mode it used in the response.

The backend must not persist changes from query endpoints.

16. System and Index Endpoints

GET /api/v1/health

Returns process health and startup mode.

Data shape:

{
"status": "ok",
"degraded": false,
"checks": {
"index_loaded": true,
"state_store_available": true,
"release_decision_status": "approved"
}
}

This endpoint should not require authentication.

It should not expose filesystem paths or secrets.

GET /api/v1/index

Returns loaded index metadata.

Data shape:

{
"index_id": "idx_2026_2027_undergrad_20260511_001",
"index_schema_version": "0.1.0",
"catalog_version_id": "uw_undergrad_2026_2027",
"catalog_title": "2026-2027 Undergraduate Studies Academic Calendar",
"upstream_catalog_id": "67e557ed6ed2fe2bd3a38956",
"release_status": "approved",
"release_decision_id": "release_decision:2026_2027_001",
"release_decision_reviewed_at": "2026-05-11T00:12:00Z",
"build_started_at": "2026-05-11T00:00:00Z",
"build_completed_at": "2026-05-11T00:10:00Z",
"parser_version": "0.1.0",
"validation_summary": {
"status": "approved_with_warnings",
"finding_count": 12,
"warning_count": 12,
"error_count": 0
},
"course_count": 1925,
"credential_count": 137,
"requirement_condition_count": 9182,
"source_reference_count": 4798,
"graph_projection_course_count": 1925,
"graph_projection_edge_count": 5136,
"graph_projection_relation_count": 4960,
"graph_projection_equivalent_count": 176,
"graph_projection_source": "published_artifact",
"profile_summary": {
"profiles": ["engineering", "math"],
"course_subject_count": 37,
"course_subject_detail_count": 1873,
"course_group_counts": [
{
"name": "Faculty of Mathematics",
"search_matched": 352,
"detail_fetched": 352,
"status": "complete"
}
],
"program_group_counts": [
{
"name": "Faculty of Mathematics",
"search_matched": 69,
"detail_fetched": 69,
"status": "complete"
}
],
"referenced_course_gap_subjects": [
{
"subject": "BUS",
"missing_course_count": 23,
"reference_count": 46,
"subject_coverage_status": "complete",
"subject_detail_fetched": 28,
"example_course_codes": ["BUS 223W", "BUS 227W"]
}
]
}
}

V1 serves one active index.

profile_summary is derived from the published validation report. It is diagnostic release evidence for the loaded artifact, not an academic rule source. UIs may use it to label index scope and residual referenced-course gaps, but query correctness still comes from catalog records, requirement conditions, source references, and evaluator results.

Referenced-course gap rows may include subject coverage status.

subject_coverage_status: "complete" means the subject was fetched for the active snapshot profile, but the listed referenced course codes were still not present in parsed course listings.

It does not mean the backend should invent those courses.

Multi-catalog runtime loading is not a v1 requirement.

GET /api/v1/catalog-versions

Returns catalog versions contained in the active index.

V1 normally returns one catalog version.

The array form is still used so the API can tolerate future indexes that contain more than one catalog version.

GET /api/v1/catalog-versions/{catalog_version_id}

Returns one catalog version record from the active index.

If the requested catalog is not present in the active index, return 404 not_found.

17. Catalog Resource Endpoints

GET /api/v1/courses

Searches or lists course listings.

Query parameters:

ParameterTypeMeaning
qstringOptional text query over code and title.
subjectstringOptional subject filter.
levelstringOptional level filter such as 100, 200, 300, or 400.
limitintegerPage size, default 50, hard maximum from runtime config.
cursorstringOpaque pagination cursor.

Data shape:

{
"items": [
{
"course_listing_id": "course_listing:CS:246",
"course_credit_id": "course_credit:cs_246",
"course_code": "CS 246",
"subject": "CS",
"catalog_number": "246",
"title": "Object-Oriented Software Development",
"units_x100": 50,
"units_display": "0.50",
"level": "200",
"uncertainty_summary": {
"has_unparsed_requirements": false
}
}
],
"next_cursor": null,
"total_count": 1
}

total_count is the number of indexed course listings matching the same search and filter parameters before pagination. It is not the number of rows visible on the current page.

GET /api/v1/courses/{subject}/{catalog_number}

Returns one course listing.

Data includes:

  • listing identity;
  • credit identity;
  • canonical code;
  • title;
  • description;
  • units as exact machine value and display value;
  • level;
  • cross-listed listings;
  • antirequisite summaries;
  • prerequisite summary;
  • corequisite summary;
  • source references;
  • uncertainty summary.

GET /api/v1/courses/{subject}/{catalog_number}/requirements

Returns requirement expressions attached to a course.

Data shape:

{
"course_listing_id": "course_listing:CS:246",
"course_code": "CS 246",
"requirements": [
{
"requirement_source_id": "requirement_source:course:CS:246:prerequisites",
"requirement_kind": "prerequisite",
"requirement_expression_id": "requirement_expression:req_123",
"expression": {}
}
]
}

expression uses the indexed requirement expression representation.

Opaque source fragments remain explicit as unparsed_requirement nodes.

GET /api/v1/courses/{subject}/{catalog_number}/source

Returns source fields and source references for one course.

The backend may provide snippets and stable source ids.

The backend should avoid returning large raw HTML by default.

GET /api/v1/course-credits/{course_credit_id}

Returns a credit identity and linked listings.

The response must distinguish credit identity from course listings.

GET /api/v1/credentials

Searches or lists credentials.

Query parameters:

  • q;
  • credential_type;
  • faculty;
  • department;
  • limit;
  • cursor.

The response uses the same pagination shape as course search:

{
"items": [],
"next_cursor": null,
"total_count": 0
}

total_count is the matching credential count before pagination.

GET /api/v1/credentials/{credential_id}

Returns one credential record.

Data includes:

  • credential id;
  • credential type;
  • title;
  • owning faculty or department when known;
  • catalog version;
  • source ids;
  • requirement source ids;
  • source references;
  • uncertainty summary.

GET /api/v1/credentials/{credential_id}/requirements

Returns parsed and opaque credential requirements.

The response must preserve requirement groups and cardinality.

GET /api/v1/credentials/{credential_id}/source

Returns source fields and source references for one credential.

18. Source Reference Endpoints

GET /api/v1/source-references/{source_reference_id}

Returns one source reference.

GET /api/v1/source-references

Returns a bounded set by ids or filters.

Supported query parameters:

  • ids;
  • source_kind;
  • course_listing_id;
  • credential_id;
  • requirement_id;
  • limit;
  • cursor.

The endpoint must enforce maximum result sizes.

19. Student State Endpoints

POST /api/v1/state

Creates an anonymous state record.

Authentication is not required.

The request body may include an initial student_state.

If omitted, the backend creates an empty state for the active catalog.

Response data includes state_token exactly once.

Shape:

{
"state_id": "state:7f3a",
"state_version": 1,
"state_token": "base64url-secret-token",
"student_state": {}
}

The response must use Cache-Control: no-store.

The backend must store only a token verifier.

GET /api/v1/state/current

Loads the state associated with the bearer token.

Requires Authorization.

The response never includes state_token.

PUT /api/v1/state/current

Replaces the state associated with the bearer token.

Requires Authorization.

The request body contains a full student_state.

The request body must include positive expected_state_version.

The backend validates schema and increments state_version.

The backend preserves unresolved user-entered references.

GET /api/v1/state/current/export

Exports the state in a portable JSON shape.

Requires Authorization.

This endpoint is read-only.

It uses GET.

The response uses Cache-Control: no-store.

POST /api/v1/state/current/migration-preview

Returns advisory migration information from the state catalog to the active catalog.

Requires Authorization.

V1 is single-active-index.

If the old catalog is not available, return catalog_unavailable in warnings or unknowns and provide only reference-resolution preview that can be computed safely.

Deferred: POST /api/v1/state/current/migrate

Version 1 does not register a migration-acceptance endpoint.

Migration acceptance is a future explicit operation.

The backend must not silently overwrite the original state catalog version.

DELETE /api/v1/state/current

Hard-deletes the state associated with the bearer token.

Requires Authorization.

Requires confirmation.

Recommended request shape:

{
"confirm": "delete my state"
}

The endpoint should be idempotent after deletion from the client’s perspective.

It must not reveal whether a later token guess matched a deleted state beyond normal unauthorized behavior.

20. Query Endpoints

Query endpoints are side-effect free.

They use POST because request bodies can be complex.

They accept state_mode.

They return academic_result payloads specified in docs/specifications/query-evaluation-semantics-spec/.

Routes:

MethodPathPurpose
POST/api/v1/query/course-unlockEvaluate whether selected courses are unlocked.
POST/api/v1/query/credential-progressEvaluate progress toward credentials.
POST/api/v1/query/what-ifCompare baseline state with proposed changes.
POST/api/v1/query/advisoryReturn missing requirements, conservative alternatives, and course signals for selected courses or credentials.
POST/api/v1/query/course-impactExplain what selected courses unlock or affect.
POST/api/v1/query/credential-gap-summarySummarize remaining, blocked, conflicting, and unknown requirements for target credentials.

Every query response must include:

  • state_mode;
  • target;
  • status;
  • academic_result;
  • source_references;
  • unknowns;
  • warnings.

Catalog mismatch must be explicit.

Catalog unavailable must be explicit.

Advisory Query

POST /api/v1/query/advisory is the course-first product helper endpoint.

It accepts course and credential targets and returns:

  • missing requirement rows;
  • source-backed candidate courses for missing requirements;
  • conservative alternative path hints;
  • future-course signals from indexed requirement relations;
  • source-reference ids for every surfaced academic claim;
  • non-sensitive engine trace summaries.

Advisory output is not a complete planner.

It must not claim global feasibility or global infeasibility.

It may use direct evaluator output, bounded indexed relation closure, and resolver-style alternative grouping.

It must preserve unknowns and conflicts from the underlying query results.

persisted_with_changes is not a v1 advisory state mode.

Use what-if for request-local state changes.

Course Impact Query

POST /api/v1/query/course-impact is a bounded relationship query for the course-first Canva and Advisory surfaces.

It accepts course targets and returns:

  • direct unlocks within the active index;
  • transitive unlock paths within explicit request bounds;
  • exclusions and equivalent-credit relations;
  • credentials that mention the course listing or shared credit identity;
  • subject and relation counts;
  • source references for surfaced indexed relations.

Course impact is not a satisfaction query.

The response uses status: "not_applicable" and academic_result.status: "not_applicable" because the payload describes relationship evidence, not whether a requirement is fulfilled.

The backend computes the indexed relation evidence.

The frontend may use that evidence to size, highlight, or rank visible nodes, but it must not reinterpret impact evidence as course unlock, credential progress, or feasibility.

persisted_with_changes is not a v1 course-impact state mode.

Use what-if when request-local state changes are part of the question.

21. Graph View Endpoints

Graph view endpoints return bounded typed projections.

They are not a general graph query language.

They are not GraphQL in version 1.

Routes:

MethodPathPurpose
GET/api/v1/graph/viewsList supported graph view types and limits.
POST/api/v1/graph/views/course-neighborhoodGet prerequisite and unlock neighborhood.
POST/api/v1/graph/views/course-universeGet bounded universal course relation graph for the active index.
POST/api/v1/graph/views/course-pathwaysGet recursive upstream/downstream course pathways around selected courses.
POST/api/v1/graph/views/credential-requirementsGet requirement graph for a credential.
POST/api/v1/graph/views/unlock-overlayGet state-dependent unlock statuses for visible nodes.
POST/api/v1/graph/views/target-relevanceGet relevance of visible courses to selected credentials.
POST/api/v1/graph/views/expand-nodeExpand one graph node within a named view.

The response contract is specified in docs/specifications/graph-view-response-spec/.

State-dependent graph views require Authorization when state_mode = "persisted" or state_mode = "persisted_with_changes".

Catalog-only graph views do not require authentication.

22. Required Header Behavior

All responses should include:

  • Content-Type;
  • Cache-Control;
  • X-Request-ID or equivalent request id header.

Catalog-only cacheable responses may include:

  • ETag;
  • Last-Modified when meaningful.

State-dependent responses must include:

  • Cache-Control: no-store.

The backend must not reflect Authorization header values.

23. Request Validation

The backend validates:

  • HTTP method;
  • content type;
  • body size;
  • JSON syntax;
  • required fields;
  • enum values;
  • identifier shapes;
  • state mode rules;
  • pagination limits;
  • graph view bounds.

Validation errors return 400 bad_request unless a more specific status applies.

The backend should reject oversized bodies before decoding them fully.

Go implementations should use request body limiting with net/http facilities such as MaxBytesReader where applicable: Go net/http.

24. Compatibility Checklist

Within /api/v1, implementation must not:

  • remove required envelope fields;
  • rename status values;
  • treat semantic unknown as an HTTP error;
  • accept state tokens in URLs;
  • expose token verifiers;
  • expose raw solver internals;
  • silently reinterpret state under the active catalog;
  • silently drop source references from query responses.

25. Test Scenarios

API handler tests should cover:

  • every route returns the common envelope;
  • errors use the error envelope;
  • catalog-only responses use cacheable headers when eligible;
  • state-dependent responses use no-store;
  • token creation returns token once;
  • state load never returns token;
  • health and index metadata expose release decision status without filesystem paths;
  • course resource routes use separate subject and catalog number path components;
  • missing token returns 401;
  • token in query parameters is rejected;
  • query unknown returns 200 with academic unknown;
  • catalog mismatch returns explicit warning or unknown;
  • graph view bounds are enforced;
  • unsupported method returns 405;
  • invalid JSON returns 400;
  • request too large returns 413;
  • source references appear in course, credential, query, and graph responses when source-backed data is used.

26. References