A Typed Constraint Model for Academic Course Universes

Abstract

This document defines the mathematical foundation for UWScrape.

The project is not merely a catalogue of University of Waterloo courses.

It is a model of academic possibility.

The intended product is an exploratory and planning system in which courses act as visible atoms.

Those atoms compose into credentials, minors, options, specializations, joint honours plans, and degree paths.

The problem looks like a graph problem from far away.

At close range it is a typed finite constraint satisfaction problem.

The visual graph is a projection, not the source of truth.

The central thesis is:

An academic course universe should be represented canonically as a typed constraint satisfaction system over versioned academic objects, then projected into typed factor graphs and directed typed hypergraphs for visualization, querying, and explanation.

This thesis is motivated by Waterloo’s Kuali Curriculum Management rule system.

Waterloo documents course requisites as Kuali rules split into prerequisites, corequisites, and antirequisites.

The documented rule inventory includes grade thresholds, subject-code predicates, unit thresholds, course choices, academic level requirements, program enrollment requirements, milestones, and antirequisite exclusions.

See Waterloo’s list of available requisite rules:

https://uwaterloo.ca/academic-calendar-curriculum-management-resources/kuali-cm/building-rules/how-build-course-requisites/list-available-requisite-rules

Waterloo also documents that plan requirements use the same underlying rule concept as course requisites, though with different fields, categories, and available wording.

See Waterloo’s plan requirement guide:

https://uwaterloo.ca/academic-calendar-curriculum-management-resources/kuali-cm/building-rules/how-build-plan-requirement-lists

Waterloo further documents that cross-listed courses share Quest identification and material content, while subject code, course number, requisites, and ownership may differ.

See Waterloo’s cross-listed course guidance:

https://uwaterloo.ca/academic-calendar-curriculum-management-resources/curriculum-management/all-about-cross-listed-courses

These facts make a plain directed prerequisite graph inadequate.

A plain graph edge cannot faithfully encode “complete any two of this set,” “earn at least 70% in one of these courses,” “be in level 2A or higher,” or “take one 300-level or 400-level course from AMATH or STAT.”

The model below therefore starts from typed academic objects and typed predicates.

It then defines requirements as expressions over those predicates.

It then defines student plans as valuations over the expression variables.

Finally, it defines graph and spatial projections as derived views.

The intended implementation consequence is strict:

No visualization may invent academic truth.

Every planet, island, gate, thread, or timeline marker must be traceable to a source rule, an inference rule, or an explicitly marked opaque fragment.

1. Problem Statement

Students experience academic calendars as prose, tables, nested lists, and course pages.

Administrators experience them as governed records, proposals, effective dates, and rule fields.

Planning software often collapses this structure into a shallow graph.

That collapse is tempting because courses are visually natural nodes.

It is also mathematically wrong.

Consider a target course T.

Suppose T requires one of A or B, and also one of C or D.

A plain graph usually draws A -> T, B -> T, C -> T, and D -> T.

That drawing suggests that A alone is a prerequisite route to T.

But A alone is not sufficient.

A must be paired with either C or D.

The missing object is a logical gate.

Now consider a requirement:

Complete 1.0 units from 300-level or 400-level AMATH or STAT courses.

This is not a finite list at the moment of modeling.

It is a predicate over a changing course set and calendar version.

Flattening it into a list destroys the reason why the courses matched.

The reason matters for diffs, explanation, and future calendar updates.

Now consider cross-listed courses.

Waterloo’s cross-listing guidance says cross-listed courses share a Quest identification number and material content.

But it also says subject code, course number, requisites, and ownership may differ.

Therefore a user-visible course listing and a credit-bearing identity are related but not identical.

This distinction is essential.

The app must let students click on listings such as CS 246 or MATH 235.

The solver must reason about whether two listings represent the same credit identity, equivalent content, an antirequisite conflict, or merely a scheduling association.

The problem is therefore:

Define a canonical model that preserves academic semantics.

Define projections that are useful for visualization.

Define solver encodings that support reachability and degree feasibility.

Define proof obligations that prevent the graph from lying.

2. External Theory Stack

The core formalism begins with finite constraint satisfaction.

Berkeley CS188 describes CSPs using variables, domains, and constraints:

https://inst.eecs.berkeley.edu/~cs188/textbook/csp/csps.html

Waterloo CS486/686 notes similarly define a CSP by variables, domains, and constraints, with assignments that are complete and consistent:

https://cs.uwaterloo.ca/~b246chen/Notes/CS486_686/index.html

The model also borrows the factor graph view.

Kschischang, Frey, and Loeliger define factor graphs as bipartite graphs visualizing factorizations of global functions into local functions.

Their tutorial paper is the canonical reference:

https://doi.org/10.1109/18.910572

For a compact teaching formulation of factor graphs as variable-based models, see Stanford CS221 notes:

https://web.stanford.edu/~shervine/teaching/cs-221/cheatsheet-variables-models/

The visual and interchange representation also uses directed hypergraphs.

Directed hypergraphs generalize directed graphs by allowing a hyperedge to connect a tail set of vertices to a head set of vertices.

Gallo, Longo, Pallottino, and Nguyen study directed hypergraphs and applications in operations research, propositional logic, relational databases, and transportation:

https://doi.org/10.1016/0166-218X(93)90045-P

A concise open definition of directed hypergraphs is also available in XGI documentation:

https://xgi.readthedocs.io/en/latest/api/tutorials/focus_7.html

The solver layer relates to SAT, SMT, CP, Datalog, and package dependency resolution.

SMT-LIB provides standard languages, theories, and benchmark conventions for satisfiability modulo theories:

https://smt-lib.org/

Its Core theory defines Booleans and basic Boolean operators:

https://smt-lib.org/theories-Core.shtml

Souffle documents Datalog as a declarative language built from relation declarations, facts, rules, and directives:

https://souffle-lang.github.io/program

Microsoft’s Z3 fixedpoint guide describes a bottom-up Datalog-style engine over finite relations:

https://microsoft.github.io/z3guide/docs/fixedpoints/basicdatalog/

Package dependency resolution is a useful analogy.

Dart Pub’s PubGrub solver describes version solving as selecting package versions that satisfy constrained dependencies, with NP-hard worst-case behavior:

https://dart.googlesource.com/pub.git/+/f27dcfdb/doc/solver.md

Academic credentials also resemble feature models.

Feature-model literature studies mandatory, optional, alternative, or, requires, excludes, and cardinality constraints.

Benavides, Segura, and Ruiz-Cortes survey automated analysis of feature models:

https://doi.org/10.1016/j.is.2010.01.001

The Universal Variability Language paper summarizes group cardinality constraints such as selecting between n and m child features:

https://doi.org/10.1016/j.jss.2024.112326

Finally, course-prerequisite network research gives a baseline graph-theoretic view.

Stavrinides and Zuev model course-prerequisite networks as directed networks of courses and prerequisite links, then study centrality and topological stratification:

https://doi.org/10.1007/s41109-023-00543-w

This project generalizes that CPN idea.

The CPN is a useful projection.

It is not the complete academic truth model.

3. Design Principles

Principle 1: preserve the source distinction between courses, credentials, rules, and prose.

Principle 2: treat course listings as user-facing objects.

Principle 3: treat credit identities as solver-facing objects.

Principle 4: represent requirements as expressions, not strings.

Principle 5: preserve opaque text when parsing fails.

Principle 6: version every academic object.

Principle 7: make every derived edge explainable.

Principle 8: never let a visual thread imply sufficiency unless the constraint semantics actually imply sufficiency.

Principle 9: keep academic progress separate from academic standing.

Principle 10: allow multiple solver projections, because no single engine is optimal for every query.

These principles constrain the rest of the paper.

4. Core Academic Objects

Let Cal be a finite set of calendar versions.

A calendar version may be a named academic year, a Kuali effective period, or a scraped snapshot with a timestamp.

Let Term be a finite set of academic terms in scope.

Terms may include Fall, Winter, and Spring instances.

Let Progress be the ordered set:

1A < 1B < 2A < 2B < 3A < 3B < 4A < 4B.

This is academic progress.

It is not academic standing.

Let Standing be a separate finite set.

Typical values might include good standing, probation, required to withdraw, or other institution-specific statuses.

The exact names must come from source data or project policy.

Let Subject be a finite set of subject codes in the active index.

Examples include CS, MATH, AMATH, STAT, PMATH, CO, ACTSC, ENGL, BIOL, and so on.

Let CourseListing be the set of visible course entries.

A listing has at least:

listing_id.

subject.

catalog_number.

title.

unit_weight.

calendar_version.

source_record_id.

source_url.

A listing is what a student sees and clicks.

Let CreditIdentity be the set of academic credit identities.

A credit identity is the solver object representing “the same credit-bearing thing” when that identity can be established.

Cross-listed offerings should normally map to one shared credit identity.

But that mapping is a claim with provenance, not a guess.

Let Credential be the set of academic targets.

Credentials include majors, honours plans, minors, options, specializations, diplomas, certificates, and joint honours constructions.

Let Requirement be the set of requirement expressions attached to courses, credentials, or regulations.

Let Milestone be a finite set of non-course achievements.

Waterloo’s Kuali rule list includes milestone rules such as communication or safety milestones.

Milestones must therefore be first-class objects, not comments.

Let ProgramPlan be the set of enrollment plans and declarations.

This is separate from Credential.

A credential is something to satisfy.

A declared plan is a state of the student relative to the institution.

Some requirements are gated by enrollment in a program or plan.

5. Listings, Identities, and Equivalence

Definition 5.1: A course listing is a tuple:

L = (code, title, units, subject, number, version, source).

Definition 5.2: A credit identity is an equivalence-class candidate:

I = (identity_id, listings, version, evidence).

The word “candidate” matters.

The system should distinguish verified identity from inferred identity.

Waterloo’s cross-listing page gives strong evidence when listings share a Quest identification number.

But other equivalence phenomena are weaker.

Examples include renumbering, retired aliases, transfer courses, or manually typed free-form text.

Define relation lists_as(L, I).

This relation states that listing L realizes identity I.

Define relation cross_listed(L1, L2).

This relation states that two listings are documented cross-listings.

Define relation antireq(L1, L2) or antireq(I1, I2).

This relation states a conflict or credit exclusion.

Do not conflate cross-listing with antirequisite.

Cross-listing normally indicates shared identity.

Antirequisite normally indicates conflict or duplication.

The UI may draw both as related.

The solver must treat them differently.

Definition 5.3: A listing projection is a map:

pi_listing : CreditIdentity -> P(CourseListing).

Definition 5.4: A credit projection is a partial map:

pi_credit : CourseListing -> CreditIdentity.

It is partial because a listing may not yet be resolved.

Unresolved listings remain valid nodes.

They simply have weaker solver behavior.

6. Student State

A student state is not a transcript only.

It is a valuation over academic facts.

Definition 6.1: A student state S contains:

completed(L or I, grade, term).

planned(L or I, term).

in_progress(L or I, term).

declared_plan(P, term).

desired_credential(C).

academic_progress(p).

academic_standing(s).

milestone_obtained(m).

calendar_version(c).

The state may also contain user preferences.

Preferences are not academic constraints unless explicitly modeled as soft constraints.

Examples include avoiding morning classes, preferring proofs, or preferring applied courses.

Definition 6.2: A completed course fact has a grade domain:

Grade = {0, 1, ..., 100} union {CR, NCR, P, F, unknown}.

The exact grade domain should be configured institutionally.

Numeric and non-numeric grades must be treated carefully.

A grade threshold cannot be evaluated against CR without a policy.

Definition 6.3: A planned course fact has a term but no final grade.

A planned course may satisfy future sequencing hypotheses.

It may not satisfy a completed-course predicate unless the query explicitly asks for a hypothetical future plan.

Definition 6.4: The state has modes.

actual_mode uses completed and current facts.

planned_mode uses completed, current, and planned facts.

what_if_mode allows hypothetical additions.

The same requirement expression may evaluate differently by mode.

The UI must show which mode produced a result.

7. Requirements as Typed Expressions

The primitive requirement object is an expression over predicates.

Definition 7.1: A predicate is an atomic Boolean-valued condition over a student state and an academic universe.

Examples:

completed(I).

completed_listing(L).

completed_or_concurrent(I).

grade_at_least(I, q).

units_from(S, u).

progress_at_least(p).

progress_exactly(p).

standing_is(s).

enrolled_in(P).

milestone_obtained(m).

not_completed_or_concurrent(I).

course_matches(predicate).

Definition 7.2: Requirement expressions are generated by:

atom(a).

allOf(E1, ..., En).

anyOf(E1, ..., En).

choose(k, [E1, ..., En]).

chooseRange(k_min, k_max, [E1, ..., En]).

unitsAtLeast(u, PoolPredicate).

unitsBetween(u_min, u_max, PoolPredicate).

averageAtLeast(scope, q).

not(E).

opaque(text, source, confidence).

The not constructor should be used sparingly.

Many academic “negative” rules are better represented by specific predicates such as antirequisite or not-open-to-plan.

Definition 7.3: The arity of an expression is the number of distinct atomic predicates that occur in it.

Definition 7.4: A gate is a non-atomic expression node with a known operator.

Gates are visualizable.

They are not courses.

The gate distinction prevents false course-to-course sufficiency claims.

8. Evaluation Semantics

Let U be an academic universe.

Let S be a student state.

Let E be a requirement expression.

Define evaluation:

eval_U(E, S) in {true, false, unknown}.

The third value is needed for opaque rules, missing grades, unresolved identities, or unmodeled institutional discretion.

Definition 8.1: eval(atom(a), S) is the truth value of predicate a under state S.

Definition 8.2: eval(allOf(E1, ..., En), S) is true iff every child is true.

It is false iff at least one child is false.

It is unknown otherwise.

Definition 8.3: eval(anyOf(E1, ..., En), S) is true iff at least one child is true.

It is false iff every child is false.

It is unknown otherwise.

Definition 8.4: eval(choose(k, Es), S) is true iff at least k children are true.

It is false iff even counting unknown children as true cannot reach k.

It is unknown otherwise.

Definition 8.5: eval(chooseRange(a, b, Es), S) uses the interval of possible true counts.

Let t be the number of children known true.

Let u be the number of children whose value is unknown.

The expression is verified true iff a <= t and t + u <= b.

It is verified false iff t > b or t + u < a.

It is unknown otherwise.

For open-world planning, the UI should expose the interval [t, t + u] rather than only the three-valued result.

Definition 8.6: eval(unitsAtLeast(u, P), S) sums credit units of completed identities matching pool predicate P, using credit identity de-duplication.

It returns true iff the sum is at least u.

It returns unknown if any matching or unit value is unresolved and the threshold boundary depends on it.

Definition 8.7: eval(opaque(text), S) is unknown unless a human or later parser attaches a validated semantics.

Opaque is not false.

Opaque is not ignored.

Opaque is a preserved obligation.

9. Academic Progress and Academic Standing

Waterloo Kuali rules include “Student must be in level {level} or higher” and “Student must be in level {level}” wording in the requisite rule list.

The relevant Waterloo source is:

https://uwaterloo.ca/academic-calendar-curriculum-management-resources/kuali-cm/building-rules/how-build-course-requisites/list-available-requisite-rules

This document calls that quantity academic progress.

Academic progress is a position in the program sequence.

Academic standing is a status about academic condition.

The model therefore defines:

progress_at_least(2A).

progress_exactly(2A).

standing_is(good_standing).

standing_not(probation).

These are separate predicates.

This separation avoids a common modeling error.

A student may be in 3A and also not in good academic standing.

The first fact may satisfy a course level prerequisite.

The second fact may affect separate institutional rules.

They are not the same variable.

10. Course Requisites

A course requisite is a requirement expression attached to a course listing or credit identity.

Waterloo separates course requisite fields into prerequisites, corequisites, and antirequisites.

The Kuali course requisite guide states this structure directly:

https://uwaterloo.ca/academic-calendar-curriculum-management-resources/kuali-cm/building-rules/how-build-course-requisites

Definition 10.1: A prerequisite expression pre(T) constrains whether target course T may be taken after prior state.

Definition 10.2: A corequisite expression co(T) constrains whether target course T may be taken concurrently with other courses or program states.

Definition 10.3: An antirequisite expression anti(T) constrains courses, units, milestones, levels, or plans that block or duplicate credit for T.

Do not model antirequisites as negative prerequisites only.

They affect enrollment, credit counting, conflict display, and degree audit differently.

Example 10.4:

pre(T) = allOf(anyOf(completed(A), completed(B)), progress_at_least(2A)).

This means A or B plus level 2A or higher.

The graph projection may show A and B as contributors.

It may not mark A alone as sufficient.

Example 10.5:

pre(T) = anyOf(grade_at_least(Easy, 90), grade_at_least(Medium, 70), grade_at_least(Hard, 60), completed(VeryHard)).

This directly captures grade-differentiated alternatives.

It is not a set of identical prerequisite edges.

The grade threshold is part of the edge semantics.

Example 10.6:

anti(T) = not_completed_or_concurrent(X).

This is a conflict predicate.

Its UI thread should be visually distinct from prerequisite and unlock threads.

11. Plan and Credential Requirements

A credential requirement is an expression attached to a credential.

Waterloo documents plan requirement fields such as Course Requirements with units, Course Requirements without units, Required Courses term-by-term, and Course Lists.

See:

https://uwaterloo.ca/academic-calendar-curriculum-management-resources/kuali-cm/building-rules/how-build-plan-requirement-lists

Definition 11.1: A credential requirement expression req(C) is evaluated against a student state to determine satisfaction of credential C.

Definition 11.2: A course contribution is a proof fragment showing that a completed or planned course participates in a satisfied or partially satisfied requirement expression.

The phrase “counts toward” must be backed by a contribution proof.

Counting toward a credential is not the same as unlocking a course.

Definition 11.3: A requirement slot is a named subexpression used for explanation.

Examples:

core_courses.

advanced_electives.

communication_requirement.

technical_electives_list_2.

Slots are useful for UI and proof reporting.

They must not replace expression semantics.

Example 11.4:

req(CS_minor) = allOf(completed(CS_246), unitsAtLeast(2.0, subject_is(CS) and level_at_least(200)), averageAtLeast(scope=CS_courses, q=60)).

This example is schematic.

It is not a claim about Waterloo’s actual CS minor.

It demonstrates the difference between fixed course predicates, pool predicates, and numeric average constraints.

12. Opaque Text as a First-Class Constraint

Academic calendars contain text that cannot be parsed safely.

Waterloo’s Kuali guides also document free-form text rules for cases that do not fit the structured options.

For course requisites, Waterloo lists standard free-form wording for retired courses, WLU courses, graduate courses, average constraints, and program enrollment text:

https://uwaterloo.ca/academic-calendar-curriculum-management-resources/kuali-cm/building-rules/how-build-course-requisites/list-available-requisite-rules

For plan requirements, Waterloo recommends predetermined rules before free-form text and documents standard free-form phrases:

https://uwaterloo.ca/academic-calendar-curriculum-management-resources/kuali-cm/building-rules/how-build-plan-requirement-lists

Definition 12.1: An opaque constraint is a tuple:

O = (text, source_url, source_record, parser_version, confidence, reason).

Definition 12.2: Opaque constraints evaluate to unknown.

Definition 12.3: Opaque constraints may still participate in provenance, UI display, search, and manual review queues.

Proposition 12.4: Treating unparsed source text as opaque is sound with respect to false-positive avoidance.

Proof sketch:

If the parser cannot establish a formal predicate, replacing the text with true would permit invalid plans.

Replacing it with false would reject potentially valid plans.

Replacing it with unknown preserves the obligation and prevents the system from claiming verified satisfaction.

Therefore unknown is the conservative value.

This is a three-valued soundness claim, not a completeness claim.

13. The Typed Finite CSP

The academic universe induces a CSP-like object.

Classically, a finite CSP is a triple of variables, domains, and constraints.

The Berkeley CS188 and Waterloo CS486 notes both present this basic form:

https://inst.eecs.berkeley.edu/~cs188/textbook/csp/csps.html

https://cs.uwaterloo.ca/~b246chen/Notes/CS486_686/index.html

Definition 13.1: A typed academic CSP is:

A = (X, D, C, tau, P, V).

Here:

X is a finite set of variables.

D maps each variable to a finite domain.

C is a finite set of constraints.

tau maps variables and constraints to types.

P is a provenance map.

V is a calendar/version context.

Variables include:

take_I_t for taking credit identity I in term t.

completed_I_t for completion by term t.

grade_I for grade in identity I.

progress_t for academic progress in term t.

standing_t for academic standing in term t.

declared_P_t for declared program or plan in term t.

satisfies_R for requirement expression node R.

credential_C for credential satisfaction.

Domains include:

Booleans for selection and satisfaction.

Finite grade categories or integer percentage grades.

Finite progress values.

Finite standing values.

Finite term values.

Constraints include:

Prerequisite constraints.

Corequisite constraints.

Antirequisite constraints.

Credit identity de-duplication.

Requirement expression semantics.

Credential satisfaction semantics.

Temporal ordering.

Calendar version compatibility.

Provenance constraints.

The CSP is typed because not every variable may appear in every constraint.

For example, grade_I is not a program plan.

progress_t is not an academic standing.

Type discipline prevents nonsensical expressions.

14. Factor Graph Semantics

A factor graph represents a global function as a product of local functions.

Kschischang, Frey, and Loeliger use factor graphs to visualize how complicated global functions factor into local functions:

https://doi.org/10.1109/18.910572

Definition 14.1: The academic factor graph is a bipartite graph:

F = (X union Phi, E_F).

X is the set of variables.

Phi is the set of factor nodes.

An edge (x, phi) exists iff variable x is in the scope of factor phi.

Definition 14.2: Each hard factor phi is a function from total assignments over its scope to {0, 1}.

Partial evaluation may lift that function to {0, 1, unknown}.

For hard feasibility:

phi(a) = 1 means local constraint satisfied.

phi(a) = 0 means local constraint violated.

phi(a) = unknown means incomplete information.

For optimization:

phi(a) >= 0 may encode preference, risk, or soft cost.

The hard academic truth should remain separable from soft preference.

Definition 14.3: The global hard satisfaction value is the conjunction of all hard factors.

For total two-valued assignments, this conjunction can be written in Boolean arithmetic as:

Sat(a) = product_phi phi(a).

For partial assignments, the lifted value uses three-valued conjunction rather than ordinary multiplication.

This mirrors the standard factor graph formulation used in variable-based AI notes:

https://web.stanford.edu/~shervine/teaching/cs-221/cheatsheet-variables-models/

Academic interpretation:

A student plan is feasible iff every hard academic factor is satisfied.

Unknown factors prevent verified feasibility.

They may still allow hypothetical or advisory feasibility.

15. Directed Typed Hypergraph Semantics

The factor graph is good for solver structure.

It is not always good for visual exploration.

The visual layer often wants directed relationships from prerequisites to gates to courses or credentials.

Directed hypergraphs provide that form.

Definition 15.1: A directed hypergraph is H = (N, E).

Each hyperedge e in E has a tail set T(e) subset N and a head set H(e) subset N.

This follows the standard tail/head-set idea used in directed hypergraph literature:

https://doi.org/10.1016/0166-218X(93)90045-P

Definition 15.2: A typed directed hypergraph is:

H_tau = (N, E, tau_N, tau_E).

tau_N maps nodes to types.

tau_E maps hyperedges to edge types.

Node types include:

CourseListing.

CreditIdentity.

Credential.

RequirementGate.

Predicate.

StudentStateFact.

Milestone.

SubjectRegion.

Edge types include:

requires.

contributes_to.

satisfies.

conflicts_with.

equivalent_to.

belongs_to.

unlocks_when_satisfied.

derived_unlock.

Definition 15.3: Gate nodes preserve logical structure.

For allOf(A, B) -> T, the hypergraph includes an AND gate.

For anyOf(A, B) -> T, the hypergraph includes an OR gate.

For choose(k, Es) -> T, the hypergraph includes a threshold gate.

The gate node has a type and parameter.

It is not an ordinary course.

This makes the visual graph honest.

16. From Requirements to Graphs

Define transformation G(E) from requirement expression E to a typed gate graph.

For an atom a, create a predicate node n_a.

For allOf(E1, ..., En), create child graphs for each Ei, then create a gate node g with operator AND.

Connect each child root to g.

For anyOf(E1, ..., En), create a gate node with operator OR.

For choose(k, Es), create a gate node with operator THRESHOLD and parameter k.

For unitsAtLeast(u, P), create a predicate-pool gate with parameter u and pool predicate P.

For opaque, create an opaque gate node.

The root of G(E) is the final gate or atom node.

Attach that root to the target course or credential via a typed edge.

The transformation is structural recursion.

It terminates because requirement expressions are finite trees or finite DAGs.

If shared subexpressions are interned, the result is a DAG.

If not, the result is a tree.

Both are valid projections.

17. Normal Form

The expression language supports many academic constructs.

For some solvers, it is useful to normalize.

Definition 17.1: Boolean normal form expands allOf, anyOf, and choose into Boolean constraints over auxiliary variables.

For each expression node E, introduce a Boolean variable sat_E.

For atom a, constrain sat_E <-> a.

For allOf(E1, ..., En), constrain:

sat_E <-> (sat_E1 and ... and sat_En).

For anyOf(E1, ..., En), constrain:

sat_E <-> (sat_E1 or ... or sat_En).

For choose(k, Es), constrain:

sat_E <-> (sum_i sat_Ei >= k).

For chooseRange(a, b, Es), constrain:

sat_E <-> (a <= sum_i sat_Ei <= b).

This is a pseudo-Boolean representation.

It can be encoded into SAT with cardinality encodings.

It can be encoded directly into SMT or CP with integer arithmetic.

Definition 17.2: Numeric normal form preserves integer or rational constraints such as units, grades, and averages.

Grades and units are not best flattened into pure Boolean constraints until necessary.

SMT or CP can handle them more naturally.

SMT-LIB documents integer and real theories for such formulas:

https://smt-lib.org/theories.shtml

18. Proof Sketch: Normalization Soundness

Theorem 18.1: For any finite requirement expression E without opaque nodes, under a total two-valued valuation of all atomic and numeric predicates, the auxiliary-variable normal form is satisfiability-equivalent to the direct expression semantics.

Proof sketch:

Proceed by structural induction on E.

Base case:

If E is an atom a, the normal form asserts sat_E <-> a.

Thus sat_E has exactly the truth value of a.

Inductive case for allOf:

Assume each child variable sat_Ei matches child expression Ei.

The normal form asserts sat_E <-> conjunction_i sat_Ei.

By the semantics of allOf, this is exactly the truth condition of E.

Inductive case for anyOf is analogous using disjunction.

Inductive case for choose(k, Es):

Assume child variables match child expressions.

The normal form asserts that at least k child satisfaction variables are true.

This is exactly the semantics of choose(k, Es).

Numeric cases use corresponding arithmetic constraints.

Therefore every satisfying direct evaluation extends to a satisfying auxiliary assignment.

Conversely, every satisfying auxiliary assignment agrees with direct evaluation at the root.

Opaque nodes and partial-state unknowns are excluded because they intentionally do not have complete two-valued semantics.

This is a proof sketch because a full proof must specify the exact grade, unit, and three-valued logic encodings.

19. Proof Sketch: Hypergraph Projection Soundness

Theorem 19.1: The typed gate hypergraph projection preserves requirement dependency structure up to expression isomorphism when every gate node stores its operator and parameters.

Proof sketch:

Construct the projection by structural recursion on expression nodes.

Every atom becomes a typed atom or predicate node.

Every composite expression becomes a typed gate node.

Edges connect exactly the roots of child projections to the parent gate.

The parent gate stores operator and parameters.

Therefore one can reconstruct the expression recursively from the root gate.

The reconstruction is unique up to ordering of children in commutative gates and up to sharing of interned subexpressions.

Since academic allOf, anyOf, and choose are permutation-invariant over child order, this ambiguity is semantic rather than substantive.

Thus the projection is sound for dependency explanation.

It is not necessarily layout-preserving.

It intentionally discards prose order, screen order, and visual geometry unless those are stored separately as presentation metadata.

20. Counterexample: Ordinary Graph Lossiness

Proposition 20.1: A plain directed course graph cannot faithfully represent arbitrary Waterloo-style nested course requirements without additional non-course nodes or edge annotations.

Counterexample:

Let target course T require:

allOf(anyOf(A, B), anyOf(C, D)).

A plain graph with only course vertices and prerequisite edges can include edges:

A -> T.

B -> T.

C -> T.

D -> T.

This graph is identical to the graph produced by:

anyOf(A, B, C, D).

It is also identical to the graph produced by:

allOf(A, B, C, D).

The three expressions have different satisfaction sets.

For example, {A} satisfies the second but not the first or third.

{A, C} satisfies the first but not the third.

{A, B, C, D} satisfies all three.

Since the same graph corresponds to three non-equivalent requirements, the graph is not faithful.

Therefore course-only directed graphs are lossy.

Gate nodes or typed hyperedges are necessary.

21. Solver Encodings

Different query types deserve different encodings.

SAT is strong for pure Boolean feasibility.

SMT is strong when grade thresholds, integer units, and arithmetic averages matter.

CP is strong for scheduling and finite-domain constraints.

Datalog is strong for recursive reachability, explanation, and derived relations.

Package-resolver-style algorithms are strong for conflict explanation and incremental decisions.

The system should not choose exactly one forever.

It should define a canonical model and derive solver-specific encodings.

SAT

SAT variables:

take_I_t.

completed_I_by_t.

sat_R.

SAT clauses encode Boolean structure and conflicts.

Cardinality constraints require encodings such as sequential counters, sorting networks, or native pseudo-Boolean support if the solver provides it.

SAT can answer:

Is there any plan satisfying credential C?

Can course X be included without conflict?

Are target credentials jointly feasible under simplified Boolean constraints?

SAT is less natural for precise average calculations unless translated carefully.

SMT

SMT variables include integers or reals for grades, units, and averages.

SMT-LIB supplies standardized theories such as Core, Ints, Reals, Arrays, and Strings:

https://smt-lib.org/theories.shtml

SMT can answer:

Can a plan satisfy grade threshold alternatives?

Can a student reach the average requirement under known grades and unknown future grades?

Which numeric thresholds block feasibility?

SMT is appropriate when arithmetic is not incidental.

CP

Constraint programming can model term scheduling directly.

Variables can be course-to-term assignments.

Domains can be finite terms.

Constraints can express prerequisites as temporal inequalities.

CP can answer:

Can this course set fit into eight terms?

What is the earliest possible term for target course T?

Which term assignments satisfy all prerequisites and unit caps?

CP is a natural backend for degree-CAD schedule construction.

Datalog

Datalog represents facts and rules over finite relations.

Souffle documents facts, rules, relations, and typed declarations:

https://souffle-lang.github.io/program

Datalog can derive:

direct_unlock(A, T).

contributes_to(course, requirement).

reachable_course(course).

blocked_by(course, reason).

credential_uses_course(credential, course).

Datalog is excellent for explanation graphs and recursive closure.

It is not enough alone for optimization or rich numeric arithmetic.

Package Resolver Analogy

Package managers solve constrained dependency problems.

PubGrub frames version solving as selecting versions that satisfy dependencies, choose one version per package, and avoid selecting unreachable extras:

https://dart.googlesource.com/pub.git/+/f27dcfdb/doc/solver.md

Academic planning has analogous structure.

Choose courses and terms.

Satisfy dependencies.

Avoid conflicts.

Explain why no solution exists.

But academic planning differs in important ways.

Students may take extra courses.

Grades are not known in advance.

Requirements may be choice sets rather than package dependencies.

Calendar versions and transfer credits complicate identity.

The analogy is useful for conflict explanation, not a complete replacement model.

22. Query Semantics

The UI needs queries that match student reasoning.

Query 22.1: What does course X unlock?

Formal answer:

Find target courses T for which X appears in the prerequisite expression dependency set of T.

Classify X’s role as sufficient, contributing, alternative, grade-gated, blocked, or opaque.

Query 22.2: What does course X contribute to?

Formal answer:

Find credentials C and requirement slots R such that X appears in or matches a predicate in req(C).

Return contribution proofs rather than flat strings.

Query 22.3: What changes if I add course X?

Formal answer:

Evaluate selected query family under state S.

Evaluate the same family under state S + completed_or_planned(X).

Return the delta in satisfied gates, reachable courses, blocked courses, credential progress, and conflicts.

Query 22.4: What paths remain possible?

Formal answer:

Given targets and state, solve for feasible completions under term, prerequisite, antirequisite, and credential constraints.

Return one or more witness plans if satisfiable.

Return a minimal or useful unsatisfiable core if not.

Query 22.5: What courses are academically central?

Formal answer:

Compute centrality and importance metrics over derived projections.

Report which projection and metric were used.

Do not call a course “important” without specifying the metric.

23. Unlock Status

Unlock is not a Boolean edge.

It is a status relative to a state and a target.

Definition 23.1: Course X is a sufficient unlock for target T in state S if adding X alone makes pre(T) true.

Definition 23.2: Course X is a contributing unlock if X appears in an unsatisfied prerequisite expression for T and adding X moves at least one gate closer to satisfaction.

Definition 23.3: Course X is an alternative unlock if X satisfies one child of an OR or choose gate where other alternatives exist.

Definition 23.4: Course X is grade-gated if the contribution depends on a grade threshold.

Definition 23.5: Course X is blocked if another hard constraint prevents T even after X is added.

Definition 23.6: Course X is opaque-related if X appears only in opaque text or uncertain parse output.

These statuses become thread styles in the global view.

They also become explanation labels in the quick inspection panel.

24. Credential Satisfaction and Degree CAD

Degree CAD treats academic planning as design under constraints.

The student selects components.

The solver validates assemblies.

The UI explains fit, conflict, and remaining work.

Definition 24.1: A degree design is:

D = (S, Targets, Preferences, Mode).

S is the student state.

Targets is a set of desired credentials or open exploration goals.

Preferences is a set of soft constraints.

Mode is actual, planned, or what-if.

Definition 24.2: A design is academically valid if every hard constraint induced by selected courses and target credentials evaluates true.

Definition 24.3: A design is verified if no required expression evaluates unknown.

Definition 24.4: A design is advisory if all known hard constraints pass but at least one required expression is unknown.

Definition 24.5: A design is invalid if any hard constraint evaluates false.

The UI should use these distinctions.

Valid is not always verified.

Advisory is not a failure.

It is a truthful uncertainty label.

25. Importance Metrics

Planet or island size should not encode a vague importance.

It should encode a named metric.

Let P be a chosen projection.

Let c be a course listing or credit identity.

Metric 25.1: direct unlock count.

DU(c) = |{T : c directly participates in pre(T)}|.

This matches simple out-degree only when requirements are flattened.

Metric 25.2: downstream unlock count.

Reach(c) = |{T : T is reachable downstream from c in projection P}|.

This is close to reachability metrics in CPN research.

Stavrinides and Zuev discuss course importance using CPN degree, PageRank, betweenness, and structural hierarchy:

https://doi.org/10.1007/s41109-023-00543-w

Metric 25.3: unlock diversity.

Let Subj(c) be the multiset of subject codes among downstream targets.

Define subject entropy:

H(c) = - sum_s p_s log(p_s).

Here p_s is the fraction of downstream targets in subject s.

A course that unlocks 20 CS courses has high count but lower diversity.

A course that unlocks CS, STAT, CO, AMATH, and PMATH may have higher bridge value.

Metric 25.4: credential relevance.

CR(c) = |{Credential C : c contributes to req(C)}|.

A direct contribution and a pool contribution should be tagged separately.

Metric 25.5: bottleneck score.

For target set Tg, define:

B(c) = fraction of feasible witness plans for targets in Tg that include c.

Exact computation may be expensive.

Approximation by sampling or minimal path families is acceptable if labeled.

Metric 25.6: bridge score.

Use betweenness centrality on a suitable projection, with subject-aware weighting.

The projection must be declared.

Metric 25.7: optionality score.

A course has low optionality if it appears in many allOf structures or in choose gates with few alternatives.

It has high optionality if it belongs mostly to large pools.

Metric 25.8: conflict risk.

Conflict(c) = number or weighted severity of antirequisite, duplicate-credit, not-open, or unit-cap conflicts induced by c.

Conflict risk should not imply that a course is bad.

It means the course has more planning interactions.

26. Monotonicity and Its Exceptions

Many users intuitively expect that taking more courses can only help.

That is false.

Proposition 26.1: In a model with only positive completion predicates and allOf/anyOf/choose-at-least gates, adding a completed course is monotone.

Proof sketch:

Positive completion predicates can only move from false to true.

allOf, anyOf, and choose-at-least are monotone Boolean operators.

Thus adding a course cannot make a previously true expression false.

Proposition 26.2: Antirequisites break monotonicity.

Proof sketch:

Let course T require not_completed_or_concurrent(A).

In state S without A, the predicate is true.

In state S plus completed A, the predicate is false.

Thus adding A can invalidate T.

Proposition 26.3: Upper-bound unit constraints break monotonicity.

Proof sketch:

If a rule says no more than 1.0 units from a pool, adding another course from that pool can violate the upper bound.

Proposition 26.4: Average constraints are non-monotone with respect to adding courses when the added grade can lower the average.

Proof sketch:

An average above 70 can fall below 70 after adding a low grade in the relevant scope.

Therefore degree-CAD explanations must show both positive and negative deltas.

The UI should not say “more unlocked” without also checking conflicts and averages.

27. Finite Decidability

The finite model is decidable because the active universe is finite.

The number of course listings in a calendar snapshot is finite.

The number of credentials in a snapshot is finite.

The number of terms in a planning horizon is finite.

Grades are represented by a finite or bounded numeric domain.

Therefore the induced CSP has finite domains after horizon selection.

Theorem 27.1: Given a finite calendar snapshot, finite planning horizon, and finite grade domain, academic feasibility is decidable.

Proof sketch:

The model induces a finite set of variables.

Each variable has a finite domain.

Each constraint is computable over a finite product of domains.

Exhaustive enumeration decides satisfiability.

This is not efficient in general.

It is enough for decidability.

Efficient solving requires SAT, SMT, CP, Datalog, heuristics, caching, or decompositions.

This mirrors standard CSP complexity observations.

The Berkeley CS188 material notes that CSP structure enables specialized algorithms beyond black-box search:

https://inst.eecs.berkeley.edu/~cs188/textbook/csp/csps.html

28. Versioning and Provenance

Every academic fact must carry provenance.

Definition 28.1: Provenance record:

Prov = (source_url, source_record_id, source_field, calendar_version, captured_at, parser_version, raw_text, confidence).

Definition 28.2: A derived fact has derivation provenance.

It includes source facts, transformation name, transformation version, and timestamp.

Definition 28.3: A confidence label is one of:

official_structured.

official_free_text_parsed.

official_free_text_opaque.

inferred_cross_listing.

manual_reviewed.

third_party_unverified.

user_entered.

This project should prefer official structured sources.

If third-party data such as UWFlow is ever used, it must be labeled as such and not silently merged with official source truth.

Definition 28.4: Calendar version compatibility.

A student plan must be evaluated under a selected calendar policy.

Some students follow the calendar of entry.

Some rules may use current offerings.

This document does not resolve Waterloo policy.

It requires the architecture to make the policy explicit.

29. Cross-Listed Courses

Waterloo states that cross-listed courses share a Quest identification number and material content, while some fields may differ:

https://uwaterloo.ca/academic-calendar-curriculum-management-resources/curriculum-management/all-about-cross-listed-courses

The model response is:

Represent each visible offering as a CourseListing.

Represent the shared credit-bearing object as a CreditIdentity.

Attach listings to identity by sourced relation.

Let requirements choose whether they reference listings or identities.

If a plan explicitly lists all cross-listed offerings, preserve that list.

If a plan intentionally excludes a cross-listed offering, preserve that exclusion.

Waterloo’s plan requirement guide notes that all cross-listed courses must be listed unless there is intent not to automatically count the other course toward the requirement or average calculation:

https://uwaterloo.ca/academic-calendar-curriculum-management-resources/kuali-cm/building-rules/how-build-plan-requirement-lists

Therefore the model must not automatically infer that every cross-listing counts for every requirement.

It may infer shared identity for conflict and de-duplication.

It must use the requirement’s actual listed objects for counting unless a validated policy says otherwise.

This is subtle and important.

Identity equivalence and requirement satisfaction are different relations.

30. Predicate Pools

Many academic rules refer to sets defined by predicates.

Examples:

Any 300-level or 400-level AMATH course.

At least 1.0 units from selected subject codes.

Complete additional courses at the 400-level or above.

Waterloo’s plan requirement guide documents free-form phrases of this kind:

https://uwaterloo.ca/academic-calendar-curriculum-management-resources/kuali-cm/building-rules/how-build-plan-requirement-lists

Definition 30.1: A pool predicate is a Boolean predicate over course listings or credit identities.

Examples:

subject_in({AMATH, STAT}).

level_between(300, 499).

unit_weight_equals(0.5).

active_in_calendar(c).

owned_by_faculty(Math).

Definition 30.2: A pool is evaluated relative to a calendar version.

The same predicate can match different course sets in different calendars.

Definition 30.3: A pool contribution proof stores both the matched course and the predicate that matched it.

This is essential for explanation.

The UI should be able to say:

“This course counts because it is a 300-level STAT course.”

Not merely:

“This course counts.”

31. Graph Projections

The canonical model can produce multiple projections.

Projection 31.1: Course prerequisite network.

Nodes are courses.

Edges are simplified prerequisite participation.

This projection is useful for broad centrality and curriculum overview.

It is lossy.

Projection 31.2: Gate graph.

Nodes are courses, predicates, gates, and targets.

Edges preserve logical structure.

This projection is faithful for requirement explanation.

Projection 31.3: Factor graph.

Nodes are variables and factors.

Edges connect variables to factors by scope.

This projection is solver-oriented.

Projection 31.4: Credential contribution graph.

Nodes are courses, requirement slots, and credentials.

Edges show contribution proofs.

This projection supports degree-CAD.

Projection 31.5: Conflict graph.

Nodes are courses or identities.

Edges are antirequisites, duplicate-credit exclusions, not-open constraints, or unit-cap interactions.

This projection supports risk visualization.

Projection 31.6: Student-state activation graph.

Nodes and threads are colored by actual, planned, possible, blocked, or unknown status.

This projection powers the interactive atlas.

Every projection must publish its loss policy.

If a projection flattens gates, it must label itself as simplified.

32. Spatial Layout as Projection

The planned UI uses planets, islands, gates, threads, and timelines.

Mathematically, this is a layout function.

Definition 32.1: A spatial layout is:

layout : N -> R^3.

It maps visual nodes to coordinates.

Definition 32.2: A thread rendering is:

thread : E -> VisualCurve.

It maps typed edges or hyperedges to curves, bands, or grouped strands.

Definition 32.3: A semantic zoom policy maps zoom level to projection detail.

At far zoom, subjects and credentials may aggregate.

At middle zoom, course islands and major gates appear.

At close zoom, labels, badges, grade thresholds, and exact gate text appear.

The layout is not truth.

Truth is in the constraint model.

A beautiful layout with false implications is a failure.

A cluttered layout with faithful semantics may still be improved.

But truth must precede geometry.

33. Planet and Island Sizing

A course planet may be sized by different metrics.

The UI must expose the metric.

Default metric recommendation:

Use a weighted combination of downstream unlock count, unlock diversity, and credential relevance.

But allow toggling:

Size by unlock impact.

Size by credential relevance.

Size by bridge score.

Size by bottleneck score.

Size by conflict risk.

Size by user-specific impact.

Definition 33.1: User-specific impact of course c under state S and target set T is:

Impact(c, S, T) = Delta(queries, S, S + c).

The visible planet size can use a scalar summary of this delta.

The inspection window should show the vector components.

This prevents a single number from hiding tradeoffs.

34. Threads

Thread types:

Prerequisite thread.

Unlock thread.

Credential contribution thread.

Choice-group thread.

Conflict thread.

Cross-listing thread.

Equivalent-credit thread.

User-selected path thread.

Planned-course thread.

Completed-course activation thread.

Unknown or opaque thread.

Each thread has:

source.

target.

edge_type.

truth_status.

role.

provenance.

explanation.

Threads may bundle.

Bundling is visual aggregation.

Bundling must not erase gate semantics.

Clicking a bundle should reveal the underlying constraints.

35. Gates as Visual Notation

Visual gates are the key to avoiding false graph semantics.

AND gate:

All incoming child expressions are required.

OR gate:

At least one incoming child expression is required.

K-of-N gate:

At least K incoming child expressions are required.

Range gate:

Between K and M incoming child expressions are required.

Grade gate:

A child course or identity requires a numeric grade threshold.

Unit gate:

A pool of courses must contribute enough units.

Average gate:

A numeric average must meet a threshold.

Progress gate:

Academic progress must equal or exceed a level.

Enrollment gate:

Declared or enrolled plan must match.

Opaque gate:

Source text is preserved but not fully formalized.

Conflict gate:

A condition blocks or duplicates credit.

Gates should be inspectable.

They should never be hidden when their absence would change meaning.

36. Explanation Proofs

Every important UI answer should be backed by a proof object.

Definition 36.1: A proof object is a tree or DAG:

Proof = (claim, status, children, provenance, mode).

Claim examples:

Course X unlocks course T.

Course X contributes to credential C.

Credential C is satisfied.

Course Y is blocked.

Status examples:

verified_true.

verified_false.

unknown.

hypothetical_true.

partial.

Proof children correspond to subrequirements.

For allOf, all children are required.

For anyOf, at least one child is enough.

For choose, the proof records selected witnesses and remaining alternatives.

For numeric gates, the proof records the computed sum or average.

For opaque gates, the proof records source text and review status.

Explanation is not a UI afterthought.

It is the mechanism by which the app earns trust.

37. Soundness of Query Deltas

The “what changed if I add this course?” query is central.

Definition 37.1: Let Q be a finite family of query functions.

Let S be a state.

Let add(c, S) be the state after adding course c as planned or completed depending on mode.

Define:

Delta_Q(c, S) = { (q, q(S), q(add(c, S))) : q in Q and q(S) != q(add(c, S)) }.

Theorem 37.2: If every query in Q is a pure function of the canonical model and state, then Delta_Q is reproducible.

Proof sketch:

Pure functions return the same output for the same input.

The canonical model and state are fixed.

The add operation is deterministic in a chosen mode.

Therefore each before and after query result is deterministic.

Thus the delta is reproducible.

This proof depends on versioning.

If the calendar source changes without changing model version, reproducibility fails.

Therefore source snapshot and parser version are part of the input.

38. Incomplete Information

Students may not know exact grades.

Sources may omit exact offerings.

Calendar rules may contain free text.

The model must support incomplete information.

Definition 38.1: A partial state omits some facts or uses unknown values.

Definition 38.2: A possible completion of a partial state assigns values to unknowns within their domains.

Definition 38.3: A claim is necessarily true if it holds for every possible completion.

Definition 38.4: A claim is possibly true if it holds for at least one possible completion.

Definition 38.5: A claim is necessarily false if it fails for every possible completion.

This gives a modal interpretation of uncertainty.

The UI can say:

“Definitely satisfied.”

“Possibly satisfied depending on grade.”

“Not satisfied under any known completion.”

This is more useful than a single unknown label.

SMT and CP encodings can support possible/necessary questions over bounded domains.

39. Calendar Scraping Implications

Although this document is not a scraper architecture, the math imposes scraper requirements.

The scraper must preserve source records.

It must preserve rule hierarchy.

It must preserve labels such as Complete ALL and Complete 1.

It must preserve groups and subgroups.

It must preserve field identity:

Prerequisite.

Corequisite.

Antirequisite.

Course Requirements.

Course Lists.

Additional Requirements.

It must preserve source order as presentation metadata.

It must preserve cross-listed courses.

It must preserve free text.

It must preserve effective dates and calendar versions.

It must not flatten everything to text.

It must not flatten everything to course edges.

The canonical model is only as good as the captured structure.

40. UWFlow and Third-Party Indexes

Third-party indexes may be useful.

They may provide prior art, comparison data, or coverage checks.

They should not silently become canonical truth.

If a third-party index says course A unlocks course B, the system must store:

source = third_party.

license status.

capture time.

comparison to official source.

confidence.

The model should support differential analysis:

Official source says X.

Third-party source says Y.

Parser inferred Z.

These disagreements are valuable.

They are not reasons to merge facts without provenance.

41. Feature Models and Credentials

Feature models provide a useful analogy.

A software product line has features.

A valid product is a legal feature selection.

Credentials similarly define legal selections of courses, units, milestones, and averages.

Feature-model analysis studies mandatory, optional, alternative, or, requires, excludes, and cardinality constraints.

See Benavides et al.:

https://doi.org/10.1016/j.is.2010.01.001

And UVL’s discussion of cardinality groups:

https://doi.org/10.1016/j.jss.2024.112326

The analogy is strong for:

Choice groups.

Requires constraints.

Excludes constraints.

Cardinality constraints.

Automated consistency analysis.

The analogy is weaker for:

Temporal sequencing.

Grades.

Academic progress.

Calendar-version policy.

Transfer credits.

Student-specific transcript history.

Therefore feature models inform the credential layer but do not replace the academic CSP.

42. Course-Prerequisite Networks as Baseline Projection

Course-prerequisite networks are a valid and useful projection.

Stavrinides and Zuev define CPNs with courses as nodes and prerequisite relationships as directed links.

They study centrality, topological stratification, and interdependence:

https://doi.org/10.1007/s41109-023-00543-w

This project generalizes CPNs in three ways.

First, prerequisites are expressions, not just links.

Second, credentials are constraints, not just labels.

Third, student state activates or deactivates parts of the universe.

The CPN remains useful for:

High-level exploration.

Introductory vs advanced course structure.

Centrality metrics.

Subject interdependence.

Visual summaries.

But CPNs alone are insufficient for:

Grade-gated alternatives.

K-of-N requirements.

Credential degree audits.

Antirequisite conflict reasoning.

Cross-listed identity reasoning.

Opaque-text confidence.

Thus CPN is a view, not the database.

43. Formal Object Summary

The academic universe is:

U = (Obj, Rel, Expr, Prov, Ver).

Obj contains typed academic objects.

Rel contains typed relations.

Expr contains requirement expressions.

Prov contains provenance records.

Ver contains version contexts.

The student state is:

S = (Facts, Mode, CalendarPolicy).

The canonical evaluator is:

eval_U(E, S) -> Truth.

The solver compiler is:

compile_backend(U, S, Query, Backend) -> SolverInstance.

The projection compiler is:

project(U, S, ProjectionSpec) -> VisualGraph.

The explanation engine is:

explain(U, S, Claim) -> Proof.

These five functions define the architecture boundary.

If an implementation preserves them, UI experiments can vary without corrupting truth.

44. Implementation Invariants

Invariant 44.1: Every visible claim has provenance.

Invariant 44.2: Every course node resolves to a listing, not merely a string.

Invariant 44.3: Every listing may optionally resolve to a credit identity.

Invariant 44.4: Every requirement has either structured semantics or opaque semantics.

Invariant 44.5: Every graph edge has a type.

Invariant 44.6: Every simplified edge knows which gate or expression produced it.

Invariant 44.7: Every solver result records model version and source snapshot.

Invariant 44.8: Every uncertainty is visible at inspection depth.

Invariant 44.9: Academic progress and academic standing remain separate variables.

Invariant 44.10: Hard academic constraints and soft user preferences remain separate.

These invariants should become tests once implementation begins.

45. Example: Nested Prerequisite

Suppose:

Course T requires:

One of A or B.

One of C or D.

Academic progress 2A or higher.

Formal expression:

pre(T) = allOf(anyOf(completed(A), completed(B)), anyOf(completed(C), completed(D)), progress_at_least(2A)).

If a student completed A and C and is in 2A, T is unlocked.

If a student completed A only, T is partially unlocked.

If a student completed B and D but is in 1B, T is blocked by progress.

If a student completed A and D but has an antirequisite conflict with T, T is blocked by conflict.

A plain graph cannot express these distinctions.

The typed gate graph can.

The solver can.

The UI can show them as threads and gates.

46. Example: Grade Alternatives

Suppose:

Course T requires:

90% or higher in E.

Or 70% or higher in M.

Or 60% or higher in H.

Or completion of V.

Formal expression:

pre(T) = anyOf(grade_at_least(E, 90), grade_at_least(M, 70), grade_at_least(H, 60), completed(V)).

If the student has 85 in E, that branch is false.

If the student has 72 in M, the whole expression is true.

If the student has V complete, grade is irrelevant for that branch.

The edge from E to T is grade-gated.

The edge from V to T is completion-gated.

They are not semantically identical.

The UI should display that difference.

47. Example: Predicate Pool

Suppose:

Credential C requires 1.0 units from 300-level or 400-level AMATH or STAT courses.

Formal expression:

req(C) = unitsAtLeast(1.0, subject_in({AMATH, STAT}) and level_between(300, 499)).

If each course is 0.5 units, two matching courses satisfy the gate.

If a course is cross-listed AMATH/PMATH, credit identity and listed subject must be handled carefully.

If a course is STAT 231, it fails level predicate.

If a course is AMATH 350, it matches.

The proof should say why it matches.

If calendar version changes and AMATH 350 is retired, the predicate pool changes.

The source version matters.

48. Example: Antirequisite Conflict

Suppose course X is an antirequisite of Y.

If the student completed X, Y may not be open for credit.

If the student plans Y after X, the degree-CAD validator should mark conflict.

If the student completed both historically, the audit may need institutional policy.

The model records the conflict.

It does not invent remediation.

A conflict thread should be visually distinct from a missing prerequisite thread.

The former means “this combination has a negative rule.”

The latter means “more preparation is needed.”

These are different planning experiences.

49. Example: Cross-Listed Identity

Suppose L1 and L2 are cross-listed.

They share a credit identity I.

A requirement lists L1 only.

Does L2 count?

The model answer is not automatic.

If the requirement source lists both, both count directly.

If the requirement source lists only L1, the system checks policy and provenance.

Waterloo’s plan guide explicitly says cross-listed courses must be listed unless the intent is not to count the other offering automatically:

https://uwaterloo.ca/academic-calendar-curriculum-management-resources/kuali-cm/building-rules/how-build-plan-requirement-lists

Therefore a robust model stores:

L1 and L2 as listings.

I as shared identity.

The requirement’s actual listing references.

Any policy-derived equivalence separately.

This avoids both undercounting and overcounting.

50. Data Contracts Implied by the Math

Course record contract:

id.

code.

title.

units.

subject.

number.

level.

calendar_version.

cross_listings.

prerequisite_expr.

corequisite_expr.

antirequisite_expr.

provenance.

Credential record contract:

id.

name.

credential_type.

owning_unit.

calendar_version.

requirement_expr.

requirement_slots.

provenance.

Expression contract:

expr_id.

op.

params.

children.

source_text.

source_field.

confidence.

provenance.

Student state contract:

completed.

planned.

in_progress.

grades.

declared_plans.

desired_credentials.

academic_progress.

academic_standing.

milestones.

mode.

These contracts are conceptual.

Specific serialization can be JSON, relational tables, graph storage, or typed code structures.

The math requires the distinctions, not a particular database.

51. Proof Review Checklist

Before implementation, every formal transformation should answer:

What is the input type?

What is the output type?

What information is preserved?

What information is discarded?

Is the transformation sound?

Is it complete?

What source uncertainty can produce unknown?

What calendar version is assumed?

Does cross-listing change identity, satisfaction, or both?

Does antirequisite affect enrollment, credit counting, or both?

Does the query operate in actual, planned, or what-if mode?

This checklist should be embedded into engineering review.

It is how the project avoids visually persuasive wrong answers.

52. Risks and Open Mathematical Questions

Risk 52.1: Calendar prose may exceed the expression language.

Mitigation:

Preserve opaque constraints and review them.

Risk 52.2: Cross-listing policy may differ between credit identity and requirement counting.

Mitigation:

Keep identity, listing, and requirement references separate.

Risk 52.3: Average calculations may have institutional details not captured initially.

Mitigation:

Treat averages as typed constraints with explicit scope and policy.

Risk 52.4: The UI may overstate simplified graph projections.

Mitigation:

Require every projection to expose its loss policy.

Risk 52.5: Solver results may be slow for broad degree planning.

Mitigation:

Use multiple backends and incremental query design.

Risk 52.6: Students may treat advisory output as official advice.

Mitigation:

A single calm information page and source/provenance links are enough.

Do not bury the UI in disclaimers.

Build trust through accuracy and source tracing.

Open question 52.7:

What is the best minimal unsat-core explanation for academic planning?

Open question 52.8:

How should historical calendar rights be modeled for students who entered in different years?

Open question 52.9:

How should transfer credits and substitutions become typed facts?

Open question 52.10:

How much of the global view should use exact solver state versus approximate visual centrality?

These are future architecture questions.

They do not invalidate the core model.

53. Conclusion

The right foundation for UWScrape is not a course table.

It is not a prerequisite graph.

It is not a scraper output dump.

It is a typed academic constraint system.

Courses remain the visible atoms.

Credit identities provide solver-level de-duplication and equivalence.

Requirement expressions preserve the actual logic of academic rules.

Student states activate those expressions.

Factor graphs and hypergraphs provide faithful projections.

SAT, SMT, CP, Datalog, and package-resolver techniques provide complementary solver strategies.

The global planet/island UI should be a spatial projection over this model.

Its beauty should come from the structure of the academic universe.

Its trust should come from provenance and explainable proof objects.

The central rule is simple:

If the source says it, preserve it.

If the parser understands it, formalize it.

If the solver derives it, explain it.

If the model does not know, say unknown.

That discipline is what makes a degree customizer possible.

References

University of Waterloo, “How to build course requisites.”

https://uwaterloo.ca/academic-calendar-curriculum-management-resources/kuali-cm/building-rules/how-build-course-requisites

University of Waterloo, “List of available requisite rules.”

https://uwaterloo.ca/academic-calendar-curriculum-management-resources/kuali-cm/building-rules/how-build-course-requisites/list-available-requisite-rules

University of Waterloo, “How to build plan requirement lists.”

https://uwaterloo.ca/academic-calendar-curriculum-management-resources/kuali-cm/building-rules/how-build-plan-requirement-lists

University of Waterloo, “All about cross-listed courses.”

https://uwaterloo.ca/academic-calendar-curriculum-management-resources/curriculum-management/all-about-cross-listed-courses

University of Waterloo, “Courses form field details.”

https://uwaterloo.ca/academic-calendar-curriculum-management-resources/kuali-cm/courses-form-field-details

UC Berkeley CS188, “Constraint Satisfaction Problems.”

https://inst.eecs.berkeley.edu/~cs188/textbook/csp/csps.html

Waterloo CS486/686 Notes, “Constraint Satisfaction.”

https://cs.uwaterloo.ca/~b246chen/Notes/CS486_686/index.html

Stanford CS221, “Variables-based Models.”

https://web.stanford.edu/~shervine/teaching/cs-221/cheatsheet-variables-models/

Kschischang, Frey, and Loeliger, “Factor graphs and the sum-product algorithm.”

https://doi.org/10.1109/18.910572

Gallo, Longo, Pallottino, and Nguyen, “Directed hypergraphs and applications.”

https://doi.org/10.1016/0166-218X(93)90045-P

XGI documentation, “Directed Hypergraphs.”

https://xgi.readthedocs.io/en/latest/api/tutorials/focus_7.html

SMT-LIB, “The Satisfiability Modulo Theories Library.”

https://smt-lib.org/

SMT-LIB, “Core theory.”

https://smt-lib.org/theories-Core.shtml

SMT-LIB, “Theories.”

https://smt-lib.org/theories.shtml

Souffle, “Program.”

https://souffle-lang.github.io/program

Souffle, “Facts.”

https://souffle-lang.github.io/facts

Souffle, “Relations.”

https://souffle-lang.github.io/relations

Z3 Guide, “Basic Datalog.”

https://microsoft.github.io/z3guide/docs/fixedpoints/basicdatalog/

Dart Pub, “PubGrub solver overview.”

https://dart.googlesource.com/pub.git/+/f27dcfdb/doc/solver.md

Benavides, Segura, and Ruiz-Cortes, “Automated analysis of feature models 20 years later.”

https://doi.org/10.1016/j.is.2010.01.001

UVL, “Feature modelling with the Universal Variability Language.”

https://doi.org/10.1016/j.jss.2024.112326

Stavrinides and Zuev, “Course-prerequisite networks for analyzing and understanding academic curricula.”

https://doi.org/10.1007/s41109-023-00543-w