Skip to content

ADR 0008: Manual Parser Patch Governance

ADR 0008: Manual Parser Patch Governance

Status: Accepted for architectural direction.

Date: 2026-05-11.

Context

The scraper parser will not perfectly understand every Waterloo requirement on day one.

Some source fragments will remain unparsed.

Some source fragments will be parsed with low confidence.

Some source fragments will be important enough to correct manually before parser automation catches up.

Manual patches were accepted as a first-class design feature in the scraper specification.

Manual patches can improve data quality.

Manual patches can also introduce hidden assumptions if unmanaged.

Manual patches must be auditable.

Manual patches must be deterministic.

Manual patches must preserve original source evidence.

Manual patches must not become invisible product logic.

Decision

Manual parser patches will be governed as reviewed source artifacts.

Patch files should be stored in source control.

Patch files should live under a dedicated patch directory.

The initial patch directory should be data/patches unless implementation chooses a clearer path.

Each patch must have a stable id.

Each patch must have an action.

Each patch must have a source target.

Each patch must have a reason.

Each patch must have reviewer notes.

Each patch must reference source evidence.

Each patch must be reported when applied.

Each patch must be reported when not applied.

Patch application output must be part of validation and release review.

Patches must never mutate raw snapshots.

Patches must never delete source references.

Patches must never hide the original source text.

Patches may replace parsed interpretation.

Patches may add a missing interpretation.

Patches may mark a fragment intentionally ignored.

Patches may resolve a source link.

Patches may set a credit identity group.

Patches may mark a validation finding as reviewed.

Patch misses for required patches block release.

Patch misses for optional patches warn.

Patch Required Fields

A patch must include id.

A patch must include action.

A patch must include reason.

A patch must include item_type.

A patch must include either item_pid or item_id.

A patch must include field.

A patch should include rule_id when available.

A patch should include dom_path when rule id is unavailable.

A patch should include source_text_hash when practical.

A patch should include replacement when the action changes semantics.

A patch should include reviewer_notes.

A patch should include added_at.

A patch should include last_reviewed_at.

A patch should include required.

A patch should include expires_after_catalog when known.

A patch should include source_reference_hint.

Patch Actions

replace_requirement replaces one parsed or unparsed requirement interpretation.

add_requirement adds an interpretation for a source fragment.

ignore_fragment marks a fragment as intentionally non-semantic for evaluation.

link_course resolves a course reference.

link_credential resolves a credential reference.

set_course_credit_group sets listing-to-credit grouping.

set_confidence changes parser confidence with justification.

mark_finding_reviewed marks a validation finding as accepted.

split_requirement splits a source fragment into multiple requirement expressions.

annotate_requirement adds review notes without changing semantics.

Each action must have a documented schema.

Unknown actions must block parsing.

Action schemas must be validated before any patch is applied.

Patch Matching

Patch matching should be deterministic.

Patch matching should use item type.

Patch matching should use item pid or item id.

Patch matching should use source field name.

Patch matching should use rule id when available.

Patch matching should use DOM path when rule id is unavailable.

Patch matching should use source text hash when available.

Patch matching should avoid matching only by display text.

Patch matching should detect multiple matches.

Multiple matches should block required patches.

Multiple matches should warn for optional patches.

No match should block required patches.

No match should warn for optional patches.

Patch matching should report exact matched source references.

Patch matching should detect stale patches.

Review Policy

Every semantic patch should be reviewed before publication.

Review should confirm source evidence.

Review should confirm target specificity.

Review should confirm replacement semantics.

Review should confirm action schema.

Review should confirm no unrelated source fragments are affected.

Review should confirm whether the patch should be required.

Review should confirm whether the patch is catalog-specific.

Review should confirm whether the parser should eventually learn the pattern.

Review notes should explain why automation was insufficient.

Review notes should be concise but useful.

A patch may be added quickly during early exploration.

A patch must still appear in release review before publication.

Patch Lifecycle

A patch starts as proposed.

A patch becomes active when accepted into the patch file.

An active patch is applied during parse.

An active patch is reported during parse.

An active patch is reviewed during validation.

An active patch is considered during release gate review.

A patch becomes stale when its target no longer matches.

A stale required patch blocks release.

A stale optional patch warns.

A patch may be retired when the parser supports the pattern.

Retired patches should remain discoverable in git history.

The current patch file should not keep retired patches unless needed for historical builds.

Historical catalog patches may be separated by catalog slug.

Patch File Organization

Patch files may be organized by catalog slug.

Patch files may be organized by item type.

Patch files may be organized by faculty once volume grows.

The first implementation may use one patch file per catalog.

Example path: data/patches/2026-2027-undergraduate.yaml.

If patch volume grows, split paths should remain deterministic.

Patch file paths should be recorded in parse manifests.

Patch file hashes should be recorded in parse manifests.

Patch file hashes should be recorded in release decisions.

Source Preservation

Patches must preserve raw source evidence.

Patches must preserve source references.

Patches must preserve original display text.

Patches must preserve original HTML when applicable.

Patches may add normalized interpretations.

Patches may add explanation text.

Patches may mark parser output as superseded.

Superseded parser output should remain traceable.

The index should make patched requirements identifiable.

The UI may show that a requirement interpretation was manually reviewed.

Alternatives Considered

Alternative 1: No Manual Patches

The project could require parser automation for every correction.

This keeps data flow simple.

However, it slows progress on rare but important source patterns.

It also forces parser code to absorb one-off exceptions too early.

This alternative is rejected.

Alternative 2: Ad Hoc Code Exceptions

The parser could contain special cases in code for individual records.

This is fast for one-off fixes.

However, it hides data decisions in implementation logic.

It makes review harder.

It makes stale exceptions harder to detect.

This alternative is rejected.

Alternative 3: Unreviewed Patch Files

The parser could allow patch files without governance.

This is flexible.

However, it risks turning patches into untracked authority.

It also weakens user trust.

This alternative is rejected.

Consequences

Manual patches become auditable.

Parser gaps can be fixed without overfitting parser code.

Release gates can reason about patch misses.

Patch files become part of the trusted data build.

The team must maintain patch schema validation.

The team must review patch drift across catalog versions.

The project can improve parser automation based on repeated patch patterns.

Follow-Up

Define the exact patch schema in implementation specs.

Define patch application output JSONL schema.

Define release gate handling for required and optional patch misses.

Add parser reports that list patches by action and target.