Features — Document Classification

Overview

Classification is the third stage of the bulk-intake pipeline and runs per sub-document after splitting and OCR. It answers two questions:

What kind of document is this? — InDocumentType (e.g., MedicalBill, PoliceReport, TreatmentPlan).
What sub-category? — InDocumentCategory (e.g., Inpatient vs Outpatient, Incident vs Supplemental). (v1: reviewer-assigned only — the AI classifier does not yet populate this.)

Both values persist on the InDocument row even when downstream routing fails — so reviewers see the AI's best guess, and metrics capture confidence distribution across the whole tenant.

Gated by feature flag. Classification runs only when App.InDocumentOCR is enabled for the tenant. When disabled, every doc enters review with RequiresReviewReason = AiDisabled.

Decision Flow

Every document passes through the same gauntlet of three checks — feature flag, classifier success, confidence threshold. Each check has a dedicated review-reason so metrics and reviewer UX can distinguish "AI said nothing" from "AI said something unconfidently."

Three gates: feature flag → classifier success → confidence threshold. Each failure path writes a distinct RequiresReviewReason so metrics can distinguish them. Even on low-confidence, the AI guess is preserved for the reviewer.

Type & Category

Both are enum-backed so admins can filter routing rules, lists, and metrics by them. Two-level granularity gives a good balance: Type handles the gross categorization (what workflow it belongs to), and Category narrows within that.

Field	Type	Who Writes It	Why It Matters
`InDocumentType`	`InDocumentType` enum	AI classifier (confident cases) or reviewer.	Selects which routing rule applies; filters admin lists.
`InDocumentCategory`	`InDocumentCategoryEnum`	Reviewer only (v1 — AI classifier does not populate this).	Narrows to the most specific rule when multiple rules match the type.
`ClassificationConfidence`	`int` 0–100	AI classifier only.	Below `ClassifyConfidenceThreshold` → review-required.

Confidence Gating

Each routing rule declares a ClassifyConfidenceThreshold. After classification:

Confidence ≥ Threshold → the pipeline proceeds to extract + route.
Confidence < Threshold → stage marked Skipped; document lands in review with RequiresReviewReason = LowClassifyConfidence. The classifier's guess is still shown to the reviewer as a starting point.
Confidence = null or classifier failure → RequiresReviewReason = AiFailed. Reviewer classifies manually.

Reviewer Override

Because InDocumentType and InDocumentCategory are real settable columns, reviewers can correct the classifier's output without losing the original AI guess — the prior result is preserved in RoutingResultJson's classification history. Admin filters on document-type columns always reflect the current value (reviewer-corrected wins over AI if both exist).

Manual Mode

When AI is disabled or classification fails, the review form surfaces every field as an empty editable input — not a confidence chip. The set header shows a "Manual mode" badge so reviewers know nothing is pre-filled.

Confidence Metrics

The bulk-intake metrics dashboard shows a classification-confidence distribution widget — a histogram of confidence scores across the tenant. Use this to:

Tune per-rule thresholds (raise threshold if auto-linked but wrong cases spike).
Detect model drift (if a formerly confident type starts landing below threshold).
Benchmark AI accuracy before expanding to AutoCreateDraftWorkItem no-match behavior.

API Surface

The classifier is available to other callers (not just the orchestrator) via IDocumentClassifierService.ClassifyAsync(byte[], string mimeType, CancellationToken). Default impl (AzureDiClassifier) wraps IDocumentIntelligenceService and normalizes the 0.0–1.0 confidence to an integer 0–100.

Bulk Document Upload — the pipeline that invokes classification.
Admin Guide — where per-tenant AI feature flags are managed.

Document Classification

Overview

Decision Flow

Type & Category

Confidence Gating

Reviewer Override

Manual Mode

Confidence Metrics

API Surface

Related