Overview

Classification is the third stage of the bulk-intake pipeline and runs per sub-document after splitting and OCR. It answers two questions:

Both values persist on the InDocument row even when downstream routing fails — so reviewers see the AI's best guess, and metrics capture confidence distribution across the whole tenant.

Gated by feature flag. Classification runs only when App.InDocumentOCR is enabled for the tenant. When disabled, every doc enters review with RequiresReviewReason = AiDisabled.

Decision Flow

Every document passes through the same gauntlet of three checks — feature flag, classifier success, confidence threshold. Each check has a dedicated review-reason so metrics and reviewer UX can distinguish "AI said nothing" from "AI said something unconfidently."

Input: document bytes + mimeType per sub-document, after Split & OCR App.InDocumentOCR feature enabled? Review Required RequiresReviewReason = AiDisabled Manual-mode form; empty fields No Yes Call AzureDiClassifier.ClassifyAsync Azure DI → type, category, 0–1 confidence Classification succeeded? Review Required RequiresReviewReason = AiFailed No Yes Confidence ≥ ClassifyConfidence- Threshold? Review Required RequiresReviewReason = LowClassify- Confidence No Persist + Proceed InDocumentType, Category, Confidence written to row Yes → Extract → Route stages AI guess kept as reviewer hint
Three gates: feature flag → classifier success → confidence threshold. Each failure path writes a distinct RequiresReviewReason so metrics can distinguish them. Even on low-confidence, the AI guess is preserved for the reviewer.

Type & Category

Both are enum-backed so admins can filter routing rules, lists, and metrics by them. Two-level granularity gives a good balance: Type handles the gross categorization (what workflow it belongs to), and Category narrows within that.

FieldTypeWho Writes ItWhy It Matters
InDocumentType InDocumentType enum AI classifier (confident cases) or reviewer. Selects which routing rule applies; filters admin lists.
InDocumentCategory InDocumentCategoryEnum Reviewer only (v1 — AI classifier does not populate this). Narrows to the most specific rule when multiple rules match the type.
ClassificationConfidence int 0–100 AI classifier only. Below ClassifyConfidenceThreshold → review-required.

Confidence Gating

Each routing rule declares a ClassifyConfidenceThreshold. After classification:

Reviewer Override

Because InDocumentType and InDocumentCategory are real settable columns, reviewers can correct the classifier's output without losing the original AI guess — the prior result is preserved in RoutingResultJson's classification history. Admin filters on document-type columns always reflect the current value (reviewer-corrected wins over AI if both exist).

Manual Mode

When AI is disabled or classification fails, the review form surfaces every field as an empty editable input — not a confidence chip. The set header shows a "Manual mode" badge so reviewers know nothing is pre-filled.

Confidence Metrics

The bulk-intake metrics dashboard shows a classification-confidence distribution widget — a histogram of confidence scores across the tenant. Use this to:

API Surface

The classifier is available to other callers (not just the orchestrator) via IDocumentClassifierService.ClassifyAsync(byte[], string mimeType, CancellationToken). Default impl (AzureDiClassifier) wraps IDocumentIntelligenceService and normalizes the 0.0–1.0 confidence to an integer 0–100.