Document Classification
Azure Document Intelligence classifies each sub-document into an InDocumentType with a confidence score. (In v1, the AI does not yet return InDocumentCategory; reviewers set the subcategory when needed.) Classification drives routing, extraction strategy, and the review-required decision.
Overview
Classification is the third stage of the bulk-intake pipeline and runs per sub-document after splitting and OCR. It answers two questions:
- What kind of document is this? —
InDocumentType(e.g., MedicalBill, PoliceReport, TreatmentPlan). - What sub-category? —
InDocumentCategory(e.g., Inpatient vs Outpatient, Incident vs Supplemental). (v1: reviewer-assigned only — the AI classifier does not yet populate this.)
Both values persist on the InDocument row even when downstream routing fails —
so reviewers see the AI's best guess, and metrics capture confidence distribution across
the whole tenant.
App.InDocumentOCR is enabled for the tenant.
When disabled, every doc enters review with RequiresReviewReason = AiDisabled.
Decision Flow
Every document passes through the same gauntlet of three checks — feature flag, classifier success, confidence threshold. Each check has a dedicated review-reason so metrics and reviewer UX can distinguish "AI said nothing" from "AI said something unconfidently."
RequiresReviewReason so metrics can distinguish them. Even on low-confidence, the AI guess is preserved for the reviewer.Type & Category
Both are enum-backed so admins can filter routing rules, lists, and metrics by them.
Two-level granularity gives a good balance: Type handles the gross categorization
(what workflow it belongs to), and Category narrows within that.
| Field | Type | Who Writes It | Why It Matters |
|---|---|---|---|
InDocumentType |
InDocumentType enum |
AI classifier (confident cases) or reviewer. | Selects which routing rule applies; filters admin lists. |
InDocumentCategory |
InDocumentCategoryEnum |
Reviewer only (v1 — AI classifier does not populate this). | Narrows to the most specific rule when multiple rules match the type. |
ClassificationConfidence |
int 0–100 |
AI classifier only. | Below ClassifyConfidenceThreshold → review-required. |
Confidence Gating
Each routing rule declares a ClassifyConfidenceThreshold. After classification:
-
Confidence ≥ Threshold→ the pipeline proceeds to extract + route. -
Confidence < Threshold→ stage markedSkipped; document lands in review withRequiresReviewReason = LowClassifyConfidence. The classifier's guess is still shown to the reviewer as a starting point. -
Confidence = nullor classifier failure →RequiresReviewReason = AiFailed. Reviewer classifies manually.
Reviewer Override
Because InDocumentType and InDocumentCategory are real settable
columns, reviewers can correct the classifier's output without losing the original AI guess
— the prior result is preserved in RoutingResultJson's classification history.
Admin filters on document-type columns always reflect the current value (reviewer-corrected
wins over AI if both exist).
Manual Mode
When AI is disabled or classification fails, the review form surfaces every field as an empty editable input — not a confidence chip. The set header shows a "Manual mode" badge so reviewers know nothing is pre-filled.
Confidence Metrics
The bulk-intake metrics dashboard shows a classification-confidence distribution widget — a histogram of confidence scores across the tenant. Use this to:
- Tune per-rule thresholds (raise threshold if auto-linked but wrong cases spike).
- Detect model drift (if a formerly confident type starts landing below threshold).
- Benchmark AI accuracy before expanding to
AutoCreateDraftWorkItemno-match behavior.
API Surface
The classifier is available to other callers (not just the orchestrator) via
IDocumentClassifierService.ClassifyAsync(byte[], string mimeType, CancellationToken).
Default impl (AzureDiClassifier) wraps IDocumentIntelligenceService
and normalizes the 0.0–1.0 confidence to an integer 0–100.
Related
- Bulk Document Upload — the pipeline that invokes classification.
- Admin Guide — where per-tenant AI feature flags are managed.