39 Document Types, Zero Code
There is a problem that every document-intensive business eventually hits: adding a new document type means another engineering sprint.
You need to process a new kind of identity document, or a business registration certificate from a different state, or an internal form that no vendor has ever seen. Someone has to write the extraction logic, test it, deploy it, and maintain it. The document library is code, and code has a maintenance cost.
For small document libraries this is manageable. For enterprises processing dozens of document types across multiple business lines — or for platforms that need to support the varied document universes of many different customers — it becomes a serious constraint on how quickly you can move.
9thSense takes a different approach. Every document type, all 39 of them, is a definition managed through the platform. Adding a new type — even a custom one your engineers define — requires no code deployment. The extraction engine applies the same intelligence to a type you created five minutes ago as it does to the built-in library.
The 39 Built-In Types
The platform ships with a library that covers the documents enterprises in financial services, insurance, lending, travel, and merchant onboarding actually process.
Identity (7 types)
Aadhaar Card, PAN Card, Passport, Driving License, Voter ID, Ration Card, and Vehicle Registration Certificate. These are the identity documents at the core of KYC workflows in India. Each extracts the relevant fields with built-in format validation. Sensitive numbers — Aadhaar, PAN — are validated for format and masked in stored records per regulatory requirements.
Financial (10 types)
Bank Statement, Cheque, Demat/DP Statement, Form 16, Invoice, ITR Acknowledgement, ITR Analysis, Salary Slip, Utility Bill, and more. Bank statements extract transaction summaries, closing balances, and account details. Salary slips extract gross pay, deductions, and net pay. Each type extracts the specific fields that downstream processes need — not a dump of all visible text.
Business (8 types)
GST Certificate, CIN, FSSAI License, UDYAM Registration, Drug License, Shop and Establishment Certificate, Board Resolution, and Labour Certificate. These are the business registration documents that lending platforms, payment aggregators, and enterprise onboarding flows routinely require. The extraction fields reflect what compliance teams actually verify.
Travel (3 types)
Flight Ticket, Hotel Booking, and Visa. Travel documents are a particular challenge for general OCR systems because their formats vary significantly across airlines, hotel chains, and issuing countries. The platform handles the layout variation and extracts consistent structured data: departure and arrival details from flight tickets, check-in and checkout from hotel bookings, entry conditions and validity from visas.
Scene (6 types)
Merchant Shopfront, Merchant Interior, Merchant Neighbourhood, Shop Inventory, Site Safety Audit, and Vehicle Inspection. These are not document extraction types in the traditional sense — they analyze real-world photographs using visual intelligence rather than OCR. More on Scene Sense in a separate post.
Classification (3 types)
Deepfake Detection, Synthetic Document Check, and Face Liveness. These types run classifier models rather than field extraction — they produce a verdict and a confidence level rather than structured data fields. They are the fraud detection layer that runs alongside document verification.
Other (3 types)
Marksheet, Election Affidavit, and University Offer Letter. Education and civic documents that appear in specific onboarding and lending contexts.
Auto-Classification: Any Document, Any Order
One of the more practical capabilities of the platform is automatic document classification. Upload any document, and the platform identifies what type it is — you do not need to tell it.
This matters most in agent-based workflows. When an intelligent agent is collecting documents from an applicant, the applicant typically does not think in terms of document type names. They upload what they have. The platform classifies each submission, routes it to the appropriate extraction pipeline, and tells the agent what arrived and what was extracted.
Classification also handles the case where applicants misidentify what they are submitting. An applicant who uploads their Form 16 when asked for a salary slip is not being deceptive — they may simply not know the difference. Auto-classification catches this and routes correctly.
Custom document types you create participate in auto-classification as well. You describe the distinguishing characteristics of your document type when you define it, and the classifier learns to identify it alongside the built-in library.
Extraction with Built-In Validation
For every document type, extraction produces typed, structured output with validation applied before the result reaches your application.
This is different from raw OCR. Raw OCR gives you text. Document extraction gives you fields: a name field with the extracted name, a date field in a normalized format, a numeric field with the extracted amount. Required fields are checked — if the extraction could not find a field that must be present, you get a structured error rather than a null that breaks your downstream logic.
Format validation runs on extracted values. An identity number that does not match the expected format is flagged before it reaches your database. Dates are normalized to a consistent format regardless of how they appear on the source document. Regional date formats, abbreviated months, and varying separator styles are all handled.
Sensitive fields — identity numbers, account numbers — are masked in stored records while still being validated. Your application gets the validated value in the API response, but what is stored at rest is protected.
Custom Document Types: Minutes, Not Sprints
The custom type capability is what makes 9thSense practical for enterprises with non-standard document requirements.
Every enterprise processing documents at scale has at least some document types that are specific to their business: internal application forms, state-specific certificates, industry licenses that vary by regulator, partner documents in proprietary formats. No vendor library covers all of these. In a code-driven system, each one requires an engineering engagement.
With 9thSense, you describe your document type through the API. Provide a description of what the document contains, list the fields you need to extract, and optionally provide sample documents. The platform generates a type definition from your description. You review it, adjust if needed, and activate it.
From that point, the type is live. Documents submitted against it go through the same extraction and validation pipeline as any built-in type. Your engineering team did not write any code.
A lending platform that operates across Indian states and needs to process state-specific trade licenses — formats that differ by municipal corporation — can define each variant as a custom type in an afternoon. An insurance company adding a new claim form to their library does not need to wait for their next sprint cycle.
Custom Types in Agent Workflows
Custom document types integrate directly into intelligent agent workflows. An agent configured to handle a specific onboarding flow can require your custom document types alongside built-in ones. The agent collects them, extracts from them, and applies your business rules against the extracted data — all within the same platform.
This is the practical implication of a unified data model. A business rule that says "the name on the custom internal form must match the name on the Aadhaar card" works because both extractions produce structured data in the same format. Cross-document validation does not require any special integration between different systems — it is all within 9thSense.
What Is in the Pipeline
The built-in library expands continuously based on customer requirements. Priority items currently in development include US and European identity documents, ISO-standard commercial invoices, insurance policy documents, and salary certificates common in GCC countries.
Validation capabilities are also expanding. The next release includes cross-field validation rules — checks that span multiple fields within a single document, like verifying that a bank statement's closing balance is arithmetically consistent with its transactions — and date range validation for ensuring documents are within required recency windows.
If there is a document type you process regularly that should be in the built-in library, we want to hear about it. The library grows based on what enterprises actually need.
Try it yourself →
pip install 9thsense