Introducing 9thSense: 9 Senses for Visual Intelligence

When I was building financial infrastructure at a fintech, we had a recurring problem that nobody wanted to name out loud: we spent more engineering time maintaining vendor integrations than building product.

Six vendors. One KYC workflow. A different SDK, a different auth model, a different error format, a different support escalation path for each one. When any vendor had an outage, the on-call engineer spent the first twenty minutes just figuring out which part of the pipeline had broken. The integration layer had become load-bearing.

This is not a fintech problem. It is a problem for any company that works with visual data — documents, faces, scenes. Insurance companies doing property assessments. Lending platforms processing income documents. Travel companies verifying visas. Merchants getting onboarded. Banks running KYC.

All of them are running some version of the same patchwork stack: an OCR vendor here, an identity vendor there, a custom prompt chain held together with engineering effort, a manual review queue for everything that falls through. None of these pieces share a data model. None compose naturally. All require constant effort to keep synchronized.

Today we are launching 9thSense.

The Idea: Enterprises Should Have Senses

The human perceptual system is elegant because it is unified. You do not swap out your visual cortex for document reading and use a different one for face recognition. There is one integrated system, and it routes inputs to the right processing pathway automatically.

Enterprises processing visual data have nothing like that. They have a collection of point solutions that do not share a data model, do not compose naturally, and require constant engineering effort to keep synchronized.

Our thesis is straightforward: AI has reached the point where we can give enterprises a unified perceptual platform — a set of senses — that covers the full spectrum of visual intelligence. Not just document extraction. Not just identity verification. The whole thing, with a common data model, composable capabilities, and an intelligent agent layer on top that orchestrates them all.

We call them senses because that is what they are: perceptual capabilities your systems previously did not have, and now do.

The 9 Senses

We have organized the platform around 9 core capabilities. Five are shipping today. Four are on the roadmap.

The 5 Shipped Senses

Document Sense — structured extraction from any document type. The platform ships with 39 pre-built document types across 7 categories: identity documents like Aadhaar, PAN, Passport, and Driving License; financial documents like bank statements, ITR filings, salary slips, and Form 16; business documents like GST certificates, CIN, FSSAI, and UDYAM registrations; travel documents including flight tickets, hotel bookings, and visas; scene types for physical premises and vehicles; fraud classification types; and a category for education and other documents.

For each document type, you get typed, validated, structured output — not raw OCR text. The platform also handles documents you define yourself: create a custom document type via API and the same extraction pipeline applies immediately, without any code deployment.

Identity Sense — facial biometrics at production scale. Three capabilities in one: 1:1 face match (compare a selfie against an ID photo), 1:N face search (find a face across millions of enrolled records), and cross-database search across 6 configured biometric databases simultaneously. This is what confirms that the person submitting a KYC selfie is the same person pictured on the identity document — and what checks that person against watchlist and sanctions databases before onboarding proceeds.

Shield Sense — deepfake and synthetic fraud detection. Three classifiers, each targeting a different fraud vector: deepfake image detection for AI-generated or manipulated face images, synthetic document detection for documents that were never issued by a real institution, and face liveness verification to confirm a live person is present rather than a photograph, video, or mask. These classifiers run on self-hosted models — your customer's biometric data never leaves your infrastructure.

Scene Sense — scene understanding for physical world verification. Six scene types covering the situations where enterprises need to verify what actually exists in the physical world: merchant shopfront, merchant interior, merchant neighbourhood, retail inventory, construction site safety, and vehicle inspection. Upload a photo, get structured intelligence back. The same platform that reads documents can understand real-world scenes.

Judge Sense — the intelligent agent layer. Judge is what makes the other four senses composable. You configure a verification agent with a set of document requirements and business rules. The agent collects documents from applicants, runs them through the relevant senses, cross-checks consistency across documents, and drives toward a decision. Agents can handle multi-turn conversations, explain issues in plain language, request specific remediation documents, and guide applicants through complex multi-step workflows. The entire agent — its persona, its rules, its retry logic — is configuration, not code.

The 4 Roadmap Senses

Conversation Sense — voice and chat agents that handle inbound customer queries, walk applicants through onboarding, and escalate to human agents with a full context handoff. Think of it as Judge Sense extended to cover the full customer communication lifecycle, not just the document submission flow.

Monitoring Sense — real-time stream analysis. CCTV feeds, video uploads, continuous fraud pattern detection. Document Sense and Shield Sense running at stream speed, surfacing signals continuously rather than at submission time.

Review Sense — structured post-event review. Audit trails, compliance reports, decision explanations. The paper trail that regulated industries require when a decision gets challenged, built into the platform rather than assembled after the fact.

Investigation Sense — case investigation copilot. When a fraud case lands in your operations team's queue, Investigation Sense aggregates all the signals — document history, face search results, risk indicators, cross-match flags — into a unified case view with a conversational interface for analysts.

The Numbers

39 document types across 7 categories, covering the documents that enterprises in financial services, insurance, travel, and retail actually process.

12 AI tools in the platform: document extraction, classification, OCR, table extraction, signature detection, cross-match, entity resolution, address verification, redaction, risk scoring, field validation, and classifier inference.

6 biometric databases for face search: configurable per tenant, supporting both internal watchlists and external compliance databases including PEP lists and sanctions lists.

3 fraud detection classifiers: deepfake detection, synthetic document detection, and face liveness verification.

These are not aspirational numbers. They reflect what is running in production today.

The Platform Approach: Configuration, Not Code

This is the design decision that took the longest to get right, and the one that matters most for enterprise adoption.

The natural way to build a document intelligence system is to hardcode things. Write an extraction function for each document type, deploy it, maintain it, repeat when requirements change. This works for small document libraries. It becomes unmaintainable at scale, and it breaks entirely when an enterprise customer needs a document type you have never seen before.

9thSense is configuration-driven from the ground up. Every document type is a definition you can inspect and extend. Adding a new document type — even a custom one specific to your industry — means describing it through the API, not deploying new code. The extraction engine applies the same intelligence to a document type created five minutes ago as it does to the 39 built-in types.

The same principle applies to agents. A visa processing agent and a merchant onboarding agent and a KYC agent are all the same underlying machinery, configured differently. The business rules, the required document types, the fraud checks, the conversational persona — all configuration. All changeable without an engineering sprint.

This matters for enterprises because their requirements are not standard. They have document types that no vendor has catalogued. They have compliance rules that are specific to their regulator or their geography. They have edge cases that emerge from their actual customer base, not from a generic benchmark.

Configuration-driven means those things are first-class citizens, not afterthoughts bolted on after the platform was designed for someone else's use case.

Enterprise-Ready: What That Actually Means

We built 9thSense to operate inside enterprise environments, not just alongside them.

SOC 2 compliance through Digio, our parent company. Digio processes millions of KYC verifications for Indian enterprises and has the compliance infrastructure that regulated industries require. 9thSense inherits that posture.

Multi-region deployment. Data residency requirements are real, particularly in financial services and healthcare. We support regional isolation so your documents and biometric data do not leave the jurisdiction your compliance team requires.

Self-hosted AI. Every classifier and extraction model in 9thSense can run on your own infrastructure. For enterprises that cannot send document images and biometric data to third-party AI APIs, this is not a feature request — it is a baseline requirement. We designed for it from day one. The intelligence layer operates over a well-defined internal API, and you can bring your own models into that framework.

API-first. A complete REST API covering all platform capabilities, with a Python SDK for teams that prefer library-style integration. The SDK handles authentication, retries, and the plumbing you do not want to build yourself.

Agent intelligence at Level 2. Our agent framework supports multi-turn conversations, document retry flows, and LLM-driven decision logic with configurable business rules. A Level 2 agent does not just process documents — it guides the person through the workflow, handles exceptions in natural language, and adapts to the state of the case in real time.

What We Are Honest About

The 4 roadmap senses are roadmap. Conversation, Monitoring, Review, and Investigation are in design and early development. We have built the foundation they need, but we have not shipped them, and we will not describe them in the product documentation as though we have.

The 39 document types cover the documents our design partners process. Your industry probably has document types that are not in that list. The custom type API exists precisely for this — describe your document type through the API, and extraction works immediately. But we are not claiming coverage we do not have.

The scene types are currently calibrated to India. Merchant shopfront analysis understands what an Indian small business looks like. Site safety checks are calibrated to Indian construction contexts. If your use case is a different geography, we would like to understand the specifics before you assume Scene Sense covers it out of the box.

Backed by Digio

9thSense is built by the team at Digio — India's leading digital identity infrastructure company. Digio processes millions of Aadhaar-based KYC verifications, e-sign transactions, and onboarding workflows for banks, NBFCs, insurance companies, and fintechs across India.

Building 9thSense inside Digio means we have access to real enterprise document pipelines from the beginning. The 39 document types in the library are not theoretical — they are the documents Digio's customers actually process. The edge cases we have handled are not hypothetical — they come from production traffic at scale.

It also means 9thSense ships with the compliance posture that enterprise buyers require, not as a feature to be added later but as a design baseline.

What Is Next

We are shipping the four roadmap senses in priority order. Monitoring Sense is next — real-time stream analysis for continuous fraud detection and compliance monitoring. After that, Conversation Sense, for voice and chat-native onboarding agents that handle the full applicant communication lifecycle.

If you are building in financial services, insurance, travel, or any industry that processes documents or verifies identity at scale, we would like to talk. The specific use cases matter: what document types you process, what your exception rate looks like, where your current pipeline breaks down.

The best way to start is the API. A working integration with your first document type takes about 10 minutes.

Get API access → | Read the docs → | Talk to us →