From Shopfront to Onboarded: How Scene Sense Automates Physical Verification
There is a step in merchant onboarding that nobody talks about because it is embarrassingly manual: the field visit.
Before a payment aggregator onboards a merchant, before a bank extends a business loan, before an insurance company issues a commercial policy — someone has to go look at the shop. Is the business real? Does the signage match what the application says? Is it actually the kind of establishment the documents describe? Is the neighbourhood consistent with the claimed customer volumes?
In India alone, payment aggregators process millions of merchant onboardings per year. Each one, ideally, warrants a physical inspection. In practice, field verification is expensive, slow, and geographically constrained. Agents cannot be everywhere. The coverage gap gets filled by assumptions, and assumptions get exploited by fraudulent applicants who know the inspection is unlikely to happen.
Scene Sense is 9thSense's answer to this problem. Upload a photograph, get structured intelligence back. The same platform that reads documents can understand real-world scenes.
What Scene Sense Actually Does
Document extraction and scene analysis are different problems. Document Sense processes structured documents — passports, bank statements, certificates — where the information is encoded as text in a predictable layout. The challenge is reliable extraction from that structure.
Scene Sense analyzes the physical world. It processes a photograph of a shopfront, an industrial site, a vehicle, or a retail space and answers questions about what that photograph contains: what the signage says, what condition the facility is in, what payment methods are advertised, what safety equipment is or is not present.
These are not OCR questions. Understanding a scene requires visual reasoning — recognizing what a GSTIN display looks like even when it is partially obscured by a delivery bike, inferring business scale from spatial context without a measuring tape, identifying safety violations from the way workers are positioned relative to machinery. This is a different capability from reading printed text, and Scene Sense is built for it.
Merchant Onboarding: Three Photographs, Complete Picture
The merchant verification use case illustrates Scene Sense's value most clearly. Standard document verification for merchant onboarding is straightforward: GST certificate, PAN card, cancelled cheque, incorporation documents. These can all be forged. A fraudulent merchant can have a valid GST registration for a business that does not exist at the declared address, or exists but is a completely different kind of business.
Scene Sense adds three photo-based verification steps that documents cannot replicate.
Shopfront analysis extracts what is actually displayed at the business premises: the business name as shown on physical signage, visible GSTIN if displayed (common for GST-compliant businesses in India), payment method logos and stickers, contact information, operating hours, signage quality, and an estimate of frontage dimensions. The cross-validation opportunity is direct: the business name extracted from the shopfront photograph should match the trade name on the GST certificate. A mismatch is a meaningful signal worth investigating.
Payment method display is a soft signal about business maturity. A merchant claiming high transaction volumes who shows no payment branding at their premises is worth a closer look. Signage quality — professional branded signage versus hand-painted versus no signage — adds context about the scale and investment level of the business.
Interior analysis tests the application story against physical reality. It identifies what type of store the interior shows, the scale of the operation, whether a digital point-of-sale system is visible, what product categories are stocked, and general condition. A merchant who claims to be a mid-size electronics retailer but whose interior photograph shows a micro-scale operation with minimal inventory is a mismatch that warrants escalation.
Whether a digital POS terminal is visible is directly relevant to payment processing applications. A high-volume payment gateway applicant without visible payment infrastructure is unusual. Brand names visible on products help verify claimed authorized dealer relationships — if the products on the shelves do not reflect the brand the applicant claims to represent, the relationship may be misrepresented.
Neighbourhood analysis addresses location viability. A high-volume retailer in an area with low commercial density and low foot traffic is implausible. The neighbourhood photograph extracts the commercial context: whether the location is on a main road or a back lane, how dense the surrounding commercial activity is, what businesses are adjacent, whether there are any risk signals like physical damage, isolation, or vacancy. These are the observations a field agent would make and note in a visit report.
Together, these three photographs give an underwriter or onboarding analyst what a physical visit would have produced — not as a human narrative, but as structured data that can be compared against application details and used in automated decision logic.
Vehicle Inspection
For motor insurance, vehicle inspection has historically required a surveyor to physically examine the vehicle before a policy is issued. Scene Sense processes a set of vehicle photographs and returns a structured damage assessment.
The analysis covers vehicle identification (type, make and model if visible, licence plate, colour), overall condition, and a detailed damage inventory. Each observed damage item is documented with the affected area, the type of damage, and the severity. An insurer can use this as the pre-policy inspection record, enabling a direct comparison at claim time to determine whether damage is new or pre-existing.
Licence plate extraction allows cross-validation against the vehicle registration certificate submitted with the application. A mismatch between the plate in the photograph and the RTO record is an immediate flag.
For used vehicle lending, the analysis can extract odometer readings from instrument cluster photographs, enabling lenders to verify declared mileage without a physical visit.
Construction Site Safety
Construction site safety audits are mandated by regulation and required for insurance compliance. Manual audits are periodic — they happen at scheduled intervals, not continuously. Safety conditions on active construction sites change rapidly between visits. A site that passes an audit on Monday may have significant violations by Wednesday.
Scene Sense enables photo-based safety assessments at any frequency. A site supervisor photographs the work area; the analysis returns a safety score, a PPE compliance percentage broken down by equipment type (helmets, safety vests, harnesses, goggles, and so on), a list of specific violations observed, and hazards detected. "Worker visible without helmet near excavation zone. Safety barricading absent from northern face."
The breakdown by equipment type is particularly actionable: the difference between "some workers lack safety equipment" and "harness compliance is at 60%, located primarily in the eastern scaffolding zone" is the difference between a vague concern and a specific corrective action.
For construction finance, lenders and insurers can require regular photo-based safety attestation as a condition of loan drawdown or continued coverage — creating an ongoing compliance record without the cost of periodic physical visits.
Retail Inventory
Inventory financing and supply chain lending use the physical inventory at a shop as collateral. Traditional inventory audits require a physical visit to count and value stock. Shop inventory analysis allows lenders to request periodic photographic attestation of stock levels.
The analysis covers product categories visible, brand names stocked, overall stock level, estimated shelf vacancy, organization quality, and specific concerns like visible expired products. A borrower claiming inventory worth several lakhs but submitting photographs showing heavily depleted shelves is a risk signal. Visible expired or near-expiry products affect collateral value in food and pharmaceutical lending.
Regular photo-based inventory attestation — monthly, quarterly, or on any schedule the lender requires — creates an ongoing record of collateral condition at a fraction of the cost of physical visits.
The Integration with Verification Workflows
Scene Sense's practical value multiplies when it is integrated into agent-driven verification workflows rather than called in isolation.
A merchant onboarding agent can be configured to require shopfront, interior, and neighbourhood photographs alongside the standard document set. The agent collects them from the merchant, processes them through Scene Sense, and applies cross-validation rules against both the scene outputs and the document extractions.
The rule that checks whether the business name on the shopfront matches the trade name on the GST certificate is a single configuration item in the agent — the same rule format used for any other cross-document check. The agent handles the document collection, the extraction, the cross-validation, and the communication to the merchant, all within the same workflow.
This is the integration advantage of a unified platform. Scene analysis results and document extraction results share the same data model and the same rule evaluation framework. Cross-validating a scene output against a document field requires no special integration work.
What Scene Sense Does Not Do
It is worth being direct about the boundaries.
Scene Sense is not a substitute for all human judgment in high-stakes verification. For complex cases — a merchant whose signage matches but whose interior raises questions — the structured output is evidence for a human decision, not a replacement for one. The goal is to extend human judgment to every case at scale, not to remove it from high-stakes exceptions.
Scene types are currently calibrated to India. The merchant shopfront analysis understands what Indian small business formats look like. Site safety checks reflect Indian construction standards. Vehicle type classification covers Indian road vehicles. If your use case is a different geography, we would want to understand the specifics before you assume full coverage.
Scene Sense works from photographs. The quality of the analysis depends on the quality of the photograph — lighting, angle, and distance all affect what can be extracted. The platform includes guidance on photograph capture for each scene type, but poor-quality images will produce less complete results.
What Changes
A field visit that used to take three to five business days and cost several hundred rupees per merchant becomes a self-service photograph submission that the merchant completes from their phone in under five minutes.
That does not eliminate field visits entirely. For high-value accounts, for escalated cases, for situations where the photo analysis raises flags that require human confirmation — field visits still have a role. But the cases that require a physical visit are a fraction of the total. The rest can be handled through Scene Sense, with the field visit capacity reserved for the cases where it genuinely matters.
For insurance claims, vehicle lending, and construction finance, the reduction in physical inspection requirements has the same effect. Routine cases — where the photograph analysis shows what the application claims — can proceed without dispatching a surveyor. Exceptions get the physical inspection they need.
The field still exists. It just fits in a smartphone camera now.
Try it yourself →
pip install 9thsense