Doge Does Research #2 - LLMs and Comprehensiveness

About a year ago, we ran a simple experiment - we asked an LLM to find all unicorn companies in India, and measured how complete the answer was. The results were humbling enough to write about - even with web search switched on and the right sources available, the model returned a materially incomplete list. The conclusion then was straightforward - if you need a comprehensive list, give the model the data; don't let it go looking.

Since then, model capabilities have marched on. The leading labs have all shipped newer, supposedly more capable models. So we wanted to know: how far have we come? and this year, we're back with a variation of the same question as last year.

We gave models from three frontline AI labs - Google Gemini, OpenAI GPT and Anthropic Sonnet and also Manus (probably the best browser use agent) - the same task, and compared their outputs against a verified master list of 97 active unicorns compiled from multiple sources. The prompt was deliberately unglamorous:

"Get me a list of all unicorn companies in India as of date, excluding companies that have gone public, got acquired or became defunct. Return your results in one column with company name."

No documents provided. No RAG. No hand-holding on sources. Models allowed to use the web. This becase a test of what each model can do to search and stitch together data from multiple sources on the internet.

Unicorn Identification by Model

We asked four different models to identify unicorn companies in India, and compared their results against a master list compiled from multiple sources.

Model Performance Comparison

Lab	Model	Unicorn Count
Google Gemini	gemini3.5-flash	86
OpenAI	GPT-5.5-Instant	64
Anthropic	Sonnet4.6	79
Manus	Manus 1.6 Lite	99
Master List	-	97

LLM Output Comparison Table

Lab: Google Gemini	Lab: OpenAI	Lab: Anthropic	Lab: Manus	Master List
Model: gemini3.5-flash	Model: GPT-5.5-Instant	Model: Sonnet4.6	Model: Manus 1.6 Lite	Model: n.a.
Unicorn Count: 86	Unicorn Count: 64	Unicorn Count: 79	Unicorn Count: 99	Unicorn Count: 97
Company Name	Company Name	Company Name	Company Name	Company Name
Acko	Acko	5ire	Acko	5ire
Apna	Ather Energy	Acko	Amagi	Acko
BharatPe	BlackBuck	BharatPe	Apna	Apna
BillDesk	BlueStone	Biocon Biologics	BharatPe	BharatPe
BoAt	Boat	BlackBuck	BillDesk	BillDesk
BrowserStack	BrowserStack	BrowserStack	BlackBuck	BoAt
CarDekho	Chargebee	BYJU'S	BoAt	BrowserStack
Cars24	CRED	CARS24	BrowserStack	Cardekho
Chargebee	Dailyhunt	CoinDCX	CRED	CARS24
CoinDCX	Darwinbox	CRED	Cardekho	ChargeBee
CoinSwitch	DealShare	Darwinbox	Cars24	CitiusTech
CRED	Dream11	DealShare	ChargeBee	CoinDCX
Cult.fit	ElasticRun	Raise Financial Services	CitiusTech	CoinSwitch
Dailyhunt	Eruditus	Dream11	CoinDCX	CRED
Darwinbox	FirstCry	Drools	CoinSwitch	Cult.fit
DealShare	Five Star Business Finance	Eruditus	Cult.fit	Dailyhunt
Dream11	Fractal Analytics	EV Co (Mahindra)	Dailyhunt	Darwinbox
Drools	FreshToHome	FarEye	DarwinBox	DealShare
Droom	Games24x7	Fireflies AI	DealShare	Dream11
ElasticRun	GlobalBees	Games24x7	Dream11	Drools
Eruditus	Groww	Glance	Drools	Droom
Fractal	Infra.Market	GlobalBees	Droom	Druva
Games24x7	InMobi	Groww	Druva	ElasticRun
Glance	Innovaccer	InCred Finance	ElasticRun	Eruditus
GlobalBees	Juspay	Infra.Market	Eruditus	FarEye
Good Glamm Group (MyGlamm)	KreditBee	Juspay	Fireflies AI	Fireflies AI
Gupshup	LEAD School	Khatabook	Fractal	Games24x7
Hasura	Licious	KreditBee	Games24x7	Glance
Icertis	Livspace	LeadSquared	Glance	GlobalBees
InCred Finance	Meesho	Lenskart	GlobalBees	Gupshup
Infra.market	Mensa Brands (BRND.ME)	Livspace	Groww	Hasura
InMobi	MoEngage	Mahindra Electric Automobile	Gupshup	Icertis
Innovaccer	Molbio Diagnostics	Meesho	Hasura	InCred Finance
JSW One Platforms	Money View	Mensa Brands (BRND.ME)	Icertis	Infra.Market
Jumbotail	Navi	Mobile Premier League	InCred Finance	InMobi
Juspay	Neysa	Moglix	InMobi	Innovaccer
KreditBee	NoBroker	Money View	Infra.Market	Jumbotail
Ola Krutrim	NSE (National Stock Exchange)	Netradyne	Innovaccer	Juspay
LEAD School	OfBusiness	Neysa	JSW One Platforms	Khatabook
LeadSquared	Ola Cabs	OfBusiness	Jumbotail	KreditBee
Licious	Ola Electric	Ola Cabs	Juspay	LEAD School
Livspace	Open Financial Technologies	Ola Krutrim	KreditBee	LeadSquared
Mensa Brands (BRND.ME)	OYO Rooms	OneCard	Krutrim SI Designs	Licious
MindTickle	PhysicsWallah	Open Financial Technologies	LEAD School	Livspace
Mobile Premier League	Pine Labs	Oxyzo	LeadSquared	Mensa Brands (BRND.ME)
Moglix	Pocket FM	OYO Rooms	Lenskart	MindTickle
Molbio Diagnostics	Postman	Perfios	Licious	Mobile Premier League
Money View	Purplle	PhonePe	LivSpace	MoEngage
Mu Sigma	Razorpay	Physics Wallah	Meesho	Moglix
Neysa	ShareChat	Pine Labs	Mensa Brands (BRND.ME)	Molbio Diagnostics
NoBroker	Skyroot Aerospace	Pocket FM	MindTickle	Money View
OfBusiness	Spinny	Polygon	Mobile Premier League	Mu Sigma
Ola Cabs	Stashfin	Porter	Moglix	Navi
OneCard	Swiggy Instamart	Pristyn Care	Molbio Diagnostics	Netradyne
Open Financial Technologies	Unacademy	Purplle	Money View	Neysa
Oxyzo	Upstox	Raise Financial Services	Mu Sigma	NoBroker
OYO Rooms	Urban Company	Rapido	MyGlamm	OfBusiness
Perfios	Vedantu	Razorpay	Netradyne	Ola Cabs
PhonePe	Whatfix	Rebel Foods	Neysa	Ola Electric
PhysicsWallah	XpressBees	Reliance Jio	NoBroker	Ola Krutrim
Pocket FM	Yubi (CredAvenue)	Reliance Retail	OfBusiness	OneCard
Porter	Zepto	Rivigo	Ola	Open Financial Technologies
Postman	Zetwerk	ShareChat	Ola Krutrim	Oxyzo
Pristyn Care	Zoho	Shiprocket	OneCard	OYO Rooms
Purplle		Skyroot Aerospace	Open	Perfios
Raise Financial Services		Slice	OYO	PharmEasy
Rapido		Snapdeal	Oxyzo	PhonePe
Razorpay		Staq (Innovaccer)	Perfios	Pocket FM
Rebel Foods		Jumbotail	PharmEasy	Polygon
ShareChat		Udaan	Pine Labs	Porter
Shiprocket		Unacademy	Polygon	Postman
Skyroot Aerospace		UpGrad	Porter	Pristyn Care
slice		Upstox	Postman	Purplle
Spinny		Vedantu	Pristyn Care	Raise Financial Services
Turtlemint		Dailyhunt	Purplle	Rapido
Udaan		Yubi (CredAvenue)	Raise Financial	Razorpay
Unacademy		Zepto	Rapido	Rebel Foods
Uniphore		Zetwerk	RazorPay	Sarvam AI
upGrad			Rebel Foods	ShareChat
Upstox			Rivigo	Shiprocket
Vedantu			ShareChat	Skyroot Aerospace
Xpressbees			Shiprocket	Slice
Yubi (CredAvenue)			Skyroot Aerospace	Spinny
Zenoti			Slice	Staq (Innovaccer)
Zeta			Spinny	Stashfin
Zetwerk			Udaan	Udaan
			Unacademy	Uniphore
			Uniphore	UpGrad
			upGrad	Upstox
			Upstox	Vedantu
			Urban Company	Whatfix
			Vedantu	Xpressbees
			Xpressbees	Yubi (CredAvenue)
			Yubi (CredAvenue)	Zenoti
			Zenoti	Zepto
			Zepto	Zeta
			Zerodha	Zetwerk
			Zeta
			Zetwerk
			Zoho

**Variations in names of unicorns, returned by the models, have been normalized using the master list to allow for cross list comparison. For example, Krutrim or Krutrim SI have been renamed to Ola Krutrim. You can see the raw and normalized data spreadsheet here

Analysis of Results

The unicorn count alone is a misleading headline. A model can return a high number by padding its list with false positives, or a low number by being conservative. What matters is how many of the 97 verified unicorns each model actually identified - and how many entries it included that had no business being there. The table below adds that lens.

Lab	Model	Unicorn Count	Accuracy vs Master List
Google Gemini	gemini3.5-flash	86	82.5% (80/97) - returned 86 names, 6 were false positives
OpenAI	GPT-5.5-Instant	64	49.5% (48/97) - returned 64 names, missed 49 real unicorns, 16 were false positives
Anthropic	Sonnet4.6	79	64.9% (63/97) - returned 79 names, missed 35 real unicorns, 16 were false positives
Manus	Manus 1.6 Lite	99	83.5% (81/97) - returned 99 names, missed 16 real unicorns, 19 were false positives
Master List	-	97	100%

Missing Companies

When measured against the Master List (97 unicorns), identification gaps are more severe than the headline unicorn counts suggest. The "Accuracy" figure represents how many of the 97 verified unicorns each model actually identified - but the misses tell a sharper story:

OpenAI (GPT-5.5-Instant) missed 49 companies - exactly half the master list. The omissions aren't just obscure names; they include well-established players like BharatPe, CoinDCX, PhonePe, Rebel Foods, Shiprocket, and Udaan. At 49.5% accuracy, this is by far the weakest recall of the four models tested. The gap between its stated count (64) and its actual verified hits (48) is stark - nearly a quarter of what it returned had no business being on the list.
Anthropic (Sonnet 4.6) missed 35 companies despite appearing mid-table by count. It dropped well-known names like BoAt, InMobi, Postman, and Zeta, while simultaneously inflating its list with false positives. At 64.9% accuracy, the gap between its unicorn count (79) and verified hits (63) is the starkest precision-recall mismatch of any model - it substituted invented entries for real ones.
Google Gemini (gemini3.5-flash) missed 17 companies, achieving 82.5% accuracy - the second-best recall of the four. Its blind spots skewed toward newer or less-covered names: 5ire, CitiusTech, Druva, Khatabook, MoEngage, Navi, and Sarvam AI were all absent. The omissions feel like genuine knowledge gaps rather than structural failures.
Manus (Manus 1.6 Lite) achieved the best recall at 83.5%, missing only 16 companies - but its misses still included some notable ones: FarEye, MoEngage, Navi, PhonePe, Pocket FM, Sarvam AI, and Stashfin all slipped through despite Manus's active web-searching approach. No model got a clean sheet.

The "Extra" Problem (False Positives)

Comprehensiveness isn't just about recall - precision matters equally. Several models padded their lists with companies that a careful reading of the prompt should have excluded. Across all four models, the false positives fall into four distinct categories, each revealing a different kind of model failure.

Type 1 - Already Public - The prompt explicitly excluded companies that have gone public. Yet every model included at least some IPO graduates. Groww, Meesho, Ather Energy, BlueStone, FirstCry, PhysicsWallah, Urban Company, Fractal, etc. have all listed on public markets. This is arguably the most straightforward error - these exits are heavily covered events - and yet it was the most common category of mistake across all four labs.

Type 2 - Defunct, Acquired or Written Down - BYJU'S, Rivigo, and MyGlamm are companies whose unicorn status has effectively ceased to exist - through insolvency proceedings, operational shutdown, or deep valuation write-downs that have been extensively reported. Unacademy just got acquired. Yet, their continued appearance on model-generated lists suggests that models learn the "unicorn moment" of a company but struggle to unlearn it. Anthropic was the most exposed here, listing all three; Manus included two.

Type 3 - Corporate Subsidiaries, Not Startup Unicorns - JSW One Platforms, Reliance Jio, Reliance Retail, Biocon Biologics, Tata Passenger Electric Mobility, etc. - these are arms of large, established conglomerates - not exactly venture-backed startups that independently crossed the $1B valuation mark. The unicorn designation is specifically a startup construct, and including subsidiaries of the Reliance, Tata, JSW, or Mahindra groups reflects a category confusion that a well-calibrated model should avoid. Anthropic was most prone to this error, accounting for the majority of conglomerate entries across the test.

Type 4 - Bootstrapped and Proud of It - Zoho & Zerodha are special cases. They are valued well above $1B - but they have never raised external venture capital. Whether they "qualify" as a unicorn is a definitional debate, but their inclusion in the lists of Manus and OpenAI suggests models are pattern-matching on valuation alone, ignoring the funding-lineage dimension of the term.

Breaking down the extras by lab:

Manus had the most extras (19), with the largest share being public companies (Amagi, Groww, Lenskart, Meesho, Urban Company) alongside two defunct entries (MyGlamm, Rivigo), two bootstrapped (Zerodha, Zoho), and one conglomerate arm (JSW One Platforms). More searches, it turns out, also surface more noise.
OpenAI added 16 extras - the broadest categorical spread of any model - including 10 public companies (Ather Energy, BlueStone, FirstCry, Fractal Analytics, Groww, Meesho, PhysicsWallah, Urban Company, Five Star Business Finance, Pine Labs, Swiggy Instamart), one bootstrapped (Zoho), one conglomerate (NSE), and a soonicorn? (FreshToHome).
Anthropic Sonnet contributed 16 extras, concentrated in two categories: public companies (Groww, Lenskart, Meesho, PhysicsWalla) and conglomerate subsidiaries (Reliance Jio, Reliance Retail, Biocon Biologics, EV Co (Mahindra), Mahindra Electric Automobile, Tata Passenger Electric Mobility) - the latter being the most distinctive failure pattern of any model in this test. Also notable: two defunct ones Snapdeal and BYJU'S appearing on its list.
Gemini had the fewest extras at 6, and its errors were the most forgivable: 2 public companies (Fractal, PhysicsWallah), one defunct (MyGlamm), and one conglomerate arm (JSW One Platforms). Turtlemint - a well-funded insurtech that has been on the cusp of unicorn status for some time - is the one genuine soonicorn in the mix. Gemini's low false positive rate, combined with its strong recall, makes it the most precise model in this test and essentially ties with Manus on overall accuracy (82.5% vs 83.5%) despite using no agentic web search.

Patterns in LLM Failure

Valuation Staleness: The presence of BYJU'S, Rivigo, and MyGlamm across multiple models is the clearest evidence of temporal lag. These are not obscure edge cases - their collapses were front-page news. Models appear to index the peak of a company's story far more reliably than its decline.
Prompt Filtering Failures: Every model failed on at least one explicit exclusion criterion in the prompt - "gone public," "acquired," or "defunct." The models understood the task conceptually but couldn't apply the filters consistently against their own internal knowledge.
The "Specialised SaaS" Blindspot: All four models showed higher consistency on consumer-facing names (Dream11, OYO, Razorpay) than on B2B or deep-tech unicorns. CitiusTech, FarEye, Icertis, Netradyne, and Sarvam AI were among the most frequently missed - suggesting these companies simply don't appear often enough in the training corpus to register reliably when the model is constructing a list from memory.
The Agentic Tradeoff: Manus's web-search-augmented approach achieved the best raw recall but also generated the most false positives and naming artefacts. The more striking finding is that Gemini - using live search - matched Manus almost exactly on accuracy (82.5% vs 83.5%), while producing far fewer false positives.

Doge Does Research #2 - LLMs and Comprehensiveness - a year later