Doge Does Research #2 - LLMs and Comprehensiveness - a year later
About a year ago, we ran a simple experiment - we asked an LLM to find all unicorn companies in India, and measured how complete the answer was. The results were humbling enough to write about - even with web search switched on and the right sources available, the model returned a materially incomplete list. The conclusion then was straightforward - if you need a comprehensive list, give the model the data; don't let it go looking.
Since then, model capabilities have marched on. The leading labs have all shipped newer, supposedly more capable models. So we wanted to know: how far have we come? and this year, we're back with a variation of the same question as last year.
We gave models from three frontline AI labs - Google Gemini, OpenAI GPT and Anthropic Sonnet and also Manus (probably the best browser use agent) - the same task, and compared their outputs against a verified master list of 97 active unicorns compiled from multiple sources. The prompt was deliberately unglamorous:
"Get me a list of all unicorn companies in India as of date, excluding companies that have gone public, got acquired or became defunct. Return your results in one column with company name."
No documents provided. No RAG. No hand-holding on sources. Models allowed to use the web. This becase a test of what each model can do to search and stitch together data from multiple sources on the internet.
Unicorn Identification by Model
We asked four different models to identify unicorn companies in India, and compared their results against a master list compiled from multiple sources.
Model Performance Comparison
| Lab | Model | Unicorn Count |
|---|---|---|
| Google Gemini | gemini3.5-flash | 86 |
| OpenAI | GPT-5.5-Instant | 64 |
| Anthropic | Sonnet4.6 | 79 |
| Manus | Manus 1.6 Lite | 99 |
| Master List | - | 97 |
LLM Output Comparison Table
| Lab: Google Gemini | Lab: OpenAI | Lab: Anthropic | Lab: Manus | Master List |
|---|---|---|---|---|
| Model: gemini3.5-flash | Model: GPT-5.5-Instant | Model: Sonnet4.6 | Model: Manus 1.6 Lite | Model: n.a. |
| Unicorn Count: 86 | Unicorn Count: 64 | Unicorn Count: 79 | Unicorn Count: 99 | Unicorn Count: 97 |
| Company Name | Company Name | Company Name | Company Name | Company Name |
| Acko | Acko | 5ire | Acko | 5ire |
| Apna | Ather Energy | Acko | Amagi | Acko |
| BharatPe | BlackBuck | BharatPe | Apna | Apna |
| BillDesk | BlueStone | Biocon Biologics | BharatPe | BharatPe |
| BoAt | Boat | BlackBuck | BillDesk | BillDesk |
| BrowserStack | BrowserStack | BrowserStack | BlackBuck | BoAt |
| CarDekho | Chargebee | BYJU'S | BoAt | BrowserStack |
| Cars24 | CRED | CARS24 | BrowserStack | Cardekho |
| Chargebee | Dailyhunt | CoinDCX | CRED | CARS24 |
| CoinDCX | Darwinbox | CRED | Cardekho | ChargeBee |
| CoinSwitch | DealShare | Darwinbox | Cars24 | CitiusTech |
| CRED | Dream11 | DealShare | ChargeBee | CoinDCX |
| Cult.fit | ElasticRun | Raise Financial Services | CitiusTech | CoinSwitch |
| Dailyhunt | Eruditus | Dream11 | CoinDCX | CRED |
| Darwinbox | FirstCry | Drools | CoinSwitch | Cult.fit |
| DealShare | Five Star Business Finance | Eruditus | Cult.fit | Dailyhunt |
| Dream11 | Fractal Analytics | EV Co (Mahindra) | Dailyhunt | Darwinbox |
| Drools | FreshToHome | FarEye | DarwinBox | DealShare |
| Droom | Games24x7 | Fireflies AI | DealShare | Dream11 |
| ElasticRun | GlobalBees | Games24x7 | Dream11 | Drools |
| Eruditus | Groww | Glance | Drools | Droom |
| Fractal | Infra.Market | GlobalBees | Droom | Druva |
| Games24x7 | InMobi | Groww | Druva | ElasticRun |
| Glance | Innovaccer | InCred Finance | ElasticRun | Eruditus |
| GlobalBees | Juspay | Infra.Market | Eruditus | FarEye |
| Good Glamm Group (MyGlamm) | KreditBee | Juspay | Fireflies AI | Fireflies AI |
| Gupshup | LEAD School | Khatabook | Fractal | Games24x7 |
| Hasura | Licious | KreditBee | Games24x7 | Glance |
| Icertis | Livspace | LeadSquared | Glance | GlobalBees |
| InCred Finance | Meesho | Lenskart | GlobalBees | Gupshup |
| Infra.market | Mensa Brands (BRND.ME) | Livspace | Groww | Hasura |
| InMobi | MoEngage | Mahindra Electric Automobile | Gupshup | Icertis |
| Innovaccer | Molbio Diagnostics | Meesho | Hasura | InCred Finance |
| JSW One Platforms | Money View | Mensa Brands (BRND.ME) | Icertis | Infra.Market |
| Jumbotail | Navi | Mobile Premier League | InCred Finance | InMobi |
| Juspay | Neysa | Moglix | InMobi | Innovaccer |
| KreditBee | NoBroker | Money View | Infra.Market | Jumbotail |
| Ola Krutrim | NSE (National Stock Exchange) | Netradyne | Innovaccer | Juspay |
| LEAD School | OfBusiness | Neysa | JSW One Platforms | Khatabook |
| LeadSquared | Ola Cabs | OfBusiness | Jumbotail | KreditBee |
| Licious | Ola Electric | Ola Cabs | Juspay | LEAD School |
| Livspace | Open Financial Technologies | Ola Krutrim | KreditBee | LeadSquared |
| Mensa Brands (BRND.ME) | OYO Rooms | OneCard | Krutrim SI Designs | Licious |
| MindTickle | PhysicsWallah | Open Financial Technologies | LEAD School | Livspace |
| Mobile Premier League | Pine Labs | Oxyzo | LeadSquared | Mensa Brands (BRND.ME) |
| Moglix | Pocket FM | OYO Rooms | Lenskart | MindTickle |
| Molbio Diagnostics | Postman | Perfios | Licious | Mobile Premier League |
| Money View | Purplle | PhonePe | LivSpace | MoEngage |
| Mu Sigma | Razorpay | Physics Wallah | Meesho | Moglix |
| Neysa | ShareChat | Pine Labs | Mensa Brands (BRND.ME) | Molbio Diagnostics |
| NoBroker | Skyroot Aerospace | Pocket FM | MindTickle | Money View |
| OfBusiness | Spinny | Polygon | Mobile Premier League | Mu Sigma |
| Ola Cabs | Stashfin | Porter | Moglix | Navi |
| OneCard | Swiggy Instamart | Pristyn Care | Molbio Diagnostics | Netradyne |
| Open Financial Technologies | Unacademy | Purplle | Money View | Neysa |
| Oxyzo | Upstox | Raise Financial Services | Mu Sigma | NoBroker |
| OYO Rooms | Urban Company | Rapido | MyGlamm | OfBusiness |
| Perfios | Vedantu | Razorpay | Netradyne | Ola Cabs |
| PhonePe | Whatfix | Rebel Foods | Neysa | Ola Electric |
| PhysicsWallah | XpressBees | Reliance Jio | NoBroker | Ola Krutrim |
| Pocket FM | Yubi (CredAvenue) | Reliance Retail | OfBusiness | OneCard |
| Porter | Zepto | Rivigo | Ola | Open Financial Technologies |
| Postman | Zetwerk | ShareChat | Ola Krutrim | Oxyzo |
| Pristyn Care | Zoho | Shiprocket | OneCard | OYO Rooms |
| Purplle | Skyroot Aerospace | Open | Perfios | |
| Raise Financial Services | Slice | OYO | PharmEasy | |
| Rapido | Snapdeal | Oxyzo | PhonePe | |
| Razorpay | Staq (Innovaccer) | Perfios | Pocket FM | |
| Rebel Foods | Jumbotail | PharmEasy | Polygon | |
| ShareChat | Udaan | Pine Labs | Porter | |
| Shiprocket | Unacademy | Polygon | Postman | |
| Skyroot Aerospace | UpGrad | Porter | Pristyn Care | |
| slice | Upstox | Postman | Purplle | |
| Spinny | Vedantu | Pristyn Care | Raise Financial Services | |
| Turtlemint | Dailyhunt | Purplle | Rapido | |
| Udaan | Yubi (CredAvenue) | Raise Financial | Razorpay | |
| Unacademy | Zepto | Rapido | Rebel Foods | |
| Uniphore | Zetwerk | RazorPay | Sarvam AI | |
| upGrad | Rebel Foods | ShareChat | ||
| Upstox | Rivigo | Shiprocket | ||
| Vedantu | ShareChat | Skyroot Aerospace | ||
| Xpressbees | Shiprocket | Slice | ||
| Yubi (CredAvenue) | Skyroot Aerospace | Spinny | ||
| Zenoti | Slice | Staq (Innovaccer) | ||
| Zeta | Spinny | Stashfin | ||
| Zetwerk | Udaan | Udaan | ||
| Unacademy | Uniphore | |||
| Uniphore | UpGrad | |||
| upGrad | Upstox | |||
| Upstox | Vedantu | |||
| Urban Company | Whatfix | |||
| Vedantu | Xpressbees | |||
| Xpressbees | Yubi (CredAvenue) | |||
| Yubi (CredAvenue) | Zenoti | |||
| Zenoti | Zepto | |||
| Zepto | Zeta | |||
| Zerodha | Zetwerk | |||
| Zeta | ||||
| Zetwerk | ||||
| Zoho |
**Variations in names of unicorns, returned by the models, have been normalized using the master list to allow for cross list comparison. For example, Krutrim or Krutrim SI have been renamed to Ola Krutrim. You can see the raw and normalized data spreadsheet here
Analysis of Results
The unicorn count alone is a misleading headline. A model can return a high number by padding its list with false positives, or a low number by being conservative. What matters is how many of the 97 verified unicorns each model actually identified - and how many entries it included that had no business being there. The table below adds that lens.
| Lab | Model | Unicorn Count | Accuracy vs Master List |
|---|---|---|---|
| Google Gemini | gemini3.5-flash | 86 | 82.5% (80/97) - returned 86 names, 6 were false positives |
| OpenAI | GPT-5.5-Instant | 64 | 49.5% (48/97) - returned 64 names, missed 49 real unicorns, 16 were false positives |
| Anthropic | Sonnet4.6 | 79 | 64.9% (63/97) - returned 79 names, missed 35 real unicorns, 16 were false positives |
| Manus | Manus 1.6 Lite | 99 | 83.5% (81/97) - returned 99 names, missed 16 real unicorns, 19 were false positives |
| Master List | - | 97 | 100% |
Missing Companies
When measured against the Master List (97 unicorns), identification gaps are more severe than the headline unicorn counts suggest. The "Accuracy" figure represents how many of the 97 verified unicorns each model actually identified - but the misses tell a sharper story:
- OpenAI (GPT-5.5-Instant) missed 49 companies - exactly half the master list. The omissions aren't just obscure names; they include well-established players like BharatPe, CoinDCX, PhonePe, Rebel Foods, Shiprocket, and Udaan. At 49.5% accuracy, this is by far the weakest recall of the four models tested. The gap between its stated count (64) and its actual verified hits (48) is stark - nearly a quarter of what it returned had no business being on the list.
- Anthropic (Sonnet 4.6) missed 35 companies despite appearing mid-table by count. It dropped well-known names like BoAt, InMobi, Postman, and Zeta, while simultaneously inflating its list with false positives. At 64.9% accuracy, the gap between its unicorn count (79) and verified hits (63) is the starkest precision-recall mismatch of any model - it substituted invented entries for real ones.
- Google Gemini (gemini3.5-flash) missed 17 companies, achieving 82.5% accuracy - the second-best recall of the four. Its blind spots skewed toward newer or less-covered names: 5ire, CitiusTech, Druva, Khatabook, MoEngage, Navi, and Sarvam AI were all absent. The omissions feel like genuine knowledge gaps rather than structural failures.
- Manus (Manus 1.6 Lite) achieved the best recall at 83.5%, missing only 16 companies - but its misses still included some notable ones: FarEye, MoEngage, Navi, PhonePe, Pocket FM, Sarvam AI, and Stashfin all slipped through despite Manus's active web-searching approach. No model got a clean sheet.
The "Extra" Problem (False Positives)
Comprehensiveness isn't just about recall - precision matters equally. Several models padded their lists with companies that a careful reading of the prompt should have excluded. Across all four models, the false positives fall into four distinct categories, each revealing a different kind of model failure.
Type 1 - Already Public - The prompt explicitly excluded companies that have gone public. Yet every model included at least some IPO graduates. Groww, Meesho, Ather Energy, BlueStone, FirstCry, PhysicsWallah, Urban Company, Fractal, etc. have all listed on public markets. This is arguably the most straightforward error - these exits are heavily covered events - and yet it was the most common category of mistake across all four labs.
Type 2 - Defunct, Acquired or Written Down - BYJU'S, Rivigo, and MyGlamm are companies whose unicorn status has effectively ceased to exist - through insolvency proceedings, operational shutdown, or deep valuation write-downs that have been extensively reported. Unacademy just got acquired. Yet, their continued appearance on model-generated lists suggests that models learn the "unicorn moment" of a company but struggle to unlearn it. Anthropic was the most exposed here, listing all three; Manus included two.
Type 3 - Corporate Subsidiaries, Not Startup Unicorns - JSW One Platforms, Reliance Jio, Reliance Retail, Biocon Biologics, Tata Passenger Electric Mobility, etc. - these are arms of large, established conglomerates - not exactly venture-backed startups that independently crossed the $1B valuation mark. The unicorn designation is specifically a startup construct, and including subsidiaries of the Reliance, Tata, JSW, or Mahindra groups reflects a category confusion that a well-calibrated model should avoid. Anthropic was most prone to this error, accounting for the majority of conglomerate entries across the test.
Type 4 - Bootstrapped and Proud of It - Zoho & Zerodha are special cases. They are valued well above $1B - but they have never raised external venture capital. Whether they "qualify" as a unicorn is a definitional debate, but their inclusion in the lists of Manus and OpenAI suggests models are pattern-matching on valuation alone, ignoring the funding-lineage dimension of the term.
Breaking down the extras by lab:
- Manus had the most extras (19), with the largest share being public companies (Amagi, Groww, Lenskart, Meesho, Urban Company) alongside two defunct entries (MyGlamm, Rivigo), two bootstrapped (Zerodha, Zoho), and one conglomerate arm (JSW One Platforms). More searches, it turns out, also surface more noise.
- OpenAI added 16 extras - the broadest categorical spread of any model - including 10 public companies (Ather Energy, BlueStone, FirstCry, Fractal Analytics, Groww, Meesho, PhysicsWallah, Urban Company, Five Star Business Finance, Pine Labs, Swiggy Instamart), one bootstrapped (Zoho), one conglomerate (NSE), and a soonicorn? (FreshToHome).
- Anthropic Sonnet contributed 16 extras, concentrated in two categories: public companies (Groww, Lenskart, Meesho, PhysicsWalla) and conglomerate subsidiaries (Reliance Jio, Reliance Retail, Biocon Biologics, EV Co (Mahindra), Mahindra Electric Automobile, Tata Passenger Electric Mobility) - the latter being the most distinctive failure pattern of any model in this test. Also notable: two defunct ones Snapdeal and BYJU'S appearing on its list.
- Gemini had the fewest extras at 6, and its errors were the most forgivable: 2 public companies (Fractal, PhysicsWallah), one defunct (MyGlamm), and one conglomerate arm (JSW One Platforms). Turtlemint - a well-funded insurtech that has been on the cusp of unicorn status for some time - is the one genuine soonicorn in the mix. Gemini's low false positive rate, combined with its strong recall, makes it the most precise model in this test and essentially ties with Manus on overall accuracy (82.5% vs 83.5%) despite using no agentic web search.
Patterns in LLM Failure
- Valuation Staleness: The presence of BYJU'S, Rivigo, and MyGlamm across multiple models is the clearest evidence of temporal lag. These are not obscure edge cases - their collapses were front-page news. Models appear to index the peak of a company's story far more reliably than its decline.
- Prompt Filtering Failures: Every model failed on at least one explicit exclusion criterion in the prompt - "gone public," "acquired," or "defunct." The models understood the task conceptually but couldn't apply the filters consistently against their own internal knowledge.
- The "Specialised SaaS" Blindspot: All four models showed higher consistency on consumer-facing names (Dream11, OYO, Razorpay) than on B2B or deep-tech unicorns. CitiusTech, FarEye, Icertis, Netradyne, and Sarvam AI were among the most frequently missed - suggesting these companies simply don't appear often enough in the training corpus to register reliably when the model is constructing a list from memory.
- The Agentic Tradeoff: Manus's web-search-augmented approach achieved the best raw recall but also generated the most false positives and naming artefacts. The more striking finding is that Gemini - using live search - matched Manus almost exactly on accuracy (82.5% vs 83.5%), while producing far fewer false positives.