Case Study
How A Top-5 VC Firm Built Their Deal Sourcing Engine On Crustdata
Company
Top 5 VC Firm by AUM
use case
Proprietary Founder Sourcing
company overview
This firm is one of the most prominent venture capital firms in the world. With an internal data science and product team dedicated to building proprietary sourcing infrastructure, they've moved beyond off-the-shelf VC platforms to gain a competitive edge in identifying founders before anyone else does.
scale
~200,000 people updated weekly
The firm was seeing the same founders as everyone else, at the same time. Their out-of-the-box sourcing tools couldn't surface founders who fit their thesis before competitors did, resulting in lower allocation in rounds, missing high ROI investments and losing the ability to build relationships before competitive processes began.
Their data and tech teams set out to build a proprietary founder monitoring system, but existing data infrastructure couldn't support the vision.

Impossible to Codify a Unique Investment Thesis
Standard VC platforms decide which founders surface based on generic criteria and every firm subscribed to the same platform runs the same playbook.
The firm’s thesis involves signals and combinations that no off-the-shelf platform exposes as filters
They wanted to weight founder signals differently by sector - a founder with previous entrepreneurial experience matters more in B2B SaaS while a deep technical background or PhD matters more in deep-tech
The sourcing edge they wanted couldn't exist inside a shared platform. If competitors can subscribe to the same alerts, they lose their edge
They wanted to be alerted when companies started hiring ex-founders in leadership positions, a pattern they have seen over the years, precedes rapid growth
Existing platforms optimized for later-stage companies, not early signals
Data Quality & Deduplication Challenges
Working with multiple data providers meant constant resolution of duplicate records and inconsistent identifiers.
No reliable unique identifier across providers
Engineering resources drained by data cleaning instead of product building
Freshness & Latency Requirements
The firm’s internal tools needed to provide results quickly. But most data providers relied on human-in-the-loop processes that introduced unpredictable latency.
Other "real-time" providers actually used manual data collection behind the scenes
Latency issues eroded trust in the data and slowed workflows
No infrastructure for true just-in-time enrichment at scale
Crustdata provided the data infrastructure layer for their proprietary founder monitoring system - a live knowledge graph that updates in real-time without human intervention.
Raw Data Infrastructure to Build On
Off-the-shelf VC platforms mean shared edge - competitors using the same platform can replicate the same filters, signals, and alert logic
Proprietary data sources like internal documents and partner notes can't be uploaded to a third-party platform but can be integrated into an in-house system built on raw data APIs
Internal tooling compounds over time - every new signal, scoring model, and feedback loop from partners makes the system harder to replicate. A SaaS platform's improvements benefit all subscribers equally; internal tooling only benefits the firm

Real-Time Automated Enrichment
Unlike other providers they had evaluated, Crustdata's architecture is fully automated.
API request triggers live crawlers to fetch information from the web instantly
Dedicated crawler resources available for high-priority requests
Reliable Unique Identifiers That Solve Deduplication
Crustdata's knowledge graph anchors every entity to a stable unique identifier and maps all other data sources to this anchor.
Datapoints from multiple external sources mapped to core identifiers
Identifiers stay consistent, and no engineering time is wasted on deduplication pipelines
Infrastructure Built for Scale
The firm processes hundreds of millions of records and updates approximately 200,000 people weekly. Crustdata's infrastructure supports this volume without breaking.
Real-Time API: Person enrichment, company jobs, posts, and reactors on demand
Profile Watcher: Automated monitoring of founder job changes, social posts, and early signals
Low Latency: Sub-2-second response times for enrichment
Thesis-Driven Sourcing Without Platform Constraints
Instead of relying on the same filters and alerts every other firm uses, they can codify their unique investment thesis into custom monitoring systems built on raw, unbiased data.
No external platform deciding which founders surface
Full control over signal weighting and criteria
Proprietary sourcing edge that compounds over time
First-Mover Advantage on Founder Discovery
Identify founders at the earliest signals: job changes, stealth company formation, first key hires
Reach out while the founder is still "under the radar"
Build relationships before competitive processes start

Engineering Focus Shifted from Data Cleaning to Product Building
With deduplication and identifier consistency solved at the data layer, the firm’s team spends time building sourcing tools instead of fixing broken pipelines.
Eliminated extensive post-processing workflows
Stable identifiers across data refreshes
Faster iteration on internal sourcing products

