Company Search API Implementation Guide for Developers
Learn the architecture split between cached APIs (milliseconds, stale) and real-time crawling (seconds, fresh). Implementation guide with code, filters, cost control.
Published
Feb 2, 2026
Written by
Abilash C.
Reviewed by
Manmohit G.
Read time
7
minutes


A company search API is a programmatic interface for finding businesses by specific criteria. Common filters include industry, geography, employee count, funding stage, and tech stack. You send a request and receive structured JSON. This data can feed your product, data pipeline, or internal tool.
Developers use company search APIs for four main purposes. First, building targeted prospect lists for sales workflows. Second, enriching CRM records with missing firmographics. Third, running repeatable market scans. Fourth, giving AI agents accurate company context at decision time.
The tricky part is that “company data” isn’t one thing: different providers trade off speed, freshness, coverage, and cost in ways that affect how you architect your integration.
Join us as we go through the basics – how search differs from enrichment, how filters and pagination work, and how to handle limits and errors in production – using Crustdata’s APIs for concrete examples.
Understanding company data APIs
“Company data API” is an umbrella term. Some APIs are built to help you discover companies that match a set of filters. Others are built to enrich a company you already know about.
Before you implement anything, it’s worth separating those patterns and understanding where the underlying data comes from, because that determines what you can trust, what you have to verify, and how you should design your workflow.
What is a company search API?
A company search API is a query interface that returns companies matching the criteria you provide. Most implementations use REST architecture. These APIs answer discovery questions like "Which companies match these characteristics?" They differ from lookup APIs that retrieve details about a known company.
In a real system, search is typically the first stage. You use it to produce a set of candidate companies, then pass those results into the next step in your pipeline.
For example:
Prospecting workflows use search to build target account lists, then sync them into a CRM or outreach tool.
Market and competitive research uses search to create cohorts for analysis, like “all fintechs headquartered in New York above 50 employees.
AI agents and automation use search to pull the current context right before taking an action, like drafting a message or triggering an alert.
Where do these APIs get company information? Well, most providers aggregate from multiple sources and normalize everything into a consistent schema, which is why the same company can have different categories or event history across vendors.
Data sources are wide-ranging:
Government registries provide legal entity names, incorporation dates, registered addresses, and status where available.
Professional data and job postings inform employee counts, job titles, and hiring signals derived from public profiles and recruiting pages.
Funding and corporate activity databases track rounds, investors, acquisitions, and related timelines, often with strong coverage of startups.
Web crawling and news sources capture website changes, press releases, careers pages, and recent mentions that signal momentum.
Review and marketplace platforms can add product categories and some stack clues via sites like G2 or Glassdoor.
Different providers focus on specific areas and use different sources. OpenCorporates is oriented around legal entity and registry data, which fits verification and compliance use cases. Crunchbase is oriented around startups, investors, funding rounds, and M&A. Crustdata’s positioning is more about fresh signals and monitoring across venture capital, sales, and recruitment, which makes a difference when you care about what changed recently, not just what’s in a periodic snapshot.
Some company APIs include technology fields. Meaningful technographics often come from dedicated services. This data is typically inferred rather than declared by the company. Common methods include scanning websites for scripts and tags, checking HTTP headers, and extracting mentions of tools from job postings.
If your workflow depends heavily on “uses Salesforce” or “runs on AWS,” you may end up combining a company search API with a technographics provider.
A useful comparison point across vendors is update behavior. Some refresh on a schedule, which works for periodic list building and batch enrichment. Others support on-demand refresh or continuous monitoring, which is a better fit for event-triggered workflows.
Search vs enrichment: two different patterns
Search and enrichment solve different problems, and they’re often priced and rate-limited differently.
A search API takes criteria and returns a list of matching companies. Use it when discovering accounts by filters. This pattern works when you don't know the company set in advance.
An enrichment API takes an identifier like a domain or company ID and returns a detailed profile for that one company. Use it when you already have the account, and you’re filling in missing fields or verifying details.
In Crustdata, the company search endpoints handle discovery, while the company enrichment endpoint (/screener/company) handles lookups by domain or ID.
Cost usually follows the pattern too: search is commonly priced based on how many results you pull, and enrichment is commonly priced per company enriched, with real-time refresh typically costing more than cached enrichment.
Working with filters and search criteria
Once you understand what a company search API returns, the next step is learning how to ask the right questions.
Most of the power lives in the filter layer. Filters decide what the API can express, how much work you push onto the provider versus your own code, and how stable your results are when the vendor’s taxonomy changes.
Anatomy of a search request
A search request is a structured API call with three components: authentication, filter payload, and pagination controls. Crustdata's company search uses a POST request with a JSON body. Query parameters are not used.
Authentication is handled by passing your API token in the request header using Authorization: Bearer <your_token>. This is typical for company data APIs and lets you rotate keys without changing application code.
The body of the request contains a list of filters. Each filter is an object with three required parts:
filter_typeidentifies the field you want to filter on, such as industry, region, or company headcount.typedefines the operator, like inclusion, exclusion, or a range comparison.valuesupplies the criteria the operator evaluates against.
When you combine filters in a single request, they are evaluated together.
For example, finding technology companies in the United States with 51 to 200 employees means sending three filters in the same request: one for industry, one for region, and one for headcount. A company must satisfy all of them to appear in the results.
A useful mental model is that search generates a candidate set. Anything you can’t express cleanly in filters will either require additional queries or a post-processing step on your side.
Using an autocomplete API to find valid filter values
Filters fail in boring ways. The most common one is “your value doesn’t match what the provider expects.” Industry labels and other categorical fields often need to match a canonical vocabulary, including capitalization.
Many providers offer an autocomplete-like API or published value lists to help you discover the exact values you should use. This is especially useful when you’re building a UI where users type free text, and you need to map that input onto the provider’s taxonomy.
Autocomplete is available for multiple categories, including:
Region, to return valid geography values you can paste directly into filters.
Industry, to return canonical industry labels rather than whatever a user typed.
Title, to help when you’re building people-adjacent workflows that depend on role data.
School, to support education-related fields in products that blend company and talent signals.
A practical workflow involves resolving user input against the provider’s suggested values during setup or at query time, storing the returned value, and then using that exact string in your filter.
Filter types and operators
Different fields expect different operator types, and using the wrong operator for a field type causes errors.
Text-like filters typically support inclusion and exclusion. You pass a list of accepted values for in, or a list of blocked values for not in. This is common for categories like industry and region.
Numeric filters use ranges. Crustdata supports between with minimum and maximum values, and some fields also support a unit or currency qualifier through an extra sub-field. Revenue is the typical example.
Some filters behave like flags. In Crustdata, IN_THE_NEWS is one of these: you’re not comparing against a value, you’re turning a condition on.
By default, multiple filters combine with AND logic, meaning every condition must be true. If you need OR logic, the usual approach is to run separate searches and union the results on your side.
Common search scenarios
Most real-world queries are variations on a few repeatable patterns:
ICP list building combines industry, region, and headcount to generate a starting account set you can enrich or score.
Budget-aware narrowing adds revenue ranges or employee buckets to reduce the result size before exporting to downstream tools.
Activity-driven discovery uses hiring-related filters like job opportunities to find companies that are actively staffing roles.
Exclusions for focus apply
not into filter out regions or industries you explicitly don’t want in your output.
The key is to treat search as iterative. Run a first pass, inspect what comes back, then refine filters until the query produces a stable, useful result set for your workflow.
Implementation guide
With the concepts and filters in place, you can wire everything together in a way that’s predictable and cost-aware.
Let’s go over a complete implementation flow.
Choosing the right endpoint: realtime vs in-database search
Crustdata exposes two company search endpoints that solve different problems:
Realtime search is optimized for fresh data. It checks the internet in real time. Pricing is typically based on the number of companies returned. Pagination uses fixed-size pages with a maximum page cap per query.
In-database search queries a pre-indexed database. It’s optimized for bulk discovery and rich, nested filtering. It’s generally more cost-efficient for large result sets and supports larger page sizes. Pagination is often cursor-based rather than page-based, which makes it easier to stream through large result sets and pinpoint specific companies.
Use realtime search when the question depends on what’s true right now, like whether a company is actively hiring or just announced funding.
Use an in-database search when you’re building large lists, running scheduled jobs, or doing exploratory analysis where being slightly out of date is acceptable.
There’s also a structural difference in how filters are expressed. Realtime search uses an array of filter objects with filter_type, type, and value. In-database search uses a nested filters object with an explicit logical operator and a list of conditions.
If you switch endpoints, you can’t reuse the same payload verbatim.
Setting up your environment
Before writing any code, you’ll need to set up credentials and basic request handling.
Store your Crustdata API token in an environment variable, such as CRUSTDATA_API_TOKEN, and read it at runtime. Hardcoding tokens makes rotation painful and increases the risk of accidental leaks.
Crustdata’s API is JSON over HTTP and doesn’t require an SDK. You can use any language with an HTTP client. Python is a common choice for data pipelines and automation jobs, so the example below uses plain requests without any wrappers.
A minimal Python example
Here’s a Python script that uses Crustdata to programmatically discover and list specific business leads.
It authenticates via an API token to filter for U.S.-based software companies with a headcount between 51 and 200 employees. Once the request is processed, it iterates through the results to extract and print the names of every matching organization.
import os
import requests
API_TOKEN = os.environ.get("CRUSTDATA_API_TOKEN")
URL = "https://api.crustdata.com/screener/company/search"
headers = {
Authorization: Token <YOUR_TOKEN>,
"Content-Type": "application/json",
}
payload = {
"filters": [
{"filter_type": "INDUSTRY", "type": "in", "value": ["Software"]},
{"filter_type": "REGION", "type": "in", "value": ["United States"]},
{"filter_type": "COMPANY_HEADCOUNT", "type": "in", "value": ["51-200"]},
],
"page": 1,
}
try:
response = requests.post(URL, json=payload, headers=headers, timeout=30)
if response.status_code == 429:
print("Rate limit hit. Use exponential backoff.")
else:
response.raise_for_status()
data = response.json()
for company in data.get("companies", []):
print(company.get("name", "N/A"))
except requests.exceptions.Timeout:
print("Request timed out.")
except requests.exceptions.RequestException as e:
print(f"Request failed: {e}")
This pattern applies across endpoints. You can build the filter payload and send a POST request, then iterate over the returned companies array.
Handling pagination and parsing results
Search results almost never fit in a single response, so pagination handling needs to be part of your implementation.
Realtime search returns a fixed number of companies per page. You increment the page parameter starting at 1 and stop when the companies array comes back empty, or you hit the maximum page limit. Because pages are capped, you should treat realtime search as a targeted query tool rather than a bulk export mechanism.
In-database search uses cursor-based pagination. You omit the cursor on the first request, then pass the cursor value from the response into the next request. This continues until no cursor is returned. You can also control batch size with a limit parameter, up to the documented maximum.
Both endpoints return a list of companies plus a count field that tells you how many matches exist. Use that count early. If a query produces far more results than you expect, it’s usually cheaper to tighten filters than to page through everything.
When you hit pagination limits, the usual workaround is to split the query into smaller chunks. For example, run the same search once per industry category or per region, then merge the results on your side.
Limitations and best practices
Company search APIs work best when you design around their limits instead of discovering them in production. Factors like rate limits and pagination caps shape how aggressive your queries can be and where you should add guardrails.
Rate limits, pagination caps, and credit usage
Each endpoint enforces limits to protect the service and keep costs within bounds.
For realtime search, Crustdata applies rate limits to protect the service. Requests beyond the allowed threshold return rate-limit errors.
If you need higher throughput, it’s usually possible to discuss custom rate limits with the provider, which is common when teams move from prototyping to production workloads.
Pagination behavior differs by endpoint and affects how much data you can realistically pull in one pass.
Realtime search paginates by page number and features a maximum page limit. This design makes it a targeted query tool for high-intent leads rather than a bulk export mechanism.
In-database search uses cursor-based pagination and supports much larger result sets, but you still get better performance and lower cost when filters are specific rather than broad.
Credits are typically consumed based on results returned, not just requests sent. Realtime-style search tends to be priced per company returned, while database-style search is usually more cost-efficient for large batches.
It’s common for test queries that return zero companies to have little or no cost impact, which makes it safe to iterate on filters during development.
These mechanics mean it’s smarter to use realtime search sparingly for questions where freshness matters, and in-database search for bulk discovery and scheduled jobs.
Error handling and optimization
Most errors fall into a small set of predictable cases.
A 400-level error usually means the filter payload is malformed. This often comes from using invalid values for categorical fields. Validating inputs with an autocomplete API before sending a search request avoids most of these failures.
A 429 response means you’ve hit a rate limit. The safest response is to slow down, not retry immediately. Space requests evenly across the minute and apply exponential backoff so short bursts don’t cascade into repeated failures.
Cost control comes from tightening queries early. Narrow filters reduce result size, which reduces both pagination overhead and credit usage. For large list building or exploratory analysis, in-database search is almost always cheaper than realtime search.
Data freshness also benefits from selective refresh. Firmographic fields like industry or headquarters change infrequently and can be cached locally. Time-sensitive signals like job postings or recent activity are better fetched on demand.
For very high-volume use cases, some providers also offer bulk datasets delivered via cloud storage. A common pattern is to use bulk data as a baseline, then layer API calls on top only for real-time checks or monitoring.
Start building with company data
Once you understand the basics of company search APIs, you’re in a position to build something that holds up in production.
The difference between a quick proof of concept and a reliable integration usually comes down to making deliberate choices early, like validating inputs before requests go out and selecting the search mode that matches how fresh the data needs to be.
Company data becomes most valuable when it’s wired directly into real workflows, whether that’s an AI sales agent making decisions in real time or a prospecting tool that refreshes as the market changes.
If you’re ready to move from exploration to implementation, book a demo with Crustdata to walk through your use case and get API access.
Products
Popular Use Cases
95 Third Street, 2nd Floor, San Francisco,
California 94103, United States of America
© 2026 Crustdata Inc.
Products
Popular Use Cases
95 Third Street, 2nd Floor, San Francisco,
California 94103, United States of America
© 2025 CrustData Inc.
Products
Popular Use Cases
95 Third Street, 2nd Floor, San Francisco,
California 94103, United States of America
© 2025 CrustData Inc.

