How to integrate Exa search for web search into LLM agents
● How to integrate Exa search for web search into LLM agents.md
# How to integrate Exa search for web search into LLM agents
A no-fluff guide to integrating Exa (`api.exa.ai`) as a web-search backend for LLM agents and tools. Written after shipping four production tools on top of it (entity reviews, local discovery, events, general Q&A). If you're an agent reading this to one-shot an Exa integration, you should be able to copy the patterns below and have it working in under an hour.
---
## What Exa is, in one paragraph
Exa is a neural + keyword web search API that returns ranked URLs plus extracted page content. Two things make it useful for LLM agents over plain Google: (1) you can ask for **semantically similar** results (`type=neural` / `type=auto`), so "rooftop bars with skyline views" returns rooftop bars instead of pages that match those keywords; (2) you can ask Exa to **summarize each result against a question** server-side (`contents.summary.query`), so you get clean structured text per source instead of raw HTML to parse yourself.
That second feature is the killer one for agents. You're not sending the LLM 6 random web pages — you're sending it 6 short summaries already focused on what the user asked.
---
## Endpoint, auth, payload shape
```
POST https://api.exa.ai/search
Headers:
x-api-key: <your key>
Content-Type: application/json
```
Minimum payload:
```json
{
"query": "rooftop bars with skyline views in Bangkok",
"type": "auto",
"numResults": 6,
"contents": {
"summary": { "query": "Name, neighborhood, what makes it special, price range." },
"maxAgeHours": 168
}
}
```
Response:
```json
{
"results": [
{
"title": "...",
"url": "...",
"summary": "...",
"text": "...",
"publishedDate": "...",
"score": 0.91
}
]
}
```
---
## Things your training data is wrong about (2026 reality)
LLMs frequently emit outdated Exa code. Verify against current docs but here's what's changed:
1. **`type: "neural"` is deprecated for new code.** Use `type: "auto"` — Exa decides per query whether neural or keyword retrieval works better.
2. **`livecrawl: "preferred"` is deprecated.** The modern freshness control is `contents.maxAgeHours` — a hard upper bound on content age. Set it low (1–24) for time-sensitive queries, high (720+) for stable content.
3. **`includeDomains` is still the right way to constrain sources** when you have strong reason to. Over-filtering kills recall for less-famous entities.
4. **Summaries are not free.** `contents.summary` runs an LLM call per result on Exa's side. Don't request it for huge `numResults` you won't use.
---
## The four query patterns I keep reusing
### A. Reviews / opinions of a named entity (a hotel, restaurant, product)
```python
{
"query": f'Guest reviews and honest opinions of "{name}" in {location}',
"type": "auto",
"numResults": 6,
"contents": {
"summary": {
"query": f"What do real users say about {name}? Pros, cons, service, value."
},
"maxAgeHours": 24 * 30
}
}
```
Gotchas:
- **Quote the entity name.** Without quotes, Exa neural drifts to similar entities (a search for a famous-but-not-unique hotel name returns three other hotels of the same chain worldwide).
- **Don't over-filter `includeDomains`** for obscure entities. Better to let Exa pull what it can and trust the summary to flag mismatches — the LLM-generated summary will often say "the content is about X, not Y", which is honest and recoverable.
- 30-day `maxAgeHours` for reviews — sentiment doesn't change overnight, and you want a meaningful sample.
### B. Local discovery (restaurants, cafes, bars, viewpoints near a city/landmark)
```python
{
"query": f"{intent} in {city} near {landmark}", # e.g. "rooftop bars in Bangkok near Sukhumvit"
"type": "auto",
"numResults": 6,
"contents": {
"summary": { "query": "Name, neighborhood, what makes it special, price range." },
"maxAgeHours": 168
}
}
```
Gotchas:
- **Be specific with intent in the query.** "Best vegetarian restaurants" beats "restaurants" because Exa's neural retrieval keys on intent.
- **Don't promise live hours/availability.** Exa returns what's on web pages; opening hours can be stale.
- 7-day window catches most active blogs and city guides.
### C. Events in a date window
```python
{
"query": f"{interests} concerts festivals events in {city} between {start_date} and {end_date}",
"type": "auto",
"numResults": 8,
"includeDomains": ["eventbrite.com","timeout.com","songkick.com","bandsintown.com","residentadvisor.net"],
"contents": {
"summary": { "query": "Event name, date, venue, what is it about." },
"maxAgeHours": 24
}
}
```
Gotchas:
- **Include date strings in the query, don't rely on `maxAgeHours` to filter.** A 24h window means the *page* was crawled recently — not that the *event* is in your date range. Aggregator pages updated yesterday can describe years-old festivals.
- **Post-filter by year/month** in the returned summary text. Or tell the LLM explicitly to verify dates before quoting.
- Domain allowlist works well here because event aggregators are well-known.
### D. General factual Q&A (safety, advisories, "best time to visit", transit options, pricing norms)
```python
{
"query": user_question, # pass through verbatim
"type": "auto",
"numResults": 5,
"contents": {
"summary": { "query": "Direct answer with current facts and any caveats." },
"maxAgeHours": 1
}
}
```
Gotchas:
- **Force fresh content** (`maxAgeHours: 1` or similar) for safety/visa/advisory queries. Old caches are dangerous here.
- Always have the LLM caveat ("rules change, double check") in its synthesized reply — Exa pulls what's on the web, which may already be wrong.
---
## A reference Python async client
This is the shape I ship in prod. ~70 lines, no external deps beyond `httpx`. Designed to be a singleton with connection pooling.
```python
"""Exa Search API client — async, pooled, with typed errors."""
import logging
from typing import Any
import httpx
logger = logging.getLogger(__name__)
EXA_SEARCH_URL = "https://api.exa.ai/search"
class ExaError(Exception):
"""Raised on transport / HTTP failures. NOT raised when the API key is
missing — that degrades to an empty-result fallback so unconfigured
envs don't surface scary errors to users."""
class ExaClient:
def __init__(self, api_key: str, timeout: float = 25.0):
self._api_key = api_key
self._timeout = timeout
self._http: httpx.AsyncClient | None = None
self._warned_missing_key = False
def _get_http(self) -> httpx.AsyncClient:
if self._http is None or self._http.is_closed:
self._http = httpx.AsyncClient(timeout=self._timeout)
return self._http
async def aclose(self) -> None:
if self._http is not None and not self._http.is_closed:
await self._http.aclose()
self._http = None
async def search(
self,
query: str,
*,
search_type: str = "auto",
num_results: int = 6,
include_domains: list[str] | None = None,
summary_query: str | None = None,
text_max_chars: int | None = None,
max_age_hours: int | None = 168,
) -> list[dict[str, Any]]:
"""Returns shaped results. [] on success-with-no-hits OR missing key.
Raises ExaError on transport/HTTP failures."""
if not self._api_key:
if not self._warned_missing_key:
logger.warning("EXA_API_KEY missing — exa disabled")
self._warned_missing_key = True
return []
contents: dict[str, Any] = {}
if max_age_hours is not None:
contents["maxAgeHours"] = max_age_hours
if summary_query:
contents["summary"] = {"query": summary_query}
if text_max_chars:
contents["text"] = {"maxCharacters": text_max_chars}
payload: dict[str, Any] = {
"query": query,
"type": search_type,
"numResults": num_results,
"contents": contents,
}
if include_domains:
payload["includeDomains"] = include_domains
headers = {"x-api-key": self._api_key, "Content-Type": "application/json"}
try:
resp = await self._get_http().post(EXA_SEARCH_URL, json=payload, headers=headers)
resp.raise_for_status()
data = resp.json()
except httpx.HTTPError as e:
logger.error({"msg": "exa search failed", "err": str(e)})
raise ExaError(f"Exa request failed: {e}") from e
except Exception as e:
logger.error({"msg": "exa unexpected error", "err": str(e)})
raise ExaError(f"Exa unexpected error: {e}") from e
return [
{
"title": r.get("title", ""),
"url": r.get("url", ""),
"summary": r.get("summary", "") or "",
"text": r.get("text", "") or "",
"published_date": r.get("publishedDate", ""),
"score": r.get("score"),
}
for r in data.get("results", [])
]
```
Design decisions explained:
- **Singleton with persistent `httpx.AsyncClient`.** Recreating an `httpx.AsyncClient` per call defeats connection pooling and TLS session reuse. Reuse one client across the process lifetime. Add `aclose()` to your shutdown hook.
- **`ExaError` exists to distinguish transient failures from genuine empty results.** If you collapse both into "returns []", callers will report "no results found" when the actual story is "Exa is down" — terrible UX. The tool above the client should turn `ExaError` into a "service unavailable, please retry" message and `[]` into "we looked, found nothing."
- **Missing API key returns `[]`, not raises.** A missing key is a deploy-time misconfiguration that should not produce user-visible "service unavailable" errors in every request. Log once at warning level and let the feature silently degrade — much better dev/staging ergonomics. (You should still fail loudly at config validation in your bootstrapping, not here.)
- **`max_age_hours: int | None = 168`** is a sensible 7-day default. Override per-tool: 1h for advisories, 24h for events, 720h for reviews.
---
## A shared "tool helper" so your N web tools don't drift
If you wrap Exa with multiple tools (reviews, discovery, events, Q&A), the result-shaping and error-handling code will duplicate fast. Centralize it:
```python
"""Shared helpers for Exa-backed tools."""
from .exa_client import ExaError, get_exa_client
# ToolResult is your own framework type — adapt to your stack.
async def fetch_exa_sources(
*,
query: str,
summary_query: str,
num_results: int,
include_domains: list[str] | None = None,
max_age_hours: int | None = 168,
) -> list[dict[str, str]]:
"""Run an Exa search, return shaped {title, url, summary} sources.
Raises ExaError on transport failures."""
results = await get_exa_client().search(
query=query,
num_results=num_results,
include_domains=include_domains,
summary_query=summary_query,
max_age_hours=max_age_hours,
)
return [
{"title": r.get("title", ""), "url": r.get("url", ""), "summary": r.get("summary", "")}
for r in results
if r.get("summary") or r.get("title")
]
def transient_lookup_failure(tool_name: str, err: Exception) -> ToolResult:
"""Uniform 'lookup unavailable' response for any Exa-backed tool."""
logger.warning({"msg": f"{tool_name} lookup failed", "err": str(err)})
return ToolResult(
success=False,
error="Couldn't reach the web lookup service right now. Please retry in a moment.",
)
```
Each tool then collapses to ~10 lines:
```python
try:
sources = await fetch_exa_sources(
query=...,
summary_query=...,
num_results=6,
max_age_hours=24 * 30,
)
except ExaError as e:
return transient_lookup_failure("get_entity_reviews", e)
return ToolResult(success=True, data={"sources": sources})
```
---
## Tool-description writing tips (for LLM tool-calling)
Exa is only useful if your model actually reaches for it. Three things make a big difference:
1. **Tell the model the failure mode of OTHER tools.** Example: in your `get_entity_reviews` tool description, write *"Use this when the user asks 'how is this place?' — your underlying booking-API search does not return reviews, only ratings."* That nudges the model toward the right tool when it sees both available.
2. **Be explicit about what NOT to use a tool for.** In a general `web_search` tool, list things to defer (other specific tools). This prevents one fat tool from swallowing all queries.
3. **Show example phrases the user would say.** *"Use this when the user asks 'where should I eat in Lisbon', 'rooftop bars in Bangkok with skyline view'..."* Models pattern-match on these.
---
## What NOT to do
- **Don't recreate `httpx.AsyncClient` per request.** Connection pooling matters.
- **Don't return `[]` on every failure.** Distinguish empty-success from transport-failure with a typed exception.
- **Don't default to `type: "neural"` in new code.** Use `"auto"`.
- **Don't pass `livecrawl`** — switched to `maxAgeHours`.
- **Don't promise the LLM that Exa results are authoritative.** Always have the synthesized answer caveat (especially for safety/visa/advisory queries).
- **Don't request `contents.summary` for huge `numResults`** you won't show — every summary is an LLM call on Exa's side.
- **Don't blindly trust dates in the summary text.** Event pages get re-crawled; old events leak in. Post-filter on dates if precision matters.
- **Don't filter `includeDomains` too aggressively** for obscure entities — recall drops to zero. Tight allowlists work for well-known sources (event sites, review aggregators); they hurt for tail content.
---
## Cost / latency reality (rough numbers)
- A search with 6 results + `contents.summary` returns in **3–8 seconds** typically; can spike to 20s+ for cold queries needing live crawls.
- The summary feature dominates cost. If you don't need it, omit it — you get raw URLs and `text` for free-ish.
- For latency-sensitive UX (chat / messaging), warn the user before calling Exa: *"Looking that up, hang on..."* It feels much better than 8 seconds of silence.
---
## Where this guide ends
If you've copied the Python client, the shared helper, and one of the four query patterns into your codebase — you have a working Exa integration. The rest is product judgment: which queries to wrap as dedicated tools vs. fold into a general `web_search`, how aggressively to cache, and how to surface caveats to the end user.
The deprecation pace is what bites most LLM-written integrations. If you're reading this in 2027+, check the Exa changelog before trusting any field name above.
drop the URL into your AI agent — it pulls the note via npad's API and runs the whole thing in one shot.