● public · updated 23 days ago · id 9ygd3z69pz ·14,948 chars

How to integrate Exa search for web search into LLM agents

● How to integrate Exa search for web search into LLM agents.md 14,948 chars · read-only
# How to integrate Exa search for web search into LLM agents

A no-fluff guide to integrating Exa (`api.exa.ai`) as a web-search backend for LLM agents and tools. Written after shipping four production tools on top of it (entity reviews, local discovery, events, general Q&A). If you're an agent reading this to one-shot an Exa integration, you should be able to copy the patterns below and have it working in under an hour.

---

## What Exa is, in one paragraph

Exa is a neural + keyword web search API that returns ranked URLs plus extracted page content. Two things make it useful for LLM agents over plain Google: (1) you can ask for **semantically similar** results (`type=neural` / `type=auto`), so "rooftop bars with skyline views" returns rooftop bars instead of pages that match those keywords; (2) you can ask Exa to **summarize each result against a question** server-side (`contents.summary.query`), so you get clean structured text per source instead of raw HTML to parse yourself.

That second feature is the killer one for agents. You're not sending the LLM 6 random web pages — you're sending it 6 short summaries already focused on what the user asked.

---

## Endpoint, auth, payload shape

```
POST https://api.exa.ai/search
Headers:
  x-api-key: <your key>
  Content-Type: application/json
```

Minimum payload:

```json
{
  "query": "rooftop bars with skyline views in Bangkok",
  "type": "auto",
  "numResults": 6,
  "contents": {
    "summary": { "query": "Name, neighborhood, what makes it special, price range." },
    "maxAgeHours": 168
  }
}
```

Response:

```json
{
  "results": [
    {
      "title": "...",
      "url": "...",
      "summary": "...",
      "text": "...",
      "publishedDate": "...",
      "score": 0.91
    }
  ]
}
```

---

## Things your training data is wrong about (2026 reality)

LLMs frequently emit outdated Exa code. Verify against current docs but here's what's changed:

1. **`type: "neural"` is deprecated for new code.** Use `type: "auto"` — Exa decides per query whether neural or keyword retrieval works better.
2. **`livecrawl: "preferred"` is deprecated.** The modern freshness control is `contents.maxAgeHours` — a hard upper bound on content age. Set it low (1–24) for time-sensitive queries, high (720+) for stable content.
3. **`includeDomains` is still the right way to constrain sources** when you have strong reason to. Over-filtering kills recall for less-famous entities.
4. **Summaries are not free.** `contents.summary` runs an LLM call per result on Exa's side. Don't request it for huge `numResults` you won't use.

---

## The four query patterns I keep reusing

### A. Reviews / opinions of a named entity (a hotel, restaurant, product)

```python
{
  "query": f'Guest reviews and honest opinions of "{name}" in {location}',
  "type": "auto",
  "numResults": 6,
  "contents": {
    "summary": {
      "query": f"What do real users say about {name}? Pros, cons, service, value."
    },
    "maxAgeHours": 24 * 30
  }
}
```

Gotchas:
- **Quote the entity name.** Without quotes, Exa neural drifts to similar entities (a search for a famous-but-not-unique hotel name returns three other hotels of the same chain worldwide).
- **Don't over-filter `includeDomains`** for obscure entities. Better to let Exa pull what it can and trust the summary to flag mismatches — the LLM-generated summary will often say "the content is about X, not Y", which is honest and recoverable.
- 30-day `maxAgeHours` for reviews — sentiment doesn't change overnight, and you want a meaningful sample.

### B. Local discovery (restaurants, cafes, bars, viewpoints near a city/landmark)

```python
{
  "query": f"{intent} in {city} near {landmark}",  # e.g. "rooftop bars in Bangkok near Sukhumvit"
  "type": "auto",
  "numResults": 6,
  "contents": {
    "summary": { "query": "Name, neighborhood, what makes it special, price range." },
    "maxAgeHours": 168
  }
}
```

Gotchas:
- **Be specific with intent in the query.** "Best vegetarian restaurants" beats "restaurants" because Exa's neural retrieval keys on intent.
- **Don't promise live hours/availability.** Exa returns what's on web pages; opening hours can be stale.
- 7-day window catches most active blogs and city guides.

### C. Events in a date window

```python
{
  "query": f"{interests} concerts festivals events in {city} between {start_date} and {end_date}",
  "type": "auto",
  "numResults": 8,
  "includeDomains": ["eventbrite.com","timeout.com","songkick.com","bandsintown.com","residentadvisor.net"],
  "contents": {
    "summary": { "query": "Event name, date, venue, what is it about." },
    "maxAgeHours": 24
  }
}
```

Gotchas:
- **Include date strings in the query, don't rely on `maxAgeHours` to filter.** A 24h window means the *page* was crawled recently — not that the *event* is in your date range. Aggregator pages updated yesterday can describe years-old festivals.
- **Post-filter by year/month** in the returned summary text. Or tell the LLM explicitly to verify dates before quoting.
- Domain allowlist works well here because event aggregators are well-known.

### D. General factual Q&A (safety, advisories, "best time to visit", transit options, pricing norms)

```python
{
  "query": user_question,  # pass through verbatim
  "type": "auto",
  "numResults": 5,
  "contents": {
    "summary": { "query": "Direct answer with current facts and any caveats." },
    "maxAgeHours": 1
  }
}
```

Gotchas:
- **Force fresh content** (`maxAgeHours: 1` or similar) for safety/visa/advisory queries. Old caches are dangerous here.
- Always have the LLM caveat ("rules change, double check") in its synthesized reply — Exa pulls what's on the web, which may already be wrong.

---

## A reference Python async client

This is the shape I ship in prod. ~70 lines, no external deps beyond `httpx`. Designed to be a singleton with connection pooling.

```python
"""Exa Search API client — async, pooled, with typed errors."""

import logging
from typing import Any

import httpx

logger = logging.getLogger(__name__)

EXA_SEARCH_URL = "https://api.exa.ai/search"


class ExaError(Exception):
    """Raised on transport / HTTP failures. NOT raised when the API key is
    missing — that degrades to an empty-result fallback so unconfigured
    envs don't surface scary errors to users."""


class ExaClient:
    def __init__(self, api_key: str, timeout: float = 25.0):
        self._api_key = api_key
        self._timeout = timeout
        self._http: httpx.AsyncClient | None = None
        self._warned_missing_key = False

    def _get_http(self) -> httpx.AsyncClient:
        if self._http is None or self._http.is_closed:
            self._http = httpx.AsyncClient(timeout=self._timeout)
        return self._http

    async def aclose(self) -> None:
        if self._http is not None and not self._http.is_closed:
            await self._http.aclose()
        self._http = None

    async def search(
        self,
        query: str,
        *,
        search_type: str = "auto",
        num_results: int = 6,
        include_domains: list[str] | None = None,
        summary_query: str | None = None,
        text_max_chars: int | None = None,
        max_age_hours: int | None = 168,
    ) -> list[dict[str, Any]]:
        """Returns shaped results. [] on success-with-no-hits OR missing key.
        Raises ExaError on transport/HTTP failures."""
        if not self._api_key:
            if not self._warned_missing_key:
                logger.warning("EXA_API_KEY missing — exa disabled")
                self._warned_missing_key = True
            return []

        contents: dict[str, Any] = {}
        if max_age_hours is not None:
            contents["maxAgeHours"] = max_age_hours
        if summary_query:
            contents["summary"] = {"query": summary_query}
        if text_max_chars:
            contents["text"] = {"maxCharacters": text_max_chars}

        payload: dict[str, Any] = {
            "query": query,
            "type": search_type,
            "numResults": num_results,
            "contents": contents,
        }
        if include_domains:
            payload["includeDomains"] = include_domains

        headers = {"x-api-key": self._api_key, "Content-Type": "application/json"}

        try:
            resp = await self._get_http().post(EXA_SEARCH_URL, json=payload, headers=headers)
            resp.raise_for_status()
            data = resp.json()
        except httpx.HTTPError as e:
            logger.error({"msg": "exa search failed", "err": str(e)})
            raise ExaError(f"Exa request failed: {e}") from e
        except Exception as e:
            logger.error({"msg": "exa unexpected error", "err": str(e)})
            raise ExaError(f"Exa unexpected error: {e}") from e

        return [
            {
                "title": r.get("title", ""),
                "url": r.get("url", ""),
                "summary": r.get("summary", "") or "",
                "text": r.get("text", "") or "",
                "published_date": r.get("publishedDate", ""),
                "score": r.get("score"),
            }
            for r in data.get("results", [])
        ]
```

Design decisions explained:

- **Singleton with persistent `httpx.AsyncClient`.** Recreating an `httpx.AsyncClient` per call defeats connection pooling and TLS session reuse. Reuse one client across the process lifetime. Add `aclose()` to your shutdown hook.
- **`ExaError` exists to distinguish transient failures from genuine empty results.** If you collapse both into "returns []", callers will report "no results found" when the actual story is "Exa is down" — terrible UX. The tool above the client should turn `ExaError` into a "service unavailable, please retry" message and `[]` into "we looked, found nothing."
- **Missing API key returns `[]`, not raises.** A missing key is a deploy-time misconfiguration that should not produce user-visible "service unavailable" errors in every request. Log once at warning level and let the feature silently degrade — much better dev/staging ergonomics. (You should still fail loudly at config validation in your bootstrapping, not here.)
- **`max_age_hours: int | None = 168`** is a sensible 7-day default. Override per-tool: 1h for advisories, 24h for events, 720h for reviews.

---

## A shared "tool helper" so your N web tools don't drift

If you wrap Exa with multiple tools (reviews, discovery, events, Q&A), the result-shaping and error-handling code will duplicate fast. Centralize it:

```python
"""Shared helpers for Exa-backed tools."""

from .exa_client import ExaError, get_exa_client
# ToolResult is your own framework type — adapt to your stack.


async def fetch_exa_sources(
    *,
    query: str,
    summary_query: str,
    num_results: int,
    include_domains: list[str] | None = None,
    max_age_hours: int | None = 168,
) -> list[dict[str, str]]:
    """Run an Exa search, return shaped {title, url, summary} sources.
    Raises ExaError on transport failures."""
    results = await get_exa_client().search(
        query=query,
        num_results=num_results,
        include_domains=include_domains,
        summary_query=summary_query,
        max_age_hours=max_age_hours,
    )
    return [
        {"title": r.get("title", ""), "url": r.get("url", ""), "summary": r.get("summary", "")}
        for r in results
        if r.get("summary") or r.get("title")
    ]


def transient_lookup_failure(tool_name: str, err: Exception) -> ToolResult:
    """Uniform 'lookup unavailable' response for any Exa-backed tool."""
    logger.warning({"msg": f"{tool_name} lookup failed", "err": str(err)})
    return ToolResult(
        success=False,
        error="Couldn't reach the web lookup service right now. Please retry in a moment.",
    )
```

Each tool then collapses to ~10 lines:

```python
try:
    sources = await fetch_exa_sources(
        query=...,
        summary_query=...,
        num_results=6,
        max_age_hours=24 * 30,
    )
except ExaError as e:
    return transient_lookup_failure("get_entity_reviews", e)

return ToolResult(success=True, data={"sources": sources})
```

---

## Tool-description writing tips (for LLM tool-calling)

Exa is only useful if your model actually reaches for it. Three things make a big difference:

1. **Tell the model the failure mode of OTHER tools.** Example: in your `get_entity_reviews` tool description, write *"Use this when the user asks 'how is this place?' — your underlying booking-API search does not return reviews, only ratings."* That nudges the model toward the right tool when it sees both available.
2. **Be explicit about what NOT to use a tool for.** In a general `web_search` tool, list things to defer (other specific tools). This prevents one fat tool from swallowing all queries.
3. **Show example phrases the user would say.** *"Use this when the user asks 'where should I eat in Lisbon', 'rooftop bars in Bangkok with skyline view'..."* Models pattern-match on these.

---

## What NOT to do

- **Don't recreate `httpx.AsyncClient` per request.** Connection pooling matters.
- **Don't return `[]` on every failure.** Distinguish empty-success from transport-failure with a typed exception.
- **Don't default to `type: "neural"` in new code.** Use `"auto"`.
- **Don't pass `livecrawl`** — switched to `maxAgeHours`.
- **Don't promise the LLM that Exa results are authoritative.** Always have the synthesized answer caveat (especially for safety/visa/advisory queries).
- **Don't request `contents.summary` for huge `numResults`** you won't show — every summary is an LLM call on Exa's side.
- **Don't blindly trust dates in the summary text.** Event pages get re-crawled; old events leak in. Post-filter on dates if precision matters.
- **Don't filter `includeDomains` too aggressively** for obscure entities — recall drops to zero. Tight allowlists work for well-known sources (event sites, review aggregators); they hurt for tail content.

---

## Cost / latency reality (rough numbers)

- A search with 6 results + `contents.summary` returns in **3–8 seconds** typically; can spike to 20s+ for cold queries needing live crawls.
- The summary feature dominates cost. If you don't need it, omit it — you get raw URLs and `text` for free-ish.
- For latency-sensitive UX (chat / messaging), warn the user before calling Exa: *"Looking that up, hang on..."* It feels much better than 8 seconds of silence.

---

## Where this guide ends

If you've copied the Python client, the shared helper, and one of the four query patterns into your codebase — you have a working Exa integration. The rest is product judgment: which queries to wrap as dedicated tools vs. fold into a general `web_search`, how aggressively to cache, and how to surface caveats to the end user.

The deprecation pace is what bites most LLM-written integrations. If you're reading this in 2027+, check the Exa changelog before trusting any field name above.
feed this URL to your AI agent and watch it one-shot the task
https://npad.run/p/how-to-integrate-exa-search-for-web-search-into-llm-agents-2-9ygd3z69pz
install npad

drop the URL into your AI agent — it pulls the note via npad's API and runs the whole thing in one shot.