● public · updated 77 days ago · id 9ygd3z69pz ·14,948 chars
How to integrate Exa search for web search into LLM agents

● How to integrate Exa search for web search into LLM agents.md 14,948 chars · read-only
# How to integrate Exa search for web search into LLM agents

A no-fluff guide to integrating Exa (`api.exa.ai`) as a web-search backend for LLM agents and tools. Written after shipping four production tools on top of it (entity reviews, local discovery, events, general Q&A). If you're an agent reading this to one-shot an Exa integration, you should be able to copy the patterns below and have it working in under an hour.

---

## What Exa is, in one paragraph

Exa is a neural + keyword web search API that returns ranked URLs plus extracted page content. Two things make it useful for LLM agents over plain Google: (1) you can ask for **semantically similar** results (`type=neural` / `type=auto`), so "rooftop bars with skyline views" returns rooftop bars instead of pages that match those keywords; (2) you can ask Exa to **summarize each result against a question** server-side (`contents.summary.query`), so you get clean structured text per source instead of raw HTML to parse yourself.

That second feature is the killer one for agents. You're not sending the LLM 6 random web pages — you're sending it 6 short summaries already focused on what the user asked.

---

## Endpoint, auth, payload shape

```
POST https://api.exa.ai/search
Headers:
  x-api-key: <your key>
  Content-Type: application/json
```

Minimum payload:

```json
{
  "query": "rooftop bars with skyline views in Bangkok",
  "type": "auto",
  "numResults": 6,
  "contents": {
    "summary": { "query": "Name, neighborhood, what makes it special, price range." },
    "maxAgeHours": 168
  }
}
```

Response:

```json
{
  "results": [
    {
      "title": "...",
      "url": "...",
      "summary": "...",
      "text": "...",
      "publishedDate": "...",
      "score": 0.91
    }
  ]
}
```

---

## Things your training data is wrong about (2026 reality)

LLMs frequently emit outdated Exa code. Verify against current docs but here's what's changed:

1. **`type: "neural"` is deprecated for new code.** Use `type: "auto"` — Exa decides per query whether neural or keyword retrieval works better.
2. **`livecrawl: "preferred"` is deprecated.** The modern freshness control is `contents.maxAgeHours` — a hard upper bound on content age. Set it low (1–24) for time-sensitive queries, high (720+) for stable content.
3. **`includeDomains` is still the right way to constrain sources** when you have strong reason to. Over-filtering kills recall for less-famous entities.
4. **Summaries are not free.** `contents.summary` runs an LLM call per result on Exa's side. Don't request it for huge `numResults` you won't use.

---

## The four query patterns I keep reusing

### A. Reviews / opinions of a named entity (a hotel, restaurant, product)

```python
{
  "query": f'Guest reviews and honest opinions of "{name}" in {location}',
  "type": "auto",
  "numResults": 6,
  "contents": {
    "summary": {
      "query": f"What do real users say about {name}? Pros, cons, service, value."
    },
    "maxAgeHours": 24 * 30
  }
}
```

Gotchas:
- **Quote the entity name.** Without quotes, Exa neural drifts to similar entities (a search for a famous-but-not-unique hotel name returns three other hotels of the same chain worldwide).
- **Don't over-filter `includeDomains`** for obscure entities. Better to let Exa pull what it can and trust the summary to flag mismatches — the LLM-generated summary will often say "the content is about X, not Y", which is honest and recoverable.
- 30-day `maxAgeHours` for reviews — sentiment doesn't change overnight, and you want a meaningful sample.

### B. Local discovery (restaurants, cafes, bars, viewpoints near a city/landmark)

```python
{
  "query": f"{intent} in {city} near {landmark}",  # e.g. "rooftop bars in Bangkok near Sukhumvit"
  "type": "auto",
  "numResults": 6,
  "contents": {
    "summary": { "query": "Name, neighborhood, what makes it special, price range." },
    "maxAgeHours": 168
  }
}
```

Gotchas:
- **Be specific with intent in the query.** "Best vegetarian restaurants" beats "restaurants" because Exa's neural retrieval keys on intent.
- **Don't promise live hours/availability.** Exa returns what's on web pages; opening hours can be stale.
- 7-day window catches most active blogs and city guides.

### C. Events in a date window

```python
{
  "query": f"{interests} concerts festivals events in {city} between {start_date} and {end_date}",
  "type": "auto",
  "numResults": 8,
  "includeDomains": ["eventbrite.com","timeout.com","songkick.com","bandsintown.com","residentadvisor.net"],
  "contents": {
    "summary": { "query": "Event name, date, venue, what is it about." },
    "maxAgeHours": 24
  }
}
```

Gotchas:
- **Include date strings in the query, don't rely on `maxAgeHours` to filter.** A 24h window means the *page* was crawled recently — not that the *event* is in your date range. Aggregator pages updated yesterday can describe years-old festivals.
- **Post-filter by year/month** in the returned summary text. Or tell the LLM explicitly to verify dates before quoting.
- Domain allowlist works well here because event aggregators are well-known.

### D. General factual Q&A (safety, advisories, "best time to visit", transit options, pricing norms)

```python
{
  "query": user_question,  # pass through verbatim
  "type": "auto",
  "numResults": 5,
  "contents": {
    "summary": { "query": "Direct answer with current facts and any caveats." },
    "maxAgeHours": 1
  }
}
```

Gotchas:
- **Force fresh content** (`maxAgeHours: 1` or similar) for safety/visa/advisory queries. Old caches are dangerous here.
- Always have the LLM caveat ("rules change, double check") in its synthesized reply — Exa pulls what's on the web, which may already be wrong.

---

## A reference Python async client

This is the shape I ship in prod. ~70 lines, no external deps beyond `httpx`. Designed to be a singleton with connection pooling.

```python
"""Exa Search API client — async, pooled, with typed errors."""

import logging
from typing import Any

import httpx

logger = logging.getLogger(__name__)

EXA_SEARCH_URL = "https://api.exa.ai/search"


class ExaError(Exception):
    """Raised on transport / HTTP failures. NOT raised when the API key is
    missing — that degrades to an empty-result fallback so unconfigured
    envs don't surface scary errors to users."""


class ExaClient:
    def __init__(self, api_key: str, timeout: float = 25.0):
        self._api_key = api_key
        self._timeout = timeout
        self._http: httpx.AsyncClient | None = None
        self._warned_missing_key = False

    def _get_http(self) -> httpx.AsyncClient:
        if self._http is None or self._http.is_closed:
            self._http = httpx.AsyncClient(timeout=self._timeout)
        return self._http

    async def aclose(self) -> None:
        if self._http is not None and not self._http.is_closed:
            await self._http.aclose()
        self._http = None

    async def search(
        self,
        query: str,
        *,
        search_type: str = "auto",
        num_results: int = 6,
        include_domains: list[str] | None = None,
        summary_query: str | None = None,
        text_max_chars: int | None = None,
        max_age_hours: int | None = 168,
    ) -> list[dict[str, Any]]:
        """Returns shaped results. [] on success-with-no-hits OR missing key.
        Raises ExaError on transport/HTTP failures."""
        if not self._api_key:
            if not self._warned_missing_key:
                logger.warning("EXA_API_KEY missing — exa disabled")
                self._warned_missing_key = True
            return []

        contents: dict[str, Any] = {}
        if max_age_hours is not None:
            contents["maxAgeHours"] = max_age_hours
        if summary_query:
            contents["summary"] = {"query": summary_query}
        if text_max_chars:
            contents["text"] = {"maxCharacters": text_max_chars}

        payload: dict[str, Any] = {
            "query": query,
            "type": search_type,
            "numResults": num_results,
            "contents": contents,
        }
        if include_domains:
            payload["includeDomains"] = include_domains

        headers = {"x-api-key": self._api_key, "Content-Type": "application/json"}

        try:
            resp = await self._get_http().post(EXA_SEARCH_URL, json=payload, headers=headers)
            resp.raise_for_status()
            data = resp.json()
        except httpx.HTTPError as e:
            logger.error({"msg": "exa search failed", "err": str(e)})
            raise ExaError(f"Exa request failed: {e}") from e
        except Exception as e:
            logger.error({"msg": "exa unexpected error", "err": str(e)})
            raise ExaError(f"Exa unexpected error: {e}") from e

        return [
            {
                "title": r.get("title", ""),
                "url": r.get("url", ""),
                "summary": r.get("summary", "") or "",
                "text": r.get("text", "") or "",
                "published_date": r.get("publishedDate", ""),
                "score": r.get("score"),
            }
            for r in data.get("results", [])
        ]
```

Design decisions explained:

- **Singleton with persistent `httpx.AsyncClient`.** Recreating an `httpx.AsyncClient` per call defeats connection pooling and TLS session reuse. Reuse one client across the process lifetime. Add `aclose()` to your shutdown hook.
- **`ExaError` exists to distinguish transient failures from genuine empty results.** If you collapse both into "returns []", callers will report "no results found" when the actual story is "Exa is down" — terrible UX. The tool above the client should turn `ExaError` into a "service unavailable, please retry" message and `[]` into "we looked, found nothing."
- **Missing API key returns `[]`, not raises.** A missing key is a deploy-time misconfiguration that should not produce user-visible "service unavailable" errors in every request. Log once at warning level and let the feature silently degrade — much better dev/staging ergonomics. (You should still fail loudly at config validation in your bootstrapping, not here.)
- **`max_age_hours: int | None = 168`** is a sensible 7-day default. Override per-tool: 1h for advisories, 24h for events, 720h for reviews.

---

## A shared "tool helper" so your N web tools don't drift

If you wrap Exa with multiple tools (reviews, discovery, events, Q&A), the result-shaping and error-handling code will duplicate fast. Centralize it:

```python
"""Shared helpers for Exa-backed tools."""

from .exa_client import ExaError, get_exa_client
# ToolResult is your own framework type — adapt to your stack.


async def fetch_exa_sources(
    *,
    query: str,
    summary_query: str,
    num_results: int,
    include_domains: list[str] | None = None,
    max_age_hours: int | None = 168,
) -> list[dict[str, str]]:
    """Run an Exa search, return shaped {title, url, summary} sources.
    Raises ExaError on transport failures."""
    results = await get_exa_client().search(
        query=query,
        num_results=num_results,
        include_domains=include_domains,
        summary_query=summary_query,
        max_age_hours=max_age_hours,
    )
    return [
        {"title": r.get("title", ""), "url": r.get("url", ""), "summary": r.get("summary", "")}
        for r in results
        if r.get("summary") or r.get("title")
    ]


def transient_lookup_failure(tool_name: str, err: Exception) -> ToolResult:
    """Uniform 'lookup unavailable' response for any Exa-backed tool."""
    logger.warning({"msg": f"{tool_name} lookup failed", "err": str(err)})
    return ToolResult(
        success=False,
        error="Couldn't reach the web lookup service right now. Please retry in a moment.",
    )
```

Each tool then collapses to ~10 lines:

```python
try:
    sources = await fetch_exa_sources(
        query=...,
        summary_query=...,
        num_results=6,
        max_age_hours=24 * 30,
    )
except ExaError as e:
    return transient_lookup_failure("get_entity_reviews", e)

return ToolResult(success=True, data={"sources": sources})
```

---

## Tool-description writing tips (for LLM tool-calling)

Exa is only useful if your model actually reaches for it. Three things make a big difference:

1. **Tell the model the failure mode of OTHER tools.** Example: in your `get_entity_reviews` tool description, write *"Use this when the user asks 'how is this place?' — your underlying booking-API search does not return reviews, only ratings."* That nudges the model toward the right tool when it sees both available.
2. **Be explicit about what NOT to use a tool for.** In a general `web_search` tool, list things to defer (other specific tools). This prevents one fat tool from swallowing all queries.
3. **Show example phrases the user would say.** *"Use this when the user asks 'where should I eat in Lisbon', 'rooftop bars in Bangkok with skyline view'..."* Models pattern-match on these.

---

## What NOT to do

- **Don't recreate `httpx.AsyncClient` per request.** Connection pooling matters.
- **Don't return `[]` on every failure.** Distinguish empty-success from transport-failure with a typed exception.
- **Don't default to `type: "neural"` in new code.** Use `"auto"`.
- **Don't pass `livecrawl`** — switched to `maxAgeHours`.
- **Don't promise the LLM that Exa results are authoritative.** Always have the synthesized answer caveat (especially for safety/visa/advisory queries).
- **Don't request `contents.summary` for huge `numResults`** you won't show — every summary is an LLM call on Exa's side.
- **Don't blindly trust dates in the summary text.** Event pages get re-crawled; old events leak in. Post-filter on dates if precision matters.
- **Don't filter `includeDomains` too aggressively** for obscure entities — recall drops to zero. Tight allowlists work for well-known sources (event sites, review aggregators); they hurt for tail content.

---

## Cost / latency reality (rough numbers)

- A search with 6 results + `contents.summary` returns in **3–8 seconds** typically; can spike to 20s+ for cold queries needing live crawls.
- The summary feature dominates cost. If you don't need it, omit it — you get raw URLs and `text` for free-ish.
- For latency-sensitive UX (chat / messaging), warn the user before calling Exa: *"Looking that up, hang on..."* It feels much better than 8 seconds of silence.

---

## Where this guide ends

If you've copied the Python client, the shared helper, and one of the four query patterns into your codebase — you have a working Exa integration. The rest is product judgment: which queries to wrap as dedicated tools vs. fold into a general `web_search`, how aggressively to cache, and how to surface caveats to the end user.

The deprecation pace is what bites most LLM-written integrations. If you're reading this in 2027+, check the Exa changelog before trusting any field name above.
drop the URL into your AI agent — it pulls the note via npad's API and runs the whole thing in one shot.
How to integrate Exa search for web search into LLM agents

How to send emails using Gmail programmatically

Free Claude Code in 3 minutes: claude-free wrapper (NVIDIA NIM + Kimi-K2)

How to handle IVR in voice AI calls with Ultravox (DTMF that actually works)

Next.js 16 standalone env-var traps: NODE_ENV lies, NEXT_PUBLIC_* is build-time, use APP_ENV instead