How to Add Web Search to Your LLM Application with Keiro

Learn how to integrate real-time web search into any LLM application using the Keiro API. Step-by-step guide with Python and JavaScript examples.

9 min readKeiro Team

Introduction

Large language models are powerful but limited by their training data cutoff. When users ask about current events, recent product launches, or anything that happened after training, the model either hallucinates or admits ignorance. The solution is straightforward: give your LLM access to the web.

In this guide, we walk through how to integrate Keiro's search API into any LLM application, whether you are building a chatbot, a coding assistant, or a research tool.

Why Web Search Matters for LLMs

  • Freshness: LLMs have a training cutoff. Web search provides current information.
  • Accuracy: Grounding responses in real sources reduces hallucination.
  • Citations: Users trust answers more when they can verify sources.
  • Specificity: Web data provides details that may not be in the model's training set.

The Basic Pattern

The integration pattern is simple:

  • Detect when a query needs web data (or always search)
  • Call Keiro's search API
  • Include the search results in the LLM's context
  • Generate a response with citations

Python Implementation

Basic Search Integration

import requests
from openai import OpenAI

KEIRO_API_KEY = "your-keiro-api-key"
openai_client = OpenAI(api_key="your-openai-api-key")

def search_and_respond(user_message: str) -> str:
    # Search the web
    search_resp = requests.post("https://kierolabs.space/api/search", json={
        "apiKey": KEIRO_API_KEY,
        "query": user_message
    })
    results = search_resp.json().get("results", [])

    # Format context
    context = ""
    for i, r in enumerate(results[:5], 1):
        context += f"[{i}] {r.get('title', '')}: {r.get('content', r.get('snippet', ''))}\n"
        context += f"    URL: {r.get('url', '')}\n\n"

    # Generate response with context
    response = openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": (
                "Answer the user's question using the provided search results. "
                "Cite sources using [1], [2], etc. Be concise and accurate."
            )},
            {"role": "user", "content": f"Search Results:\n{context}\nQuestion: {user_message}"}
        ]
    )
    return response.choices[0].message.content

# Usage
answer = search_and_respond("What is the current state of nuclear fusion research?")
print(answer)

Smart Search: Only Search When Needed

Not every user message needs a web search. Use a simple classifier to decide:

def needs_web_search(message: str) -> bool:
    """Use the LLM to determine if a query needs fresh web data."""
    response = openai_client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[
            {"role": "system", "content": (
                "Determine if this user message requires current web data to answer accurately. "
                "Respond with only 'yes' or 'no'."
            )},
            {"role": "user", "content": message}
        ],
        max_tokens=3
    )
    return response.choices[0].message.content.strip().lower() == "yes"

def smart_respond(user_message: str) -> str:
    if needs_web_search(user_message):
        return search_and_respond(user_message)
    else:
        response = openai_client.chat.completions.create(
            model="gpt-4o",
            messages=[{"role": "user", "content": user_message}]
        )
        return response.choices[0].message.content

JavaScript/TypeScript Implementation

const KEIRO_API_KEY = "your-keiro-api-key";

async function searchAndRespond(userMessage: string): Promise<string> {
  // Step 1: Search with Keiro
  const searchResp = await fetch("https://kierolabs.space/api/search", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      apiKey: KEIRO_API_KEY,
      query: userMessage
    })
  });
  const searchData = await searchResp.json();
  const results = searchData.results || [];

  // Step 2: Format context
  const context = results.slice(0, 5).map((r: any, i: number) =>
    `[${i + 1}] ${r.title}: ${r.content || r.snippet}\n    URL: ${r.url}`
  ).join("\n\n");

  // Step 3: Generate with your LLM of choice
  const llmResp = await fetch("https://api.openai.com/v1/chat/completions", {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
      "Authorization": "Bearer your-openai-api-key"
    },
    body: JSON.stringify({
      model: "gpt-4o",
      messages: [
        { role: "system", content: "Answer using the search results. Cite sources with [1], [2], etc." },
        { role: "user", content: `Search Results:\n${context}\n\nQuestion: ${userMessage}` }
      ]
    })
  });
  const llmData = await llmResp.json();
  return llmData.choices[0].message.content;
}

Using Keiro /answer for a One-Call Solution

If you want the simplest possible integration, Keiro's /answer endpoint combines search and generation in a single call:

async function quickAnswer(question: string): Promise<any> {
  const resp = await fetch("https://kierolabs.space/api/answer", {
    method: "POST",
    headers: { "Content-Type": "application/json" },
    body: JSON.stringify({
      apiKey: KEIRO_API_KEY,
      query: question
    })
  });
  return resp.json();
}

// Returns { response: "...", sources: [...] }
const result = await quickAnswer("What are the new features in React 20?");
console.log(result.response);

Streaming Integration

For chat applications, you want to show search results first, then stream the LLM response. Here is the pattern:

async def stream_search_and_respond(user_message: str):
    # Step 1: Search (fast, 400-600ms)
    results = retrieve(user_message)

    # Step 2: Yield search results immediately
    yield {"type": "sources", "data": [{"title": r["title"], "url": r["url"]} for r in results]}

    # Step 3: Stream the LLM response
    context = format_context(results)
    stream = openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": "Answer using the search results. Cite sources."},
            {"role": "user", "content": f"Search Results:\n{context}\n\nQuestion: {user_message}"}
        ],
        stream=True
    )

    for chunk in stream:
        if chunk.choices[0].delta.content:
            yield {"type": "text", "data": chunk.choices[0].delta.content}

Best Practices

  • Limit context size: Send at most 5 search results with truncated content to stay within token limits.
  • Use /search-pro for important queries: When accuracy matters more than speed, use the pro endpoint.
  • Cache results: Keiro automatically gives a 50% discount on cached queries, making repeated searches very cheap.
  • Handle errors gracefully: If search fails, fall back to the LLM's knowledge and indicate that results may not be current.
  • Always cite sources: Include URLs in your system prompt instructions so the LLM can reference them.

Conclusion

Adding web search to your LLM application is one of the highest-impact improvements you can make. It grounds your AI in reality, reduces hallucinations, and gives users verifiable answers. With Keiro's simple API and affordable pricing, there is no reason not to integrate it today.

Get started at kierolabs.space — plans start at $5.99/month for 10,000 requests.

Ready to build something?

Join developers using Keiro — 10× cheaper with superior performance.

Get started