Build a RAG Pipeline with Keiro API in Under 10 Minutes

A step-by-step tutorial to build a production-ready Retrieval-Augmented Generation pipeline using Keiro's search API and OpenAI, complete with Python code.

10 min readKeiro Team

Introduction

Retrieval-Augmented Generation (RAG) is the most popular pattern for grounding LLM responses with real-world data. Instead of relying solely on the model's training data, a RAG pipeline retrieves relevant information from the web (or a knowledge base) and feeds it to the LLM as context.

In this tutorial, you will build a complete RAG pipeline using Keiro's search API for retrieval and OpenAI's GPT for generation. The entire process takes under 10 minutes.

Architecture Overview

Our RAG pipeline has three stages:

  • Query – The user asks a question
  • Retrieve – Keiro searches the web and returns relevant results
  • Generate – An LLM synthesizes an answer using the retrieved context

This is sometimes called the "naive RAG" pattern, and it is surprisingly effective for most use cases.

Prerequisites

  • Python 3.9+
  • A Keiro API key (sign up at kierolabs.space)
  • An OpenAI API key

Install Dependencies

pip install requests openai

Step 1: Set Up the Retriever

The retriever uses Keiro's /search endpoint to find relevant web content for any query.

import requests

KEIRO_API_KEY = "your-keiro-api-key"
KEIRO_BASE_URL = "https://kierolabs.space/api"

def retrieve(query: str, num_results: int = 5) -> list[dict]:
    """Search the web using Keiro and return relevant results."""
    response = requests.post(f"{KEIRO_BASE_URL}/search", json={
        "apiKey": KEIRO_API_KEY,
        "query": query
    })
    response.raise_for_status()
    results = response.json().get("results", [])
    return results[:num_results]

Each result typically includes a title, url, and snippet or content field that we can use as context.

Step 2: Format the Context

We need to format the retrieved results into a string that the LLM can use as context.

def format_context(results: list[dict]) -> str:
    """Format search results into a context string for the LLM."""
    context_parts = []
    for i, result in enumerate(results, 1):
        title = result.get("title", "Untitled")
        url = result.get("url", "")
        content = result.get("content", result.get("snippet", ""))
        context_parts.append(
            f"Source {i}: {title}\n"
            f"URL: {url}\n"
            f"Content: {content}\n"
        )
    return "\n---\n".join(context_parts)

Step 3: Set Up the Generator

The generator takes the user's question and the retrieved context, then produces a grounded answer.

from openai import OpenAI

client = OpenAI(api_key="your-openai-api-key")

SYSTEM_PROMPT = """You are a helpful assistant that answers questions based on the provided web search results.
Always cite your sources by referencing the source number and URL.
If the search results don't contain enough information to answer the question, say so.
Do not make up information that isn't supported by the sources."""

def generate(query: str, context: str) -> str:
    """Generate an answer using the LLM with retrieved context."""
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": (
                f"Question: {query}\n\n"
                f"Search Results:\n{context}\n\n"
                f"Please answer the question based on the search results above."
            )}
        ],
        temperature=0.3
    )
    return response.choices[0].message.content

Step 4: Put It All Together

def rag_pipeline(query: str) -> dict:
    """Complete RAG pipeline: retrieve then generate."""
    # Step 1: Retrieve relevant documents
    results = retrieve(query)

    if not results:
        return {
            "answer": "I could not find relevant information to answer your question.",
            "sources": []
        }

    # Step 2: Format context
    context = format_context(results)

    # Step 3: Generate answer
    answer = generate(query, context)

    # Step 4: Return answer with sources
    sources = [{"title": r.get("title", ""), "url": r.get("url", "")} for r in results]

    return {
        "answer": answer,
        "sources": sources
    }

# Try it out
if __name__ == "__main__":
    result = rag_pipeline("What are the latest developments in quantum computing in 2026?")
    print("Answer:", result["answer"])
    print("\nSources:")
    for source in result["sources"]:
        print(f"  - {source['title']}: {source['url']}")

Step 5: Enhance with Keiro /search-pro

For higher-quality retrieval, swap /search for /search-pro. This endpoint applies additional re-ranking and filtering to return more relevant results:

def retrieve_pro(query: str, num_results: int = 5) -> list[dict]:
    """Use Keiro's pro search for higher quality retrieval."""
    response = requests.post(f"{KEIRO_BASE_URL}/search-pro", json={
        "apiKey": KEIRO_API_KEY,
        "query": query
    })
    response.raise_for_status()
    results = response.json().get("results", [])
    return results[:num_results]

Step 6: Add Web Page Extraction for Deeper Context

Sometimes search snippets are not enough. Use Keiro's /web-crawler endpoint to get the full content of top results:

def retrieve_with_full_content(query: str, num_pages: int = 3) -> list[dict]:
    """Search and then extract full page content for top results."""
    # First, search
    results = retrieve(query, num_results=num_pages)

    # Then, extract full content for each result
    enriched = []
    for result in results:
        try:
            crawl_resp = requests.post(f"{KEIRO_BASE_URL}/web-crawler", json={
                "apiKey": KEIRO_API_KEY,
                "url": result["url"]
            })
            crawl_resp.raise_for_status()
            full_content = crawl_resp.json().get("content", "")
            result["content"] = full_content[:3000]  # Limit to 3000 chars per source
            enriched.append(result)
        except Exception:
            enriched.append(result)  # Fall back to snippet

    return enriched

The Shortcut: Keiro /answer

If you want to skip the entire pipeline and get a sourced answer in one call, Keiro's /answer endpoint does exactly that:

def quick_answer(query: str) -> dict:
    """Get a sourced answer in a single API call."""
    response = requests.post(f"{KEIRO_BASE_URL}/answer", json={
        "apiKey": KEIRO_API_KEY,
        "query": query
    })
    response.raise_for_status()
    data = response.json()
    return {
        "answer": data.get("response", ""),
        "sources": data.get("sources", [])
    }

This is perfect for prototyping or use cases where you do not need to customize the generation step.

Production Tips

  • Cache repeated queries: Keiro gives you a 50% discount on cached results automatically, so repeated queries are cheap.
  • Use batch processing: If you need to pre-populate answers for a set of FAQs, use Keiro's free /batch-search endpoint.
  • Set a timeout: Always set a reasonable timeout (5-10 seconds) on your HTTP requests to handle edge cases gracefully.
  • Limit context length: Do not send more than 10,000 tokens of context to your LLM. Trim results if needed.
  • Log sources: Always store which sources were used for each answer. This is essential for debugging and compliance.

Cost Analysis

Here is what this RAG pipeline costs per query:

ComponentCost
Keiro /search (Pro plan)~$0.000125
OpenAI GPT-4o (500 input + 300 output tokens)~$0.004
Total per query~$0.004

At this cost, you can process 250,000 RAG queries per month for about $1,000 in total (mostly OpenAI costs). The search component is negligible.

Conclusion

You now have a working RAG pipeline that can answer any question using real-time web data. The combination of Keiro's affordable search API and an LLM gives you a powerful foundation for building AI applications that stay current and cite their sources.

From here, you can extend the pipeline with conversation memory, user personalization, and more sophisticated retrieval strategies. But even this basic pipeline is production-ready for many use cases.

Get your Keiro API key at kierolabs.space and start building your RAG pipeline today.

Ready to build something?

Join developers using Keiro — 10× cheaper with superior performance.

Get started