Introduction
Retrieval-Augmented Generation (RAG) is the most popular pattern for grounding LLM responses with real-world data. Instead of relying solely on the model's training data, a RAG pipeline retrieves relevant information from the web (or a knowledge base) and feeds it to the LLM as context.
In this tutorial, you will build a complete RAG pipeline using Keiro's search API for retrieval and OpenAI's GPT for generation. The entire process takes under 10 minutes.
Architecture Overview
Our RAG pipeline has three stages:
- Query – The user asks a question
- Retrieve – Keiro searches the web and returns relevant results
- Generate – An LLM synthesizes an answer using the retrieved context
This is sometimes called the "naive RAG" pattern, and it is surprisingly effective for most use cases.
Prerequisites
- Python 3.9+
- A Keiro API key (sign up at kierolabs.space)
- An OpenAI API key
Install Dependencies
pip install requests openai
Step 1: Set Up the Retriever
The retriever uses Keiro's /search endpoint to find relevant web content for any query.
import requests
KEIRO_API_KEY = "your-keiro-api-key"
KEIRO_BASE_URL = "https://kierolabs.space/api"
def retrieve(query: str, num_results: int = 5) -> list[dict]:
"""Search the web using Keiro and return relevant results."""
response = requests.post(f"{KEIRO_BASE_URL}/search", json={
"apiKey": KEIRO_API_KEY,
"query": query
})
response.raise_for_status()
results = response.json().get("results", [])
return results[:num_results]
Each result typically includes a title, url, and snippet or content field that we can use as context.
Step 2: Format the Context
We need to format the retrieved results into a string that the LLM can use as context.
def format_context(results: list[dict]) -> str:
"""Format search results into a context string for the LLM."""
context_parts = []
for i, result in enumerate(results, 1):
title = result.get("title", "Untitled")
url = result.get("url", "")
content = result.get("content", result.get("snippet", ""))
context_parts.append(
f"Source {i}: {title}\n"
f"URL: {url}\n"
f"Content: {content}\n"
)
return "\n---\n".join(context_parts)
Step 3: Set Up the Generator
The generator takes the user's question and the retrieved context, then produces a grounded answer.
from openai import OpenAI
client = OpenAI(api_key="your-openai-api-key")
SYSTEM_PROMPT = """You are a helpful assistant that answers questions based on the provided web search results.
Always cite your sources by referencing the source number and URL.
If the search results don't contain enough information to answer the question, say so.
Do not make up information that isn't supported by the sources."""
def generate(query: str, context: str) -> str:
"""Generate an answer using the LLM with retrieved context."""
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": (
f"Question: {query}\n\n"
f"Search Results:\n{context}\n\n"
f"Please answer the question based on the search results above."
)}
],
temperature=0.3
)
return response.choices[0].message.content
Step 4: Put It All Together
def rag_pipeline(query: str) -> dict:
"""Complete RAG pipeline: retrieve then generate."""
# Step 1: Retrieve relevant documents
results = retrieve(query)
if not results:
return {
"answer": "I could not find relevant information to answer your question.",
"sources": []
}
# Step 2: Format context
context = format_context(results)
# Step 3: Generate answer
answer = generate(query, context)
# Step 4: Return answer with sources
sources = [{"title": r.get("title", ""), "url": r.get("url", "")} for r in results]
return {
"answer": answer,
"sources": sources
}
# Try it out
if __name__ == "__main__":
result = rag_pipeline("What are the latest developments in quantum computing in 2026?")
print("Answer:", result["answer"])
print("\nSources:")
for source in result["sources"]:
print(f" - {source['title']}: {source['url']}")
Step 5: Enhance with Keiro /search-pro
For higher-quality retrieval, swap /search for /search-pro. This endpoint applies additional re-ranking and filtering to return more relevant results:
def retrieve_pro(query: str, num_results: int = 5) -> list[dict]:
"""Use Keiro's pro search for higher quality retrieval."""
response = requests.post(f"{KEIRO_BASE_URL}/search-pro", json={
"apiKey": KEIRO_API_KEY,
"query": query
})
response.raise_for_status()
results = response.json().get("results", [])
return results[:num_results]
Step 6: Add Web Page Extraction for Deeper Context
Sometimes search snippets are not enough. Use Keiro's /web-crawler endpoint to get the full content of top results:
def retrieve_with_full_content(query: str, num_pages: int = 3) -> list[dict]:
"""Search and then extract full page content for top results."""
# First, search
results = retrieve(query, num_results=num_pages)
# Then, extract full content for each result
enriched = []
for result in results:
try:
crawl_resp = requests.post(f"{KEIRO_BASE_URL}/web-crawler", json={
"apiKey": KEIRO_API_KEY,
"url": result["url"]
})
crawl_resp.raise_for_status()
full_content = crawl_resp.json().get("content", "")
result["content"] = full_content[:3000] # Limit to 3000 chars per source
enriched.append(result)
except Exception:
enriched.append(result) # Fall back to snippet
return enriched
The Shortcut: Keiro /answer
If you want to skip the entire pipeline and get a sourced answer in one call, Keiro's /answer endpoint does exactly that:
def quick_answer(query: str) -> dict:
"""Get a sourced answer in a single API call."""
response = requests.post(f"{KEIRO_BASE_URL}/answer", json={
"apiKey": KEIRO_API_KEY,
"query": query
})
response.raise_for_status()
data = response.json()
return {
"answer": data.get("response", ""),
"sources": data.get("sources", [])
}
This is perfect for prototyping or use cases where you do not need to customize the generation step.
Production Tips
- Cache repeated queries: Keiro gives you a 50% discount on cached results automatically, so repeated queries are cheap.
- Use batch processing: If you need to pre-populate answers for a set of FAQs, use Keiro's free /batch-search endpoint.
- Set a timeout: Always set a reasonable timeout (5-10 seconds) on your HTTP requests to handle edge cases gracefully.
- Limit context length: Do not send more than 10,000 tokens of context to your LLM. Trim results if needed.
- Log sources: Always store which sources were used for each answer. This is essential for debugging and compliance.
Cost Analysis
Here is what this RAG pipeline costs per query:
| Component | Cost |
|---|---|
| Keiro /search (Pro plan) | ~$0.000125 |
| OpenAI GPT-4o (500 input + 300 output tokens) | ~$0.004 |
| Total per query | ~$0.004 |
At this cost, you can process 250,000 RAG queries per month for about $1,000 in total (mostly OpenAI costs). The search component is negligible.
Conclusion
You now have a working RAG pipeline that can answer any question using real-time web data. The combination of Keiro's affordable search API and an LLM gives you a powerful foundation for building AI applications that stay current and cite their sources.
From here, you can extend the pipeline with conversation memory, user personalization, and more sophisticated retrieval strategies. But even this basic pipeline is production-ready for many use cases.
Get your Keiro API key at kierolabs.space and start building your RAG pipeline today.