Build a RAG Pipeline with Keiro in Under 10 Minutes

A step-by-step guide to building a production-ready RAG pipeline using Keiro's search and content extraction API. Includes Python code you can copy-paste.

10 min readKeiro Team

Retrieval Augmented Generation (RAG) is the most practical way to give LLMs access to current, factual information. Instead of relying solely on training data, you retrieve relevant documents at query time and include them in the LLM's context.

This tutorial shows you how to build a complete RAG pipeline using Keiro's search and content extraction API. We'll go from zero to a working system in under 10 minutes.

Architecture Overview

Our RAG pipeline has four steps:

  1. Query — User asks a question
  2. Search — Keiro searches the web and returns relevant content chunks
  3. Augment — We inject the retrieved content into the LLM prompt
  4. Generate — The LLM generates an answer grounded in real sources

Step 1: Install Dependencies

pip install requests openai

Step 2: Search + Extract Content with Keiro

Keiro's /search/content endpoint in medium mode does something unique: it searches the web AND returns pre-chunked content from the top results. This skips the entire "fetch pages → parse HTML → chunk text" pipeline that most RAG systems require.

import requests

KEIRO_KEY = "your_keiro_api_key"

def search_and_chunk(query: str, max_results: int = 3) -> list[dict]:
    """Search the web and get RAG-ready chunks in one API call."""
    response = requests.post(
        "https://kierolabs.space/api/v2/search/content",
        headers={"Authorization": f"Bearer {KEIRO_KEY}"},
        json={"query": query, "maxResults": max_results, "mode": "medium"}
    )
    response.raise_for_status()
    return response.json()["results"]

Step 3: Build the RAG Prompt

def build_rag_prompt(query: str, sources: list[dict]) -> str:
    """Build a prompt with retrieved context."""
    context_parts = []
    for i, source in enumerate(sources, 1):
        context_parts.append(
            f"Source {i}: {source['title']}\n"
            f"URL: {source['url']}\n"
            f"Content: {source['content'][:2000]}\n"
        )

    context = "\n---\n".join(context_parts)

    return f"""Answer the following question using ONLY the provided sources.
Cite sources by number [1], [2], etc. If the sources don't contain
enough information, say so.

Sources:
{context}

Question: {query}

Answer:"""

Step 4: Generate with OpenAI

from openai import OpenAI

client = OpenAI(api_key="your_openai_key")

def rag_answer(query: str) -> str:
    """Full RAG pipeline: search → augment → generate."""
    # 1. Search + extract chunks
    sources = search_and_chunk(query)

    # 2. Build augmented prompt
    prompt = build_rag_prompt(query, sources)

    # 3. Generate answer
    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.1
    )

    return response.choices[0].message.content

# Try it
answer = rag_answer("What are the latest best practices for RAG pipelines in 2026?")
print(answer)

Step 5: Add Source Attribution

For production, you'll want to return sources alongside the answer:

def rag_with_sources(query: str) -> dict:
    sources = search_and_chunk(query)
    prompt = build_rag_prompt(query, sources)

    response = client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
        temperature=0.1
    )

    return {
        "answer": response.choices[0].message.content,
        "sources": [{"title": s["title"], "url": s["url"]} for s in sources],
        "credits_used": 1.5  # search/content costs 1.5 credits
    }

Why Keiro for RAG?

Most RAG systems require a multi-step pipeline: search → fetch pages → parse HTML → chunk → embed. Keiro's medium mode collapses the first four steps into a single API call:

StepTraditional RAGKeiro RAG
Web searchTavily/Exa ($3–4/1K)One call to /search/content ($1.50/1K)
Fetch pagesrequests + rate limiting
Parse HTMLBeautifulSoup/Trafilatura
Chunk textLangChain splitters
EmbedOpenAI embeddingsOpenAI embeddings

The result: fewer dependencies, less code, lower latency, and 60–80% cost savings compared to Tavily or Exa-based RAG pipelines.

Next Steps

  • Use /search/content deep mode for full markdown extraction
  • Add Keiro's /search/batch endpoint for offline dataset generation
  • Use the /search/flash endpoint for real-time agent loops where latency matters
  • Implement caching to save credits on repeated queries

Keiro plans start at $15/month for 5,000 credits. The /search/content endpoint costs 1.5 credits per call, so the Essential plan gives you ~3,300 RAG queries per month. Start with 300 free credits to test your pipeline — no credit card needed.

Ready to build something?

Join developers using Keiro — 10× cheaper with superior performance.

Get started