How to Build a UX Research Agent

Build an AI agent that synthesizes user interviews, finds pain points, and generates insight reports using Keiro and qualitative analysis.

8 min readPierre Dubois

How to Build a UX Research Agent

UX Research is one of the most impactful applications of AI-powered web search in 2026. By connecting an LLM to real-time web data through a search API, you can build an agent that researches ux research, extracts key information, cites sources, and generates structured reports — all in seconds instead of hours. This guide walks you through building a production-grade ux research agent from scratch, with complete Python and JavaScript implementations using Keiro for web search, OpenAI for generation, and a source attribution system that ensures every claim is verifiable. We cover everything from the basic agent loop to advanced techniques like multi-source verification, caching, monitoring, scaling, and cost optimization.

Why should you build a UX Research agent?

Manual ux research is slow, expensive, and inconsistent. A human researcher spends 30-60 minutes per query: searching multiple sources, reading documents, extracting key facts, cross-referencing claims, and writing up findings. At scale, this is unsustainable. A team doing 100 ux research queries per day needs 3-5 full-time researchers just to keep up. An AI agent does the same work in 10-30 seconds per query, at a cost of $0.01-0.05 per query. That is 100-300x faster and 50-200x cheaper than manual research. The ROI is clear even for small teams: a single developer can build and maintain an AI agent that replaces an entire research department.

AI agents are also consistent — they follow the same process for every query, eliminating the variability that human researchers introduce. They are comprehensive — they can search 5-10 sources per query, while humans typically stop after 2-3. And they are verifiable — with source attribution, every claim in the agent output links to the original document, allowing anyone to audit the research. Consistency is particularly important for ux research applications that serve multiple users who need the same quality of answer regardless of when they ask.

The key limitation is that ux research agents depend on the quality of their search data. If the search API returns poor or outdated results, the agent output will be poor or outdated. This is why choosing the right search API — one with reliable, current, content-rich results — is the most important decision. Keiro at $0.50 per 1,000 queries provides the best combination of cost, content quality, and reliability. Its Medium mode returns RAG-ready chunks directly, eliminating the need for a separate content extraction pipeline. Compared to alternatives, Keiro is 8x cheaper than Tavily and 14x cheaper than Exa, which means you can run more searches per query (improving comprehensiveness) without blowing your budget.

What architecture should a UX Research agent use?

The ux research agent follows a four-phase architecture. Phase 1 is search: the agent takes the user query and searches for relevant web data using Keiro's content search API. Phase 2 is extraction: the agent processes the search results, extracting key facts and organizing them by topic. Phase 3 is analysis: the LLM synthesizes the extracted facts into a coherent answer, comparing sources and identifying conflicts. Phase 4 is citation: the agent attaches source references to every factual claim, producing an answer with [Source N] citations that link back to the original documents.

This four-phase architecture has several advantages over simpler approaches. By separating search from generation, you can evaluate retrieval quality independently from answer quality. By requiring citations, you create an audit trail that builds trust. By processing multiple sources, you reduce the risk of relying on a single, potentially biased source. And by using Keiro's content extraction, you eliminate the need for a separate scraping step, which is the most common failure point in ux research pipelines. The architecture also enables targeted optimization: if you find that your agent's answers lack depth, you can tune the search phase without changing the generation phase, and vice versa. This separation of concerns is what makes the four-phase architecture maintainable in production — each phase can be updated, tested, and deployed independently.

Python: building the UX Research agent

import requests
from openai import OpenAI

KEIRO_KEY = "YOUR_KEIRO_KEY"
openai_client = OpenAI()

def research_agent(question, top_k=5):
    # UX Research research agent: search, extract, analyze, cite

    # Phase 1: Search with Keiro
    search_resp = requests.post(
        "https://kierolabs.space/api/v2/search/content",
        headers={"Authorization": "Bearer " + KEIRO_KEY},
        json={"query": question, "mode": "medium", "maxResults": top_k},
        timeout=15
    )
    results = search_resp.json()["results"]

    if not results:
        return {"answer": "No results found.", "sources": []}

    # Phase 2: Extract and organize content
    sources = []
    context_parts = []
    for i, r in enumerate(results):
        content = r.get("content", "")[:800]
        context_parts.append(f"[Source {i+1}] {r.get('title', '')}\n{content}")
        sources.append({"number": i+1, "title": r.get("title", ""), "url": r["url"]})

    context = "\n\n".join(context_parts)

    # Phase 3 and 4: Analyze and cite with LLM
    response = openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": f"You are a ux research research agent. Answer based on the sources below. "
                           f"Cite every factual claim with [Source N]. If sources conflict, note it. "
                           f"If sources are insufficient, say so.\n\n{context}"
            },
            {"role": "user", "content": question}
        ]
    )

    return {
        "answer": response.choices[0].message.content,
        "sources": sources
    }

# Usage
result = research_agent("ux research latest trends and data 2026")
print(result["answer"])
for s in result["sources"]:
    print(f"[{s['number']}] {s['title']}: {s['url']}")

JavaScript: UX Research agent implementation

async function researchAgent(question, topK = 5) {
  const searchResp = await fetch("https://kierolabs.space/api/v2/search/content", {
    method: "POST",
    headers: {
      "Authorization": "Bearer " + process.env.KEIRO_KEY,
      "Content-Type": "application/json"
    },
    body: JSON.stringify({ query: question, mode: "medium", maxResults: topK })
  });
  const searchData = await searchResp.json();
  const results = searchData.results || [];

  if (results.length === 0) {
    return { answer: "No results found.", sources: [] };
  }

  const context = results
    .map((r, i) => `[Source ${i + 1}] ${r.title}\n${(r.content || "").substring(0, 800)}`)
    .join("\n\n");

  const sources = results.map((r, i) => ({
    number: i + 1, title: r.title || "", url: r.url
  }));

  const llmResp = await fetch("https://api.openai.com/v1/chat/completions", {
    method: "POST",
    headers: {
      "Authorization": "Bearer " + process.env.OPENAI_KEY,
      "Content-Type": "application/json"
    },
    body: JSON.stringify({
      model: "gpt-4o",
      messages: [
        { role: "system", content: `You are a ux research research agent. Answer based on sources. Cite with [Source N].\n\n${context}` },
        { role: "user", content: question }
      ]
    })
  });
  const llmData = await llmResp.json();

  return { answer: llmData.choices[0].message.content, sources };
}

cURL: testing the search component

# Keiro content search for ux research
curl -X POST https://kierolabs.space/api/v2/search/content \
  -H "Authorization: Bearer YOUR_KEIRO_KEY" \
  -H "Content-Type: application/json" \
  -d '{"query":"ux research latest research 2026","mode":"medium","maxResults":5}'

# Keiro Flash for fast overview
curl -X POST https://kierolabs.space/api/v2/search/flash \
  -H "Authorization: Bearer YOUR_KEIRO_KEY" \
  -H "Content-Type: application/json" \
  -d '{"query":"ux research latest research 2026","maxResults":5}'

How do you handle errors and rate limiting?

Production ux research agents need robust error handling. The search API may return empty results, timeout, or return rate limit errors (HTTP 429). The LLM may generate an answer that ignores the context or fails to cite sources. You need to handle each case gracefully, because every unhandled error is an error your users see directly in their output.

For search API errors, implement exponential backoff with a maximum of 3 retries. If all retries fail, return a message explaining that the search is temporarily unavailable. For rate limits, Keiro's rate limits are generous (5 req/s on Explorer, 20 req/s on Startup, 100 req/s on Enterprise). If you exceed these limits, implement a token bucket rate limiter to stay within bounds. The backoff schedule should start at 1 second and double with each retry: 1s, 2s, 4s. This prevents thundering herd problems when the API recovers from a temporary outage.

For LLM errors, validate the response before returning it. Check that the answer contains at least one [Source N] citation. If it does not, regenerate with a stronger citation instruction. If the answer contains unsupported claims, flag them for human review. Also handle edge cases like the LLM refusing to answer (content filter), returning an empty response, or exceeding the token limit. Each of these cases should produce a clear, user-friendly message rather than a raw error dump.

How do you implement multi-source verification?

Basic ux research agents search once and generate an answer from whatever they find. This works for simple queries, but for high-stakes ux research where accuracy is critical, you need multi-source verification. The idea is to search multiple times with different query formulations, then compare results across searches to identify claims that are supported by multiple independent sources. Claims supported by 3+ sources are marked as "verified", claims supported by 2 sources are marked as "likely", and claims from a single source are marked as "unverified" with a note encouraging the user to check the original source.

def verified_research_agent(question, num_searches=3, top_k=5):
    # Advanced: multi-source verification for ux research
    all_results = []

    # Generate multiple query formulations
    queries = [question]
    queries.append(f"{question} overview comparison")
    queries.append(f"{question} latest data statistics")

    for i, query in enumerate(queries[:num_searches]):
        resp = requests.post(
            "https://kierolabs.space/api/v2/search/content",
            headers={"Authorization": "Bearer " + KEIRO_KEY},
            json={"query": query, "mode": "medium", "maxResults": top_k},
            timeout=15
        )
        results = resp.json().get("results", [])
        for r in results:
            r["search_index"] = i + 1
        all_results.extend(results)

    # Deduplicate by URL
    seen_urls = set()
    unique_results = []
    for r in all_results:
        if r["url"] not in seen_urls:
            seen_urls.add(r["url"])
            unique_results.append(r)

    # Build context with source overlap info
    context_parts = []
    sources = []
    for i, r in enumerate(unique_results):
        searches = r.get("search_index", 1)
        tag = "[Verified]" if searches >= 3 else "[Likely]" if searches >= 2 else "[Unverified]"
        content = r.get("content", "")[:600]
        context_parts.append(f"[Source {i+1}] {tag} {r.get('title', '')}\n{content}")
        sources.append({"number": i+1, "title": r.get("title", ""), "url": r["url"], "status": tag})

    context = "\n\n".join(context_parts)

    response = openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {"role": "system", "content": f"You are a ux research research agent. Verify claims across sources. "
                                        f"Prefer verified claims. Note conflicts. Cite with [Source N].\n\n{context}"},
            {"role": "user", "content": question}
        ]
    )

    return {"answer": response.choices[0].message.content, "sources": sources, "total_sources": len(unique_results)}

Multi-source verification increases search cost by 2-3x (because you make multiple searches), but dramatically improves answer reliability. With Keiro at $0.50/1K, the additional cost is negligible: 3 searches of 5 results each costs $0.0075 per query. With Tavily at $4.00/1K, the same 3 searches cost $0.06 per query — still affordable, but 8x more expensive. The verification approach is most valuable for ux research use cases where accuracy is critical: financial research, medical information, legal analysis, or any domain where a wrong answer has real consequences.

How do you add caching to the agent?

Caching is the single most effective way to reduce cost and latency for ux research agents. Many queries are repetitive: different users ask the same question, or the same user asks a question they asked last week. Without caching, every query hits the search API and LLM, costing $0.01-0.05 per query. With caching, repeat queries cost effectively nothing. A well-designed cache can reduce your effective query cost by 30-70%, depending on how repetitive your query distribution is.

import hashlib
import json
import time

# Simple in-memory cache (use Redis for production)
cache = {}
CACHE_TTL = 3600  # 1 hour

def cached_research_agent(question, top_k=5):
    # Check cache first
    cache_key = hashlib.md5(question.lower().encode()).hexdigest()
    if cache_key in cache:
        entry = cache[cache_key]
        if time.time() - entry["timestamp"] < CACHE_TTL:
            entry["cache_hit"] = True
            return entry["result"]

    # Cache miss: run the full agent
    result = research_agent(question, top_k)

    # Store in cache
    cache[cache_key] = {"result": result, "timestamp": time.time()}
    result["cache_hit"] = False
    return result

# Usage with automatic caching
result = cached_research_agent("ux research latest trends 2026")
if result["cache_hit"]:
    print("Returned from cache (0 search cost)")
else:
    print("Fresh result from Keiro search")

For production deployments, use Redis instead of an in-memory cache. Redis supports TTL-based expiration, so cached entries automatically expire after the configured period. Set the TTL based on how quickly your ux research data changes: 1 hour for fast-moving data, 24 hours for stable data. Keiro also offers a built-in cache discount: when you query the same search term within 24 hours, Keiro returns cached results at 50% off. This means even cache misses on the search API side benefit from Keiro's caching, reducing your search cost for repeated queries. A good strategy is to combine your application-level cache (for the full agent response) with Keiro's search cache (for the search component) to maximize savings.

What should you monitor in production?

Running a ux research agent in production requires monitoring four key metrics: search latency, LLM latency, citation rate, and cache hit rate. Search latency measures how long the Keiro API takes to return results. LLM latency measures how long OpenAI takes to generate the answer. Citation rate measures what percentage of factual claims in the answer include a [Source N] citation. Cache hit rate measures how often queries are served from cache instead of hitting the API.

Set alert thresholds for each metric. Search latency should stay under 2 seconds on average. If it exceeds 5 seconds for more than 5 consecutive queries, alert the on-call engineer. LLM latency should stay under 5 seconds. If it exceeds 10 seconds, the user experience degrades significantly. Citation rate should stay above 70%. If it drops below 50%, the agent is not properly citing sources and the output cannot be audited. Cache hit rate should be tracked but not alerted on — it is a cost metric, not a quality metric.

Implement monitoring with a simple logging system that records every query, its metrics, and its outcome. Push these logs to a time-series database like Prometheus or InfluxDB. Create dashboards that show trends over time, not just current values. A gradual increase in search latency over days or weeks indicates a problem that threshold-based alerts will miss. Track per-query cost as well: log whether each query was a cache hit, a Keiro cache hit (50% discount), or a full-price query. This data helps you optimize your cache TTL and estimate your monthly bill accurately.

How do you scale the agent?

The ux research agent is embarrassingly parallel — each query is independent. This makes horizontal scaling straightforward: run multiple agent instances behind a load balancer and distribute queries across them. The bottleneck is the search API rate limit, not compute. Keiro's Enterprise plan supports 100 req/s, which handles approximately 8.6 million queries per day. If you need more than that, contact Keiro for custom rate limits.

For most ux research applications, scaling follows a predictable pattern. Start with a single agent instance. When query volume exceeds what one instance can handle (typically 10-50 queries per minute depending on LLM latency), add a second instance behind a round-robin load balancer. Continue adding instances until you reach the Keiro rate limit. At that point, you have two options: upgrade to a higher Keiro plan, or implement request queuing with priority levels. Request queuing is useful when some queries are more time-sensitive than others: prioritize interactive user queries over background batch jobs.

Containerize the agent with Docker for easy deployment. Each container runs a single agent instance with its own API keys and cache connection. Use Kubernetes or similar for orchestration, with horizontal pod autoscaling based on CPU usage or request queue length. Set the minimum replicas to 2 for high availability, and the maximum to whatever your Keiro rate limit supports. Monitor the auto-scaler to ensure it is not thrashing — if replicas are constantly being created and destroyed, your minimum replica count is too low.

Cost analysis for the UX Research agent

The total cost per query is the sum of search API cost and LLM cost. With Keiro at $0.50/1K and GPT-4o at approximately $0.01-0.05 per query, the total cost is $0.0105-0.0505 per query. At 1,000 queries per day, this costs $315-1,515 per month. With Tavily at $4.00/1K, the same workload costs $1,200-2,400 per month in search costs alone — 4-8x more expensive. The cost analysis should also factor in cache savings: with a 40% cache hit rate, your effective search cost drops by 40%, and with Keiro's search cache discount, even non-cached searches may cost less.

ComponentCost/Query1K Queries/Day
Keiro search (5 results)$0.0025$75/month
GPT-4o generation$0.01-0.05$300-1,500/month
Total with Keiro$0.0125-0.0525$375-1,575/month
Total with Tavily$0.04-0.09$1,200-2,700/month
With 40% cache hit (Keiro)$0.0075-0.0315$225-945/month
With 40% cache hit (Tavily)$0.024-0.054$720-1,620/month

How can you optimize costs?

Reducing the cost of your ux research agent is a combination of strategic and tactical decisions. Here are the most effective techniques we have seen teams use in production.

  • Use Flash for pre-filtering: Instead of running 5 content searches per query, run 10 Flash searches first, rank the results by metadata, and only extract content from the top 3-5. Flash costs the same per query but returns in ~300ms, and you avoid paying for content extraction on low-relevance results. This can reduce your content search volume by 40-60%.
  • Implement aggressive caching: Set your cache TTL as high as your data freshness requirements allow. For most ux research use cases, a 4-hour cache TTL is sufficient — data does not change that fast. A 4-hour TTL with 40% cache hit rate saves more than a 1-hour TTL with 15% cache hit rate.
  • Use a cheaper LLM for simple queries: Not every query needs GPT-4o. Classify queries by complexity and route simple queries to GPT-4o-mini or similar models that cost 10-20x less. A query like "what is ux research?" does not need a $0.05 generation step. A $0.002 generation step with a smaller model is sufficient.
  • Batch similar queries: If your application receives multiple queries about the same topic within a short window, batch them into a single search and share the context across multiple LLM calls. This reduces search API calls by up to 80% for bursty traffic patterns.
  • Leverage Keiro's search cache: Keiro offers 50% off on repeated search queries within 24 hours. Structure your queries to maximize cache hits: use canonical query formulations instead of user-provided raw text. Normalize queries by lowercasing, removing filler words, and applying consistent formatting.

When should you NOT build a UX Research agent?

  • Simple, deterministic queries: If queries can be answered by a database lookup or API call, you do not need an AI agent. A direct call is faster and more reliable.
  • Regulatory requirements for human review: If your domain requires human verification of every answer, the agent should assist humans rather than replace them.
  • Real-time data that changes by the second: Web search has a latency of minutes to hours. For truly real-time data, use a dedicated real-time API.
  • Paywalled content: Keiro cannot extract content behind paywalls. If the best sources are paywalled, you need a subscription-based data provider.
  • Very low query volume (under 100/month): If you only need occasional ux research, manual research may be more cost-effective than building and maintaining an AI agent.

How do you deploy to production?

Deploying the ux research agent in production requires caching, monitoring, and scaling. Implement a result cache that stores the answer and sources for each query. If the same question comes up repeatedly, return the cached result. Keiro's built-in cache (50% discount on cached queries) helps with the search component. Add Redis or similar for the full agent result cache. The production deployment checklist includes: Docker containerization, health check endpoints, structured logging, rate limit handling, graceful degradation, and automated alerting.

Monitor performance with: search latency (under 2 seconds), LLM latency (under 5 seconds), citation rate (70%+ of sentences with citations), and user satisfaction metrics. Set alerts for when any metric drops below threshold. Use structured logging (JSON format) so you can query and analyze logs in your monitoring system. Log every query with its latency, cost, cache status, and citation count — this data is essential for debugging and optimization.

For scaling, the ux research agent is embarrassingly parallel — each query is independent. Scale horizontally by running multiple agent instances behind a load balancer. The bottleneck is the search API rate limit. Keiro's Enterprise plan supports 100 req/s, which handles approximately 8.6 million queries per day. Use Kubernetes with horizontal pod autoscaling for automatic scaling based on request queue length.

Testing the UX Research agent

Before deploying your ux research agent to production, you need a testing strategy that covers both unit tests and integration tests. Unit tests verify individual components: the search function, the LLM call, the citation parser. Integration tests verify the full pipeline: query in, answer out, with correct sources and citations. Both are essential because individual components can work correctly in isolation but fail when combined.

Write unit tests for the search function using mocked API responses. Create a set of 5-10 mock search responses that cover the common cases: normal results, empty results, single result, and error responses. Test that your search function correctly handles each case, including retry logic and error messages. For the LLM call, mock the OpenAI API and test that your citation validation logic correctly identifies answers with and without citations. For the citation parser, test that it correctly extracts [Source N] references from various answer formats.

Integration tests require actual API calls, so they are slower and cost money. However, they are the only way to verify that the full pipeline works end-to-end. Create a test suite of 10-20 representative ux research queries and run it after every pipeline change. Each test verifies that the answer is non-empty, contains at least one citation, and includes at least 3 sources. Run integration tests in a staging environment, not against your production API keys, to avoid polluting your production data and hitting rate limits during test runs. Automate the integration test suite to run on every pull request. A failing integration test should block deployment. At Keiro's $0.50/1K pricing, running 20 integration test queries per PR costs approximately $0.05 — cheap enough to run on every change without thinking about it.

Start building your UX Research agent

Keiro's built-in content extraction, RAG-ready chunking, and $0.50/1K pricing make it the ideal search API for building ux research agents. The Explorer plan gives you 500 free credits to build and test your agent. Most teams have a working prototype in under an hour and a production deployment in under a week. Create your free Keiro account and start building your ux research agent today.

Coded by DeepSeek -- orchestrated by GLM 5.1

Frequently asked questions

Transcribe interviews with Whisper or similar. Use NLP to extract themes and pain points. Use Keiro to research competitor UX. Generate a structured insight report with quotes and recommendations.

No. AI synthesizes interviews but cannot conduct them. The agent helps analyze data faster, letting researchers spend more time with users.

Use embeddings to cluster similar quotes. Use LLMs to label clusters with theme names. Filter by frequency and emotional intensity for priority themes.

Classifying user quotes as positive, negative, or neutral. Negative quotes with high frequency indicate critical pain points. Positive quotes reveal strengths to amplify.

Transcription: $0.006/minute (Whisper API). NLP: free with open-source models. Keiro: $0.005/query. A 10-interview study costs under $10 in tools vs 2-3 days of manual work.

Pierre Dubois
Pierre Dubois
Staff Engineer

Staff engineer specializing in RAG pipelines and retrieval architecture. Built distributed search with Elasticsearch and vector databases. Based in Lyon.

Stop overpaying for search.
Start at $1.

5,000 searches/mo. Enterprise-grade API. 93% off your first month. Cancel anytime.

Start at $1