We'll explain how we structured three different knowledge bases for one agent, implemented RAG-optimized documentation with metadata, and eliminated content duplication using... other agents.

The Challenge: One Agent, Three Sources

When we launched the support agent for Agentech, we faced an interesting challenge. The agent needed to answer completely different types of questions:

"How do I set up Telegram integration?" → needs technical documentation
"Show me an example for an online store" → needs practical case studies from the blog
"What are your pricing plans and terms of service?" → needs legal documents

One agent, but three completely different information sources. Each with its own features and structure requirements.

Our Solution: Three Bases, Many Agents

Instead of trying to cram everything into one large database, we created three specialized ones. But the main advantage of this approach is reusing bases across different agents.

For example, the "Legal Documents" base is connected not only to the main support agent but also to the partner agent and the CRM agent for managers. Everyone gets the same up-to-date information about pricing and terms.

The "Blog" base is used both in the technical support agent (for solution examples) and the sales agent (for demonstrating capabilities).

Three Specialized Knowledge Bases

Blog Base

Purpose: Practical cases and usage examples
Content: Articles like "How to Create an Agent for an Online Store," real scenarios, ready-made prompts
Optimization: Rich keywords for diverse use cases and semantic search

Documentation Base

Purpose: User instructions and guides
Content: How to create an agent, set up integrations, work with the knowledge base
Optimization: Comprehensive metadata with technical synonyms and translations

⚖️ Legal Documents Base

Purpose: Official information
Content: Terms of service, privacy policy, pricing, company details
Optimization: Precise legal terminology in keywords for accuracy

RAG Optimization with Metadata

Unlike traditional chunking approaches, we implemented a metadata-driven system for optimal RAG performance:

Metadata for Every Section:

Each logical section has keywords including synonyms, translations, and domain variants
Summaries provide context for reranking and understanding
HTML chunk separators at natural content boundaries

Example from our Blog base:

html
undefined

This approach gives us precise control over what users find when they search.

How Search Works Across Three Bases

When a user asks a question, here's what happens:

Query analysis: System extracts key concepts and intent
Parallel search: Simultaneously searches all three bases using both semantic and keyword matching
Metadata boost: Results with matching keywords get ranking boost
Smart combination: Agent synthesizes information from different sources

Examples in action:

Question: "Can I use your platform for an online store and how much does it cost?"

Search process:

Keywords "online store" match Blog metadata → finds e-commerce case study
Keywords "cost", "pricing" match Legal Documents → finds pricing page
Agent combines both for comprehensive answer

Automation: Agents Watching Agents

The main problem with any knowledge base is that it becomes outdated. We solved this problem radically: we put agents in charge of maintaining order.

Documentation Agent: Monitors Code and Updates Metadata

How it works:

Analyzes code changes during deployment
Identifies affected documentation sections
Updates content AND metadata keywords
Adjusts summaries to reflect new functionality
Creates drafts for team approval

Example: We added OAuth support. The agent:

Finds all authentication-related sections
Adds "OAuth", "OAuth2", "авторизация OAuth" to keywords
Updates summaries to mention OAuth capability
Rewrites affected instructions

Blog Agent: Prevents Duplication

How it works:

Analyzes new article drafts
Compares keywords and summaries with existing content
Identifies overlap and gaps
Suggests unique angles to explore

Example: Writing about "HR agent setup." The agent reports:

Existing article covers basic HR automation
Keywords overlap: "HR", "recruitment", "onboarding"
Missing coverage: "performance reviews", "time tracking integration"
Suggestion: Focus on advanced HR workflows not yet documented

Content Structure for AI Understanding

Unlike human-readable docs, AI agents need semantic clarity:

Semantic Headers with Context

markdown
# How to Connect Telegram Bot to Agent

**Goal:** Enable agent-user communication via Telegram
**Prerequisites:** Active agent, Telegram bot token
**Result:** Fully functional Telegram bot connected to your agent

NO_INDEX for Non-Essential Content

html

Welcome to our integration guide! In this article, we'll walk through...
[Table of Contents]
[Navigation links]

Metrics We Track

Search Quality Metrics

Keyword hit rate: How often keyword matches improve results
Semantic match accuracy: Pure embedding search vs. hybrid search
Result relevance: User feedback on answer quality
Zero-result queries: What users search but don't find

Content Health Metrics

Metadata coverage: Sections with/without proper metadata
Keyword freshness: When keywords were last reviewed
Summary accuracy: How well summaries match actual content
Cross-reference validity: Broken links between related content

Usage Analytics

Most searched keywords: Helps optimize metadata
Common query patterns: Reveals how users phrase questions
Bounce rate by section: Indicates content quality issues

Implementation Best Practices

Start with Content Audit

Identify content types → assign to appropriate base
Mark non-indexable content → add NO_INDEX tags
Define logical boundaries → place chunk separators
Extract key concepts → build initial keyword lists

Metadata Creation Process

Read the chunk → understand core information
List search terms → how would users look for this?
Add translations → English + native language
Include synonyms → technical and colloquial terms
Write summary → specific, not generic

Quality Control Checklist

✅ Every chunk has RAG_META
✅ Keywords match actual content depth
✅ Summaries are specific and descriptive
✅ NO_INDEX tags on navigation/fluff
✅ Chunk separators at logical breaks

Results and Future Plans

Achieved Results

70% improvement in first-query success rate
90% reduction in content duplication
Real-time updates with every code deployment
Hybrid search outperforms pure semantic by 40%

Upcoming Automation

Query pattern analyzer: Auto-suggests missing keywords from user searches
Semantic drift detector: Identifies when summaries no longer match content
Cross-base optimizer: Finds opportunities to link related content
A/B testing framework: Tests different keyword strategies automatically

Want to implement RAG-optimized knowledge bases? Start with proper metadata structure and focus on keywords that match how your users actually search. The automation can come later once you understand your content patterns.