We'll explain how we structured three different knowledge bases for one agent, implemented RAG-optimized documentation with metadata, and eliminated content duplication using... other agents.
The Challenge: One Agent, Three Sources
When we launched the support agent for Agentech, we faced an interesting challenge. The agent needed to answer completely different types of questions:
- "How do I set up Telegram integration?" → needs technical documentation
- "Show me an example for an online store" → needs practical case studies from the blog
- "What are your pricing plans and terms of service?" → needs legal documents
One agent, but three completely different information sources. Each with its own features and structure requirements.
Our Solution: Three Bases, Many Agents
Instead of trying to cram everything into one large database, we created three specialized ones. But the main advantage of this approach is reusing bases across different agents.
For example, the "Legal Documents" base is connected not only to the main support agent but also to the partner agent and the CRM agent for managers. Everyone gets the same up-to-date information about pricing and terms.
The "Blog" base is used both in the technical support agent (for solution examples) and the sales agent (for demonstrating capabilities).
Three Specialized Knowledge Bases
Blog Base
Purpose: Practical cases and usage examples
Content: Articles like "How to Create an Agent for an Online Store," real scenarios, ready-made prompts
Optimization: Rich keywords for diverse use cases and semantic search
Documentation Base
Purpose: User instructions and guides
Content: How to create an agent, set up integrations, work with the knowledge base
Optimization: Comprehensive metadata with technical synonyms and translations
⚖️ Legal Documents Base
Purpose: Official information
Content: Terms of service, privacy policy, pricing, company details
Optimization: Precise legal terminology in keywords for accuracy
RAG Optimization with Metadata
Unlike traditional chunking approaches, we implemented a metadata-driven system for optimal RAG performance:
Metadata for Every Section:
- Each logical section has keywords including synonyms, translations, and domain variants
- Summaries provide context for reranking and understanding
- HTML chunk separators at natural content boundaries
Example from our Blog base:
htmlundefined
This approach gives us precise control over what users find when they search.
How Search Works Across Three Bases
When a user asks a question, here's what happens:
- Query analysis: System extracts key concepts and intent
- Parallel search: Simultaneously searches all three bases using both semantic and keyword matching
- Metadata boost: Results with matching keywords get ranking boost
- Smart combination: Agent synthesizes information from different sources
Examples in action:
Question: "Can I use your platform for an online store and how much does it cost?"
Search process:
- Keywords "online store" match Blog metadata → finds e-commerce case study
- Keywords "cost", "pricing" match Legal Documents → finds pricing page
- Agent combines both for comprehensive answer
Automation: Agents Watching Agents
The main problem with any knowledge base is that it becomes outdated. We solved this problem radically: we put agents in charge of maintaining order.
Documentation Agent: Monitors Code and Updates Metadata
How it works:
- Analyzes code changes during deployment
- Identifies affected documentation sections
- Updates content AND metadata keywords
- Adjusts summaries to reflect new functionality
- Creates drafts for team approval
Example: We added OAuth support. The agent:
- Finds all authentication-related sections
- Adds "OAuth", "OAuth2", "авторизация OAuth" to keywords
- Updates summaries to mention OAuth capability
- Rewrites affected instructions
Blog Agent: Prevents Duplication
How it works:
- Analyzes new article drafts
- Compares keywords and summaries with existing content
- Identifies overlap and gaps
- Suggests unique angles to explore
Example: Writing about "HR agent setup." The agent reports:
- Existing article covers basic HR automation
- Keywords overlap: "HR", "recruitment", "onboarding"
- Missing coverage: "performance reviews", "time tracking integration"
- Suggestion: Focus on advanced HR workflows not yet documented
Content Structure for AI Understanding
Unlike human-readable docs, AI agents need semantic clarity:
Semantic Headers with Context
markdown# How to Connect Telegram Bot to Agent **Goal:** Enable agent-user communication via Telegram **Prerequisites:** Active agent, Telegram bot token **Result:** Fully functional Telegram bot connected to your agent
NO_INDEX for Non-Essential Content
htmlWelcome to our integration guide! In this article, we'll walk through... [Table of Contents] [Navigation links]
Metrics We Track
Search Quality Metrics
- Keyword hit rate: How often keyword matches improve results
- Semantic match accuracy: Pure embedding search vs. hybrid search
- Result relevance: User feedback on answer quality
- Zero-result queries: What users search but don't find
Content Health Metrics
- Metadata coverage: Sections with/without proper metadata
- Keyword freshness: When keywords were last reviewed
- Summary accuracy: How well summaries match actual content
- Cross-reference validity: Broken links between related content
Usage Analytics
- Most searched keywords: Helps optimize metadata
- Common query patterns: Reveals how users phrase questions
- Bounce rate by section: Indicates content quality issues
Implementation Best Practices
Start with Content Audit
- Identify content types → assign to appropriate base
- Mark non-indexable content → add NO_INDEX tags
- Define logical boundaries → place chunk separators
- Extract key concepts → build initial keyword lists
Metadata Creation Process
- Read the chunk → understand core information
- List search terms → how would users look for this?
- Add translations → English + native language
- Include synonyms → technical and colloquial terms
- Write summary → specific, not generic
Quality Control Checklist
- ✅ Every chunk has RAG_META
- ✅ Keywords match actual content depth
- ✅ Summaries are specific and descriptive
- ✅ NO_INDEX tags on navigation/fluff
- ✅ Chunk separators at logical breaks
Results and Future Plans
Achieved Results
- 70% improvement in first-query success rate
- 90% reduction in content duplication
- Real-time updates with every code deployment
- Hybrid search outperforms pure semantic by 40%
Upcoming Automation
- Query pattern analyzer: Auto-suggests missing keywords from user searches
- Semantic drift detector: Identifies when summaries no longer match content
- Cross-base optimizer: Finds opportunities to link related content
- A/B testing framework: Tests different keyword strategies automatically
Want to implement RAG-optimized knowledge bases? Start with proper metadata structure and focus on keywords that match how your users actually search. The automation can come later once you understand your content patterns.