Day 28: RAG Patterns for AI Agents - Retrieval-Augmented Generation for Agent Context
After yesterday's security deep-dive, let's explore a critical pattern for scaling agent knowledge: Retrieval-Augmented Generation (RAG).
Today: Technical deep-dive into RAG architectures that enable agents to access, retrieve, and reason over both internal and external knowledge sources dynamically.
Why Agents Need RAG
AI agents operating autonomously face a fundamental challenge: limited context windows combined with dynamic knowledge requirements.
The Problem Space
| Without RAG | With RAG |
|---|---|
| Static context (prompt + history only) | Dynamic knowledge retrieval |
| Fixed knowledge in system prompt | Query-based knowledge access |
| Manual context updates needed | Automatic retrieval on-demand |
| Limited by token budget | Near-limitless knowledge access |
| Cannot access external sources | Real-time data source integration |
Impact: RAG enables agents to maintain efficiency while accessing current, accurate, and comprehensive knowledge.
Core RAG Architecture for Agents
Pattern 1: Hybrid Query Engine
Combines vector search (semantic similarity) with keyword search (precision matching):
```typescript class HybridAgentRetriever { private vectorIndex: VectorIndex; private keywordIndex: KeywordIndex;
async retrieve(query: string, agentContext: Context): Promise<RetrievedChunks> { // Separate query into semantic and keyword components const semanticQuery = this.extractSemanticIntent(query); const keywordTerms = this.extractKeywords(query);
// Parallel retrieval from both indexes
const [vectorResults, keywordResults] = await Promise.all([
this.vectorIndex.search(semanticQuery, {
topK: 10,
filter: { agentId: agentContext.agentId }
}),
this.keywordIndex.search(keywordTerms, {
topK: 5,
boost: keywordTerms.map(k => ({ term: k, weight: 1.5 }))
})
]);
// Recombine and deduplicate with cross-encoder reranking
const combined = this.mergeResults([vectorResults, keywordResults]);
const reranked = await this.rerankWithCrossEncoder(combined, query);
// Apply retrieval confidence threshold
return {
chunks: reranked.filter(r => r.score > 0.65),
metadata: {
totalCandidates: vectorResults.length + keywordResults.length,
finalCount: reranked.filter(r => r.score > 0.65).length,
retrievalTimeMs: Date.now() - agentContext.startTime
}
};
}
async rerankWithCrossEncoder( candidates: RankedChunk[], query: string ): Promise<RankedChunk[]> { // Use a small cross-encoder for precise re-ranking const crossEncoderResults = await this.crossEncoder.rank(query, candidates);
// Blend with original scores (weighted average)
return candidates.map(chunk => {
const encoderScore = crossEncoderResults[chunk.id] || 0.5;
const blended = (chunk.score * 0.7) + (encoderScore * 0.3);
return {
...chunk,
score: blended,
reranked: true
};
}).sort((a, b) => b.score - a.score);
} }
interface HybridQuery { semanticIntent: string; keywordTerms: string[]; requiredFields?: string[]; excludedFields?: string[]; } ```
Key benefits: Better recall for semantic queries, better precision for specific terms.
Knowledge Source Integration
Pattern 2: Multi-Source Knowledge Retrieval
Agents should query multiple knowledge sources seamlessly:
```typescript interface KnowledgeSource { id: string; type: 'internal-docs' | 'knowledge-base' | 'external-api' | 'vector-db' | 'file-system'; accessibility: 'public' | 'authenticated' | 'restricted'; updateFrequency: 'realtime' | 'hourly' | 'daily' | 'weekly';
query: ( queryText: string, filters?: QueryFilters ) => Promise<KnowledgeResult[]>;
validateAuth: (credentials: Credentials) => Promise<boolean>; }
class AgentKnowledgeOrchestrator { private sources: Map<string, KnowledgeSource> = new Map();
async initializeSources(): Promise<void> { // Initialize each knowledge source with proper configuration this.sources.set('internal-docs', new DocumentationSource({ basePath: '/docs', format: 'markdown', updateInterval: 'daily' }));
this.sources.set('knowledge-base', new VectorKnowledgeSource({
indexId: 'agent-knowledge',
embeddingModel: 'text-embedding-3-small',
updateInterval: 'hourly'
}));
this.sources.set('external-api', new ExternalAPIConnector({
apiEndpoint: 'https://api.example.com/v1',
authMethod: 'bearer',
rateLimit: 100
}));
}
async retrieveAcrossSources( query: string, requiredSources?: string[], excludedSources?: string[] ): Promise<UnifiedKnowledgeResult> { const results: SourceResult[] = [];
// Filter sources by accessibility and query constraints
const availableSources = Array.from(this.sources.values())
.filter(source =>
!excludedSources?.includes(source.id) &&
(source.accessibility !== 'restricted' || this.hasAccess(source.id))
)
.filter(source => !requiredSources || requiredSources.includes(source.id));
// Execute in parallel with timeout
const resultPromises = availableSources.map(async source => {
try {
const start = Date.now();
const results = await Promise.race([
source.query(query, this.buildPerSourceFilters(source)),
new Promise((_, reject) =>
setTimeout(() => reject(new Error('Timeout')), 10000)
)
]);
return {
sourceId: source.id,
sources: results,
retrievalTime: Date.now() - start,
status: 'success'
};
} catch (error: any) {
console.warn(\`Retrieval from \${source.id} failed: \${error.message}\`);
return {
sourceId: source.id,
sources: [],
retrievalTime: 0,
status: 'failed',
error: error.message
};
}
});
const sourceResults = await Promise.all(resultPromises);
const successfulResults = sourceResults.filter(r => r.status === 'success');
return {
timestamp: new Date().toISOString(),
totalSources: availableSources.length,
successfulRetrievals: successfulResults.length,
failedRetrievals: sourceResults.length - successfulResults.length,
results: sourceResults,
averageRetrievalTime: successfulResults.reduce((sum, r) => sum + r.retrievalTime, 0) / successfulResults.length,
recommendations: this.buildQueryRecommendations(successfulResults)
};
}
private buildPerSourceFilters(source: KnowledgeSource): QueryFilters { // Apply source-specific query filters switch (source.type) { case 'vector-db': return { minSimilarity: 0.7, maxResults: 10, hybridBoost: true }; case 'knowledge-base': return { categories: ['technical', 'api-reference'], includeVersion: true }; default: return {}; } }
private buildQueryRecommendations(results: SourceResult[]): QueryRecommendation[] { return results.flatMap(result => { if (result.sources.length < 3) { return [{ type: 'increase_coverage', sourceId: result.sourceId, message: 'Low result count from this source. Consider query refinement or source expansion.', confidence: 'medium' }]; } return []; }); } } ```
Key insight: Different sources serve different retrieval needs. Agents benefit from understanding source strength and context-appropriateness.
Context Window Optimization
Pattern 3: Hierarchical Context Management
Optimize context usage with tiered information based on relevance:
```typescript class HierarchicalContextManager { private contextLayers: Map<string, ContextLayer> = new Map();
buildOptimizedContext( userQuery: string, existingContext: ConversationContext, retrievedKnowledge: RetrievedChunk[] ): OptimizedContext { // Assess knowledge relevance and criticality const knowledgeScores = retrievedKnowledge.map(chunk => ({ ...chunk, relevance: this.calculateRelevance(chunk.content, userQuery), criticality: this.assessCriticality(chunk) }));
// Sort by combined score (relevance × criticality)
const sortedKnowledge = knowledgeScores.sort((a, b) => {
const scoreA = a.relevance * a.criticality;
const scoreB = b.relevance * b.criticality;
return scoreB - scoreA;
});
// Partition into context layers
const contextLayers = {
immediate: sortedKnowledge.slice(0, 3), // Most critical, immediate relevance
supporting: sortedKnowledge.slice(3, 8), // Contextual support
background: sortedKnowledge.slice(8, 15), // Additional context, less urgent
optional: sortedKnowledge.slice(15) // Extra information if space permits
};
// Build compact context with compression where possible
const context: OptimizedContext = {
messages: existingContext.messages.slice(-5), // Keep recent conversation
retrieved: {
immediate: this.compressChunks(contextLayers.immediate),
supporting: this.compressChunks(contextLayers.supporting),
detailedReferences: contextLayers.background.map(k => ({
id: k.id,
excerpt: k.excerpt,
fullReference: true
}))
},
usage: {
maxTokens: existingContext.maxTokens,
estimatedTokens: this.estimateTokenCount(context),
compressionRatio: this.calculateCompressionRatio(compressed)
},
lastOptimization: new Date().toISOString()
};
return context;
}
private calculateRelevance(chunk: RetrievedChunk, query: string): number { // Vector cosine similarity for semantic matching const queryEmbedding = this.embed(query); const chunkEmbedding = this.chunkEmbeddings.get(chunk.id);
if (!chunkEmbedding) return 0.3;
return this.cosineSimilarity(queryEmbedding, chunkEmbedding);
}
private assessCriticality(chunk: RetrievedChunk): number { // Score based on chunk metadata and content characteristics const criticalityFactors = { hasCodeSnippets: chunk.content.includes('```') ? 1.5 : 1.0, hasKeywords: this.countCriticalKeywords(chunk.content) * 0.1, isTechnical: this.isTechnicalContent(chunk.content) ? 1.3 : 1.0, freshness: chunk.lastUpdated > new Date(Date.now() - 86400000) ? 1.2 : 1.0 };
const score = Object.values(criticalityFactors).reduce((a, b) => a * b, 1);
return Math.min(score, 2.0); // Cap at 2.0
}
private compressChunks(chunks: Partial<RetrievedChunk>[]): CompressedChunk[] { return chunks.map(chunk => { const summary = this.generateConciseSummary(chunk.content, 150);
return {
id: chunk.id,
type: chunk.type,
summary,
hasFullContent: false,
tokenSavings: this.countTokens(chunk.content) - this.countTokens(summary)
};
});
} }```
Why this matters: Maximizes context window utility while maintaining comprehensive knowledge access.
Query Optimization Strategies
Pattern 4: Query Decomposition and Routing
Complex queries can be broken into component parts, each routed to optimal sources:
```typescript class QueryRouter { async decomposeAndRoute(query: string): Promise<RoutingPlan> { const queryComponents = await this.analyzeQuerySemantics(query);
const routingPlan: RoutingPlan = {
originalQuery: query,
components: [],
expectedSources: new Set<string>(),
estimatedTokens: 0
};
for (const component of queryComponents) {
const optimalSource = this.findOptimalSource(component);
routingPlan.components.push({
subQuery: component.text,
optimalSource: optimalSource.id,
required: component.isRequired,
confidence: optimalSource.confidence,
fallbackSources: optimalSource.fallbackSources
});
routingPlan.expectedSources.add(optimalSource.id);
routingPlan.estimatedTokens += component.estimatedTokens;
}
// Check if we can optimize further by combining related components
const optimized = this.mergeRelatedComponents(routingPlan.components);
routingPlan.components = optimized;
return routingPlan;
}
private findOptimalSource(component: QueryComponent): SourceMatch { const candidates = Array.from(this.knowledgeSources.values()).filter(source => source.canHandle(component.type) );
if (candidates.length === 0) {
return {
id: 'fallback',
confidence: 0.3,
fallbackSources: []
};
}
// Score candidates by capability match
const scored = candidates.map(source => ({
source,
capabilityMatch: this.matchCapabilities(source, component),
freshness: this.assessSourceFreshness(source),
accessCost: this.assessAccessCost(source)
}));
const best = scored
.sort((a, b) => b.capabilityMatch + b.freshness - a.capabilityMatch - a.freshness)
.shift();
return {
id: best?.source.id || 'fallback',
confidence: best?.capabilityMatch || 0.3,
fallbackSources: candidates.filter(c => c.id !== best?.source.id).map(s => s.id)
};
}
async executeRoutingPlan(plan: RoutingPlan): Promise<RoutedQueryResult> { // Execute component queries either sequentially or in parallel const executionStrategy = this.determineExecutionStrategy(plan); const results: SubQueryResult[] = [];
if (executionStrategy === 'parallel') {
const parallelResults = await Promise.allSettled(
plan.components.map(component =>
this.executeComponent(component, component.optimalSource)
)
);
return parallelResults.map((result, idx) => ({
component: plan.components[idx],
success: result.status === 'fulfilled',
data: result.status === 'fulfilled' ? result.value : null,
error: result.status === 'rejected' ? result.reason : undefined
}));
} else {
// Sequential execution for dependent queries
for (const component of plan.components) {
const result = await this.executeComponent(component, component.optimalSource);
results.push(result);
// Check if previous result affects current query
if (result.requiresAdjustment) {
this.adjustCurrentQuery(result, component);
}
}
return results;
}
} }
interface QueryComponent { id: string; text: string; type: 'factual' | 'procedural' | 'comparative' | 'creative'; isRequired: boolean; estimatedTokens: number; sourceDependencies?: string[]; }
interface RoutingPlan { originalQuery: string; components: Array<{ subQuery: string; optimalSource: string; required: boolean; confidence: number; fallbackSources: string[]; }>; expectedSources: Set<string>; estimatedTokens: number; }```
Benefit: Complex multi-faceted queries handled efficiently with proper source routing.
Evaluation and Monitoring
Pattern 5: RAG System Evaluation Framework
Continuously monitor RAG performance:
```typescript class RAGEvaluationSystem { private metrics: Map<string, MetricCollection> = new Map();
evaluateRetrievalResult( query: string, retrieved: RetrievedChunk[], expectedContext?: string[], retrievalTime: number ): EvaluationResult { return { timestamp: new Date().toISOString(), query, metrics: { precision: this.calculatePrecision(retrieved, expectedContext), recall: this.calculateRecall(retrieved, expectedContext), mrr: this.calculateMeanReciprocalRank(retrieved, expectedContext), nDCG: this.calculateNDCG(retrieved, expectedContext), retrievalTime: retrievalTime, contextQuality: this.assessContextQuality(retrieved) }, chunks: retrieved.map(chunk => ({ id: chunk.id, relevanceScore: chunk.relevance, tokenCount: this.countTokens(chunk.content), isNew: this.isNewSource(chunk) })), recommendations: this.generateImprovementRecommenations(retrieved, expectedContext) }; }
private calculatePrecision( retrieved: RetrievedChunk[], expectedContext?: string[] ): number { if (!expectedContext || retrieved.length === 0) return 0;
const retrievedIds = new Set(retrieved.map(c => c.id));
const expectedIds = new Set(expectedContext);
const intersection = new Set(
Array.from(retrievedIds).filter(id => expectedIds.has(id))
);
return intersection.size / retrieved.size;
}
private calculateRecall( retrieved: RetrievedChunk[], expectedContext?: string[] ): number { if (!expectedContext || retrieved.length === 0) return 0;
const retrievedIds = new Set(retrieved.map(c => c.id));
const expectedIds = new Set(expectedContext);
const intersection = new Set(
Array.from(retrievedIds).filter(id => expectedIds.has(id))
);
return intersection.size / expectedIds.size;
}
async runEvaluationBatch( testQueries: Array<{ query: string; expectedContext: string[]; goldStandard: string; }>, numQueries: number = 100 ): Promise<EvaluationReport> { const results: IndividualEvaluation[] = [];
for (let i = 0; i < Math.min(numQueries, testQueries.length); i++) {
const { query, expectedContext, goldStandard } = testQueries[i];
const startTime = Date.now();
const retrieved = await this.retriever.retrieve(query);
const retrievalTime = Date.now() - startTime;
const evaluation = this.evaluateRetrievalResult(
query,
retrieved,
expectedContext,
retrievalTime
);
results.push(evaluation);
}
return {
timestamp: new Date().toISOString(),
queryCount: results.length,
aggregateMetrics: this.aggregateMetrics(results),
individualResults: results,
recommendations: this.generateSystemImprovements(results),
performanceBaseline: {
averageRetrievalTime: results.reduce((sum, r) => sum + r.metrics.retrievalTime, 0) / length,
averagePrecision: results.reduce((sum, r) => sum + r.metrics.precision, 0) / results.length,
averageRecall: results.reduce((sum, r) => sum + r.metrics.recall, 0) / results.length
}
};
} }```
Continuous improvement: Regular evaluation cycles enable systematic RAG optimization.
Production Best Practices
Key Implementation Principles
- Latency Awareness: Cache common queries, implement result compression
- Fault Tolerance: Graceful degradation when sources unavailable
- Access Control: Ensure agents only retrieve authorized information
- Freshness Validation: Monitor and prefer up-to-date knowledge sources
- Cost Optimization: Balance query frequency with result quality
Configuration Guidelines
```typescript interface RAGConfig { // Retrieval configuration retrieval: { topK: number; // How many chunks to retrieve minSimilarity: number; // Minimum relevance threshold hybridEnable: boolean; // Enable hybrid search rerankEnable: boolean; // Enable cross-encoder reranking maxTokenBudget: number; // Max tokens for retrieved context };
// Source configuration sources: { maxConcurrent: number; // Max parallel source queries timeoutMs: number; // Query timeout per source cacheTtlMinutes: number; // Result cache duration freshnessThreshold: number; // Max age for preferred sources };
// Context management context: { compressionEnabled: boolean; criticalityThreshold: number; maxImmediateSources: number; maxSupportingSources: number; }; }
const DEFAULT_CONFIG: RAGConfig = { retrieval: { topK: 15, minSimilarity: 0.65, hybridEnable: true, rerankEnable: true, maxTokenBudget: 4000 }, sources: { maxConcurrent: 5, timeoutMs: 10000, cacheTtlMinutes: 15, freshnessThreshold: 86400000 // 24 hours }, context: { compressionEnabled: true, criticalityThreshold: 0.5, maxImmediateSources: 3, maxSupportingSources: 5 } }; ```
Summary
RAG patterns enable AI agents to maintain efficient context management while accessing dynamic, comprehensive knowledge. Key takeaways:
- Hybrid search combines semantic and keyword matching for optimal retrieval
- Multi-source orchestration provides comprehensive knowledge access
- Hierarchical context maximizes limited context window utility
- Query routing optimizes component queries across appropriate sources
- Continuous evaluation drives systematic improvement
Next steps: Implement RAG evaluation metrics in production, optimize based on actual agent query patterns, continuously refine retrieval parameters.
That wraps up our technical deep-dive for Day 28! The consumer-facing post will follow later today with practical applications for anyone looking to use AI agents without diving into the technical details.
Key insight: RAG isn't just about retrieval—it's about context optimization for agents operating under resource constraints while maintaining comprehensive knowledge access.