Knowledge Retrieval: RAG for Enterprise Agents with Azure AI Search

Session Date: May 20, 2025
Duration: 1 hour
Venue: Build 2025 Conference - BRK141
Speaker: Pablo Castro (CVP & Distinguished Engineer, AI Platform, Microsoft)
Link: [Microsoft Build 2025 Session BRK141]

Executive Summary

This technical deep-dive session reveals the evolution from traditional RAG to purpose-built enterprise AI systems powered by Azure AI Search. Pablo Castro demonstrates how knowledge retrieval has transformed from simple vector search to sophisticated agentic retrieval systems capable of multi-hop reasoning, complex query decomposition, and enterprise-grade security. The session showcases three major innovation areas: agentic retrieval with 40% improvement in answer relevance, multi-modal document processing with AI-powered extraction, and native Entra ID-based access control for secure enterprise deployment.

Key Topics Covered

?? 1. The Evolution Beyond Traditional RAG

The Historical Context: Two Years of RAG Progress

Pablo’s Opening Insight: > “About two years ago, we started to talk about RAG… it was built out of the tools available in the room. There was this clever observation of in-context learning… we managed to put these things together.”

Traditional RAG Architecture:

In-context learning - Language models with concatenated retrieved content
Available tools approach - Using existing search stacks not designed for AI
Vector search adoption - Addressing vocabulary gaps and semantic understanding
Industry success - Two years of progress with first-wave applications

The Transition to Purpose-Built Systems

From Adoption to Innovation:

Historical approach - “Whatever we adopted to do the job”
Current evolution - “Things that we purposely built and constructed”
Strategic goal - Make tasks easier, better, and faster to solution
Enterprise focus - Real-world agents solving real-world problems

?? 2. Agentic Retrieval: The Next Generation Query Pipeline

The Limitation of Single-Shot Search

Traditional Search Constraints:

Linear results - Top-K results from single query execution
Simple fact-seeking works well: “What are the security updates for KB article #123456?”
Complex queries fail - Multi-part questions with typos, references, and context

The “Doesn’t Work” Scenario Example:

Complex Query: "What does KB [typo] article fix security issue mentioned earlier 
regarding the vulnerability we discussed?"

Challenges:

- Multiple moving parts in single question
- Typos requiring correction
- Cross-references requiring resolution
- Conversational context dependency

Agentic Retrieval Architecture

Revolutionary Approach:

Traditional: User Query ? Single Search ? Top-K Results ? LLM
Agentic: Chat History + Context ? Query Planning ? Parallel Execution ? Merge Results ? LLM

Intelligent Query Processing:

Query planning - LLM-powered understanding of information needs
Query decomposition - Breaking complex queries into retrievable components
Typo correction - Context-aware error resolution
Parallel execution - Multiple search operations simultaneously
Result merging - Intelligent combination of diverse information sources

Live Demonstration: Outdoor Gear Analysis

Complex Query Scenario:

User: "What are examples of popular tents?"
System: "TrailMaster and SkyView are two popular choices."
User: "Which one fits more people?"

Agentic Processing:

Context analysis - Understanding previous conversation about tent models
Query branching - Two separate searches:
- “TrailMaster tent maximum capacity”
- “SkyView tent maximum capacity”
Parallel execution - Both queries processed simultaneously
Result synthesis - Combined capacity information in comparative format

Performance Metrics and Validation

Quantified Improvements:

40% increase in answer relevance for complex queries
30% increase in result rate for difficult question scenarios
Multi-dataset validation - Support, MIML (Multi-Industry Multi-Language)
Groundedness preservation - No regression in hallucination prevention

Evaluation Framework:

Content relevance - Retrieved information matches query intent
Answer relevance - LLM response addresses actual question
Groundedness - Response based on retrieved data, not hallucinated

?? 4. Enterprise-Grade Security and Access Control

The Security Challenge of Intelligent Retrieval

The Problem: > “An interesting effect of all this super-smart retrieval systems and the copilots we build on top of them is that they’ll find everything.”

Enterprise Requirements:

Access control propagation - Document permissions must flow through AI systems
Identity integration - Enterprise identity systems must control AI access
Group membership - Dynamic group changes affecting document visibility
Audit trails - Compliance and security monitoring requirements

Native Entra ID Integration

Announcement: Document-Level Access Control

Automatic group expansion - Dynamic membership resolution
User-scoped indexing - Search results filtered by user permissions
RBAC integration - Azure role-based access control support
Zero manual configuration - Automatic security policy enforcement

Implementation Architecture:

Document ACLs ? Azure AI Search ? User Token ? Filtered Results
??? User IDs field
??? Groups field  
??? RBAC roles field

Live Security Demo

Access Control in Action:

Document creation - Three documents with different permission sets
User authentication - Application identity with user token delegation
Filtered results - Only accessible documents returned
Dynamic enforcement - Real-time permission checking

Advanced Security: Sensitivity Labels

Microsoft Purview Integration (Private Preview):

Document encryption - Automatic handling of encrypted documents
Policy enforcement - Sensitivity label policy compliance
Organizational protection - Document classification and handling rules
End-to-end security - From document creation to AI response

?? 5. Data Integration and Pipeline Innovation

Logic Apps Integration for External Data

Challenge: Enterprise data exists beyond Azure ecosystem Solution: Azure AI Search + Logic Apps partnership

OneDrive for Business Example:

Azure Portal wizard - Simplified integration setup
Logic Apps workflow - Automated data ingestion and processing
Change tracking - Continuous synchronization with source systems
Vectorization pipeline - Automatic embedding generation and indexing

Comprehensive Data Source Support

Native Azure Integration:

Blob Storage - Document and file processing
OneLake - Microsoft Fabric data lake integration
Azure SQL Database - Structured data indexing
Cosmos DB - NoSQL document processing

External Source Integration:

OneDrive/SharePoint - Microsoft 365 document systems
Third-party systems - Through Logic Apps connectors
Custom APIs - Extensible integration patterns

Azure Data Lake Storage Gen2 ACL Propagation

Automatic Permission Flow:

Source ACL detection - Native ADLS Gen2 access control lists
Permission propagation - Automatic transfer to search index
User/group mapping - Entra ID integration throughout pipeline
Zero-touch security - No manual permission configuration required

?? 6. Model Context Protocol and Developer Experience

MCP Integration for Agent Development

Azure MCP Server Capabilities:

Resource management - Azure resource groups and services
Service integration - SQL, Azure Monitor, Cosmos DB, Azure AI Search
Developer-focused - Tools for development workflows
Self-describing APIs - Enhanced agent interaction capabilities

Agent-Driven Development Demo

Intelligent Application Generation: Query: “Build me a Next.js app for outdoor gear using my Azure Search index”

Automated Results:

Schema analysis - Automatic index structure discovery
Data sampling - Understanding actual content patterns
UI generation - Complete application with faceted navigation
Branding decisions - Logos and visual design choices
Search integration - Functional search interface with filtering

Developer Productivity Gains:

Zero manual UI development - Complete application from description
Facet exploitation - Automatic use of searchable/filterable fields
Metadata utilization - Index descriptions driving application behavior
API integration - Functional search capabilities without manual coding

Technical Architecture Deep Dive

Knowledge Agent Abstraction

Up-Leveled API Design:

# Traditional Search API
search_client.search(
    query="tents",
    search_fields=["title", "description"],
    vector=embedding,
    top_k=10
)

# Knowledge Agent API  
knowledge_agent.retrieve(
    chat_history=[...],
    context="outdoor gear shopping",
    model="gpt-4o"
)

Architectural Benefits:

Higher abstraction - Chat history instead of field specifications
Automatic optimization - AI-driven query planning and execution
Policy integration - Built-in access control and security
Extensible foundation - Room for continuous innovation

Multi-Modal Processing Pipeline

Document Intelligence Integration:

PDF Input ? Layout Extraction ? Image Identification ? 
Verbalization (GPT-4o) ? Text Indexing ? Vector Generation ? 
Component Storage ? Addressable References

Processing Options:

Simple extraction - Basic text and image separation
Full AI Document Intelligence - Complete layout and structure analysis
Image verbalization - LLM-powered image description
Hybrid approaches - Combined embedding and description strategies

Security Architecture

Document-Level Access Control:

{
  "id": "doc1",
  "content": "Sensitive financial data...",
  "users": ["user1@contoso.com"],
  "groups": ["finance-team", "executives"],
  "roles": ["Storage Blob Data Reader"]
}

Query-Time Enforcement:

Token delegation - Application identity with user context
Dynamic filtering - Real-time permission checking
Group expansion - Automatic membership resolution
RBAC integration - Azure role-based access control

Performance Metrics and Validation

Agentic Retrieval Effectiveness

Complex Query Performance:

40% improvement in answer relevance for difficult questions
30% increase in successful result rate
Multi-industry validation - Finance, manufacturing, multiple sectors
Multi-language testing - Global deployment readiness

Query Type Analysis:

Simple queries - Maintained existing performance levels
Multi-hop questions - Significant improvement in accuracy
Complex scenarios - Material quality gains across all datasets
Groundedness - No regression in hallucination prevention

Live Demonstration Results

Agentic Retrieval in Action

Complex Query Processing:

Input: "Which tent fits more people?" (with context about TrailMaster vs SkyView)
Processing:
??? Query Planning: Understand comparative capacity question
??? Context Analysis: Previous discussion about specific tent models  
??? Query Decomposition: Two separate capacity searches
??? Parallel Execution: TrailMaster capacity + SkyView capacity
??? Result Synthesis: Comparative capacity information

Output: Structured comparison with capacity details and citations

Security Enforcement

Access Control Validation:

Document Set:
??? Doc 1: User has direct access ? Visible
??? Doc 2: User lacks permission ? Hidden
??? Doc 3: User via group membership ? Visible

Search Results: Only documents 1 and 3 returned

Agent-Driven Development

Application Generation:

Input: “Build outdoor gear app with my search index”
Analysis: Automatic schema discovery + data sampling
Output: Complete Next.js application with search, facets, branding
Result: Functional e-commerce interface without manual development

Session Highlights

“We started with whatever elements we had in the room and we’re transitioning to a purpose-built system.” - Pablo Castro

“The reality is that if you think about how RAG was constructed at that point… it was built out of the tools available in the room.” - Pablo Castro

“What we are announcing today is what we call agentic retrieval… applying the same methods we use to create agents out there in our own search engine.” - Pablo Castro

“An interesting effect of all this super-smart retrieval systems and the copilots we build on top of them is that they’ll find everything. So it becomes super important that you have proper access control policies.” - Pablo Castro

“I’m highly incompetent in anything that has to do with UI, so I would’ve never been able to do something like this.” - Pablo Castro (on agent-generated application)

Implementation Guide

Getting Started with Agentic Retrieval

1. Knowledge Agent Creation

# Create Knowledge Agent
knowledge_agent = search_client.create_knowledge_agent(
    name="product_agent",
    data_sources=["contoso_products_index"],
    model="gpt-4o",
    policy={
        "relevance_threshold": 0.7,
        "max_queries": 5
    }
)

# Use Agent for Retrieval
response = knowledge_agent.retrieve(
    chat_history=[
        {"role": "user", "content": "What are popular tents?"},
        {"role": "assistant", "content": "TrailMaster and SkyView..."},
        {"role": "user", "content": "Which fits more people?"}
    ]
)

2. Multi-Modal Document Processing

**Portal Setup Process:**
1. Import and Vectorize Data wizard
2. Select data source (Blob Storage, OneLake, etc.)
3. Enable AI Document Intelligence
4. Configure image verbalization with GPT-4o
5. Set up embedding model for vectorization
6. Enable Knowledge Store for component addressability
7. Configure Semantic Ranker for quality improvement

3. Access Control Implementation

# Document with Access Control
document = {
    "id": "secure_doc_1",
    "content": "Confidential business information...",
    "users": ["alice@contoso.com"],
    "groups": ["finance_team", "management"],
    "roles": ["Storage Blob Data Reader"]
}

# Query with User Context
results = search_client.search(
    query="quarterly results",
    user_token=user_access_token  # Automatic permission filtering
)

Best Practices for Enterprise Deployment

Data Pipeline Optimization

Use AI Document Intelligence for complex multi-modal documents
Enable image verbalization when visual content contains critical information
Configure Knowledge Store for application integration and citation support
Set up incremental indexing for continuous data synchronization

Security Configuration

Map existing ACLs from source systems to search index fields
Use managed identity for secure service-to-service communication
Enable Entra ID integration for automatic group expansion
Test permission propagation with representative user scenarios

Performance Tuning

Use Semantic Ranker for improved result quality
Configure relevance thresholds based on application requirements
Monitor agentic retrieval costs vs. traditional search approaches
Implement caching strategies for frequently accessed content

Advanced Applications

Enterprise RAG Architecture

Comprehensive Implementation Stack:

Application Layer: Copilot/Agent Interface
??? Agentic Retrieval: Complex query processing
??? Multi-Modal Processing: Document + visual understanding  
??? Access Control: Entra ID integration
??? Data Integration: Logic Apps + native connectors
??? Foundation: Azure AI Search with semantic ranking

Development Acceleration

MCP-Enabled Workflows:

Automatic application generation from search index schemas
Schema discovery and data pattern recognition
UI component optimization based on facetable fields
Citation and reference system integration

Security and Compliance

Enterprise-Grade Features:

Document-level access control with dynamic group membership
Sensitivity label support for classified content
Audit trail integration for compliance reporting
RBAC policy enforcement across AI-powered systems

Resources and Links

Official Documentation

Agentic Retrieval Announcement - Complete technical overview and implementation guide
Agentic Retrieval Evaluation Results - Detailed performance metrics and methodology
Multi-Modal Updates - Document processing and AI integration features
What’s New in Azure AI Search - Comprehensive feature updates and roadmap

Sample Code and Demos

Agentic Retrieval Demo - Working implementation of knowledge agents
Multi-Modal Demo - Document processing with visual content
Azure MCP Repository - Model Context Protocol server for Azure services
Foundry MCP Server - AI Foundry integration examples

Security and Compliance

Private Preview: Sensitivity Labels - Enterprise document classification support

About the Speaker

Pablo Castro
CVP & Distinguished Engineer, AI Platform
Microsoft
Corporate Vice President leading Azure AI Search team with focus on state-of-the-art information understanding and retrieval systems. Expert in information retrieval, machine learning, distributed systems, and database systems. Co-founder of Lagash Systems (acquired by Mercado Libre).

This session reveals the transformation of enterprise AI from simple RAG implementations to sophisticated, purpose-built knowledge retrieval systems that understand context, process multi-modal content, and enforce enterprise security policies. Pablo Castro demonstrates how Azure AI Search has evolved beyond traditional search into an intelligent knowledge platform that powers the next generation of enterprise AI applications.

Knowledge Retrieval: RAG for Enterprise Agents with Azure AI Search

Executive Summary

Key Topics Covered

?? 1. The Evolution Beyond Traditional RAG

The Historical Context: Two Years of RAG Progress

The Transition to Purpose-Built Systems

?? 2. Agentic Retrieval: The Next Generation Query Pipeline

The Limitation of Single-Shot Search

Agentic Retrieval Architecture

Live Demonstration: Outdoor Gear Analysis

Performance Metrics and Validation

?? 3. Multi-Modal Document Processing Revolution

Beyond Simple Text Chunking

Advanced Document Intelligence

AI-Powered Processing Pipeline

Live Demo: Azure Search Documentation Processing

?? 4. Enterprise-Grade Security and Access Control

The Security Challenge of Intelligent Retrieval

Native Entra ID Integration

Live Security Demo

Advanced Security: Sensitivity Labels

?? 5. Data Integration and Pipeline Innovation

Logic Apps Integration for External Data

Comprehensive Data Source Support

Azure Data Lake Storage Gen2 ACL Propagation

?? 6. Model Context Protocol and Developer Experience

MCP Integration for Agent Development

Agent-Driven Development Demo

Technical Architecture Deep Dive

Knowledge Agent Abstraction

Multi-Modal Processing Pipeline

Security Architecture

Performance Metrics and Validation

Agentic Retrieval Effectiveness

Multi-Modal Processing Benefits

Live Demonstration Results

Agentic Retrieval in Action

Multi-Modal Document Processing

Security Enforcement

Agent-Driven Development

Session Highlights

Implementation Guide

Getting Started with Agentic Retrieval

1. Knowledge Agent Creation

2. Multi-Modal Document Processing

3. Access Control Implementation

Best Practices for Enterprise Deployment

Data Pipeline Optimization

Security Configuration

Performance Tuning

Advanced Applications

Enterprise RAG Architecture

Development Acceleration

Security and Compliance

Resources and Links

Official Documentation

Sample Code and Demos

Security and Compliance

About the Speaker