Bring Your Own Model to Windows using Windows ML
Session Date: May 20, 2025
Duration: 1 hour
Venue: Build 2025 Conference - BRK225
Speakers: Ryan Demopoulos (Principal Product Manager, Microsoft), Xiaoxi Han (Senior Software Engineer, Microsoft)
Link: [Microsoft Build 2025 Session BRK225]

Executive Summary
Ryan Demopoulos and Xiaoxi Han unveil Windows ML 2.0 Experimental 1, a complete ground-up redesign of Microsoft’s AI inferencing framework for Windows. This session demonstrates how developers can achieve unprecedented hardware scalability across CPU, GPU, and NPU with a single codebase, automated dependency management, and intelligent device selection policies. The live coding demonstration showcases building a ResNet-50 image classification app from scratch, while Powder’s real-world implementation proves the platform’s production readiness for AI-native gaming experiences.
Key Topics Covered
?? 1. The Windows AI Revolution
Windows Ecosystem Dominance
Ryan’s Opening Vision: > “Windows powers the vast majority of desktop and laptop PCs all around the world. That alone is sometimes staggering to think about… when you apply it to the revolution in AI that we are seeing all across the world, it really means that that AI revolution is going to unfold on Windows more than any other place.”
Hardware Diversity Challenge:
- Range: $300 laptops to $3,000+ high-performance desktops
- Silicon Variety: Multiple NPU manufacturers, diverse GPU architectures, CPU variations
- Developer Burden: Targeting hardware diversity traditionally complex and time-consuming
- Windows ML Goal: Write once, run optimally everywhere across all hardware configurations
The Three Core Objectives
Developer Empowerment Framework:
- Cross-Hardware Scalability - Single codebase running across all Windows hardware diversity
- Performance Maximization - Optimal utilization of each individual PC’s capabilities
- Dependency Simplification - Eliminate complex dependency management, procurement, and updates
?? 2. Live Hardware Demonstration: One App, Four Platforms
Multi-Device Performance Showcase
Identical ResNet-50 Application Running Across:
Hardware Platform Matrix:
??? Surface Laptop 7 (Qualcomm Snapdragon NPU)
??? Intel Surface Device (Lunar Lake NPU)
??? AMD Machine (AMD NPU)
??? Dell Alienware (NVIDIA RTX 4090 GPU)
Demonstration Results:
- Same executable - 2MB app size across all platforms
- Automatic hardware detection - No device-specific configuration required
- Optimal performance - Each platform utilizing best available processor
- Unified prediction results - 99%+ confidence golden retriever classification across all devices
Cross-Silicon Intelligence
Windows ML’s Platform Adaptability:
- NPU Diversity - Three different manufacturers, different chip designs, automatic optimization
- GPU Integration - Seamless fallback to discrete graphics for maximum performance
- CPU Compatibility - Universal baseline ensuring compatibility on any Windows 11 PC
- Future Hardware Support - Automatic compatibility with unreleased silicon architectures
??? 3. Windows ML 2.0 Architecture: Built from Ground Up
From DirectML to Complete Redesign
The Performance Imperative: > “We need something in the platform that’s faster, faster in two different ways. First, just wall clock faster, something that can strip away the layers and the abstractions between your app code and the silicon… But we also need the platform to be able to move at the speed of AI innovation.”
Windows AI Foundry Integration:
Windows AI Foundry Architecture:
??? Windows ML Runtime (Core inferencing engine)
??? Foundry Local (Ready-to-use catalog models)
??? Built-in Windows AI APIs (OCR, text intelligence, etc.)
??? Public APIs (Direct Windows ML access)
??? Hardware Certification Program (IHV partnership)
ONNX Runtime Foundation
Technical Infrastructure:
- Open Standard ONNX Models - Mature, fast inferencing runtime with proven performance
- 20%+ Performance Improvement - Typical inference gains after ONNX conversion
- PyTorch Conversion - Direct path from PyTorch training to Windows deployment
- Execution Provider Architecture - Translation layer between runtime and diverse silicon
Execution Provider Innovation
Hardware Abstraction Excellence:
- Dedicated GPU Providers - Optimized for wide range of GPU hardware
- Refreshed NPU Providers - Partnership-developed with hardware manufacturers
- CPU Excellence - ONNX’s proven CPU support for universal compatibility
- First-Class Treatment - Each processor type treated as primary target, not fallback
?? 4. Dependency Management Revolution
The Current Developer Pain Point
Xiaoxi’s Problem Statement: > “The status quo today is that you also need to grab an AI runtime out of the sea of complex options, and that runtime becomes part of your app, where the burden of maintaining it falls on you… It’s just the norm, and it sucks.”
Traditional Complexity:
Legacy Dependency Management:
??? Runtime Selection: Choose from complex AI runtime options
??? Hardware Add-ons: ONNX execution providers for specific hardware
??? Model Variants: Multiple model copies optimized for different processors
??? Installer Logic: Complex hardware detection and model selection
??? Maintenance Burden: Updates, patches, compatibility testing
Windows ML Simplified Approach
One NuGet Package Solution:
Streamlined Development:
??? Single NuGet: Microsoft.Windows.AI.MachineLearning
??? Automatic Runtime: Bootstrapper API installs Windows ML runtime
??? Store Servicing: Microsoft handles runtime updates and fixes
??? Hardware Scanning: Automatic execution provider downloads
??? Model Resource Packs: Microsoft Store automatic model selection
Intelligent Resource Management:
- Hardware-Specific Downloads - Only necessary execution providers downloaded per device
- Minimal App Size - No bundled execution providers or multiple model variants
- Store Integration - Model variants with hardware metadata for automatic selection
- Zero Installer Logic - No complex hardware detection required in application code
? 5. Device Policies: Intelligent Performance Optimization
Smart Workload Distribution
Ryan’s Performance Philosophy: > “Device policies are basically a way for you to describe what outcome you want when you run your AI workload with Windows ML.”
Policy Options and Behavior:
Windows ML Device Policies:
??? Max Performance: Select discrete GPU for fastest inference
??? Min Overall Power: Choose NPU for battery optimization
??? Default/Unspecified: CPU for universal compatibility and accuracy
??? Explicit Selection: Direct NPU/GPU specification when desired
??? Future: Workload Splitting across multiple processors
Live Policy Demonstration
Real-Time Hardware Switching:
- CPU Baseline - Universal compatibility demonstration
- GPU Engagement - Task Manager showing graphics engine activation
- NPU Utilization - Neural processing unit spike demonstration
- One-Line Changes - Single property modification redirects entire workload
Future Workload Splitting
Multi-Processor Coordination:
- Single AI Workload - Distributed across multiple processor types
- Maximum Performance - Utilize all available hardware simultaneously
- Device Policy Integration - Seamless integration with existing policy system
- Performance Multiplication - Greater than sum-of-parts optimization
?? 6. Live Coding Demonstration: ResNet-50 from Scratch
AI Toolkit VS Code Extension
Model Conversion Innovation: > “We have just added a new functionality to it that makes this much easier… What this tool does is it provides a streamlined experience to get a model from Hugging Face, convert it to ONNX, optimize, and quantize it.”
Model Lab Capabilities:
AI Toolkit New Features:
??? Hugging Face Integration: Direct model sourcing
??? ONNX Conversion: Automatic format transformation
??? Optimization: IHV-specific performance tuning
??? Quantization: Size and power consumption optimization
??? 11 Popular Models: Pre-optimized for Windows ML
Hardware-Specific Optimization Workflows:
- Convert to QNN - Qualcomm NPU optimization
- Convert to AMD NPU - AMD-specific neural processing optimization
- Convert to Intel NPU - Intel neural processing unit tuning
- Future Convergence - Single model working optimally across all NPUs
Complete Application Development
From Zero to AI in Minutes:
// Initialization
var infrastructure = new Infrastructure();
await infrastructure.DownloadPackagesAsync();
await infrastructure.RegisterExecutionProviderLibrariesAsync();
// Session Creation with Policy
var sessionOptions = new SessionOptions();
sessionOptions.DeviceSelectionPolicy = DeviceSelectionPolicy.MaxPerformance;
var session = new InferenceSession(modelPath, sessionOptions);
// Inference
var results = session.Run(inputTensor);Live Development Results:
- Hardware Detection - Automatic Qualcomm NPU utilization
- Performance Switching - GPU engagement with single property change
- Power Optimization - NPU selection for battery conservation
- Cross-Platform Guarantee - Same code working on AMD, Intel, NVIDIA hardware
Model Compilation API
Performance Optimization:
- Pre-compilation - Avoid runtime compilation delays for large models
- Disk Caching - Save compiled models for instant loading
- Multi-gigabyte Models - Essential for LLM and complex AI workloads
- Seamless Integration - Optional optimization with minimal code changes
?? 7. Industry Partnership and Certification Program
Hardware Vendor Collaboration
Certification Program Vision: > “As these manufacturers bring new hardware to market, at the same time, they will either refresh or create new execution providers to coincide with that new hardware… You don’t have to change any of your app code.”
Partner Ecosystem:
IHV Partnership Program:
??? NVIDIA: GPU and future hardware optimization
??? AMD: NPU and graphics integration
??? Intel: Neural processing and CPU optimization
??? Qualcomm: Snapdragon NPU development
??? Certification Process: Quality assurance and accuracy testing
Hardware Partner Testimonials
Manish Sirdeshmukh (Hardware Perspective): > “The Windows ML runtime offers exactly that by auto install of the runtime and execution providers specific to the hardware it’s running on. ISVs do not need to select EPs at compile time.”
Jesse Clayton (NVIDIA): > “Windows ML helps Nvidia and other hardware vendors deliver optimizations to ISVs and users, while reducing the barriers to adoption.”
????? 8. Developer Partner Success Stories
Enterprise Adoption Feedback
Volker R�lke (Adobe Premier Pro/After Effects): > “Premier Pro and After Effects are leading professional video editing applications that often handle terabytes of video footage. Our goal is to adopt the new Windows ML once it matures enough to handle the heavy ML workloads.”
Carl Woodward (Development Perspective): > “We look forward to replacing model load and inference code from multiple SDKs with Windows ML. This will simplify our code and testing whenever runtimes update.”
Rapid Development Success
Luyan Zhang (Simplicity Achievement): > “The simplicity amazed me. Following Microsoft easy approach, get an ONNX model, add it to your app, and integrate it into your code, we converted a complex AI feature to Windows ML in just three days.”
Barth�l�my Kiss (Powder Strategic Advantage): > “Powder is an early adopter of Windows ML, and it has enabled us to integrate models three times faster, transforming speed into a key strategic advantage.”
?? 9. Powder: Real-World AI Gaming Application
Automatic Gameplay Highlight Detection
Barth�l�my’s AI-Native Vision: > “We transform gameplay into highlights automatically. How does it work? You just play. We record, find great moments, and package them for sharing… It’s the kind of AI-native experience that just starts to be possible.”
Powder Technical Achievement:
AI Gaming Platform:
??? Real-Time Analysis: Live gameplay AI model execution
??? Semantic Understanding: Audio and visual game comprehension
??? Cross-Hardware Deployment: Single model across all NPUs
??? Zero SDK Integration: No game developer modification required
??? Automatic Content Creation: AI-generated highlight reels
Live Gaming Demonstration
Technical Implementation:
- AMD Strix Halo - ASUS Z30 Copilot+ PC demonstration
- Level Completion Detection - Custom AI model trained for game events
- Background Processing - Vision model running continuously during gameplay
- Instant Results - Two level completions detected and packaged automatically
- Cinematic Highlights - 40 minutes of Skull and Bones condensed to key moments
Business Impact and Scale
Small Team Achievement:
- 15-Person Team - 10 developers achieving enterprise-grade AI deployment
- Cross-Silicon Success - “We do the work once, and it works everywhere”
- Deployment Simplification - No deep silicon knowledge required
- Strategic Advantage - 3x faster integration enabling competitive differentiation
Session Highlights
“Windows powers the vast majority of desktop and laptop PCs all around the world… that AI revolution is going to unfold on Windows more than any other place.” - Ryan Demopoulos
“The status quo today is that you also need to grab an AI runtime out of the sea of complex options… It’s just the norm, and it sucks.” - Xiaoxi Han
“We need something in the platform that’s faster, faster in two different ways. First, just wall clock faster… But we also need the platform to be able to move at the speed of AI innovation.” - Ryan Demopoulos
“The simplicity amazed me… we converted a complex AI feature to Windows ML in just three days.” - Luyan Zhang (Developer Partner)
“We do the work once, and it works everywhere.” - Barth�l�my Kiss (Powder)
Technical Architecture Deep Dive
Windows ML 2.0 Core Components
Runtime Architecture:
??? ML Layer: Initialization APIs, Generative AI helpers (WinRT with C/managed projections)
??? ONNX Runtime Layer: Full ONNX Runtime API surface via flat C APIs
??? Execution Providers: Hardware-specific optimization layers
??? Device Policies: Intelligent workload distribution system
??? Certification Program: IHV partnership ensuring future compatibility
Development Integration Stack
Developer Experience:
??? Single NuGet Package: Microsoft.Windows.AI.MachineLearning
??? AI Toolkit VS Code Extension: Model conversion and optimization
??? Automatic Dependency Management: Runtime and execution provider downloads
??? Store Integration: Model resource packs with hardware metadata
??? GitHub Resources: Sample code and documentation
Hardware Compatibility Matrix
Supported Platforms:
??? NPU Support: Qualcomm Snapdragon, Intel Lunar Lake, AMD Strix
??? GPU Support: NVIDIA RTX series, AMD graphics, Intel integrated
??? CPU Support: Universal compatibility across all Windows 11 PCs
??? Future Hardware: Automatic compatibility through certification program
??? Performance Optimization: Hardware-specific execution providers
Implementation Guidelines
Getting Started with Windows ML 2.0
**Prerequisites:**
- Windows 11 PC with any supported hardware (CPU minimum)
- Visual Studio with .NET target framework 10.0.26100.0
- AI Toolkit VS Code Extension for model conversion
- Microsoft.Windows.AI.MachineLearning NuGet package
**Basic Implementation Steps:**
1. Install Windows ML NuGet package
2. Initialize Infrastructure object for runtime setup
3. Download applicable execution providers automatically
4. Create InferenceSession with device policy
5. Run inference with ONNX modelPerformance Optimization Strategies
**Device Policy Selection:**
- **Max Performance**: Discrete GPU utilization for fastest inference
- **Min Overall Power**: NPU selection for battery-optimized applications
- **Default**: CPU for universal compatibility and highest accuracy
- **Explicit Hardware**: Direct NPU/GPU specification when requirements known
**Model Optimization:**
- Use AI Toolkit VS Code Extension for hardware-specific optimization
- Implement model compilation API for large models (LLMs, complex networks)
- Leverage quantization for size and power consumption reduction
- Consider hardware-specific model variants through Store resource packsEnterprise Deployment Considerations
**Production Readiness:**
- Current release is Experimental 1 - not for production use
- Windows ML 2.0 Stable planned for later 2025
- Feedback essential for stable release timeline and feature completeness
- Consider pilot programs with Experimental 1 for evaluation
**Hardware Planning:**
- Future hardware automatically supported through certification program
- No application updates required for new silicon compatibility
- Investment protection through forward compatibility guarantee
- Performance improvements delivered automatically through execution provider updatesResources and Further Learning
Official Documentation and Tools
- Try Windows ML - Documentation and getting started guides
- Windows ML Build Repository - Sample applications in C#, Python, C++
- AI Toolkit VS Code Extension - Model conversion and optimization
- Windows AI Foundry Overview - Complete AI platform documentation
Community and Support
- The Hub Expert Meetups - Direct access to Windows AI team at Build
- GitHub Issues - Bug reports and feature requests
- Windows AI Feedback - General Windows AI Foundry feedback
- Feedback Hub - Windows feedback collection
About the Speakers
Ryan Demopoulos
Principal Product Manager
Microsoft
18-year Product Manager veteran on Windows development platform, currently focused on Windows AI platform technologies. Previously worked on WinUI, Windows App SDK, and various Windows platform components. Passionate about curling, family, and gaming (Dota 2, roguelike deck builders).
Xiaoxi Han
Senior Software Engineer
Microsoft
Senior Software Engineer focused on Windows AI platform technologies. Previously worked on Windows Subsystem for Android, PC Game Pass, and other Windows technologies. Deep expertise in AI runtime development and cross-platform optimization.
This comprehensive session demonstrates Microsoft’s commitment to democratizing AI development on Windows through intelligent platform design, automatic hardware optimization, and unprecedented developer experience simplification. Windows ML 2.0 represents a fundamental shift toward write-once, run-everywhere AI applications with optimal performance across the entire Windows hardware ecosystem.