Building your GenAI Agents on VCF with Private AI Services

Link

⚡ TL;DR

Today at VMware Explore’s general session you saw Chris Wolf demonstrate Intelligent Assist for VMware Cloud Foundation, providing AI-powered assistance for our users. In this blog, we’ll take a step behind the curtain to see how these capabilities are running in VCF, using AI features that our customers can also use to build their own AI experiences with their own private data.

📝 Summary

Today at VMware Explore’s general session you saw Chris Wolf demonstrate Intelligent Assist for VMware Cloud Foundation, providing AI-powered assistance for our users. In this blog, we’ll take a step behind the curtain to see how these capabilities are running in VCF, using AI features that our customers can also use to build their own AI experiences with their own private data. VMware Private AI services enable administrators to safely and securely import and share approved AI models (Model Gallery and Model Governance); scale and run Models as a Service for their organization (Model Runtime and ML API gateway); create Knowledge bases and regularly refresh data in a fully supported vector database for creating RAG applications (Data Indexing and Retrieval Service in partnership with Data Services Manager); and provide developers a UI where they can compose models, knowledge bases, and tools together to create Agents (Agent Builder). The Intelligent Assist service is using these capabilities to run the Intelligent Assist agent, and VCF engineering teams are using these services as a common AI platform to deliver joint services and AI workflows. Customers can also use these same capabilities for their own teams. These features give private cloud administrators what they need to safely download, validate, and share models with teams across their cloud. Learn about how to safely onboard popular models from upstream and ensure the model’s behavior meets your enterprises’ expectations and requirements – and behavior doesn’t drift over time in this blog post. Now that you have models securely imported and shared with the right folks in your organization, you will want to run them in an efficient and scalable way. Gone are the days of every division running their own separate copies of the same popular models – instead your team can provide Models as a Service using the Model Runtime. Deploy models on a fully maintained runtime stack from directly within VCF, and then horizontally scale them as they come under load with no end user impact, as users broker their requests via the ML API gateway. This also gives you flexibility to do rolling upgrades of models with zero end user impact. This method of deploying models allows separate lines of business or tenants within a Cloud Service Provider to keep their data separate from each other while ensuring high GPU utilization.