Technical Deep Dive: How DigitalOcean and AMD Delivered a 2x Production Inference Performance Increase for Character.ai

Link

2026-01-13 ~1 min read

Jump to TL;DR Jump to Summary Open Original ↗

⚡ TL;DR

Technical Deep Dive: How DigitalOcean and AMD Delivered a 2x Production Inference Performance Increase for Character. ai Background: How Character.

📝 Summary

Technical Deep Dive: How DigitalOcean and AMD Delivered a 2x Production Inference Performance Increase for Character. ai Background: How Character. ai worked with DigitalOcean and AMD to optimize performance Technical deep dive overview Technical optimizations DP1 / TP8 / EP8 with AITER DP2 / EP4 / TP4 with AITER Infrastructure setup Managed Kubernetes Cached Weights The New AI Systems Paradigm About the author(s) Related Articles Technical Deep Dive: How we Created a Security-hardened 1-Click Deploy Moltbot DoTs SDK Development: Automating TypeScript Client Generation Currents Report: How Growing Tech Businesses Use AI Today By Piyush Srivastava and Karnik Modi Published: January 13, 2026 13 min read Character. ai , a leading AI entertainment platform with about 20 million worldwide users, wanted to optimize GPU performance and achieve lower inference costs for its application, which requires low-latency performance at large scale. They approached DigitalOcean and AMD in order to achieve this goal. Working closely together, the Character. ai , AMD, and DigitalOcean teams optimized AMD Instinct™ MI300X and MI325X GPU platforms, resulting in a 2x production inference throughput. In optimized configurations, DigitalOcean delivered high request density per node while maintaining exceptional p90 responsiveness for initial token and sustained token generation throughput, outperforming prior deployments on generic, non-optimized GPU infrastructure. These gains were achieved through platform-level optimizations, including clever parallelization strategies for large Mixture-of-Experts models, efficient FP8 execution paths, optimized kernels with AITER, topology-aware GPU allocation, and production-ready Kubernetes orchestration through DigitalOcean Kubernetes (DOKS). Together, these capabilities allowed Character. ai to scale inference predictably without increasing operational burden. In this post, we will explore the specific orchestration and tuning strategies that made these gains possible.

Open the original post ↗ https://www.digitalocean.com/blog/technical-deep-dive-character-ai-amd