Tuesday, October 14, 2025
Home Innovation Red Hat Red Hat Launches AI Inference ...
Red Hat
Business Honor
10 June, 2025
Red Hat launches AI Inference Server to accelerate scalable, flexible LLM deployment across hybrid cloud environments.
Red Hat has released its AI Inference Server, a new offering that is designed to speed up and make it easier to deploy large language model (LLM) throughout hybrid cloud. It comes as part of Red Hat OpenShift AI and Red Hat Enterprise Linux AI (RHEL AI), but also as a standalone product, the AI Inference Server is based on the open-source VLLM project at the University of California, Berkeley. It provides multi-GPU, high-throughput generative AI inference, large input context handling, and batch streaming.
The platform is compatible with various accelerators like Google TPUs, NVIDIA GPUs, and AMD GPUs, allowing businesses to avoid vendor lock-in and scale infrastructure seamlessly.
Red Hat focuses on hardware diversity compatibility by supporting Arm, Intel, Power, and IBM Z architectures, consistent with its long-term vision of open and flexible AI solutions.
Red Hat's leadership underscored the open, partner-first mindset behind their AI strategy. NVIDIA, for instance, employs the VLLM inference engine in their own products, illustrating the cooperative ethos behind innovation in rival companies. VLLM has emerged as one of the most widely used open-source inference projects, illustrating Red Hat's desire to develop shared frameworks that drive the open-source ecosystem's growth.
In addition to the AI Inference Server, Red Hat extended OpenShift Virtualization's reach to all of the top public cloud providers—AWS, Microsoft Azure, Google Cloud, and Oracle Cloud—so that enterprises can modernize their traditional virtual machine workloads along with containers on a single hybrid and multicloud platform.
Red Hat is focusing on the Asia-Pacific region, particularly South Asia, to facilitate AI adoption and infrastructure modernization through regional independent software vendors and systems integrators. It plans to incorporate agent-based AI features for increased automation and operational efficiency.