Qualcomm shares have soared by 18% after announcing a new set of "chip-based accelerator cards" for AI inference.
QCOM is best known for mobile chips and connectivity, but it’s now entering the realm of data-centre AI inference accelerators.
Qualcomm Technologies, Inc. today announced the launch of its next-generation AI inference-optimized solutions for data centers: the Qualcomm® AI200 and AI250 chip-based accelerator cards, and racks. Building off the Company’s NPU technology leadership, these solutions offer rack-scale performance and superior memory capacity for fast generative AI inference at high performance per dollar per watt—marking a major leap forward in enabling scalable, efficient, and flexible generative AI across industries.
Qualcomm AI200 introduces a purpose-built rack-level AI inference solution designed to deliver low total cost of ownership (TCO) and optimized performance for large language & multimodal model (LLM, LMM) inference and other AI workloads. It supports 768 GB of LPDDR per card for higher memory capacity and lower cost, enabling exceptional scale and flexibility for AI inference.
The Qualcomm AI250 solution will debut with an innovative memory architecture based on near-memory computing, providing a generational leap in efficiency and performance for AI inference workloads by delivering greater than 10x higher effective memory bandwidth and much lower power consumption. This enables disaggregated AI inferencing for efficient utilization of hardware while meeting customer performance and cost requirements.
This doesn't sound like a threat to Nvidia in the training space but there is a fight for inference that also involves Nvidia, AMD, Intel and others in the space. This would fuel data centers and any increase in efficiency could curb the use of power, though there are many who would argue that more inference just begs for more AI usage.
We will also need to see the real-world prove out the claims of 10x 'higher effective memory bandwidth' and see the software that goes with it.