So, you know how everyone's trying to make AI faster? Amazon.com, Inc. (AMZN)'s cloud division, AWS, thinks it has a new answer. On Friday, it announced a collaboration with AI hardware company Cerebras Systems. The goal is pretty straightforward: deliver what they claim will be the world's fastest AI inference for large language models (LLMs).
Think of it as a split-brain approach to making AI chat. The new solution marries AWS's own Trainium chips with Cerebras's CS-3 systems. They're calling the method "inference disaggregation." In simple terms, it splits the AI workload. The AWS Trainium handles the "prefill" stage—that's the part where it processes your initial input or question. Meanwhile, the Cerebras CS-3 takes over the "decode" stage, which is the heavy lifting of generating the actual output, word by word.
Why do this? Speed. "The result will be inference that's an order of magnitude faster and higher performance than what's available today," said David Brown, Vice President at AWS. That's corporate-speak for "way, way faster."
For customers, the path to this speed runs through Amazon Bedrock. The companies say the technology will be deployed within AWS data centers and accessible via Bedrock starting in the next couple of months. This makes AWS the first cloud provider to offer Cerebras's specialized hardware configured specifically for this disaggregated inference task.
The plan doesn't stop there. Later this year, AWS says it will add support for Amazon's own Nova model and other open-source models using this same Cerebras-powered infrastructure.
"Partnering with AWS... will bring the fastest inference to a global customer base," noted Cerebras CEO Andrew Feldman. It's a significant win for Cerebras, which is also known for providing massive computing capacity to Microsoft-backed OpenAI. The company now finds its hardware in the heart of AWS's cloud, built on the AWS Nitro System, which promises enterprise-grade security and isolation.
This move is also a clear shot across the bow in the AI chip wars. Cerebras competes directly with giants like NVIDIA Corp (NVDA) and Advanced Micro Devices, Inc. (AMD). By integrating a specialist like Cerebras into its cloud offering, AWS isn't just building its own chips (Trainium); it's also creating a best-of-breed hardware ecosystem. It's a strategy that gives cloud customers more options and potentially better performance for specific tasks—in this case, making LLMs respond blisteringly fast.
In the market, Amazon shares were down 0.98% at $207.48 at the time of publication on Friday.












