Nvidia’s meteoric rise has coincided with the proliferation and popularity of OpenAI’s ChatGPT. Its dominance over the GPU market constitutes 80% of high-end chips, putting it alongside such crucial firms as TSMC and ASML in the domination of their respective specialties in the global chip ecosystem. Its GPUs have now become the preferred choice of businesses operating AI-accelerated data centers. Nvidia's H100 GPU, commonly known as "Hopper," and successor Blackwell GPU architecture, are effectively the brains that allow for split-second decision-making by AI software and systems. The superior capabilities of Nvidia’s chips have given developers direct access to the GPUs’ parallel computation abilities, empowering them to use GPU technology for greater uses and functions than their initially intended role. [1] As a result, GPUs’ abilities to handle large amounts of data simultaneously has made them the go-to choice for LLM training and inference. Given the very nature of LLMs, GPUs have become the preferred form of electronic circuits due to their ability to perform mathematical calculations rapidly, making them the chief enabler of key technologies such as machine learning and artificial intelligence.
However, Nvidia may be facing stringent bottlenecks in the coming years as companies such as Cerebras, Groq, and AMD challenge them on the shift from the training front to the inference front. At its essence, model training requires feeding a model large datasets which require extreme processing power and expensive upfront costs. However, this is largely a one-time expense as the model does not need to be trained further, only requiring fine-tuning that employs less intensive techniques. [2] For these purposes, Nvidia’s GPUs have served as the quintessential powering element in building up LLMs. Nvidia’s moat currently revolves around the company’s Compute Unified Device Architecture (CUDA) platform, as it is the software toolkit and application programming interface (API) developers use to get the most out of their GPUs, including building LLMs.
To help WTW better understand the forces influencing the evolving AI market, the WTW Research Network (WRN) partnered with the University of Pennsylvania’s Wharton School and its Mack Institute’s Collaborative Innovation Program (CIP). Building on our previous work with the CIP and their Executive MBA students, Green Algorithms – AI and Sustainability, the WRN has sought to further examine the LLM competitive landscape including new disruptions and opportunities for optimization and efficiency. Part 1 looks at GenAI’s impact on risk management frameworks while Part 2 explores LLM Effectiveness at Scale. This piece rounds out the series, providing a look at the future of hardware computing and examining the implications for the market going forward as the industry moves from the training to the inference phase.
In addition to powering LLMs, the AI chip market affects both edge and cloud AI platforms. GPUs are a superior product for powering AI due to three main facets: GPUs employ parallel processing, GPU systems scale up to supercomputing heights, and the GPU software stack for AI is broad and deep. [3] These three factors have allowed LLMs to proliferate at breakneck speed due to the nearly 7,000 time increase in computing performance over the last twenty years. But the AI landscape is undergoing a shift from training to inference which will necessitate a change in the architecture from massive computing tasks for training to specialized solutions that require lesser workloads for inference.
According to Gary Dickerson, CEO of Applied Materials, there has been more investment and funding of AI chips in the last eighteen months than the previous eighteen years, with a twelve-month advantage netting a company approximately $100 billion. The explosion in AI is being driven by the need to process vast amounts of unstructured data, which accounts for 80% of all data. [4] As computing power and their processing abilities increase, the chip industry may look past the GPU-driven training that has heretofore dominated the LLM space.
Shifting from training to inference, AI chips will now take the reins in driving the AI boom going forward. Designed specifically to accelerate artificial intelligence tasks, these specialized processors deliver significant improvements in performance, efficiency, and cost-effectiveness compared to general-purpose central processing units (CPUs) and GPUs.
Opportunities | Challenges | ||
---|---|---|---|
Ability to iterate and bring cutting-edge solutions faster than incumbents | Agility in Innovation | Regulatory Compliance | Must navigate stringent data and security laws. |
Partnerships with generative AI companies for tailored solutions | Collaboration Potential | Capital Requirements | Need significant investment to scale and produce at competitive costs |
Startups can focus on niche demands such as edge computing, energy-efficient designs or photonic chips | Specialization in Emerging Needs | Market Entry Barriers | High R&D costs and competition with dominant players like Nvidia |
Our study with Wharton identified a number of key challenges and opportunities Nvidia’s competitors may face in the coming years. Economies of scale, innovation lags, increased collaboration, CAPEX requirements, regulations, brand trust, and inability to adapt are some of the most prominent hindrances that both startups and incumbents in the AI chip space will face.
One such candidate that hopes to take on Nvidia directly is Cerebras and its Wafer Scale Engine (WSE)-3 chip. The WSE-3 packs 900,000 AI cores onto a single processor and integrates an entire GPU cluster’s worth of computing power onto a single chip. Its high bandwidth, low latency properties are powered by 4 trillion transistors, a more than 50 percent increase over the previous generation thanks to the use of newer chipmaking technology. The WSE-3 is 72 square inches and holds 50 times more computing power than Nvidia’s H100 GPU, which by contrast is 1 square inch and contains 80 billion transistors. [5]
WSE-3 | Nvidia H100 | Cerebras Advantage | |
---|---|---|---|
Chip size | 46,225m2 | 814mm2 | 57X |
Cores | 900,000 | 16,896 FP32 + 528 Tensor | 52X |
On-chip memory | 44 Gigabytes | 0.05 Gigabytes | 880X |
Memory bandwidth | 21 Petabytes/sec | 0.003 Petabytes/sec | 7,000X |
Fabric bandwidth | 214 Petabytes/sec | 0.0576 Petabytes/sec | 3,715X |
Hyperscale and startup companies such as AMD, Graphcore, Intel, Amazon, and Alphabet are also trying to break Nvidia’s near-monopoly by taking advantage of the inference phase. AMD’s MI325X AI chip and Ryzen processors and Intel’s Gaudi series and Xeon processors are amongst the leading contenders to compete with Nvidia’s GPUs and CUDA API. Anthropic, an OpenAI competitor, has opted to train its Claude models with Amazon’s Trainium AI chips for training and inference. Moreover, the shift in AI processing from data centers to edge devices could challenge Nvidia's dominance. Large companies such as Apple and Broadcom are collaborating with one another to build specialized system-on-chip (SoC) products to run AI models on personalized devices with neural processors for superior privacy, heat reduction, and speed. Having ended their relationship with Nvidia entirely, Apple now designs and manufactures its own processors for mobile phones, Macs, and wearable devices utilizing its Apple silicon series – and is currently set to release their AI server processor “Baltra” in 2026 to power the AI services and features built into the company’s operating systems [6].
The term AI accelerator is increasingly used to describe more specialized AI chips, such as neural processing units (NPUs) or tensor processing units (TPUs). While general-purpose GPUs are effective when used as AI accelerators, other types of purpose-built AI hardware might offer similar or better computational power with improved energy efficiency and greater throughput for AI workloads. This is where emerging startups such as Cerebras and Groq are disrupting the chip ecosystem. By focusing on AI-specific processors that prioritize energy efficiency and scaling, these companies are able to manipulate current market deficiencies to their advantage. While challenging Nvidia, these firms are having to contend with the exponentially high cost of high-end wafers, photonics, and chip-making technologies. However, as more of these products are being adopted, AI chip disruptors will be able to take advantage of economies of scale and bring the price of their products down to a commercially attractive price point.
The shift from training to inference is now being undertaken in a rapidly shifting technological landscape that will have impacts upon not only Nvidia’s dominant position but on chatbot rollouts and AI adoption across the board. The specialized nature of AI inference chips has opened the door up to disruptors and startup innovators who see a need to build custom-built chips for AI infrastructure. While hyperscalers have traditionally been Nvidia’s main buyers of Nvidia’s premier GPU chips to power their hardware, makers of inference chips are tending to focus on a broader market spanning much of the Fortune 500 and smaller enterprises. As companies are looking to build out their AI infrastructure and integrate GenAI into their business practices, specialized chips are proving to be the more economical choice to Nvidia’s more expensive, heavy-laden GPUs.
The focus of demand for inference will take up a greater portion of AI’s needs as widespread adoption by businesses and societies continues to accelerate. Morgan Stanley estimates that inference requirements and the consolidation of enterprise data on cloud, edge, and personal devices will make up more than 75% of power and computational demands, while Barclays predicts that Nvidia will serve only 50% of the frontier inference market share [7]. The inference phase will focus on driving down the cost of operating AI models that inherently require less compute power and simpler IT and software rollouts, with companies having to strike a balance between deploying models and putting them into production. Greater precision, improved efficiency, cost-effectiveness, and scalability will be the determining factors in the coming years that companies will consider in order to gain and/or retain their competitive edge in deploying GenAI capabilities.
WTW hopes you found the general information provided in this publication informative and helpful. The information contained herein is not intended to constitute legal or other professional advice and should not be relied upon in lieu of consultation with your own legal advisors. In the event you would like more information regarding your insurance coverage, please do not hesitate to reach out to us. In North America, WTW offers insurance products through licensed entities, including Willis Towers Watson Northeast, Inc. (in the United States) and Willis Canada Inc. (in Canada).