Part 3 - The Future of Hardware Computing for AI

LLMs: GPUs for Training and Inference

Nvidia’s meteoric rise has coincided with the proliferation and popularity of OpenAI’s ChatGPT. Its dominance over the GPU market constitutes 80% of high-end chips, putting it alongside such crucial firms as TSMC and ASML in the domination of their respective specialties in the global chip ecosystem. Its GPUs have now become the preferred choice of businesses operating AI-accelerated data centers. Nvidia's H100 GPU, commonly known as "Hopper," and successor Blackwell GPU architecture, are effectively the brains that allow for split-second decision-making by AI software and systems. The superior capabilities of Nvidia’s chips have given developers direct access to the GPUs’ parallel computation abilities, empowering them to use GPU technology for greater uses and functions than their initially intended role. ^[1] As a result, GPUs’ abilities to handle large amounts of data simultaneously has made them the go-to choice for LLM training and inference. Given the very nature of LLMs, GPUs have become the preferred form of electronic circuits due to their ability to perform mathematical calculations rapidly, making them the chief enabler of key technologies such as machine learning and artificial intelligence.

However, Nvidia may be facing stringent bottlenecks in the coming years as companies such as Cerebras, Groq, and AMD challenge them on the shift from the training front to the inference front. At its essence, model training requires feeding a model large datasets which require extreme processing power and expensive upfront costs. However, this is largely a one-time expense as the model does not need to be trained further, only requiring fine-tuning that employs less intensive techniques. ^[2] For these purposes, Nvidia’s GPUs have served as the quintessential powering element in building up LLMs. Nvidia’s moat currently revolves around the company’s Compute Unified Device Architecture (CUDA) platform, as it is the software toolkit and application programming interface (API) developers use to get the most out of their GPUs, including building LLMs.

To help WTW better understand the forces influencing the evolving AI market, the WTW Research Network (WRN) partnered with the University of Pennsylvania’s Wharton School and its Mack Institute’s Collaborative Innovation Program (CIP). Building on our previous work with the CIP and their Executive MBA students, Green Algorithms – AI and Sustainability, the WRN has sought to further examine the LLM competitive landscape including new disruptions and opportunities for optimization and efficiency. Part 1 looks at GenAI’s impact on risk management frameworks while Part 2 explores LLM Effectiveness at Scale. This piece rounds out the series, providing a look at the future of hardware computing and examining the implications for the market going forward as the industry moves from the training to the inference phase.

Challenges to Nvidia’s Dominance: More or Less Computing Power?

In addition to powering LLMs, the AI chip market affects both edge and cloud AI platforms. GPUs are a superior product for powering AI due to three main facets: GPUs employ parallel processing, GPU systems scale up to supercomputing heights, and the GPU software stack for AI is broad and deep. ^[3] These three factors have allowed LLMs to proliferate at breakneck speed due to the nearly 7,000 time increase in computing performance over the last twenty years. But the AI landscape is undergoing a shift from training to inference which will necessitate a change in the architecture from massive computing tasks for training to specialized solutions that require lesser workloads for inference.

According to Gary Dickerson, CEO of Applied Materials, there has been more investment and funding of AI chips in the last eighteen months than the previous eighteen years, with a twelve-month advantage netting a company approximately $100 billion. The explosion in AI is being driven by the need to process vast amounts of unstructured data, which accounts for 80% of all data. ^[4] As computing power and their processing abilities increase, the chip industry may look past the GPU-driven training that has heretofore dominated the LLM space.

Shifting from training to inference, AI chips will now take the reins in driving the AI boom going forward. Designed specifically to accelerate artificial intelligence tasks, these specialized processors deliver significant improvements in performance, efficiency, and cost-effectiveness compared to general-purpose central processing units (CPUs) and GPUs.

Table 1: Implications for Startups in the AI Chip Market

Implications for Startups in the AI Chip Market
Source: Wharton, WTW
	Opportunities	Challenges
Ability to iterate and bring cutting-edge solutions faster than incumbents	Agility in Innovation	Regulatory Compliance	Must navigate stringent data and security laws.
Partnerships with generative AI companies for tailored solutions	Collaboration Potential	Capital Requirements	Need significant investment to scale and produce at competitive costs
Startups can focus on niche demands such as edge computing, energy-efficient designs or photonic chips	Specialization in Emerging Needs	Market Entry Barriers	High R&D costs and competition with dominant players like Nvidia

Our study with Wharton identified a number of key challenges and opportunities Nvidia’s competitors may face in the coming years. Economies of scale, innovation lags, increased collaboration, CAPEX requirements, regulations, brand trust, and inability to adapt are some of the most prominent hindrances that both startups and incumbents in the AI chip space will face.

One such candidate that hopes to take on Nvidia directly is Cerebras and its Wafer Scale Engine (WSE)-3 chip. The WSE-3 packs 900,000 AI cores onto a single processor and integrates an entire GPU cluster’s worth of computing power onto a single chip. Its high bandwidth, low latency properties are powered by 4 trillion transistors, a more than 50 percent increase over the previous generation thanks to the use of newer chipmaking technology. The WSE-3 is 72 square inches and holds 50 times more computing power than Nvidia’s H100 GPU, which by contrast is 1 square inch and contains 80 billion transistors. ^[5]

Table 2: The Cerebras WSE-3 surpassed all other processors in AI-optimized cores, memory speed and on-chip fabric bandwidth

The Cerebras WSE-3 surpassed all other processors in AI-optimized cores, memory speed and on-chip fabric bandwidth
Source: Wharton, WTW
	WSE-3	Nvidia H100	Cerebras Advantage
Chip size	46,225m²	814mm²	57X
Cores	900,000	16,896 FP32 + 528 Tensor	52X
On-chip memory	44 Gigabytes	0.05 Gigabytes	880X
Memory bandwidth	21 Petabytes/sec	0.003 Petabytes/sec	7,000X
Fabric bandwidth	214 Petabytes/sec	0.0576 Petabytes/sec	3,715X

Hyperscale and startup companies such as AMD, Graphcore, Intel, Amazon, and Alphabet are also trying to break Nvidia’s near-monopoly by taking advantage of the inference phase. AMD’s MI325X AI chip and Ryzen processors and Intel’s Gaudi series and Xeon processors are amongst the leading contenders to compete with Nvidia’s GPUs and CUDA API. Anthropic, an OpenAI competitor, has opted to train its Claude models with Amazon’s Trainium AI chips for training and inference. Moreover, the shift in AI processing from data centers to edge devices could challenge Nvidia's dominance. Large companies such as Apple and Broadcom are collaborating with one another to build specialized system-on-chip (SoC) products to run AI models on personalized devices with neural processors for superior privacy, heat reduction, and speed. Having ended their relationship with Nvidia entirely, Apple now designs and manufactures its own processors for mobile phones, Macs, and wearable devices utilizing its Apple silicon series – and is currently set to release their AI server processor “Baltra” in 2026 to power the AI services and features built into the company’s operating systems ^[6].

The term AI accelerator is increasingly used to describe more specialized AI chips, such as neural processing units (NPUs) or tensor processing units (TPUs). While general-purpose GPUs are effective when used as AI accelerators, other types of purpose-built AI hardware might offer similar or better computational power with improved energy efficiency and greater throughput for AI workloads. This is where emerging startups such as Cerebras and Groq are disrupting the chip ecosystem. By focusing on AI-specific processors that prioritize energy efficiency and scaling, these companies are able to manipulate current market deficiencies to their advantage. While challenging Nvidia, these firms are having to contend with the exponentially high cost of high-end wafers, photonics, and chip-making technologies. However, as more of these products are being adopted, AI chip disruptors will be able to take advantage of economies of scale and bring the price of their products down to a commercially attractive price point.

Conclusion

The shift from training to inference is now being undertaken in a rapidly shifting technological landscape that will have impacts upon not only Nvidia’s dominant position but on chatbot rollouts and AI adoption across the board. The specialized nature of AI inference chips has opened the door up to disruptors and startup innovators who see a need to build custom-built chips for AI infrastructure. While hyperscalers have traditionally been Nvidia’s main buyers of Nvidia’s premier GPU chips to power their hardware, makers of inference chips are tending to focus on a broader market spanning much of the Fortune 500 and smaller enterprises. As companies are looking to build out their AI infrastructure and integrate GenAI into their business practices, specialized chips are proving to be the more economical choice to Nvidia’s more expensive, heavy-laden GPUs.

The focus of demand for inference will take up a greater portion of AI’s needs as widespread adoption by businesses and societies continues to accelerate. Morgan Stanley estimates that inference requirements and the consolidation of enterprise data on cloud, edge, and personal devices will make up more than 75% of power and computational demands, while Barclays predicts that Nvidia will serve only 50% of the frontier inference market share ^[7]. The inference phase will focus on driving down the cost of operating AI models that inherently require less compute power and simpler IT and software rollouts, with companies having to strike a balance between deploying models and putting them into production. Greater precision, improved efficiency, cost-effectiveness, and scalability will be the determining factors in the coming years that companies will consider in order to gain and/or retain their competitive edge in deploying GenAI capabilities.

Energy Efficiency

Shift workloads to edge AI to optimize inference costs, improve latency, and enable hybrid cloud-edge solutions for scalable and sustainable Al.
Partner with renewable energy providers to establish energy-efficient hardware such as data centers and invest in energy-efficient hardware, such as reduced-precision TPUs and QPUs.
Invest in energy-efficient hardware to enhance performance while minimizing environmental impact.

Enterprise Cost Optimization

Build cost-transparent ecosystems using open-source Al tools and adaptable hardware to enhance accessibility and reduce costs.
Offer subscription-based pricing models for hybrid deployments making solutions more affordable for enterprises.
Invest in specialized hardware to reduce long-term operational costs and improve performance.

Privacy & Security

Embed privacy-by-design features into hardware and software to ensure compliance in sensitive industries.
Partner with privacy-focused edge Al startups to meet regional regulatory requirements and address trust gaps.
Focus on achieving certifications and align with regulatory standards to build trust and enhance adoption.

Innovation & Collaboration

Build ecosystems that integrate startup innovations into LFM stacks through partnerships or acquisitions.
Advocate for AI hardware standardization to simplify integration and shoft competition toward software innovation.
Form alliances with chip startups to reduce reliance on Nvidia and leverage strengths in edge Al solutions.
Leverage open-source AI models with adaptable hardware to lower adoption barriers for cost-sensitive industries and improve collaboration.

References

Disclaimer

WTW hopes you found the general information provided here informative and helpful. The information contained herein is not intended to constitute legal or other professional advice and should not be relied upon in lieu of consultation with your own legal advisors. In the event you would like more information regarding your insurance coverage, please do not hesitate to reach out to us. In North America, WTW offers insurance products through licensed entities, including Willis Towers Watson Northeast, Inc. (in the United States) and Willis Canada Inc. (in Canada).

Authors

Anas Alfarra

MBA Student, Wharton University, Pennsylvania, USA

Swetha Garimalla

MBA Student, Wharton University, Pennsylvania, USA

Carlos Loarte

MBA Student, Wharton University, Pennsylvania, USA

Crystal McKinney

MBA Student, Wharton University, Pennsylvania, USA

Sonal Madhok

Technology Risks Analyst

email Email

Omar Samhan

Technology and People Risks Analyst

email Email

Related capabilities

Related insights

See all insights

List of website locations and languages available in Americas
Location	Languages Available
Argentina	Spanish
Bermuda	English
Brazil	Portuguese
Canada	English French
Chile	Spanish
Colombia	Spanish
Costa Rica	Spanish
El Salvador	Spanish
Guatemala	Spanish
Honduras	Spanish
Mexico	Spanish
Nicaragua	Spanish
Panama	Spanish
Peru	Spanish
United States	English
Venezuela	Spanish

List of website locations and languages available in Asia-Pacific
Location	Languages Available
Australia	English
China	Simplified Chinese
Hong Kong (China, SAR)	English
India	English
Indonesia	English
Japan	Japanese
Korea	Korean
Malaysia	English
New Zealand	English
Philippines	English
Singapore	English
Taiwan	Traditional Chinese
Thailand	English Thai
Vietnam	English Vietnamese

List of website locations and languages available in Europe
Location	Languages Available
Austria	German
Belgium	English French Flemish
Croatia	English Croatian
Czech Republic	English Czech
Denmark	Danish
Finland	Finnish
France	French
Germany	German
Greece	Greek
Hungary	Hungarian
Ireland	English
Italy	Italian
Kazakhstan	Kazakh Russian
Luxembourg	French
Netherlands	Dutch English
Norway	Norwegian
Poland	Polish
Portugal	Portuguese
Romania	Romanian
Serbia	Serbian
Slovakia	Slovak
Spain	Spanish
Sweden	English Swedish
Switzerland	English French German
Turkey	Turkish
Ukraine	Ukrainian
United Kingdom	English

List of website locations and languages available in Middle East and Africa
Location	Languages Available
Cameroon	English French
Congo	French
Egypt	English
Ghana	English
Ivory Coast	French
Israel	English
Jordan	English
Kenya	English
Kuwait	English
Mauritius	English
Nigeria	English
Saudi Arabia	English
Senegal	French
South Africa	English
UAE	English
Uganda	English