Skip to main content
main content, press tab to continue
Article | WTW Research Network Newsletter

Reshaping the GenAI Landscape: Part 3 - The Future of Hardware Computing

By Sonal Madhok and Omar Samhan | June 20, 2025

As AI shifts from training to inference, specialized chips may soon challenge Nvidia’s GPU dominance by offering cost-effective, energy-efficient alternatives, reshaping hardware needs across enterprises.
Cyber-Risk-Management-and-Insurance|Risk and Analytics
Artificial Intelligence

LLMs: GPUs for Training and Inference

Nvidia’s meteoric rise has coincided with the proliferation and popularity of OpenAI’s ChatGPT. Its dominance over the GPU market constitutes 80% of high-end chips, putting it alongside such crucial firms as TSMC and ASML in the domination of their respective specialties in the global chip ecosystem. Its GPUs have now become the preferred choice of businesses operating AI-accelerated data centers. Nvidia's H100 GPU, commonly known as "Hopper," and successor Blackwell GPU architecture, are effectively the brains that allow for split-second decision-making by AI software and systems. The superior capabilities of Nvidia’s chips have given developers direct access to the GPUs’ parallel computation abilities, empowering them to use GPU technology for greater uses and functions than their initially intended role. [1] As a result, GPUs’ abilities to handle large amounts of data simultaneously has made them the go-to choice for LLM training and inference. Given the very nature of LLMs, GPUs have become the preferred form of electronic circuits due to their ability to perform mathematical calculations rapidly, making them the chief enabler of key technologies such as machine learning and artificial intelligence.

However, Nvidia may be facing stringent bottlenecks in the coming years as companies such as Cerebras, Groq, and AMD challenge them on the shift from the training front to the inference front. At its essence, model training requires feeding a model large datasets which require extreme processing power and expensive upfront costs. However, this is largely a one-time expense as the model does not need to be trained further, only requiring fine-tuning that employs less intensive techniques. [2] For these purposes, Nvidia’s GPUs have served as the quintessential powering element in building up LLMs. Nvidia’s moat currently revolves around the company’s Compute Unified Device Architecture (CUDA) platform, as it is the software toolkit and application programming interface (API) developers use to get the most out of their GPUs, including building LLMs.

To help WTW better understand the forces influencing the evolving AI market, the WTW Research Network (WRN) partnered with the University of Pennsylvania’s Wharton School and its Mack Institute’s Collaborative Innovation Program (CIP). Building on our previous work with the CIP and their Executive MBA students, Green Algorithms – AI and Sustainability, the WRN has sought to further examine the LLM competitive landscape including new disruptions and opportunities for optimization and efficiency. Part 1 looks at GenAI’s impact on risk management frameworks while Part 2 explores LLM Effectiveness at Scale. This piece rounds out the series, providing a look at the future of hardware computing and examining the implications for the market going forward as the industry moves from the training to the inference phase.

Challenges to Nvidia’s Dominance: More or Less Computing Power?

In addition to powering LLMs, the AI chip market affects both edge and cloud AI platforms. GPUs are a superior product for powering AI due to three main facets: GPUs employ parallel processing, GPU systems scale up to supercomputing heights, and the GPU software stack for AI is broad and deep. [3] These three factors have allowed LLMs to proliferate at breakneck speed due to the nearly 7,000 time increase in computing performance over the last twenty years. But the AI landscape is undergoing a shift from training to inference which will necessitate a change in the architecture from massive computing tasks for training to specialized solutions that require lesser workloads for inference.

According to Gary Dickerson, CEO of Applied Materials, there has been more investment and funding of AI chips in the last eighteen months than the previous eighteen years, with a twelve-month advantage netting a company approximately $100 billion. The explosion in AI is being driven by the need to process vast amounts of unstructured data, which accounts for 80% of all data. [4] As computing power and their processing abilities increase, the chip industry may look past the GPU-driven training that has heretofore dominated the LLM space.

Shifting from training to inference, AI chips will now take the reins in driving the AI boom going forward. Designed specifically to accelerate artificial intelligence tasks, these specialized processors deliver significant improvements in performance, efficiency, and cost-effectiveness compared to general-purpose central processing units (CPUs) and GPUs.

Table 1: Implications for Startups in the AI Chip Market

Implications for Startups in the AI Chip Market

Source: Wharton, WTW

  Opportunities Challenges  
Ability to iterate and bring cutting-edge solutions faster than incumbents Agility in Innovation Regulatory Compliance Must navigate stringent data and security laws.
Partnerships with generative AI companies for tailored solutions Collaboration Potential Capital Requirements Need significant investment to scale and produce at competitive costs
Startups can focus on niche demands such as edge computing, energy-efficient designs  or photonic chips Specialization in Emerging Needs Market Entry Barriers High R&D costs and competition with dominant players like Nvidia

Our study with Wharton identified a number of key challenges and opportunities Nvidia’s competitors may face in the coming years. Economies of scale, innovation lags, increased collaboration, CAPEX requirements, regulations, brand trust, and inability to adapt are some of the most prominent hindrances that both startups and incumbents in the AI chip space will face.

One such candidate that hopes to take on Nvidia directly is Cerebras and its Wafer Scale Engine (WSE)-3 chip. The WSE-3 packs 900,000 AI cores onto a single processor and integrates an entire GPU cluster’s worth of computing power onto a single chip. Its high bandwidth, low latency properties are powered by 4 trillion transistors, a more than 50 percent increase over the previous generation thanks to the use of newer chipmaking technology. The WSE-3 is 72 square inches and holds 50 times more computing power than Nvidia’s H100 GPU, which by contrast is 1 square inch and contains 80 billion transistors. [5]

Table 2: The Cerebras WSE-3 surpassed all other processors in AI-optimized cores, memory speed and on-chip fabric bandwidth

The Cerebras WSE-3 surpassed all other processors in AI-optimized cores, memory speed and on-chip fabric bandwidth

Source: Wharton, WTW

  WSE-3 Nvidia H100 Cerebras Advantage
Chip size 46,225m2 814mm2 57X
Cores 900,000 16,896 FP32 + 528 Tensor 52X
On-chip memory 44 Gigabytes 0.05 Gigabytes 880X
Memory bandwidth 21 Petabytes/sec 0.003 Petabytes/sec 7,000X
Fabric bandwidth 214 Petabytes/sec 0.0576 Petabytes/sec 3,715X

Hyperscale and startup companies such as AMD, Graphcore, Intel, Amazon, and Alphabet are also trying to break Nvidia’s near-monopoly by taking advantage of the inference phase. AMD’s MI325X AI chip and Ryzen processors and Intel’s Gaudi series and Xeon processors are amongst the leading contenders to compete with Nvidia’s GPUs and CUDA API. Anthropic, an OpenAI competitor, has opted to train its Claude models with Amazon’s Trainium AI chips for training and inference. Moreover, the shift in AI processing from data centers to edge devices could challenge Nvidia's dominance. Large companies such as Apple and Broadcom are collaborating with one another to build specialized system-on-chip (SoC) products to run AI models on personalized devices with neural processors for superior privacy, heat reduction, and speed. Having ended their relationship with Nvidia entirely, Apple now designs and manufactures its own processors for mobile phones, Macs, and wearable devices utilizing its Apple silicon series – and is currently set to release their AI server processor “Baltra” in 2026 to power the AI services and features built into the company’s operating systems [6].

The term AI accelerator is increasingly used to describe more specialized AI chips, such as neural processing units (NPUs) or tensor processing units (TPUs). While general-purpose GPUs are effective when used as AI accelerators, other types of purpose-built AI hardware might offer similar or better computational power with improved energy efficiency and greater throughput for AI workloads. This is where emerging startups such as Cerebras and Groq are disrupting the chip ecosystem. By focusing on AI-specific processors that prioritize energy efficiency and scaling, these companies are able to manipulate current market deficiencies to their advantage. While challenging Nvidia, these firms are having to contend with the exponentially high cost of high-end wafers, photonics, and chip-making technologies. However, as more of these products are being adopted, AI chip disruptors will be able to take advantage of economies of scale and bring the price of their products down to a commercially attractive price point.

Conclusion

The shift from training to inference is now being undertaken in a rapidly shifting technological landscape that will have impacts upon not only Nvidia’s dominant position but on chatbot rollouts and AI adoption across the board. The specialized nature of AI inference chips has opened the door up to disruptors and startup innovators who see a need to build custom-built chips for AI infrastructure. While hyperscalers have traditionally been Nvidia’s main buyers of Nvidia’s premier GPU chips to power their hardware, makers of inference chips are tending to focus on a broader market spanning much of the Fortune 500 and smaller enterprises. As companies are looking to build out their AI infrastructure and integrate GenAI into their business practices, specialized chips are proving to be the more economical choice to Nvidia’s more expensive, heavy-laden GPUs.

The focus of demand for inference will take up a greater portion of AI’s needs as widespread adoption by businesses and societies continues to accelerate. Morgan Stanley estimates that inference requirements and the consolidation of enterprise data on cloud, edge, and personal devices will make up more than 75% of power and computational demands, while Barclays predicts that Nvidia will serve only 50% of the frontier inference market share [7]. The inference phase will focus on driving down the cost of operating AI models that inherently require less compute power and simpler IT and software rollouts, with companies having to strike a balance between deploying models and putting them into production. Greater precision, improved efficiency, cost-effectiveness, and scalability will be the determining factors in the coming years that companies will consider in order to gain and/or retain their competitive edge in deploying GenAI capabilities.

Energy Efficiency

  • Shift workloads to edge AI to optimize inference costs, improve latency, and enable hybrid cloud-edge solutions for scalable and sustainable Al.
  • Partner with renewable energy providers to establish energy-efficient hardware such as data centers and invest in energy-efficient hardware, such as reduced-precision TPUs and QPUs.
  • Invest in energy-efficient hardware to enhance performance while minimizing environmental impact.

Enterprise Cost Optimization

  • Build cost-transparent ecosystems using open-source Al tools and adaptable hardware to enhance accessibility and reduce costs.
  • Offer subscription-based pricing models for hybrid deployments making solutions more affordable for enterprises.
  • Invest in specialized hardware to reduce long-term operational costs and improve performance.

Privacy & Security

  • Embed privacy-by-design features into hardware and software to ensure compliance in sensitive industries.
  • Partner with privacy-focused edge Al startups to meet regional regulatory requirements and address trust gaps.
  • Focus on achieving certifications and align with regulatory standards to build trust and enhance adoption.

Innovation & Collaboration

  • Build ecosystems that integrate startup innovations into LFM stacks through partnerships or acquisitions.
  • Advocate for AI hardware standardization to simplify integration and shoft competition toward software innovation.
  • Form alliances with chip startups to reduce reliance on Nvidia and leverage strengths in edge Al solutions.
  • Leverage open-source AI models with adaptable hardware to lower adoption barriers for cost-sensitive industries and improve collaboration.

References

  1. Think Topics | IBM Return to article
  2. AI inference vs. training: What is AI inference? Return to article
  3. Why GPUs Are Great for AI Return to article
  4. AI Chips: Challenges and Opportunities Return to article
  5. This New AI Chip Makes Nvidia’s H100 Look Puny in Comparison Return to article
  6. Apple reportedly building AI server processor with help from Broadcom Return to article
  7. How ‘inference’ is driving competition to Nvidia’s AI chip dominance Return to article

Authors


Technology and People Risks Analyst
email Email

Technology Risks Analyst
email Email

Contact us