OpenAI x Broadcom - The OpenAI Podcast Ep. 8

2 min read
ai-infrastructureopenaiinferencehardware
View as Markdown
Originally from youtube.com
View source

My notes

Summary

OpenAI and Broadcom announce an 18-month-old partnership to co-design a custom AI chip plus the full rack/system around it, deploying 10 incremental gigawatts of capacity starting late 2026 over a 3-year rollout. The pitch is vertical integration, optimizing from transistor to token to wring out more “intelligence per watt” for inference-heavy, OpenAI-specific workloads. Leadership frames AI compute as the biggest joint industrial project in human history and a future critical utility on par with railroads and the internet.

Key Insight

  • Scale trajectory (concrete numbers): OpenAI’s first cluster was 2 MW, then 200 MW, then ~2 GW by end of 2025. Current partnerships push toward ~30 GW. The Broadcom deal alone is 10 GW, described as “a drop in the bucket.”
  • 2 GW serves ~10% of the world’s population on ChatGPT plus research, Sora, and API. Demand is framed as effectively unbounded.
  • Jevons paradox in practice: “You optimize by 10x and there’s 20x more demand.” Cheaper/faster/smarter models consistently increase usage rather than reduce compute needs.
  • Train vs inference chips diverge: training chips maximize TFLOPs + networking (it’s a cluster, not one chip); inference chips weight more toward memory and memory bandwidth relative to compute. Custom silicon lets you specialize per workload.
  • AI designing its own chips: OpenAI applied its own models to chip design, achieving “massive area reductions” by pouring compute into already human-optimized components. Claimed to compress ~a month of expert optimization work, though not yet producing optimizations humans couldn’t reach.
  • Why build now: OpenAI spent years giving feedback to chip startups who “just didn’t listen.” Owning the roadmap (“control your own destiny”) is the real driver, not cost alone. They target underserved workloads where vertical integration is a clear edge, not replacing NVIDIA/AMD GPUs, which they still need heavily for flexible research.
  • Packaging roadmap (Broadcom): beyond the ~800 mm² reticle limit, multiple dies in 2D, then 3D stacking (Z dimension), then integrated optics (announced 100 Tbit/s switching with optics on-chip). Cluster performance is expected to “keep doubling at least every 6 to 12 months.”
  • Timeline reality check: railroads took ~a century to become critical infrastructure, internet ~30 years. They explicitly say this rollout takes longer than 5 years.
  • Agent-driven demand: features like Pulse (personalized background agent) are pro-tier-only purely due to compute scarcity. The stated end-state is a 24/7 agent per person, with 10 billion humans implying 10 billion chips, far beyond current capacity.