Why Chinese AI Is Suddenly So Good (ft. DeepSeek, Seedance 2.0)
2 min read
Originally from youtube.com
View source
My notes
Summary
Chinese AI caught up to US labs not by matching hardware (they’re locked out of TSMC’s top-tier Nvidia GPUs) but by rewriting the software layer, extreme Mixture-of-Experts fragmentation, memory compression, and hand-tuned low-level GPU code. On the data side, ByteDance’s ownership of Douyin gives Chinese multimodal models (Seedance 2.0, Doubao) a structural advantage in uncompressed, metadata-rich video that US firms can’t scrape legally.
Key Insight
- AI stack has 3 layers: hardware (GPUs, TSMC/Nvidia bottleneck), model (Transformer-based LLMs), data (multimodal training fuel). China was cut off at layer 1, innovated at layers 2-3.
- Nvidia Blackwell B200 GPU packs 208 billion transistors, costs $30k-$40k, and you need tens of thousands wired together to train a frontier model. Nvidia = $4.5T market cap as of March 2026.
- TSMC controls ~70% of global chip manufacturing and >90% of advanced AI chips. Taiwanese firm but depends on US software/patents, US export rules block them from selling advanced chips to China.
- DeepSeek’s two breakthroughs at model layer:
- Extreme MoE: 256 tiny experts, only 8 activated per query (vs OpenAI’s dozens of large experts). Rest of the brain stays asleep.
- Multi-head Latent Attention (MLA): compresses short-term memory (KV cache) by >90%, dramatically cutting GPU memory cost.
- DeepSeek bypassed Nvidia’s CUDA abstraction by writing PTX (Parallel Thread Execution) assembly-level code, “manual transmission” vs CUDA’s “automatic.” Squeezed more performance from older, pre-ban GPUs.
- Result: DeepSeek trained a world-class model for ~$6M vs OpenAI’s hundreds of millions. Then open-sourced it, turning the model into a platform.
- Pure reinforcement learning failure mode: when DeepSeek rewarded only correct math answers, the model developed “language mixing”, a bizarre English/Chinese hybrid. Had to layer in small amounts of human-labeled data to restore readability.
- US multimodal data wall: American labs have already scraped what’s legally scrapable. Synthetic data and lawsuits are now the ceiling.
- ByteDance’s data moat: owns Douyin (750M DAU), uncompressed native video plus engagement metadata (exact swipe-away timestamp, lighting, angle). Seedance 2.0’s physics consistency (water splashes, reflections, audio sync) beats Sora because of this data structure, not just model design.
- Doubao (ByteDance) surpassed DeepSeek as China’s #1 AI chatbot because it’s multimodal end-to-end (images, video, voice) while DeepSeek is text-only.
- China’s blind spot: Douyin data is Chinese-culture-biased. Their models will hit the same data wall when trying to generalize to Western physics/cityscapes.