Gemma 4 Has Landed

1 min read
gemma4googleopen-weightsapache2multimodalmixture-of-expertsedge-aion-device
Originally from youtube.com
View source

My notes

Summary

Google released Gemma 4 as four models (two workstation, two edge) with a genuine Apache 2.0 license - no use restrictions, no “don’t compete with us” clauses. All four models natively support vision, function calling, and reasoning, with the two edge models (E2B, E4B) also adding native audio including speech-to-translated-text in a single on-device pass.

Key Insight

  • The license is the headline. Apache 2.0 means fine-tune, modify, deploy commercially with zero strings. Previous Gemma models had custom licenses that pushed developers toward Llama/Qwen. This closes that gap.
  • MoE efficiency at 26B/4B active: The 26B MoE model activates only 3.8B parameters per token (128 experts, 8 active + 1 shared). Effective intelligence near a 27B dense model at the compute cost of roughly a 4B model - runs on consumer GPUs.
  • Native vs bolted-on: Earlier open models treated audio/vision/function-calling as afterthoughts (pipe whisper, hope the prompt template works). Gemma 4 bakes all of these into the architecture from pretraining.
  • Edge audio encoder compression: The audio encoder shrank from 681M to 305M parameters, disk footprint from 390 MB to 87 MB, and frame duration cut from 160ms to 40ms. Result: faster, lighter, more responsive transcription on-device.
  • 256K context on workstation models, 128K even on edge models. Gemma 3N was capped at 32K.
  • Native aspect-ratio vision encoder replaces the old fixed-resolution approach. Better OCR, document understanding, and multi-image (video frame) reasoning.
  • Function calling is architecture-level, not prompt-coercion. Optimized for multi-turn agentic flows with multiple tools - shows up in agentic benchmarks.
  • QAT checkpoints released so quantized versions maintain quality close to full-precision. Important for edge deployment.
  • Serverless deployment path: Google Cloud Run now supports G4 GPU (Nvidia RTX Pro 6000, 96 GB VRAM), enabling serverless hosting of the full 31B model that scales to zero.