Anthropic Out of Compute: OpenAI's GPU Bet Was Right
A TikTok claim that Anthropic's recent Claude rate-limit cuts are capacity rationing, not policy, while OpenAI's GPU pre-buy turned into a moat.
OpenAI Privacy Filter: open-weight on-device PII redaction
OpenAI released Privacy Filter, an Apache 2.0 1.5B-parameter token classifier for context-aware PII redaction. It runs locally and hits 96% F1.
RAG vs long context vs fine-tuning: when to use each
Quick framing of when to reach for RAG, long-context prompting, or fine-tuning with LLMs, plus how RAFT combines retrieval with fine-tuning.
ChatGPT as a quick personal stylist: skin tone and outfit prompts
A short TikTok tip about uploading a headshot to ChatGPT for skin-tone analysis, flattering color recommendations, and summer outfit visualisations.
Introducing GPT-5.5
OpenAI's GPT-5.5 is their strongest agentic model yet, matching GPT-5.4 latency while using fewer tokens. API pricing: $5/1M in, $30/1M out.
Introducing ChatGPT Images 2.0
OpenAI launched ChatGPT Images 2.0 on 21 April 2026, pitched as a new era of image generation, though public details remain sparse at capture.
Fix Claude Design sameness with a Firecrawl brand system
Claude Design outputs look identical by default. Extract brand JSON via Firecrawl and upload it as a Design System so every generation stays on brand.
Claude Design: Anthropic's AI-Native Interface Generator
Anthropic launched Claude Design, a prompt-driven tool for wireframes, mockups, slides, and templates, with design system integration via GitHub or local fol...
Introducing Claude Design by Anthropic Labs
Anthropic launched Claude Design, a collaborative AI design tool powered by Opus 4.7 with full design-system workflow and tight Claude Code handoff to engine...
Gemini Embeddings 2: text, image, video, audio in one vector space
Google's Gemini Embeddings 2 natively maps text, images, video, audio, and documents into one vector space, removing per-modality pipelines and conversion loss.
Aperture by Tailscale: Identity-Based AI Gateway for LLM Requests
Tailscale's Aperture (alpha) is a centralized AI gateway using Tailscale identity to route LLM requests with spending limits, access control, and telemetry.
Darkbloom: Private AI Inference on Apple Silicon
Darkbloom routes encrypted AI requests to idle Apple Silicon Macs, the Airbnb of GPU compute. ~50% cheaper than OpenRouter, with hardware attestation.
Fireworks AI - Fastest Inference for Generative AI
Fireworks AI is an inference platform for open-source generative models, marketed with latency drops from 2s to 350ms but no pricing or benchmarks.
Friends Don't Let Friends Use Ollama
Ollama wraps llama.cpp but skipped attribution, forked ggml badly, and pivoted to VC-backed cloud. llama.cpp delivers up to 1.8x throughput on the same hardw...
Gemini for macOS - your native AI desktop app
Google shipped a native Gemini macOS app with one-keystroke Option+Space access and optional screen sharing. Free, Apple Silicon only, macOS Sequoia 15.0+.
exo: Cluster Macs to Run Frontier AI Models Locally
exo clusters Apple Silicon Macs into a distributed AI inference pool, running DeepSeek v3.1 671B and Kimi K2 locally with RDMA over Thunderbolt 5.
Opus 4.7 explained in 30 seconds
A 30-second rundown of Opus 4.7: gains on coding benchmarks, 3x higher screenshot resolution, new X high reasoning tier, and a /ultra-review slash command.
Qwen3.6-35B-A3B: Agentic Coding Power, Now Open to All
Alibaba open-sourced Qwen3.6-35B-A3B, a 35B MoE with 3B active params scoring 73.4 on SWE-bench Verified and integrating with Claude Code via OpenAI-compatib...
Why Chinese AI Is Suddenly So Good (ft. DeepSeek, Seedance 2.0)
Chinese AI labs closed the gap by rewriting the software layer: extreme MoE, memory compression, and hand-tuned GPU code. Douyin adds a video data moat.
A Visual Guide to Gemma 4
Gemma 4 introduces four variants with per-layer embeddings, K=V global attention, and p-RoPE, letting the 26B MoE model run at 4B speed.