# Introducing GPT-5.5

> OpenAI's GPT-5.5 is their strongest agentic model yet, matching GPT-5.4 latency while using fewer tokens. API pricing: $5/1M in, $30/1M out.

Published: 2026-04-23
URL: https://daniliants.com/insights/introducing-gpt-5-5/
Tags: gpt-5-5, openai, agentic-ai, coding-agents, computer-use, llm-benchmarks, codex, token-efficiency

---

## Summary

OpenAI released GPT-5.5 on 23 April 2026, positioning it as their strongest agentic model to date, better at sustained multi-step tasks, coding, computer use, and knowledge work than GPT-5.4, while matching its latency and using fewer tokens for the same work. API pricing lands at $5/1M input and $30/1M output tokens, with GPT-5.5 Pro at $30/$180. The model is already in use by 85%+ of OpenAI's internal teams weekly across finance, comms, and engineering.

## Key Insight

**Token efficiency matters more than raw benchmark scores.** GPT-5.5 uses fewer tokens than GPT-5.4 to complete the same Codex tasks while delivering better results. For API-based workflows this directly cuts cost, the "more capable = more expensive" assumption no longer holds for this generation.

**The shift is from Q&A to sustained execution.** GPT-5.5 is engineered to persist across multi-step loops: plan, use tools, check output, iterate without human babysitting. The real unlock is not smarter single responses but reliable long-horizon task completion. The NVIDIA engineer quote ("losing access feels like a limb amputation") signals how fast dependency forms once a model can actually finish work end-to-end.

**Concrete benchmark numbers to know:**

- SWE-Bench Pro (real GitHub issue resolution): 58.6%
- Terminal-Bench 2.0 (complex CLI workflows): 82.7%
- OSWorld-Verified (autonomous computer use): 78.7%
- GDPval (44-occupation knowledge work): 84.9%
- Tau2-bench Telecom (customer service workflows): 98.0%
- FrontierMath Tier 4 (hardest math): 35.4% (vs 27.1% for 5.4)
- Long-context 1M token retrieval (BFS): 45.4% vs Claude Opus 4.6 at 41.2%

**Computer use is now practically viable.** 78.7% on OSWorld-Verified is the number that matters, this is real GUI automation (click, type, navigate) without special APIs. This opens browser automation options that don't need Playwright scripting.

**Load balancing self-optimisation is a noteworthy precedent.** GPT-5.5's inference team used Codex to analyse production traffic and write custom heuristic algorithms for GPU partitioning, yielding >20% token generation speedup. AI-written infra optimisations shipped to production, the loop is closing.

**OpenAI internal usage is the real signal.** 85%+ weekly active use across non-engineering teams (Finance reviewed 71,637 pages of K-1 tax forms; Comms automated Slack triage; GTM saved 5-10 hrs/week on reports) suggests the productivity gains are real and broad, not just for developers.