CanIRun.ai - Can your machine run AI models?

March 16, 2026 Source

Summary

CanIRun.ai uses browser APIs to estimate which open-weight AI models (0.8B to 1T params) a user’s machine can actually run locally. The HN discussion is dense with practical hardware comparisons - Apple Silicon vs. AMD Strix Halo, RAM allocation tricks, and clear guidance on where local models genuinely beat cloud and where they fall short.

Key Insight

The real sweet spot for local models is structured data tasks, not coding or general QA:

One HN veteran with 100+ hours of local model experimentation concluded: small models (Qwen3.5 9B) are excellent for tool use, information extraction, embedded applications - not for coding agents or general knowledge. For coding, cloud frontier models win every time.
GLM-OCR (0.9B) runs fast enough on an entry-level MacBook for large-volume document OCR - outperforming expensive legacy software (ABBYY FineReader etc.) at near-zero cost. Azure Doc Intelligence costs $1.50/1 000 pages; GLM-OCR costs electricity.
Concrete local use cases from HN: XML data extraction with inconsistent formatting, text cleanup and summarisation via Raycast + LM Studio, email classification (~$3/month with Haiku, local would be $0).

Hardware reality checks:

AMD Ryzen AI Max+ 395 (Strix Halo, 128 GB unified RAM) at $2 800: can run Qwen3.5-122B-A10B but at slow speeds. Practical max for comfortable local inference is ~80 GB quantized models when running a desktop environment simultaneously.
Linux kernel params ttm.pages_limit=31457280 ttm.page_pool_size=31457280 unlock dynamic VRAM allocation up to 110–120 GB on Strix Halo - set once, no further reboots needed.
Apple Silicon still has the easiest zero-config unified memory experience; AMD Strix Halo is competitive but needs one-time Linux kernel tuning.
Memory bandwidth is the real bottleneck: 10B active params at 4–6 bit quants feels usable; more than that starts feeling sluggish even with 128 GB RAM.

MoE architecture changes the calculus:

Models like Qwen3-Coder-Next (80B total / 3B active) run at good speeds because only 3B params are active per forward pass. Look at active param count, not total param count, when evaluating local runnability.

Cheap cloud alternative to local for coding:

Alibaba Model Studio coding plan (Qwen3.5-plus): was $3/month, now ~$10/month - still far cheaper than Claude Pro for coding use cases. Practical fallback when out of Claude tokens.