# CanIRun.ai - Can your machine run AI models?

> CanIRun.ai estimates which AI models your hardware can run locally. The real sweet spot for local models is structured data tasks, not coding.

Published: 2026-03-16
URL: https://daniliants.com/insights/canirunai-can-your-machine-run-ai-models/
Tags: local-llm, hardware, inference, ollama, model-selection, apple-silicon, amd-strix-halo, on-device-ai

---

## Summary

CanIRun.ai uses browser APIs to estimate which open-weight AI models (0.8B to 1T params) a user's machine can actually run locally. The HN discussion is dense with practical hardware comparisons - Apple Silicon vs. AMD Strix Halo, RAM allocation tricks, and clear guidance on where local models genuinely beat cloud and where they fall short.

## Key Insight

**The real sweet spot for local models is structured data tasks, not coding or general QA:**

- One HN veteran with 100+ hours of local model experimentation concluded: small models (Qwen3.5 9B) are excellent for tool use, information extraction, embedded applications - not for coding agents or general knowledge. For coding, cloud frontier models win every time.
- GLM-OCR (0.9B) runs fast enough on an entry-level MacBook for large-volume document OCR - outperforming expensive legacy software (ABBYY FineReader etc.) at near-zero cost. Azure Doc Intelligence costs $1.50/1 000 pages; GLM-OCR costs electricity.
- Concrete local use cases from HN: XML data extraction with inconsistent formatting, text cleanup and summarisation via Raycast + LM Studio, email classification (~$3/month with Haiku, local would be $0).

**Hardware reality checks:**

- AMD Ryzen AI Max+ 395 (Strix Halo, 128 GB unified RAM) at $2 800: can run Qwen3.5-122B-A10B but at slow speeds. Practical max for comfortable local inference is ~80 GB quantized models when running a desktop environment simultaneously.
- Linux kernel params `ttm.pages_limit=31457280 ttm.page_pool_size=31457280` unlock dynamic VRAM allocation up to 110–120 GB on Strix Halo - set once, no further reboots needed.
- Apple Silicon still has the easiest zero-config unified memory experience; AMD Strix Halo is competitive but needs one-time Linux kernel tuning.
- Memory bandwidth is the real bottleneck: 10B active params at 4–6 bit quants feels usable; more than that starts feeling sluggish even with 128 GB RAM.

**MoE architecture changes the calculus:**

- Models like Qwen3-Coder-Next (80B total / 3B active) run at good speeds because only 3B params are active per forward pass. Look at active param count, not total param count, when evaluating local runnability.

**Cheap cloud alternative to local for coding:**

- Alibaba Model Studio coding plan (Qwen3.5-plus): was $3/month, now ~$10/month - still far cheaper than Claude Pro for coding use cases. Practical fallback when out of Claude tokens.