The Dark Side of AI No One Talks About

Source

Summary

AI crawlers routinely ignore robots.txt, scrape staging environments, and feed on AI-generated content in a self-reinforcing loop Jamie Indigo calls “tech mad cow disease.” The piece provides a concrete defensive playbook - from log file analysis to the Johari Window framework for brand narrative control - backed by specific data points like 80%+ daily AI ranking volatility and the entity confidence threshold of 50-55% below which LLMs won’t cite a page.

Key Insight

  • AI ranking volatility is extreme. Dan Petrovic’s tracker shows 8/10 AI results shift daily, with volatility at ~80%. Optimizing for a stable AI presence is fundamentally different from traditional SERP stability.
  • Content homogeneity trap. The jump in published content in 2024 matched the combined growth from 2010-2018 - most of it AI-generated from the same statistical patterns. Google’s Martin Splitt warned that low-quality content may be skipped before rendering even begins.
  • Robots.txt is theater without layered defense. Perplexity rotated user agents to bypass blocks (documented by Cloudflare). Many LLMs train on Common Crawl, so blocking AI-specific bots while allowing Common Crawl leaves content exposed anyway.
  • Two distinct crawler types to monitor. Training crawlers collect data for model building; user-initiated crawlers fire in real-time for RAG queries. Both hit staging, internal, and dev environments if exposed.
  • Entity confidence threshold matters. Gus Pelogia’s Entity Tracker shows that if an entity’s confidence score drops below 50-55%, LLMs won’t cite the page at all. Ambiguity kills visibility.
  • Apple’s “Illusion of Thinking” paper findings. LLM accuracy collapses with task complexity, effort decreases as difficulty rises (tokenization cost optimization), and instructions aren’t followed consistently.
  • Johari Window for defensive SEO. Hidden areas (internal assets) need aggressive access restriction. Blind spots (external narratives about your brand) need social listening. Unknown quadrant is addressed through data philanthropy and original research publication.