Skip to content
Back to home
Transparency

How accurate is
Airno?

No AI detector is perfect. Here's an honest look at what Airno can and can't do, how our confidence scores work, and where we fall short.

Accuracy at a glance

~85–92%
Text Detection

On 200+ word samples of unedited AI text (ChatGPT, Claude, Gemini). Accuracy drops on heavily edited or mixed content.

~78–85%
Image Detection

On uncompressed AI-generated images (DALL-E, Midjourney, Stable Diffusion). JPEG compression and social media re-encoding reduce accuracy.

These ranges are based on internal benchmarks and may vary by content type, length, and AI model used.

Confidence tiers

80–100%
Likely AI-Generated

Strong signals from multiple detectors. High agreement across statistical, neural, and pattern analysis.

60–79%
Possibly AI-Generated

Moderate AI signals detected. Some detectors flag AI patterns but not all agree strongly.

40–59%
Mixed / Uncertain

Ambiguous result. Could be heavily edited AI text, AI-assisted writing, or unusual human writing.

20–39%
Possibly Human-Written

Few AI signals found. Most detectors see human-like patterns, though some minor flags may exist.

0–19%
Likely Human-Written

Strong human writing signals — natural burstiness, varied vocabulary, and organic sentence structure.

What we detect

Airno's ensemble of 7 detection models looks for these specific signals. No single signal is conclusive — we combine them to produce a confidence score.

Statistical Analysis

  • Perplexity (word predictability)
  • Burstiness (sentence length variance)
  • Vocabulary richness / type-token ratio
  • Zipf distribution deviation
  • Sentence length entropy
  • Hapax legomena ratio (unique word frequency)
  • Function word over-reliance
  • Paragraph length uniformity
  • Transition word density

Neural Classification

  • RoBERTa-based transformer scoring
  • Fine-tuned on human vs. AI-generated datasets
  • Contextual embedding analysis
  • Token-level probability estimation

Pattern Matching

  • 190+ known AI phrase patterns
  • Hedging language density ("it could be argued", "perhaps")
  • Sentence starter repetition
  • Vague citation detection ("studies show", "experts say")
  • Listing/enumeration patterns
  • AI-specific vocabulary ("delve", "tapestry", "multifaceted")
  • Formulaic paragraph openings
  • Punctuation variety analysis

Image Analysis

  • EXIF metadata forensics
  • Frequency domain artifact detection (DCT/FFT)
  • CNN-based artifact classification
  • Color consistency and gradient analysis

Honest limitations

False Positives (human text flagged as AI)

  • • Formal academic writing with structured vocabulary can trigger AI patterns
  • • Non-native English speakers sometimes write with patterns similar to AI output
  • • Technical documentation and legal text tend toward uniform structure
  • • Estimated false positive rate: ~8–15% depending on content type

False Negatives (AI text missed)

  • • Heavily edited or paraphrased AI text is harder to detect
  • • Short text samples (<30 words) lack enough signal for reliable detection
  • • Some newer AI models generate text that's harder to distinguish
  • • AI text with intentional “humanization” (typos, slang) can evade detection
  • • Estimated false negative rate: ~10–18% on unedited AI text

What this means for you

Airno is a signal, not a verdict. Our scores indicate probability, not certainty. We recommend using Airno as one data point alongside human judgment — never as the sole basis for consequential decisions like academic integrity rulings.

Tips for better results

Submit 100+ words for the most reliable results (200+ words ideal)
Unedited text gives the clearest signal — editing dilutes AI patterns
For images, use the original file — screenshots and re-compressed images lose forensic data
Results marked "Mixed / Uncertain" deserve human review before acting on them
Check the per-method breakdown — if detectors disagree, the result is less certain