Skip to content
Back to blog
TechnicalMarch 30, 2026

How AI detection works

A plain-language explanation of ensemble detection, why single-model classifiers fail, and how Airno combines seven independent detectors to produce a calibrated confidence score.

The single-model problem

Most AI detection tools are, at their core, a single fine-tuned classifier. A neural network is trained on a labeled dataset of human and AI-generated text, learns to distinguish the two, and returns a probability score. This works reasonably well until conditions shift: a new model releases, a user lightly paraphrases AI output, or the writing style is unusual.

A single classifier has a single failure mode. When it is wrong, it is wrong with high confidence. There is no internal cross-check, no signal that the model is operating outside its training distribution. The result is a number that looks authoritative but may be meaningless.

Ensemble detection is the answer to this problem. Instead of asking one model for its verdict, you ask several independent models, each trained on different signals and using different methods, and then combine their outputs. Disagreement between detectors is itself a signal: it tells you the case is ambiguous and that you should hold the result loosely.

What ensemble detection means

An ensemble is a collection of models that each independently analyze the same input. Their outputs are combined, typically via weighted averaging or voting, to produce a final score. The ensemble is stronger than any individual member because the members fail in different ways. A statistical detector might miss heavily paraphrased AI text but catch low-perplexity prose. A neural classifier might be fooled by unusual human writing but catch the formulaic structures in AI output. Pattern matching might trigger on specific phrases that the statistical detector misses entirely.

Airno runs 7 detectors in parallel. Their weights are tuned to reflect their relative reliability:

  • 30%Statistical analysis: Perplexity, burstiness, vocabulary richness, and Zipf distribution
  • 40%RoBERTa classifier: Fine-tuned transformer, the highest individual accuracy
  • 30%Pattern matching: 190+ linguistic rules targeting known AI writing signatures

For images, four additional detectors run: CNN artifact detection, frequency domain analysis, metadata forensics, and noise consistency analysis.

How each detector works

Statistical fingerprinting

AI-generated text has measurable statistical properties that differ from human writing. The most reliable signals are:

  • Perplexity: Language models generate text by choosing the most probable next token. This produces low-perplexity text, words that feel predictable. Human writing is messier and higher-perplexity.
  • Burstiness: Human writing varies in sentence length and complexity in ways that feel inconsistent. AI text tends to flatten this variation, producing unnaturally uniform output.
  • Vocabulary richness: AI models favor common tokens. Human writing, especially expert writing, includes rare vocabulary, domain-specific terms, and deliberate stylistic choices that lower the type-token ratio in distinctive ways.
  • Zipf distribution: In natural language, word frequency follows a power law. AI models perturb this distribution in ways that statistical analysis can detect.

RoBERTa and DeBERTa fine-tuning

Transformer-based classifiers, specifically RoBERTa and DeBERTa, are pre-trained on large text corpora and then fine-tuned on labeled datasets of human vs. AI-generated text. Fine-tuning teaches the model to recognize the subtle distributional differences between the two.

RoBERTa (Robustly Optimized BERT Pre-training Approach) and DeBERTa (Decoding-enhanced BERT with Disentangled Attention) are used together because they have complementary strengths. RoBERTa is faster and handles most cases well. DeBERTa is more accurate on difficult cases due to its disentangled attention mechanism, which separates token content from positional context.

Fine-tuned neural classifiers carry the most weight in Airno's ensemble (40%) because they demonstrate the highest individual accuracy on benchmark datasets. However, they are also the most brittle: new model releases can temporarily reduce accuracy until the classifier is retrained on examples from the new model.

Linguistic pattern matching

AI language models have identifiable stylistic habits. They overuse transition phrases ("furthermore," "it is worth noting," "in conclusion"), hedge aggressively ("it is important to consider," "one might argue"), and rely on formulaic paragraph structures with predictable openings and closings.

Airno's pattern detector contains 190+ rules that target these habits. Rules are organized by category: hedging language, transition overuse, vague academic citation patterns, list-heavy structures, and model-specific phrases that appear frequently in GPT, Claude, and Gemini output. When a rule fires, the matching span is flagged for display in the results so users can see exactly which phrases triggered detection, not just that detection occurred.

CNN artifact detection for images

AI image generators, whether GAN-based (like Stable Diffusion) or diffusion-based (like DALL-E and Midjourney), leave forensic traces in the images they produce. A convolutional neural network (CNN) trained on labeled AI and human-photographed images can learn to recognize these traces.

Complementary analysis runs in the frequency domain. Fourier transforms convert images into their constituent spatial frequencies. AI-generated images have characteristic frequency domain signatures, regular artifacts that appear because of how upsampling and deconvolution operations work in generative models. These artifacts are invisible to the human eye but detectable in the frequency spectrum.

Metadata forensics inspects EXIF data and file structure for inconsistencies or missing fields common in AI-generated images. Real photographs typically include camera model, GPS data, and lens information. AI generators may produce images with absent or contradictory metadata.

Why confidence intervals matter

A detection score of 73% means different things depending on whether the 7 detectors returned {62%, 71%, 75%, 74%, 70%, 77%, 81%} or {12%, 91%, 95%, 20%, 88%, 93%, 88%}. In the first case, detectors are in rough agreement and the score is reliable. In the second, there is sharp disagreement and the score should be held loosely.

Airno computes and displays the variance across detector outputs. A low-variance result at 73% is more actionable than a high-variance result at 73%. This matters particularly in high-stakes contexts: an educator confronting a student, a journalist verifying a source, an HR professional reviewing a candidate.

The confidence score also degrades gracefully with input length. Very short text (under 30 words) provides minimal statistical signal. Airno reports a reliability indicator alongside the score. Text that is too short to analyze reliably receives a lower reliability rating, signaling that the confidence score is less trustworthy regardless of its value.

How Airno combines detector outputs

The final score is a weighted average of the 7 detector outputs. Statistical analysis contributes 30%, the RoBERTa neural classifier contributes 40%, and pattern matching contributes 30%. These weights were derived empirically from held-out validation sets, not chosen arbitrarily.

Weighting is not static. Future versions of Airno will adjust weights dynamically based on input characteristics: for very long text, statistical methods become more reliable and receive higher weight. For text that matches few pattern rules but has low perplexity, the neural classifier dominates. The goal is to maximize accuracy across the full distribution of inputs, not just the average case.

The per-detector breakdown is always visible in results. This transparency is deliberate. Users who want to understand why a particular score was returned can inspect which detectors agreed and which disagreed, giving them enough information to decide how much weight to give the result in their own judgment.

What detection cannot do

No detection system can achieve 100% accuracy. AI models improve rapidly, and detection systems must be continually retrained to keep up. Heavily edited or paraphrased AI text is genuinely harder to detect because the statistical fingerprints are partially erased. Very short texts lack sufficient signal for reliable classification.

Airno is honest about this. We report a false positive rate of approximately 8–15% and a false negative rate in a similar range. Detection results are evidence to inform judgment, not a verdict to replace it.

See the detection in action.

Try Airno free