Where things stand in 2026
By 2026, AI writing tools are embedded in student workflows across virtually every institution. A 2025 survey by Inside Higher Ed found that 76% of undergraduate students reported using AI writing assistance at least occasionally, up from 43% in 2023. The question for faculty has shifted from “are students using AI?” to “which uses are acceptable, and how do we tell?”
Detection tools can answer part of that second question. They cannot answer it completely. Understanding the limits of detection is as important as understanding what it catches.
What detection tools can and cannot do
Can do reliably
- ✓Flag unedited or minimally edited AI-generated text at 80-95%+ accuracy
- ✓Provide probabilistic evidence that warrants further investigation
- ✓Show which signals are elevated (ensemble detectors), helping distinguish false positives
- ✓Serve as one consistent data point across a cohort of submissions
Cannot do reliably
- ✕Prove that a specific student used AI to generate a submission
- ✕Reliably detect heavily paraphrased or humanizer-processed AI text
- ✕Distinguish AI-assisted from AI-generated (both may score similarly)
- ✕Perform equally well across ESL and native English writers
False positive risk by student population
False positive rates are not uniform across student populations. Faculty working with specific student groups should calibrate their detection thresholds accordingly:
ESL / international students
High FP riskTextbook-correct formal English from non-native speakers closely matches AI output patterns. Published false positive rates as high as 61% for ESL essays in independent studies. Use higher thresholds and weight student voice evidence more heavily.
STEM and pre-med students
Medium-High FP riskTechnical writing with consistent terminology and structured argumentation scores high on statistical detectors. Lab reports and methods sections are particularly susceptible. Per-detector breakdowns help: high pattern + high statistical with low semantic is more consistent with a false positive.
Law and policy students
Medium-High FP riskLegal writing uses boilerplate language, standardized hedges, and formal clause structures that are inherently low-perplexity. Brief and memo formats are distinctive genres that some detectors have not been specifically trained on.
Graduate students
Medium FP riskAdvanced academic writers have internalized formal conventions that overlap with AI patterns. Thesis-level work has longer text samples, which improves detection accuracy; but the writing style remains formally constrained.
Native English undergrads (casual register)
Low FP riskPersonal essays, reflective writing, and informal analyses with idiosyncratic voice produce low false positive rates. Detection is most reliable and most fair in this category.
Building a fair response process
A high AI detection score should trigger a process, not a penalty. The following framework reflects best practices from institutions that have formalized their AI integrity policies:
Step 1: Run a second detector
If Turnitin or your institutional tool returns a high score, run the same text through Airno for a multi-signal breakdown. A high score on all detectors is more significant than a high score on one. If the semantic model (DeBERTa-v3) specifically is elevated, that is harder to explain as a false positive than a statistical-only flag.
Step 2: Examine the submission holistically
Compare the writing to the student's other work from the course. Look for: voice inconsistency, unusual sophistication for the student's level, absence of their documented weaknesses, suspiciously clean grammar for a student who typically struggles, and formatting that does not match their usual style.
Step 3: Request a conversation
Ask the student to discuss the paper in a meeting. Prepare 3-4 specific questions about claims, choices, or arguments in the submission that are not answerable from the text alone. A student who wrote the paper can usually explain it; a student who submitted AI output typically cannot discuss it with any depth.
Step 4: Request a follow-up writing sample
For high-stakes cases, ask the student to write a short in-class response to a specific aspect of their submission topic. In-class writing under time pressure is much harder to fake. Significant style divergence between the submission and the in-class sample is meaningful evidence.
Step 5: Document and escalate only with multiple supporting signals
A detection score alone is not sufficient for academic integrity referral at most institutions. Document: the detector score and breakdown, the voice comparison, the discussion outcomes, and the in-class writing result. Refer only when multiple independent signals converge.
Policy language that works
Course and syllabus policies that have held up well under challenge tend to share certain characteristics:
Specific about what AI use is and is not permitted
Works well
AI tools may be used for research, brainstorming, and grammar checking. Submitting AI-generated text as your own written work is prohibited.
Avoid
Vague: 'Use of AI is not permitted' (fails to account for AI-assisted research).
Disclosure-based where appropriate
Works well
If you used AI assistance in producing this work, disclose how in a brief note at the end. Failure to disclose AI use when it occurred is itself a violation.
Avoid
Blanket prohibition that makes ESL students reluctant to use grammar correction tools that are clearly legitimate.
Clear about consequences
Works well
Submissions flagged for investigation under this policy will be reviewed according to the department academic integrity procedure, which may result in a zero for the assignment or referral to the Dean.
Avoid
Vague: 'consequences will follow.'
Redesigning assignments for AI resistance
Detection is reactive. Assignment design is proactive. Assignments that are genuinely difficult for AI to complete as-assigned reduce detection workload and produce better student work:
- •Require engagement with course-specific materials not widely available online (unpublished readings, classroom discussions, lab data)
- •Assign personal reflection or narrative components that require specific autobiographical content
- •Use iterative assignments where each submission builds on written feedback from the previous one (the AI cannot fake the progression)
- •Include a brief in-class discussion or presentation component for major written assignments
- •Ask students to submit process artifacts alongside final drafts: outlines, annotated sources, revision notes
The detection tool selection question
For individual faculty use, Airno is free and provides the per-detector breakdown that makes it possible to assess whether a high score is likely to be a false positive. For students concerned about false positives before submission, Airno lets them check their own work before it goes to an instructor.
For institutional deployment with plagiarism detection integration, Turnitin remains the most common choice. Its AI detection is less granular but integrates directly into submission workflows. See Can Turnitin Detect ChatGPT? for a detailed breakdown of its performance and limitations.
For a complete tool comparison, see Best AI Detectors 2026. For specific guidance on the false positive problem that disproportionately affects ESL students, see AI Detection False Positives.
Investigate a submission before you meet with the student
Full eight-detector breakdown shows which signals are elevated and helps distinguish likely false positives from likely AI content. Free, no account needed.
Try Airno free