by TARA GARCIA MATHEWSON

Stanford study found AI detectors are biased against non-native English speakers
Taylor Hahn, who teaches at Johns Hopkins University, got an alert while grading a student paper this past spring for a communications course. He had uploaded the assignment to Turnitin, software used by over 16,000 academic institutions across the globe to spot plagiarized text and, since April, to flag AI-generated writing.
Turnitin labeled more than 90 percent of the student’s paper as AI-generated. Hahn set up a Zoom meeting with the student and explained the finding, asking to see notes and other materials used to write the paper.
“This student, immediately, without prior notice that this was an AI concern, they showed me drafts, PDFs with highlighter over them,” Hahn said. He was convinced Turnitin’s tool had made a mistake.
In another case, Hahn worked directly with a student on an outline and drafts of a paper, only to have the majority of the submitted paper flagged by Turnitin as AI-generated.
Over the course of the spring semester, Hahn noticed a pattern of these false positives. Turnitin’s tool was much more likely to flag international students’ writing as AI-generated. As Hahn started to see this trend, a group of Stanford computer scientists designed an experiment to better understand the reliability of AI detectors on writing by non-native English speakers. They published a paper last month, finding a clear bias. While they didn’t run their experiment with Turnitin, they found that seven other AI detectors flagged writing by non-native speakers as AI-generated 61 percent of the time. On about 20 percent of papers, that incorrect assessment was unanimous. Meanwhile, the detectors almost never made such mistakes when assessing the writing of native English speakers.
The Markup for more