Monday, January 9, 2023

Conclusion: Inconclusive

I finally had a few minutes to play around with Edward Tian’s AI-written text detector (explained here). The results leave me satisfied that, under the right conditions, AI-written text can be detected.

But I received enough errors from the analyzer to surmise Tian and those like him have a bit to go before the detection is reliable enough to be useful.

I tried Tian’s detector with five different bits of text:

1. Text generated by AI, which I copied and pasted into a Word document (seen here) and rearranged slightly to make a coherent essay (by rearranged, I mean I re-ordered the paragraphs. I did not alter the text at all).

2. A smaller portion of the same AI-generated text.

3. A portion of an essay I wrote a few years ago on the same subject I tasked the AI to write about.

4. An excerpt from a short story I wrote earlier this week

5. An excerpt from an essay submitted by one of my English students last semester.

Here are the results:

1. Tian’s GPTZero returned a cryptic error (see sample below).

2. Successfully identified the text as AI-generated.

3. Successfully identified the text as human-generated.

4. Successfully identified the text as human-generated.

5. Returned a cryptic error, similar to the sample already mentioned.

I suspect one source of the errors could be metadata carried over by the copy/paste from Microsoft Word, though I did get the errors when I pasted normally and when I pasted with the plain text option.

I’d like to keep conducting experiments with this website, using text I’m unfamiliar with, and certainly using text I’m unsure is AI-generated or not. I’ll keep playing with this, and see if the error rate goes down, or if I can figure out what is causing the errors. On the surface of it, I can see less-cryptic errors would be beneficial.

And with only two tests to differentiate between human and AI, there are likely ways those who want to cheat using AI can game the detector. Of the two, “perplexity,” as explained in the linked article, is a bit cryptic; I’m not exactly sure what AI finds perplexing. And “burstiness,” or variation of sentences in complexity and length, seems easy to game, though you have to suspect it might have to be humans gaming rather than AI, at least for the time being. 

While I’m interested to see where this technology goes (both in AI-generated text and in AI-text-detection, I’m more interested in the general reaction I’m seeing locally from English teachers. The consensus seems to be that we’ll eventually be able to detect it on a consistent basis, and that nothing we’re doing in our courses should change to counter AI text development.

While cheating cheaters will always cheat, I’ve seen from experience that students get more out of writing assignments they’re invested in, rather than just another run-of-the-mill ivory tower essay. But that’s just me using new technological developments to drag out the same ol’ soapbox, so maybe take what I say with a grain of salt.


No comments: