
Since ChatGPT launched in late 2022, students have been among its most avid adopters. When the rapid growth in users stalled in late spring of ’23, it briefly looked like the AI bubble might be popping, but growth resumed that September; the cause of the decline was simply summer break. Even as other kinds of organizations struggle to use a tool that can be strikingly powerful and surprisingly inept in turn, AI’s utility to students asked to produce 1,500 words on Hamlet or the Great Leap Forward was immediately obvious, and is the source of the current campaigns by OpenAI and others to offer student discounts, as a form of customer acquisition.
Every year, 15 million or so undergraduates in the United States produce papers and exams running to billions of words. While the output of any given course is student assignments — papers, exams, research projects, and so on — the product of that course is student experience. “Learning results from what the student does and thinks,” as the great educational theorist Herbert Simon once noted, “and only as a result of what the student does and thinks.” The assignment itself is a MacGuffin, with the shelf life of sour cream and an economic value that rounds to zero dollars. It is valuable only as a way to compel student effort and thought.
The utility of written assignments relies on two assumptions: The first is that to write about something, the student has to understand the subject and organize their thoughts. The second is that grading student writing amounts to assessing the effort and thought that went into it. At the end of 2022, the logic of this proposition — never ironclad — began to fall apart completely. The writing a student produces and the experience they have can now be decoupled as easily as typing a prompt, which means that grading student writing might now be unrelated to assessing what the student has learned to comprehend or express.
Generative AI can be useful for learning. These tools are good at creating explanations for difficult concepts, practice quizzes, study guides, and so on. Students can write a paper and ask for feedback on diction, or see what a rewrite at various reading levels looks like, or request a summary to check if their meaning is clear. Engaged uses have been visible since ChatGPT launched, side by side with the lazy ones. But the fact that AI might help students learn is no guarantee it will help them learn.
After observing that student action and thought is the only possible source of learning, Simon concluded, “The teacher can advance learning only by influencing the student to learn.” Faced with generative AI in our classrooms, the obvious response for us is to influence students to adopt the helpful uses of AI while persuading them to avoid the harmful ones. Our problem is that we don’t know how to do that.I
am an administrator at New York University, responsible for helping faculty adapt to digital tools. Since the arrival of generative AI, I have spent much of the last two years talking with professors and students to try to understand what is going on in their classrooms. In those conversations, faculty have been variously vexed, curious, angry, or excited about AI, but as last year was winding down, for the first time one of the frequently expressed emotions was sadness. This came from faculty who were, by their account, adopting the strategies my colleagues and I have recommended: emphasizing the connection between effort and learning, responding to AI-generated work by offering a second chance rather than simply grading down, and so on. Those faculty were telling us our recommended strategies were not working as well as we’d hoped, and they were saying it with real distress.
Earlier this semester, an NYU professor told me how he had AI-proofed his assignments, only to have the students complain that the work was too hard. When he told them those were standard assignments, just worded so current AI would fail to answer them, they said he was interfering with their “learning styles.” A student asked for an extension, on the grounds that ChatGPT was down the day the assignment was due. Another said, about work on a problem set, “You’re asking me to go from point A to point B, why wouldn’t I use a car to get there?” And another, when asked about their largely AI-written work, replied “Everyone is doing it.” Those are stories from a 15-minute conversation with a single professor.
We are also hearing a growing sense of sadness from our students about AI use. One of my colleagues reports students being “deeply conflicted” about AI use, originally adopting it as an aid to studying, but persisting with a mix of justification and unease. Some observations she’s collected:
- “I’ve become lazier. AI makes reading easier, but it slowly causes my brain to lose the ability to think critically or understand every word.”
- “I feel like I rely too much on AI, and it has taken creativity away from me.”
- On using AI summaries: “Sometimes I don’t even understand what the text is trying to tell me. Sometimes it’s too much text in a short period of time, and sometimes I’m just not interested in the text.”
- “Yeah, it’s helpful, but I’m scared that someday we’ll prefer to read only AI summaries rather than our own, and we’ll become very dependent on AI.”
Much of what’s driving student adoption is anxiety. In addition to the ordinary worries about academic performance, students feel time pressure from jobs, internships, or extracurriculars, and anxiety about GPA and transcripts for employers. It is difficult to say “Here is a tool that can basically complete assignments for you, thus reducing anxiety and saving you 10 hours of work without eviscerating your GPA. By the way, don’t use it that way.” But for assignments to be meaningful, that sort of student self-restraint is critical.
Self-restraint is also, on present evidence, not universally distributed. Last November, a Reddit post appeared in r/nyu, under the heading “Can’t stop using Chat GPT on HW.” (The poster’s history is consistent with their being an NYU undergraduate as claimed.) The post read:
I literally can’t even go 10 seconds without using Chat when I am doing my assignments. I hate what I have become because I know I am learning NOTHING, but I am too far behind now to get by without using it. I need help, my motivation is gone. I am a senior and I am going to graduate with no retained knowledge from my major.
Given these and many similar observations in the last several months, I’ve realized many of us working on AI in the classroom have made a collective mistake, believing that lazy and engaged uses lie on a spectrum, and that moving our students toward engaged uses would also move them away from the lazy ones.
Faculty and students have been telling me that this is not true, or at least not true enough. Instead of a spectrum, uses of AI are independent options. A student can take an engaged approach to one assignment, a lazy approach on another, and a mix of engaged and lazy on a third. Good uses of AI do not automatically dissuade students from also adopting bad ones; an instructor can introduce AI for essay feedback or test prep without that stopping their student from also using it to write most of their assignments.
Our problem is that we have two problems. One is figuring out how to encourage our students to adopt creative and helpful uses of AI. The other is figuring out how to discourage them from adopting lazy and harmful uses. Those are both important, but the second one is harder.I
t is easy to explain to students that offloading an assignment to ChatGPT creates no more benefit for their intellect than moving a barbell with a forklift does for their strength. We have been alert to this issue since late 2022, and students have consistently reported understanding that some uses of AI are harmful. Yet forgoing easy shortcuts has proven to be as difficult as following a workout routine, and for the same reason: The human mind is incredibly adept at rationalizing pleasurable but unhelpful behavior.
Using these tools can certainly make it feel like you are learning. In her explanatory video “AI can do your homework. Now what?,” the documentarian Joss Fong describes it this way:
Education researchers have this term “desirable difficulties,” which describes this kind of effortful participation that really works but also kind of hurts. And the risk with AI is that we might not preserve that effort, especially because we already tend to misinterpret a little bit of struggling as a signal that we’re not learning.
This preference for the feeling of fluency over desirable difficulties was identified long before generative AI. It’s why students regularly report they learn more from well-delivered lectures than from active learning, even though we know from many studies that the opposite is true. One recent paper was evocatively titled “Measuring active learning versus the feeling of learning.” Another concludes that instructor fluency increases perceptions of learning without increasing actual learning.
This is a version of the debate we had when electronic calculators first became widely available in the 1970s. Though many people present calculator use as unproblematic, K-12 teachers still ban them when students are learning arithmetic. One study suggests that students use calculators as a way of circumventing the need to understand a mathematics problem (i.e., the same thing you and I use them for). In another experiment, when using a calculator programmed to “lie,” four in 10 students simply accepted the result that a woman born in 1945 was 114 in 1994. Johns Hopkins students with heavy calculator use in K-12 had worse math grades in college, and many claims about the positive effect of calculators take improved test scores as evidence, which is like concluding that someone can run faster if you give them a car. Calculators obviously have their uses, but we should not pretend that overreliance on them does not damage number sense, as everyone who has ever typed 7 x 8 into a calculator intuitively understands.
We are hearing a growing sense of sadness from our students about AI use.
Studies of cognitive bias with AI use are starting to show similar patterns. A 2024 study with the blunt title “Generative AI Can Harm Learning” found that “access to GPT-4 significantly improves performance … However, we additionally find that when access is subsequently taken away, students actually perform worse than those who never had access.” Another found students who have access to an large language model overestimate how much they have learned. A 2025 study from Carnegie Mellon University and Microsoft Research concludes that higher confidence in GenAI is associated with less critical thinking. As with calculators, there will be many tasks where automation is more important than user comprehension, but for student work, a tool that improves the output but degrades the experience is a bad tradeoff.I
n 1980 the philosopher John Searle, writing about AI debates at the time, proposed a thought experiment called “The Chinese Room.” Searle imagined an English speaker with no knowledge of the Chinese language sitting in a room with an elaborate set of instructions, in English, for looking up one set of Chinese characters and finding a second set associated with the first. When a piece of paper with words in Chinese written on it slides under the door, the room’s occupant looks it up, draws the corresponding characters on another piece of paper, and slides that back. Unbeknownst to the room’s occupant, Chinese speakers on the other side of the door are slipping questions into the room, and the pieces of paper that slide back out are answers in perfect Chinese. With this imaginary setup, Searle asked whether the room’s occupant actually knows how to read and write Chinese. His answer was an unequivocally no.
When Searle proposed that thought experiment, no working AI could approximate that behavior; the paper was written to highlight the theoretical difference between acting with intent versus merely following instructions. Now it has become just another use of actually existing artificial intelligence, one that can destroy a student’s education.
The recent case of William A., as he was known in court documents, illustrates the threat. William was a student in Tennessee’s Clarksville-Montgomery County School System who struggled to learn to read. (He would eventually be diagnosed with dyslexia.) As is required under the Individuals with Disabilities Education Act, William was given an individualized educational plan by the school system, designed to provide a “free appropriate public education” that takes a student’s disabilities into account. As William progressed through school, his educational plan was adjusted, allowing him additional time plus permission to use technology to complete his assignments. He graduated in 2024 with a 3.4 GPA and an inability to read. He could not even spell his own name.
To complete written assignments, as described in the court proceedings, “William would first dictate his topic into a document using speech-to-text software”:
He then would paste the written words into an AI software like ChatGPT. Next, the AI software would generate a paper on that topic, which William would paste back into his own document. Finally, William would run that paper through another software program like Grammarly, so that it reflected an appropriate writing style.
This process is recognizably a practical version of the Chinese Room for translating between speaking and writing. That is how a kid can get through high school with a B+ average and near-total illiteracy.
A local court found that the school system had violated the Individuals with Disabilities Education Act, and ordered it to provide William with hundreds of hours of compensatory tutoring. The county appealed, maintaining that since William could follow instructions to produce the requested output, he’d been given an acceptable substitute for knowing how to read and write. On February 3, an appellate judge handed down a decision affirming the original judgement: William’s schools failed him by concentrating on whether he had completed his assignments, rather than whether he’d learned from them.
Searle took it as axiomatic that the occupant of the Chinese Room could neither read nor write Chinese; following instructions did not substitute for comprehension. The appellate court judge similarly ruled that William A. had not learned to read or write English: Cutting and pasting from ChatGPT did not substitute for literacy. And what I and many of my colleagues worry is that we are allowing our students to build custom Chinese Rooms for themselves, one assignment at a time.
Add a Comment