Invitation to cheat: AI outperforms law professors in Stanford study

Invitation to cheat: AI outperforms law professors in Stanford study

Law professors overwhelmingly preferred AI-generated answers over those written by fellow law professors – and flagged the AI answers as potentially misleading or harmful far less often, a study conducted by Stanford University has found.

Led by Stanford Law School Professor Julian Nyarko, the study, entitled Law Professors Prefer AI Over Peer Answers, was conducted with 16 law professors across US law schools and tested whether large language models could serve as effective tutors for contract law courses. In a blind evaluation of nearly 3,000 anonymized comparisons, professors rated AI responses significantly higher than answers written by other professors, with AI winning 75 per cent of head-to-head matchups.

“This study challenges important assumptions about AI’s role in legal education,” said Professor Nyarko, who co-authored the paper with academics from Yale, NYU, University of Chicago, and other institutions.

“We focused on law precisely because it requires judgment, nuanced reasoning, and the ability to navigate ambiguity – not just factual recall.”

The study is particularly notable because previous AI evaluations have focused primarily on subjects with clear right-or-wrong answers. Legal reasoning, by contrast, demands careful analysis of competing arguments and defensible conclusions.

“We were frankly surprised by the magnitude of the results,” Professor Nyarko added. “These weren’t just simple questions with obvious answers. Many of them required synthesizing complex material, applying it to new situations, and explaining legal concepts in ways that would help students develop their own analytical skills.”

Participants created 40 representative contracts law questions that students might ask after class or during office hours, wrote their own answers, and then evaluated responses without knowing whether they came from AI or other participating professors. The AI systems performed comparably to the best human instructor in the study.

Perhaps most striking: professors flagged AI responses as pedagogically harmful only 3.5 per cent of the time, compared to 12 per cent for peer-written answers.

“In most fields where AI gets tested, there’s a right answer. In law, there often isn’t.” said Sarath Sanga, co-author and a professor at Yale Law School.

“Two opposing arguments can both be good. What we wanted to know is whether AI can meet the latent professional standard that lawyers use to evaluate each other’s arguments. In this case, the answer was yes.”

The research team took “extensive precautions” to ensure the study’s validity. They calibrated AI responses to match the length and structure of human answers, used multiple evaluation methods, and had professors assess whether responses might mislead or confuse students.

Alejandro Salinas, first author of the study and a researcher at Nyarko’s liftlab, emphasised the educational implications.

He said: “Our study shifts attention to what AI tutoring can contribute to learning in judgment-rich fields like law. We find that, when evaluated by legal educators, AI tutors can offer high-quality, on-demand support that complements classroom instruction, and may broaden access to expert guidance.”

The study also examined specific AI models, including commercial tutoring systems and Google’s NotebookLM, finding varying levels of performance. However, even when context limitations affected AI responses, professors still frequently preferred them to human-written alternatives.

Join over 12,200 lawyers, north and south, in receiving our FREE daily email newsletter
Share icon
Share this article: