Publishers must not feed the machine munching through the academy

<��Ƶ class="standfirst">Allowing Big Tech to train AIs on academic output will only exacerbate the threat posed to teaching and research, says Martyn Hammersley

September 26, 2024

Martyn Hammersley

Montage of a machine in the shape of a monster eating books to illustrate Publishers must not feed the machine munching through the academy

Source: Alamy/Istock montage

��Դڴǰ�� controversial deal allowing academic articles and books to be used to train Microsoft��s AI systems raises questions about academic publishers' responsibilities,?relationships with authors and?. And those questions are only likely to become more salient as, in defiance of from authors, publishers press ahead with further similar deals.

, which owns Taylor and Francis, Routledge and other academic imprints, the deal ��will extend the use of AI within our business and underlines the unique value of our Intellectual Property��; its ��total AI partnership revenues�� are ��over $75m in 2024��. We should not be surprised by this desire to further exploit the academic material the company controls, I suppose. But how does this deal square with ��Դڴǰ�� claim that its responsibilities to academic authors are central?

Large language models (LLMs) are already munching through the academy in various ways. Most obviously, they are causing considerable difficulties in the assessment of student work. An essay produced with the help of an LLM says much more about the software��s capabilities than about those of the student. Improving the performance of LLMs will make that problem worse because it will be even harder to distinguish bot-written essays from human-written ones. Perhaps degrees should be awarded to the software developers rather than to the students in future?

Of course, much effort is currently being devoted to finding modes of assessment that avoid the problem and to educating students and academics in how to employ the technology responsibly in teaching and learning. There are even those who view the role of AI?positively. However, this often seems to be a matter of simply accepting what is regarded as inevitable; such optimism is hard to square with what is actually happening at ground level.

��Ƶ

Similar issues arise in the context of research, with increasing discussion of how LLMs are being �C and could be �C used to produce journal articles and books. Here, interesting issues arise about the relationship between enquiry and writing. Some social scientists have long argued that these are more or less equivalent: that, as sociologist Laurel Richardson put it many years ago, ��writing is a method of inquiry��. If that is true, perhaps AI can simply take over, especially in the humanities and social sciences �C if these are ��talking sciences��, as another sociologist, Harold Garfinkel, once claimed, on the grounds that their practitioners are engaged in simply ��shoving words around��.

But while shoving words around may be a fair description of too much published research in those fields, it is far from universally true. And, even if it were, we might ask whether AI programs can shove words around as effectively as humans, to develop new empirical analyses and theories. Do LLMs not merely reorder and reformulate what they have munched their way through? They may be able to summarise an article effectively, but can they produce an insightful critique of it? This is surely essential if knowledge develops through criticism, as Popper and others have argued.

��Ƶ

Perhaps we ought not to dismiss so quickly the ability of AI ever to become genuinely creative. Might the writing really be on the wall for researchers, in some fields at least? But it must be asked: should an academic publisher be accelerating this process?

Another issue concerns the fact that Informa did not even tell authors about the deal, never mind consult them on it: it was first reported (somewhat cryptically) in a market-focused in May, and was picked up by several . What does this tell us about the attitudes of large publishers? The implication is that academic authors are merely content providers and that companies have a free hand to do whatever they wish with that content. In other words, what is involved is simply a market relationship that is to be exploited as effectively as possible.

Finally, there is the question of whether Informa is legally entitled to use academic material in this way. That could be true as regards journal articles, where authors have been forced to sign away their copyright. The case of books, particularly those published before the development of LLMs, is less clear. According to Informa, since even early contracts give it rights to publish, sell, distribute and license the published content, this covers the proposed new use. However, whether that is the case could probably only be decided in court.

As for the suggestion that authors will receive enhanced royalties, it is not clear how this would occur or who would gain. Either way, the key question remains: why would improving the performance of LLMs be regarded as desirable from an academic point of view?

��Ƶ

This software can perhaps serve as a labour-saving tool, but are the problems it causes worth its benefits? And who faces those costs, and who gets the benefits? In the case of deals with big tech to allow LLM training, I suggest that the answers to those questions are obvious.

Martyn Hammersley is emeritus professor of educational and social research at the Open University.

Register to continue

Why register?

Registration is free and only takes a moment
Once registered, you can read 3 articles a month
Sign up for our newsletter

Or subscribe for unlimited access to:

Unlimited access to news, views, insights & reviews
Digital editions
Digital access to �ձᷡ�� university and college rankings analysis

Please or to read this article.

<��Ƶ class="pane-title"> Related articles

Academic backlash as publisher lets Microsoft train AI on papers

Researchers claim that Taylor & Francis kept details of deal quiet, but company insists that citation and limits on verbatim quoting will be sacrosanct

By Patrick Jack

30 July

University presses rack up legal bills over AI copyright breaches

London Book Fair discussion dominated by concern over large language models using published works without citations or remuneration to authors or publishing houses

By Jack Grove

14 March

Illustration: Archimedes unveils a circuit board from behind a curtain

AI poses threats to education, ethics and eureka moments

The sudden rise of generative AI offers an opportunity for reflection and renewal of our scholarly values, say Ella McPherson and Matei Candea

By Ella McPherson

19 March

A hand comes out of a computer screen and steals a credit card

Editing companies are stealing unpublished research to train their AI

Both publishers and the editing firms they outsource to must seek informed consent to use academics�� IP, say Alan Blackwell and Zoe Swenson-Wright

By Alan Blackwell

12 January

<��Ƶ class="pane-title"> Reader's comments (3)

#1 Submitted by anonanon_1 on September 27, 2024 - 4:02pm

So, Microsoft and other companies steal copyrighted material to feed their LLMs, and the response of the publishers Informa and Sage is to demand payment for this, rather than preventing it - irrespective of the academic consequences?

#2 Submitted by DocStock on October 10, 2024 - 8:52am

But they aren't "stealing" this material. You guys transferred copyright to the publishers in order to get published. That was the Faustian bargain that academia made and y'all are now paying for it. I guess maybe researchers should have listened when all the open access activists called for us to boycott publishers decades ago? Ah, well. Too late now.

#3 Submitted by anonanon_1 on November 11, 2024 - 8:27pm

Copyright was only transferred on journal articles, not books, though newer book contracts have clauses that allow publishers to make deals like this, or so they claim. Open access would hardly solve the problem!

<��Ƶ class="pane-title"> Sponsored

������Ƶ

Publishers must not feed the machine munching through the academy

������Ƶ

������Ƶ

������Ƶ

Register to continue

Subscribe

Academic backlash as publisher lets Microsoft train AI on papers

University presses rack up legal bills over AI copyright breaches

AI poses threats to education, ethics and eureka moments

Editing companies are stealing unpublished research to train their AI

Sampled vivas are pivotal in combating AI cheating

AI: cheating matters but redrawing assessment ��matters most��

Nine in 10 UK undergraduates now using AI in assessments �C survey

The demise of India��s journal whitelist highlights regulatory challenges

��Ƶ

��Ƶ

��Ƶ

��Ƶ