Researchers have launched an ambitious new metric to rate the reproducibility of a scientific paper, hoping to discourage academics from publishing shaky but eye-catching results in an effort to rack up citations and advance their careers.
The “R-factor” would rate a scientific claim from zero to one, measured by looking at subsequent papers that cite the claim and dividing the number of those that verify it by the number that attempt to do so.
It is a direct challenge to other metrics, like the widely criticised journal impact factor, that are simply based on how many citations a paper gets, regardless of whether they are positive or negative.
Josh Nicholson, one of the inventors of the metric and chief research officer at Authorea, a New York-based research writing software company, said that attaching an R-factor to a paper should give researchers “a bit of pause before they actually publish stuff” because the metric will rise or fall depending on whether later work corroborates their findings.
A flashy finding published in a prestigious journal could end up as an albatross around the neck of its author if its R-factor turns out to be low, for example.
The average R-factor of all a researcher’s claims would be a bit like their “batting average in baseball”, according to a paper outlining the idea, “”, published on the bioRxiv preprint server.
In this paper, the authors calculate the R-factor of three prominent studies, and they find that one – which was published in Science and has gained more than 400 citations – has an R-factor of 0.5, giving a “warning that the claim might be untrue”.
The idea has attracted criticism. A blogger using the pseudonym??has argued that the measure is “simplistic” because it gives all subsequent studies equal weight, even if some have much bigger sample sizes than others.
Dr Nicholson said that the metric was indeed “very simple”, but this was “a strength of it”. The journal impact factor had caught on because “it’s so easy to understand”, he said, and a rival metric that was too complex would not be adopted.
The R-factor also includes as a subscript the total number of studies used in the calculation. This is because a paper could win a perfect score with 100 confirming studies or only 10 – or even just one.
Yuri Lazebnik, another member of the team hoping to advance the metric, said that the R-factor could incentivise scientists to publish so-called “negative” results that fail to confirm a hypothesis, because “if you have a negative result, and you know that at the very least it will contribute to lowering the R-factor of this paper from a competing lab…it might be a really good motivation [to publish negative results]”.
But one major challenge is categorising whether papers support, refute or merely cite a previous claim, a huge task that potentially involves reading and interpreting hundreds or even thousands of articles to get the R-factor of a single paper.
So far, the team has done this manually with about 12,000 papers, and it has now launched a website where members of the public can help out. The aim is to have enough examples to train a machine learning system to do the categorisation automatically. So far, a computer can accomplish about half the work, leaving humans to do the rest, but it should get better over time.