Thursday, January 19, 2012

The Problem of 'Overwhelm'

I wrote my previous post, Peer Review, Holy Cow, as a provocative one, of course. The reason, though, why I think the matter requires attention is that I see a problem looming: the problem of what I call 'overwhelm'. This problem looms on several levels. First of all the capacity of the pool of reviewers able and willing to deal with the growing manuscript flow – several times over, given that many articles 'cascade' down the journal pecking order and need to be peer-reviewed at every stage – is reaching its limits. And secondly, arguably more importantly, the increasing difficulty for researchers to read all that they might want to read. In many areas there is simply too much that's relevant.

In 1995 I wrote an article entitled "Keeping the Minutes of Science". I still take that phrase fairly literally. Research articles in journals are an account, for the record, of the research work done and the results achieved. Thinking along those lines (I realise that not everybody agrees with me) leads to a few consequences. 1) Articles are not primarily written for the reader-researcher, but mainly an obligation for the author-researcher (though the two groups overlap to a very large degree). The adage is 'publish-or-perish', remember? It's not 'read-or-rot'. 2) As a result, it is only proper that payment for publication comes from the side of the author. That is most fortunate, because that also makes Open Access possible. 3) 'Minutes' are often filed quickly and are not meant to be read widely. Only when there is a problem, are they retrieved and perused.

You will have spotted the paradox of, on the one hand, articles not being read widely and, on the other hand, the desirability for Open Access. Well, that is where the 'minutes' analogy falls down. Because articles do play a role in conveying the scientific knowledge and insights gained by the authors.

However, that doesn't mean that the information and knowledge bundled up in an article can be conveyed by it being read in the conventional way. This is where the problem of 'overwhelm' starts to bite. There is just too much to read for any normal researcher*. So more and more, articles need to be 'read' by computers as a help to researchers to ingesting the information they contain. That is why Open Access is so important, especially the BOAI-compliant OA that is governed by licences such as CC-BY. It makes large-scale computer-aided reading possible.

The notion of using computers for reading and ingesting large amounts of research articles means that presentation becomes unimportant. Or rather, important in different ways. The computer-readability and interoperability comes first, and visual elements such as lay-outs are practically irrelevant. In a comment by Scott Epstein to my Peer Review, Holy Cow post the point is made that visual presentation is a value-add that distinguishes publishers from outfits like ArXiv. The publishers' presentations are nice-to-have, of course, but far less 'need-to-have' than they seem to assume. Computers can deal very well with information presented in an ArXiv fashion, provided the computer interoperability is taken care of.

My personal contention is also that peer-review is largely redundant beyond a basic level that can well be taken care of by an endorsement system, once computers are involved. We shouldn't be so scared of potential 'rubbish' being included. Analyses of large amounts of information will most likely expose potential 'rubbish' in a way that makes it recognisable similar to how 'outliers' are identified in scatter graphs (and ignored, if that seems the right thing to do, in the judgment of the researcher looking at the results).

Scott also mentions 'curation'. Curation is important, but is not necessarily – or even often – done by reviewers or journal editors, in my experience. Scott seems to agree, as demonstrated by his choice of words: "journal editors also (or should) curate the content." A system that allows for (semantic) crowd-curation of scientific assertions, which subsequently could be recognised by computers in any articles in which these assertions are repeated, is likely to be a better use of available resources, with the added benefit of not relying on a small number of 'experts' but instead, using a much wider pool of potential expertise.

I would like to finish with a few quotes from Richard Smith, ex Editor of the British Medical Journal and currently Board Member of the Public Library of Science:
"...peer review is a faith-based rather than evidence-based process, which is hugely ironic when it is at the heart of science."

"...we should scrap prepublication peer review and concentrate on postpublication peer review, which has always been the ‘real’ peer review in that it decides whether a study matters or not. By postpublication peer review I do not mean the few published comments made on papers, but rather the whole ‘market of ideas,’ which has many participants and processes and moves like an economic market to determine the value of a paper. Prepublication peer review simply obstructs this process."
 Jan Velterop

*See "On the Impossibility of Being Expert" by Fraser and Dunstan, BMJ Vol 341, 18-25 December 2010

Wednesday, January 11, 2012

Holy Cow, Peer Review

Looking at it as dispassionately as possible, one could conclude that peer review is the only remaining significant raison d’être of formal scientific publishing in journals. Imagine that scientists, collectively, decided that sharing results were of paramount importance (a truism), but peer-review isn't considered important any longer. If you imagine that, then the whole publishing edifice would suddenly look very different. More like ArXiv (where, by the way, I found this interesting article).

A recent report estimates that the “total revenues for the scientific, technical and medical publishing market are estimated to rise by 15.8% over the next three years – from $26bn in 2011 to just over $30bn in 2014.” If we assume an annual output of 1 million articles, this revenue – which, for practical purposes, equals the cost to science of access to research publications – equates to a cost of $3000 per article, and even if the output is 1.5 million articles, it’s still $2000 per article.

So the real question is: is peer review worth that much? It’s not that peer review might not have benefits at all. At issue is the cost to science of such benefits as there may be. And although post-publication peer review could easily be done, by those who feel the inclination to do so, when and where it seems to be worth the effort, it may not happen very often, of course, as there are few incentives. Isn't the endorsement system a viable alternative?

Of course, ArXiv-oid publishing platforms also carry a cost, but per article it’s likely to be only a small fraction of the amounts mentioned above. In the case of ArXiv it is about $7 per article, each of which is also completely Open Access. Seven dollars! That’s the size of a rounding error on the amounts of $2000 - $3000.

Peer review made sense in an era when publishing necessarily claimed expensive resources, such as paper to print on, physical distribution, shelf space in libraries, et cetera. One had to be careful and spend those resources on articles that were likely to be worth it, and even then restrict what was spent on individual articles by imposing maximum lengths and the like. Also, finding the articles worth reading was difficult and the choices and guidance journal editors and editorial boards made were welcome.

How all this has changed with the advent of the Web. There is hardly any need for restrictions on the number and length of articles anymore, and searching – not to mention finding – articles that are relevant to the specific project a researcher is working on has become dramatically easier. As a result, the filtering and selecting functions of journals have become rather redundant.

“All very well, but what about the quality assurance that peer review provides?” Well, it is debatable that peer review does that reliably, though I’m willing to accept that it might. However, given its costs, can we really not deal with a lack of this quality assurance in the light of the benefits of universal and inexpensive Open Access that ArXiv-oid platforms could bring? Are we not dealing with it right now? We all know that almost all articles eventually meet their accepting journal editor, and it’s difficult to imagine that every article we find with a literature web search is of sufficient ‘quality’ (whatever that means anyway) for our purposes. And yes, we will encounter ‘rubbish’ articles. Don’t we now, with nigh universal peer review? But we deal with outliers in data all the time, and it is my conviction that we can deal with outliers in the literature just as well. Anyway, ArXiv-oid platforms with an endorsement system will to a large degree prevent excesses.

Scientists are people, and as such not too well equipped to make completely rational choices. Besides, the ‘ego-system’ of qualifying for grants, tenure, et cetera, has it’s own rationality (akin to how to deal with the prisoner's dilemma). But the prospect of being able to save tens of billions of dollars each year, even after allowing generous sums for running ArXiv-oids with endorsement systems instead of peer review, which savings could be used for research (the amounts saveable are not far off the annual NIH research budget!), must be food for some serious thought. Let's see if we can think this through. It's not fair to expect scientists themselves to break the cycle. But funding bodies?

I realise that what I'm proposing here is the 'furthest point', but that's where we have to hook up the tightrope, if we want to be able to traverse the chasm separating today from what might be, no?

Jan Velterop