Thursday, January 19, 2012

The Problem of 'Overwhelm'

I wrote my previous post, Peer Review, Holy Cow, as a provocative one, of course. The reason, though, why I think the matter requires attention is that I see a problem looming: the problem of what I call 'overwhelm'. This problem looms on several levels. First of all the capacity of the pool of reviewers able and willing to deal with the growing manuscript flow – several times over, given that many articles 'cascade' down the journal pecking order and need to be peer-reviewed at every stage – is reaching its limits. And secondly, arguably more importantly, the increasing difficulty for researchers to read all that they might want to read. In many areas there is simply too much that's relevant.

In 1995 I wrote an article entitled "Keeping the Minutes of Science". I still take that phrase fairly literally. Research articles in journals are an account, for the record, of the research work done and the results achieved. Thinking along those lines (I realise that not everybody agrees with me) leads to a few consequences. 1) Articles are not primarily written for the reader-researcher, but mainly an obligation for the author-researcher (though the two groups overlap to a very large degree). The adage is 'publish-or-perish', remember? It's not 'read-or-rot'. 2) As a result, it is only proper that payment for publication comes from the side of the author. That is most fortunate, because that also makes Open Access possible. 3) 'Minutes' are often filed quickly and are not meant to be read widely. Only when there is a problem, are they retrieved and perused.

You will have spotted the paradox of, on the one hand, articles not being read widely and, on the other hand, the desirability for Open Access. Well, that is where the 'minutes' analogy falls down. Because articles do play a role in conveying the scientific knowledge and insights gained by the authors.

However, that doesn't mean that the information and knowledge bundled up in an article can be conveyed by it being read in the conventional way. This is where the problem of 'overwhelm' starts to bite. There is just too much to read for any normal researcher*. So more and more, articles need to be 'read' by computers as a help to researchers to ingesting the information they contain. That is why Open Access is so important, especially the BOAI-compliant OA that is governed by licences such as CC-BY. It makes large-scale computer-aided reading possible.

The notion of using computers for reading and ingesting large amounts of research articles means that presentation becomes unimportant. Or rather, important in different ways. The computer-readability and interoperability comes first, and visual elements such as lay-outs are practically irrelevant. In a comment by Scott Epstein to my Peer Review, Holy Cow post the point is made that visual presentation is a value-add that distinguishes publishers from outfits like ArXiv. The publishers' presentations are nice-to-have, of course, but far less 'need-to-have' than they seem to assume. Computers can deal very well with information presented in an ArXiv fashion, provided the computer interoperability is taken care of.

My personal contention is also that peer-review is largely redundant beyond a basic level that can well be taken care of by an endorsement system, once computers are involved. We shouldn't be so scared of potential 'rubbish' being included. Analyses of large amounts of information will most likely expose potential 'rubbish' in a way that makes it recognisable similar to how 'outliers' are identified in scatter graphs (and ignored, if that seems the right thing to do, in the judgment of the researcher looking at the results).

Scott also mentions 'curation'. Curation is important, but is not necessarily – or even often – done by reviewers or journal editors, in my experience. Scott seems to agree, as demonstrated by his choice of words: "journal editors also (or should) curate the content." A system that allows for (semantic) crowd-curation of scientific assertions, which subsequently could be recognised by computers in any articles in which these assertions are repeated, is likely to be a better use of available resources, with the added benefit of not relying on a small number of 'experts' but instead, using a much wider pool of potential expertise.

I would like to finish with a few quotes from Richard Smith, ex Editor of the British Medical Journal and currently Board Member of the Public Library of Science:
"...peer review is a faith-based rather than evidence-based process, which is hugely ironic when it is at the heart of science."

"...we should scrap prepublication peer review and concentrate on postpublication peer review, which has always been the ‘real’ peer review in that it decides whether a study matters or not. By postpublication peer review I do not mean the few published comments made on papers, but rather the whole ‘market of ideas,’ which has many participants and processes and moves like an economic market to determine the value of a paper. Prepublication peer review simply obstructs this process."
 Jan Velterop



*See "On the Impossibility of Being Expert" by Fraser and Dunstan, BMJ Vol 341, 18-25 December 2010



Wednesday, January 11, 2012

Holy Cow, Peer Review

Looking at it as dispassionately as possible, one could conclude that peer review is the only remaining significant raison d’ĂȘtre of formal scientific publishing in journals. Imagine that scientists, collectively, decided that sharing results were of paramount importance (a truism), but peer-review isn't considered important any longer. If you imagine that, then the whole publishing edifice would suddenly look very different. More like ArXiv (where, by the way, I found this interesting article).

A recent report estimates that the “total revenues for the scientific, technical and medical publishing market are estimated to rise by 15.8% over the next three years – from $26bn in 2011 to just over $30bn in 2014.” If we assume an annual output of 1 million articles, this revenue – which, for practical purposes, equals the cost to science of access to research publications – equates to a cost of $3000 per article, and even if the output is 1.5 million articles, it’s still $2000 per article.

So the real question is: is peer review worth that much? It’s not that peer review might not have benefits at all. At issue is the cost to science of such benefits as there may be. And although post-publication peer review could easily be done, by those who feel the inclination to do so, when and where it seems to be worth the effort, it may not happen very often, of course, as there are few incentives. Isn't the endorsement system a viable alternative?

Of course, ArXiv-oid publishing platforms also carry a cost, but per article it’s likely to be only a small fraction of the amounts mentioned above. In the case of ArXiv it is about $7 per article, each of which is also completely Open Access. Seven dollars! That’s the size of a rounding error on the amounts of $2000 - $3000.

Peer review made sense in an era when publishing necessarily claimed expensive resources, such as paper to print on, physical distribution, shelf space in libraries, et cetera. One had to be careful and spend those resources on articles that were likely to be worth it, and even then restrict what was spent on individual articles by imposing maximum lengths and the like. Also, finding the articles worth reading was difficult and the choices and guidance journal editors and editorial boards made were welcome.

How all this has changed with the advent of the Web. There is hardly any need for restrictions on the number and length of articles anymore, and searching – not to mention finding – articles that are relevant to the specific project a researcher is working on has become dramatically easier. As a result, the filtering and selecting functions of journals have become rather redundant.

“All very well, but what about the quality assurance that peer review provides?” Well, it is debatable that peer review does that reliably, though I’m willing to accept that it might. However, given its costs, can we really not deal with a lack of this quality assurance in the light of the benefits of universal and inexpensive Open Access that ArXiv-oid platforms could bring? Are we not dealing with it right now? We all know that almost all articles eventually meet their accepting journal editor, and it’s difficult to imagine that every article we find with a literature web search is of sufficient ‘quality’ (whatever that means anyway) for our purposes. And yes, we will encounter ‘rubbish’ articles. Don’t we now, with nigh universal peer review? But we deal with outliers in data all the time, and it is my conviction that we can deal with outliers in the literature just as well. Anyway, ArXiv-oid platforms with an endorsement system will to a large degree prevent excesses.

Scientists are people, and as such not too well equipped to make completely rational choices. Besides, the ‘ego-system’ of qualifying for grants, tenure, et cetera, has it’s own rationality (akin to how to deal with the prisoner's dilemma). But the prospect of being able to save tens of billions of dollars each year, even after allowing generous sums for running ArXiv-oids with endorsement systems instead of peer review, which savings could be used for research (the amounts saveable are not far off the annual NIH research budget!), must be food for some serious thought. Let's see if we can think this through. It's not fair to expect scientists themselves to break the cycle. But funding bodies?

I realise that what I'm proposing here is the 'furthest point', but that's where we have to hook up the tightrope, if we want to be able to traverse the chasm separating today from what might be, no?

Jan Velterop


Wednesday, December 14, 2011

PDF resurrected

This blog is devoted to open access. Please subscribe. To the concept of open access to scientific information, that is.

Not for the sake of open access in itself. No l'art pour l'art. But for the sake of enabling scientists to make use of any information that is relevant to their research in any way that makes sense for them. In that spirit, please allow me to divert to writing about something that is not open access as such, but does help scientists to get to and use available knowledge more efficiently and conveniently.

Much – actually, the overwhelming majority – of the scientific literature is made available in the form of PDFs. There are good reasons for that. Easily downloaded, easily stored on your hard disk, easily printed nicely, integrity guaranteed to a satisfactory degree ('version of record'), et cetera. But in a web-connected world, having static, 'dead' documents like PDFs also has major drawbacks. Many scientists would like to look 'beyond the PDF'. Me, too. I am on record to have used the awkward verb 'depedefy' and making the case that that is just what should be done. No longer.

What has changed? Well, Utopia Documents. It is a scientific PDF-reader that connects the articles you have in PDF format to the web. Any PDF document that's not just a bitmap (image). The published articles, manuscripts written in MS Word that you have saved on your laptop or deposited in repositories as PDFs, even whole books. When you have Utopia Documents installed (it's free, and available for Mac, Windows and Linux from www.getutopia.org), the advantages of PDFs remain, and the disadvantages begin to melt away.

The current version of Utopia Documents is optimised for the life sciences – molecular biology, biochemistry, preclinical medicine, and the like. But this is clearly only the beginning. New functionalities and links to resources are continually being added and upgrades will be released regularly. Progress keeps being made in the foreseeable future and beyond, but that's no real reason to wait with using the PDF-reader, of course.

Conflict of interest: no conflict, actually, just interest. I'm a fan. I will do what I can to advocate Utopia Documents because I think it is a wonderful tool for scientists, potentially making their lives easier and their research more effective. And I and my colleagues will assist those who developed it and are continually improving it, with ensuring its sustainability.

Please help. The only things you have to do is to download the software, start using it, and tell your friends and colleagues about Utopia Documents.

Jan Velterop


Tuesday, December 13, 2011

Science Publishing: All About Submission

I think 'gold' open access publishing needs to be supported by submission fees rather than article publication fees, as is now generally the case.

The basic reason I am in favour of submission fees is that it makes scientific publishing really the service industry that it is, its main task nowadays having nothing to do with publishing per se, but mainly with arranging peer review and quality assurance of one sort or another. 'Publisher' is therefore a bit of a misnomer by now, a relic of the past. Publishing, as in 'making public', is very easy and people can do it by themselves, in a way that one does on a blog, for instance. Or even by depositing a manuscript in an institutional repository, which is publishing, in the sense of 'making public'. Science publishers should really be called 'quality assurance providers' or something in that vein. Because that is what a modern STM publisher is. A (perhaps too simplistic, but quite useful) model may be the 'exam' model. Submitting a paper is not unlike applying for, say, a driver's test, for which you pay, irrespective of the outcome. An article's scientific robustness is being tested; little else is of relevance (or rather, should be of relevance).

Apart from this, there are some clearly beneficial consequences of a submission-fee system.
  • It discourages spurious submissions and encourages pitching at the right journal at the right level
  • It therefore relieves pressure on the peer review system (fewer unnecessary rounds of peer review)
  • It relates any fees paid for the main work done by 'publishers'
  • It allows any prestige journals (insofar that they have a reason to exist) not to have to worry about high rejection rates and the related necessity of high article publication fees otherwise needed for OA to the small number of accepted articles (the Nature and Science argument)
  • It spreads the amount needed by a 'publisher' over a larger number of articles – the accepted plus the rejected – leading to the possibility of lower average fees
  • It removes the suspicion that OA journals might be tempted to accept more than they should just because of the money that accepted articles bring
To be fair, there are also downsides.
  • The need to be able to justify rejections properly, particularly if challenged (after all, submitters have paid for an assessment)
  • The reality that other publishers offer free submission (although this argument may not cut too much ice, given that it was also used against author-side payment, which turned out not as deadly to the model as was thought by opponents of OA)
The last point is probably keeping publishers from going in the direction of submission fees. I do hope that one of the more visionary publishers dares to make the plunge.

Jan Velterop

Sunday, December 11, 2011

Hybrid journals – double or quits?

Hybrid journals – journals that combine toll access to some articles with open access to others – do not generally enjoy a good press. Terms such as 'double-dipping' are used. This is not justified, as a general rule. I can't guarantee that double-dipping never happens, but I don't think it is generally the case. Publishers could do more to disabuse the library and research communities of the notion that it is, though.

That said, it is difficult, because even the basic understanding of how a subscription system works is often lacking outside (and even sometimes inside echelons of) the publishing community. One of the difficulties is that deciding on the price of subscriptions depends on a number of prior assumptions. There are possibly more than these three, but they are the most important ones: 1) how many subscriptions will we be able to sell; 2) how many submissions will we get and how many of those will be accepted for publication (i.e. what will the costs be); and 3) what margins can we expect to contribute to overheads and profit (or surplus, in the case of a not-for-profit publisher).

Typically, a publisher will have a portfolio of journals of which some do well, some just break even, and some make a loss if all costs, including overheads, are fully allocated. Hybrid journals will be found in all three categories. So what does 'double-dipping' mean? Are loss-making journals 'half-dipping'? Is 'double-half-dipping', in the case of those loss making journals, just 'single dipping'? Does it even make sense to think in those terms?

I think not. The objective way to look at it is to see the subscription price as the price to be paid for the non-OA articles that are published in a hybrid journal. That may be low or high if expressed in subscription price per non-OA article, but that is what a subscription to a hybrid journal is. Incidentally, comparing subscription prices per article (p/a) across a library collection will show a very wide range, and the inclusion or exclusion of hybrid journals is not likely to make any difference in the distribution of p/a in that range.

It may be helpful to think of a hybrid journal as twin journals sharing the same title, Editor, Editorial Board and editorial policy: one subscription-based, and one OA.

The OA articles in a hybrid journal are just as much OA as in any OA journal as long as they give the reader/user the same rights (of access and re-use), i.e. as long as they are covered by a licence such as the Creative Commons Attribution License (CC-BY) or the CC Attribution and Share Alike License (CC-BY-SA) and not the CC Attribution Non-Commercial License (CC-BY-NC). Sticking to CC-BY-NC licences, which does happen, is a sign of insecurity on the part of a publisher or of a lack of understanding as to what the purpose of open access actually is. Though there may be a number of cases where the publisher has overcome that insecurity but just hasn't thought about changing the licences yet.

As said above, hybrid journals do not generally enjoy a good press, but I have heard positive comments about them as well in the scientific community. Those relate to the notion that the editorial policy (the acceptance/rejection policy) of hybrid journals is not influenced by the potential financial contribution coming from the authors. The 'open choice' is typically given as an option only after the article has passed peer review and is accepted. I don't think acceptance and rejection policies of any respectable OA journal are influenced by the prospect of authors paying, and I certainly don't know of any such practices at the OA publishers I am familiar with, but it is an extra assurance hybrid journals offer that that is indeed not the case.

Jan Velterop

Saturday, December 10, 2011

The future of today is not what it used to be

One of the threads on the Liblicense-L discussion forum, on 'the future of the subscription model', has been running for quite a while now. Without much consensus. A few exchanges on the desirability and practicability of an 'author-side-paid' open access model as an alternative, strayed into a system financed by submission fees rather than fees for publication, and I am inviting views on that idea. The exchange on the forum, in chronological order (unedited save for a few typos and the insertion of a few hyperlinks):
(7 December 2011) There was an earlier comment on this thread (which I lost, alas) to the effect that one way to build an author-pays service is with a fee for submission rather than for publication. This is a great idea, and in a world without ruinous competition (John D. Rockefeller's phrase), it would work beautifully, as it aligns the cost to authors with the actual cost of delivering the service. But what happens when your competitor offers a free Christmas promotion? Or if eLife takes 10 years to figure out a business model? In a competitive market, you can never be smarter than your stupidest competitor, and if that
competitor wants to give away the store, I can see your store loaded onto someone else's truck.
 

Joe Esposito
(7 December 2011) Joe, isn't this already happening? And isn't this why a system based on submission fees hasn't successfully emerged yet?

The competition (subscription-based journals) are offering free promotions (to authors) all the time. They have found people who pay them through the back door (librarians, paying for subscriptions, as
long as it lasts). "In a competitive market you can never be smarter than your stupidest competitor." The words are yours.

This discussion is called "Future of the Subscription Model". The fundamental issue here is that the subscription model is simply not suited to an environment where maximum distribution is possible without marginal cost, and what is being distributed is not consumable (in the sense that it disappears if you consume it). In the bible there is a story about 'loaves and fish'. Allegorical (I presume). But scientific information in the internet environment is like the biblical loaves and fish. Albeit not food for the body, but food for thought. Scientific thought.

Jan Velterop
(9 December 2011) Oh, gosh, Jan, where to begin? This is just plain wrong. There is nothing "back door" about having a librarian pay for something. And it would be a wonderful world if maximum distribution were possible without marginal cost, but in fact there are huge costs to that distribution, if by "distribution" you mean that you persuade people actually to read something. Moving bits around costs nothing, and presumably this is what you mean, but the bits on my hard drive are meaningless unless I engage with them.

We have here the old saw about non-rival goods. It does not apply to media. Media is not a product but something that must engage human attention. That's a scarce thing. There is no superabundance of information when you take into account that someone has to be thinking about the information.

But it really is unfortunate that you insist on making this a binary game. I don't know if I could possibly have been more lavish in my admiration for the author-pays model.  It sits side by side with the subscription model and other forms of traditional (that is, toll-access) publishing. Who has to choose? Over time, the different economics of these models will influence the nature of the content such that you will get different things from subscriptions than you do for author-pays. Is there anything wrong with that? Why would anyone accuse a radio of not being a television?

Joe Esposito

(9 December 2011) Joe, I fear we are talking cross-purposes. My frame of reference is primarily STM journals. In that frame of reference I just don't recognise your definition of 'distribution' as "actually persuading people to read something", unless 'something' means literally that. Sure, publishers try to stimulate downloads, since they help them making he case to librarians that they should renew their subscriptions and licences. But it's a numbers game, in which 'something' pretty much means 'anything', and the marginal cost of extra downloads is negligible.

Indeed, the bits on your hard drive are meaningless unless you engage with them. Researchers do engage with the information they have access to, and they would like to have even more to engage with, but all this engaging isn't necessarily in the form of reading nowadays. Perhaps it can be described as 'meta-reading', but it is more and more about extracting facts and assertions, collating them with those from a large number of publications, connecting and relating them, analysing them, and using the information gleaned as a basis for further thinking, experimentation, et cetera. Occasionally articles are still being read linearly, but even then, particularly if they are being read online (or nowadays also in PDF when the PDF is opened with the likes of the scientific reader Utopia - free from getutopia.org) as a starting point for further navigation of information and knowledge.

Human attention is indeed a scarce thing. And that attention is less and less being attracted by journals per se, let alone their publishers. It's the connections between facts and information across a vast array of publications and databases that attract attention. Fragmentation of information in all manner of different journals some of which are accessible and some not is the scourge of many a scientist. The observation that most articles are being accessed only after having been found with general search tools such as Google are testimony to the fact that practically nobody relies anymore on the choices journals and their publishers make.

The role of a publisher nowadays is to provide the service to a scientist of having his or her contributions peer reviewed and subsequently added to the common pool (well, ocean) of knowledge and information in a standardised, accessible and attributable way. That role is not satisfactorily played with the encumbrances of the subscription model.

You are right that the subscription model may be suitable to content of a certain nature. Magazines – even scientific ones – and the like come to mind. Nothing wrong with that. For the mainstay of scientific communication, however, the model is not suitable any longer. Of course it will amble along for a time. Quite a long time, even. Inertia is a pretty strong force.

Jan Velterop

The discussion is likely to continue...



Friday, December 09, 2011

Mandatory Academic Freedom?

From the Internet Freedom Conference in The Hague8-9 December 2011 (or from following on the internet) Dimitar Poposki (Twitter name @) sent the following tweet: "Q: Can Open Access to taxpayers scientific research be considered as a mandatory academic freedom?"

The answer is 'no'. 
 
First of all, what is 'academic freedom'? From J. Peter Byrne, "Academic Freedom", 99 Yale Law Journal 251, 252-253 (1989):
The First Amendment protects academic freedom. This simple proposition stands explicit or implicit in numerous judicial opinions, often proclaimed in fervid rhetoric. Attempts to understand the scope and foundation of a constitutional guarantee of academic freedom, however, generally result in paradox or confusion. The cases, shorn of panegyrics, are inconclusive, the promise of rhetoric reproached by the ambiguous realities of academic life.
The problems are fundamental: There has been no adequate analysis of what academic freedom the Constitution protects or of why it protects it. Lacking definition or guiding principle, the doctrine floats in law, picking up decisions as a hull does barnacles.
Einstein defined it rather pithily as:
"The right to search for truth and to publish and teach what one holds to be true."
And added to it a duty:
"One must not conceal any part of what one has recognised to be true."
In that vein, I would think it easy to make the case that another duty is publishing with open access any research carried out with public money.

But 'mandatory freedom' is an oxymoron. That's why the answer is 'no'.