Sunday, September 09, 2012

'Pixels of information'

My friend Barend Mons wrote to me and I think it is worth sharing his letter on this blog. I checked with him, and he agrees that it can be shared on this blog.
Dear Jan,

I'm writing to you inspired by your remark that "OA is not a goal in itself but one means to an end: more effective knowledge discovery".

What we need for eScience is Open Information to support the Knowledge Discovery process. As eScience can be pictured as 'science that can not be done without a computer', computer reasonable information is the most important element to be 'open'. 
You're right, Barend. That's why I think CC-BY is a necessary element of open access. 
As we discussed many times before, computer reasoning and 'in silico' knowledge discovery leads essentially to 'hypotheses' not to final discoveries. There are two very important next steps. First, what I would call 'in cerebro' validation, mainly browsing the suggestions provided by computer algorithms mining the literature and 'validating' individual assertions (call them triples if you wish) in their original context. 'Who asserted it, where, based on what experimental evidence, assay...?' etc. In other words, why should I believe (in the context of my knowledge discovery process) this individual element of my 'hypothesis-graph' to be 'true' or 'valid'? Obviously in the end, the entire hypothesis put forward by a computer algorithm and 'pre'-validated by human reasoning based on 'what we collectively already know' needs to be experimentally proven (call it 'in origine' validation).

What I would like to discuss in a bit more depth is the 'in cerebro' part. For practical purposes I here define 'everything we collectively know', or at least what we have 'shared' as the 'explicitome' (I hope Jon Eisen doesn't include that in his 'bad -omes'), essentially a huge dynamic graph of 'nanopublications' or actually rather 'cardinal assertions' where identical, repetitive nanopublications have already been aggregated and assigned an 'evidence factor'.  Whenever a given assertion (connecting triple) is not a 'completely established fact' (the sort of assertion you repeat in a new narrative without the need to add a reference/citation) we will go to narrative text 'forever' to 'check the validity' in my opinion.

Major computer power is now exploited for various intelligent ways to infer the 'implicitome' of what we implicitly know (sorry, Jon, should you ever see this!), but triples captured in RDF are certainly no replacement for narrative in terms of reading a good reasoning, why conclusions are warranted, extensive description of materials and methods etc. So the 'validation' of triples outside their context will be a very important process in eScience for many decades to come. In fact your earlier metaphor of the 'minutes of science' fits perfectly in this model. 'Why would I believe this particular assertion'? ... Well, look in the minutes by whom, where and based on what evidence it was made'.

Now here is a very relevant part of the OA discussion: The time when some people thought that OA was a sort of charity model for scientific publishing is definitely over, with profitable OA publishers around us. The only real difference is: do we (the authors) pay up front, or do we refuse that (for whatever good reason, see below) and now the reader has to pay 'after the fact'. So let's first agree that there is no 'moral superiority', whatever that is, in OA over the traditional subscription model.  
Not sure if I agree, Barend. OK, let's leave morals out of it, but first of all, articles in subscription journals can also be made open access via the so-called 'green' route of depositing the accepted manuscript in an open repository; and secondly, OA at source, the so-called 'gold' route, is definitely practically and transparently the superior way to share scientific information with anyone who needs or wants it.
We have also seen the downsides of OA, for instance for researchers in developing countries who may still have great difficulty to find the substantial fees to publish in the leading Open Access journals.

I believe however, that we have a great paradigm shift right in front of us. Computer reasoning and ultralight 'RDF graphs' distributing the results to (inter alia) mobile devices will allow global open distribution of such 'pixels of information' at affordable costs, even in developing countries. Obviously, a practice that will be associated is to 'go and check' the validity of individual assertions in these graphs. That is exactly where the 'classical' narrative article will continue to have its great value. It is clear that the costs of reviewing, formatting, cross-linking and sustainably providing the 'minutes of science' is costly and that the community will have to pay for these costs via various routes. I feel that it is perfectly defensible that those articles for which the publishing costs have not been paid for by the authors, and that are still being provided by classical publishing houses, should continue to 'have a price'. As long as all nanopublications (let's say the assertions representing the 'dry facts' contained in the narrative legacy as well as data in databases) are exposed in Open (RDF) Spaces for people and computers to reason with, the knowledge discovery process will be enormously accelerated. Some people may still resent that they may have to pay (at least for some time to come) for narrative that was published following the 'don't pay now — subscribe later' adage. We obviously believe that the major players from the 'subscription age' have a responsibility, but also a very strong incentive to develop new methods and business models that allow a smooth transition to eScience-supportive publication without becoming extinct before they can adapt.

Your views are certainly worth a serious and in-depth discussion, Barend. I invite readers of this blog to join in and engage in that discussion.

Jan Velterop

Tuesday, August 07, 2012

Open access – gold versus green

Recently, Andrew Adams contributed to the 'gold' vs. 'green' open access discussion and he wrote this on the LIBLICENSE list (edited for typos):
There are on the order of 10,000 research instutitions and more than ten times as many journals. Persudaing 10,000 institutions to adopt OA deposit mandates seems to me a quicker and more certain route to obtain OA than persuading 100,000 journals to go Gold (and finding more money to bribe them into it, it would appear – money which is going to continue to be demanded by them in perpetuity, not accepted as a transitional fee – there's nothing so permanent as a temporary measure). (Full message here.)
The LIBLICENSE list moderator would not post my response, so I'm giving it here:

10,000 research institutes means, in terms of Harnadian 'green' mandates, a need for 10,000 repositories; 100,000 journals (if there were so many; I've only ever heard numbers in the order of 20-25,000 [recently confirmed as in the order of 28K]) does not mean 100,000 publishers. Besides, there is no existential reason for institutions to have a repository and 'green' mandate. The fact that others have repositories and it doesn't have one itself does not harm a research institution in the same way that not being 'gold' (or at least having a 'gold' option) does existentially harm journals in an environment of more and more 'gold' journals.

As for costs, there are two things that seem to escape the attention of 'green' advocates (by which I mean those who see no place for 'gold' open access at this stage on the basis that 'green' would be a faster route to OA and would be cheaper):
  1. 'Green' fully depends on the prolongation of the subscription model. Without subscription revenues no journals, hence no peer-reviewed articles, hence nothing to self-archive but manuscripts, arXiv-style. (That would be fine by me, actually, with post-publication peer review mechanisms overlaying arXiv-oids). The cost of maintaining subscriptions is completely ignored by exclusively 'green' advocates, who always talk about 'green' costing next to nothing. They are talking about the *marginal* cost of 'green', and compare it to the *integral* cost of 'gold'.
  2. Exclusively 'green' advocates do not seem to understand that for 'gold' journals, publishers are not in any position to "demand money". They can only offer their services in exchange for a fee if those who would pay the fee are willing to pay it. That's known as 'competition', or as a 'functioning market'. By its very nature, it drives down prices. This in contrast to the monopoloid subscription market, a dysfunctional market, where the price drivers face upwards. Sure, some APC's increased since the early beginnings of 'gold' OA publishing, when 'gold' publishers found out they couldn't do it for amounts below their costs. But generally, the average APCs per 'gold' article are lower — much lower — than the average publisher revenues per subscription article. And this average per-article subscription price will still have to be coughed up in order to keep 'green' afloat.
Price-reducing mechanisms would even work faster if and when the denizens of the ivory tower were to reduce their culturalism and anglo-linguism that currently prevails, in which case we could rapidly see science publishing emerge in places like China, India, and other countries keen on establishing their place in a global market, competing on price. APCs could tumble. Some call this 'predatory gold OA publishing'. Few seem to realise that the 'prey' is the subscription model.

The recently published Finch Report expresses a preference for immediate, 'libre', open access, and sees 'gold' as more likely to be able to deliver that than 'green'. Meanwhile, 'green' is a way to deliver OA (albeit delayed and not libre) in cases where 'gold' is not feasible yet. That is an entirely sensible viewpoint, completely compatible with the letter – and I think also the spirit – of the Budapest Open Access Initiative (BOAI). Incidentally, referring to the BOAI is characterised as "fetishism" (sic) by Andrew Adams.

Comparing 'green' and 'gold' is almost, to borrow a phrase from Stevan Harnad, "comparing apples and orang-utans". The Finch report is not mistaken to see 'green' as (in the words of Michael Jubb) an "impoverished type of open access, with embargo periods, access only to an authors’ manuscript, without links and semantic enrichment; and severe limitations on the rights of use." After all, in the 'green' ID/OA scheme (ID = Immediate Deposit and OA meaning 'Optional Access" here) favoured by Harnad c.s., deposited articles may be made open if and when the publisher permits.

Besides, 'gold' implies also 'green' ('gold' articles can be deposited, without embargo or limits on use, anywhere, and by anyone), where 'green' does not imply 'gold'. A Venn diagram might look like this (below).

The Finch group has come to its conclusions because they have clearly learnt the lessons of the last decade. There is nothing — repeat: *nothing* — that prevents academics to eschew the services of "rent-seeking" (as Adams put it) publishers. They could easily self-organise (though I realise that both the words 'could' and 'easily' are probably misplaced). To expect publishers (for-profit and not-for-profit ones alike) to refuse providing services that academics are seeking from them is silly.

For the avoidance of doubt, I am not against 'green' OA (in spite of what some 'green'-only advocates assert), especially not where there is no other option. The choice is not so much for or against 'green' or 'gold', but emphatically for full, unimpeded open access, however it is delivered, as long as it is "permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself." You recognise this last phrase? Indeed, the precise wording of the BOAI.

Jan Velterop

Wednesday, August 01, 2012

The triumph of cloud cuckoo land over reality?

It should be abundantly clear that Open Access policies by Finch, RCUK, Wellcome Trust and many others are very important for the development of universal OA, in that they not only indicate practical ways of achieving it, but also signal to the scholarly community and the wider society interested in scientific knowledge and its advance that OA should be the norm.

The 'sin' that RCUK, Finch and the Wellcome Trust committed is that they didn't formulate their policies according to strict Harnadian orthodoxy. It's not that they forbid Harnadian OA (a.k.a. 'green'), oh no. It is that they see the 'gold' route to OA as worthy of support as well. Harnad, as ultimate arbiter of Harnadian OA (though he has acolytes), would like to see funder and institutional OA policies focus entirely and only on Harnadian OA, and would want them, to all intents and purposed, forbid the 'gold' route. In Harnad's view, the 'gold' route comes into play (as 'downsized gold', whatever that means) only once all scholarly journal literature is OA according to Harnadian rules. These rules are quite specific:
  • articles must be published in peer-reviewed subscription journals; 
  • institutions must mandate their subsequent deposit in an institutional repository (not, for instance in a global subject repository); 
  • there must be no insistence on OA immediately upon publication (his big idea is ID/OA — Institutional Deposit / Optional [sic] Access); 
  • here must be no insistence on CC-BY or equivalent (which would make re-use and text-mining possible — OA in his view should just be ocular access, not machine-access).
It must be difficult to comply with these rules, and seeing his recent applause, subsequently followed by withdrawal of support, for the RCUK policy, even Harnad himself finds it difficult to assess whether his rules are 'properly' adhered to. It also seems as if his main focus is not OA but mandated deposit in institutional repositories. Probably hoping that that will eventually lead to OA. He would like to see 'gold' OA — OA at source — considered only if and when it is "downsized Gold OA, once Green OA has prevailed globally, making subscriptions unsustainable and forcing journals to downsize." It is the equivalent of opening the parachute only a split second before hitting the ground. It would be the triumph of a dogmatically serial process over a pragmatically parallel one. The triumph of cloud cuckoo land over reality.

Open Access is more than worth having. Different, complementary, ways help achieve it. There are many roads leading to Rome.

Jan Velterop
OA advocate

Monday, June 11, 2012

Small publications, large implications

When I recently enjoyed lunch with Steve Pettifer of Manchester University (the ‘father’ of Utopia Documents), the conversation turned to nanopublications. Ah, you want to know what nanopublications are. Nanopublications are machine-readable, single, attributable, scientific assertions.

Steve posed the question “why would any scientist believe a nanopublication, particularly if out of context?” Indeed, why would they? Why should they, well versed as scientists are in the art of critical thinking. They won’t, at least not without seeing the appropriate context.

Herein lies a great opportunity.

Let me explain. Nanopublications, or rather, their core in the form of machine-readable object-predicate-subject triples, can be incorporated in (vast) collections of such triples and used for reasoning, to discover new knowledge, or to make explicit hitherto tacit or hidden knowledge. Triples can therefore be very valuable to science. (The Open PHACTS project is in the process of establishing the value of this approach for drug discovery.) Many, perhaps most, scientific articles contain such single assertions, which could be presented as nanopublications.

In a recent Nature Genetics commentary called ‘The Value of Data’, Barend Mons et al. addressed this issue with the metaphor of the chicken and the egg. Now that eggs (individual assertions) are being distributed (‘traded’), their value (they all look roughly the same) can only be truly assessed by knowing the parents. Scientists will always want to personally judge whether a crucial connecting assertion in a given hypothesis is one they can accept as valid. The ability to locate where the assertion came from, in which article, in which journal, by which author, and when it was published – in short the ‘provenance’ of individual scientific assertions functioning in computer reasoning – is crucial for that. As is the ability to access the article in question.

Scientific publishers should, in their quest to add value to research publications, expose and clearly present the nanopublications contained in the articles they publish, particularly those that are believed (e.g. by the author, or the reviewers) to be unique and new. What’s more, they should make them openly and freely available, like they do with abstracts, even publishers that are not yet convinced that they should change their business models and make all their content open access. And they should not just make nanopublications open and accessible to human readers, but also to machines, because only machines are able to effectively process large numbers of nanopublications, treating each one as a ‘pixel’ of the larger picture that a researcher is building up.

So what’s the opportunity?

Well, openly accessible nanopublications are very useful for scientific discovery, they are attributable (to author, article, and journal) and scientist don’t just believe them when they see them, particularly if the assertion is new to them or when they find it in a process of computer-assisted (in silico) reasoning. Researchers will be eager to investigate their source, i.e. check out the article from which the nanopublication comes. They may cite the nanopublication, and in doing so, cite the article. An obvious win-win situation for scientists (in their roles of users and authors) and publishers alike.

What are we waiting for?

Jan Velterop

Sunday, April 29, 2012

OA not just for institutionalised scientists

On the Global Open Access List, an email list, a thread has developed on 'Open Access Priorities: Peer Access and Public Access'. Of course, true open access means access both for peers (meaning fellow-scientists, in this case, not just members of the UK House of Lords) and for the general public at large, so the discussion is really about what is more important and what is the more persuasive argument to get research scientists to make their publications available with open access. And should that argument mainly be quasi-legal, in the form of institutional mandates.

My view is this:

Is it not so that when there is no wide cultural or societal support for whatever law or mandate, more effort is generally being spent on evasion than on compliance and enforcement turns out to be like mopping up with the tap still running? If one should be taking examples from US politics, the 'war on drugs' is the one to look at.

Forcing scientists into open access via mandates and the like is only ever likely to be truly successful if it is rooted in an already changing culture. An academic culture with an expectation that research results are openly available to all. By the shame that researchers will be made to feel in the lab, at dinner parties, or in the pub, if their results are not published with open access. Of course that will still be mainly peer-pressure, but changing hearts and minds of peers is greatly helped if there were a societal substrate in which the open culture can grow. Mandates or not, OA will never happen if scientists aren't convinced from within. An appeal to them as human beings and members of society is more likely to achieve that than mandates, in my view. The latter should back up a general change of heart, not be a substitute for it.

What is 'the general public' should not be misunderstood and be construed to be only those interested in medical literature. It includes all those interested in the other 999 areas as well. Ex scientists, retired scientists, start-ups and SMEs, scientists interested in another discipline or cross-discipline topics, students, lawyers, reporters, teachers, even hobbyists. Einstein wasn't an institutionalised scientist when he worked on his most important work; he was a patent clerk.

Of course, those OA evangelists who wish to pursue mandates should be pursuing mandates. I encourage them to keep doing just that. But to narrow the efforts of OA evangelism to what is stubbornly being called "the quickest route", in spite of it being no more than a hypothesis which
certainly over the last decade and a half hasn't proved itself to be as effective as first thought, is a mistake.

By all means where there are opportunities to promote mandates let us do that, but not at the expense of making the moral and societal responsibility case for OA.

Jan Velterop

Wednesday, April 11, 2012

'Enriching' Open Access articles

I've been asked what the relevance is of my previous post to Open Access. The relevance of Utopia Documents to Open Access may not be immediately clear, but it is certainly there. Though Utopia Documents doesn't make articles open that aren't, it provides 'article-of-the-future-like' functionality for any PDFs, OA or not. It opens them up in terms of Web connectivity, as it were, and it is completely publisher-independent. So PDFs in open repositories – even informal, author-manuscript ones – and from small OA publishers can have the same type of functionality that hitherto only larger publishers could afford to provide, and then only for HTML versions of articles.

PDFs are often getting a bad press, as you probably know, yet according to statistics from many publishers, PDFs still represent by far the largest share of scientific article downloads. PDFs have great advantages, but until now, also disadvantages relative to HTML versions, particularly with regard to the latter's Web connectedness (this – open – article is worth reading: This digital divide, however, has now been bridged! The Utopia Documents PDF-viewer is built around the concept of connecting hitherto static PDFs to the Web, and it bridges the 'linkability gap' between HTML and PDF, making the latter just as easily connected to whatever the Internet has on offer as the former (as long as you are online, of course).

The new – wholly renewed – version (2.0) of the Utopia Documents scientific PDF-viewer has now been released. It is free and downloads are currently available for Mac and Windows (and a Linux version is expected soon). Version 2.0 automatically shows Altmetrics (see how the article is doing), Mendeley (see related articles available there), Sherpa/RoMEO (check its open archiving status), etcetera, and connects directly to many more scientific and laboratory information resources on the Web, straight from the PDF.

Utopia Documents allows you, if you so wish, to experience dynamically enriched scientific articles. Articles from whichever publisher or OA repository, since Utopia Documents is completely publisher-independent, providing enrichment for any modern PDF*, even 'informal' ones made by authors of their manuscript (e.g. via 'Save as PDF') and deposited in institutional repositories.

'Enrichment' means, among other things, easy Web connectivity, directly from highlighted text in the PDF, to an ever-expanding variety of data sources and scientific information and search tools. It also means the possibility to extract any tables into a spreadsheet format, and a 'toggle' that converts numerical tables into easy-to-read scatter plots. It means up-to-date Altmetrics, whenever available, that let you see how articles are doing. It means a comments function that lets you carry out relevant discussions that stay right with the paper, rather than necessarily having to go off onto a blog somewhere. It means being able to quickly flick through the images and illustrations in an article. It means that existing PDFs from whatever source are 'converted', as it were, on-the-fly, to what some publishers call 'articles of the future'. (The original PDF is in no way altered; the 'conversion' is virtual).

With Utopia Documents, publishers, repositories, libraries, even individuals with PDFs on their personal sites, can offer enriched scientific articles just by encouraging their users to read PDFs with the free Utopia Documents PDF-viewer, and so get more out of the scientific literature at hand than would otherwise be possible. Utopia Documents is indeed truly free, and not even registration is needed (except for adding comments).

Utopia Documents is usable in all scientific disciplines, but its default specialist web resources are currently optimised for the biomedical/biochemical spectrum.

Friday, April 06, 2012

Pee Dee Effing Brilliant

Are you a scientist or student? Life sciences? Do you ever read research literature in PDF format?

Did it ever occur to you that it might be useful, or at least convenient, if scientific articles in PDF format were a bit more 'connected' to the rest of the web? And would enable you, for instance, directly from the text, to:
  • look up more information about a term or phrase you're encountering (e..g a gene, a protein, etc.)
  • look up the latest related articles (e.g. in PubMed, Mendeley)
  • see, in real time, how the article is doing (Altmetrics)
  • search (NCBI databases, protein databases, Google, Wikipedia, Quertle, etc.)
  • share comments with fellow researchers
Well, all of that – and much more – is now possible. All you have to do is view your PDFs in the new Utopia Documents.

Utopia Documents has been developed by researchers from the University of Manchester, is completely free, and available for Mac, Windows and Linux. It works with all PDFs* irrespective of their origin**.

I invite you – urge you – to try it out, tell your colleagues and friends, and ask them to tell theirs. And tweet and blog about it. Registration is not necessary, except if you want to make use of the public 'comment' function. Feedback is highly appreciated. Either as a comment on this blog, or directly to the Utopia crew. And testimonials, too, obviously.

Disclosure: I work with these guys. A lot. They are brilliant and yet pragmatic. Driven by a desire to make life easier for scientists and students alike.

*With the exception of bitmap-only PDFs (scans)
**From any publisher, and even including 'informal' PDFs as can be found in repositories, or those that you have created yourself from a manuscript written in Word, for instance

Thursday, February 23, 2012

They’re changing a clause, and even some laws, yet everything stays the way it was.

The title captures the feeling of frustration with the often glacial pace of changes we regard as necessary and inevitable. So we try to influence the speed of change, and one time-honoured tool we take out of the box is the boycott. Boycotts are a way to get things off your chest; even to get some guilt relief, but although there are notable exceptions, they rarely change things fundamentally. Take the Elsevier Science boycott. I understand the feeling behind it, but if their prices were reduced to half of what they are now, or even if they went out of business, would that really be a solution to the problems with which scientific communication wrestles? As many a boycott does, this one, too, is likely to result in ‘changing a clause, changing some laws, yet everything staying the way it was’.

A boycott doesn’t alter the fact that we view publishers as publishers. That's how they view themselves, too. However, that is the underlying problem. Perhaps publishers were publishers, in the past, but they are no longer. Any dissemination of knowledge that results from their activities is not much more than a side effect. No, publishers’ role is not to ‘publish’; it is to feed the need of the scientific ego-system for public approbation, and of its officialdom for proxies for validation and scientific prowess assessment in order to make their decisions about tenure, promotion and grants easier and swifter.

Crazy line of thought, no? Well, maybe, but look at what happens in physics. The actual publishing – dissemination of information and knowledge – takes place by posting in arXiv. Yet a good portion of articles in arXiv – quite possibly the majority, does anyone have the numbers? – are subsequently submitted to journals and formally ‘published’. Why? Well, "peer review" is the stock answer. And acquiring impact factors (even though especially in physics one would expect scientists to pay heed to Einstein’s dictum that “not everything that can be counted counts and not everything that counts can be counted”).

Clearly, officialdom in physics is prepared to pay, to the tune of thousands of dollars per article, for the organization of the peer review ritual and the acquisition of impact factor ‘tags’ that come with formal publication of a ‘version of record’. So be it. If officialdom perceives these things as necessary and is willing to pay, ‘publishers’ are of course happy to provide them.

But one of the biggest problems in science communication, the free flow of information, seems to have been solved in physics, as arXiv is completely open. If arXiv-like platforms were to exist in other disciplines as well, and if a cultural expectation were to emerge that papers be posted on those platforms before submission to journals, and their posting be accepted as a priority claim, we would have achieved free flow of information in those other areas as well.

I suspect that the essence of the Federal Research Public Access Act (FRPAA) is about achieving a situation like the one that exists in physics with arXiv. Given that arXiv has done no discernable damage to publishers (at least as far as I’m aware, and, reportedly, also according to the publishing arms of the AmericanPhysical Society and the UK Institute of Physics), pushing for the Research Works Act (RWA) instead of making the case for extending an arXiv-like ‘preprint’ system to disciplines beyond physics seems an extraordinary lapse of good judgement.

On the other hand, the concern that publishers have about the academic community not being willing for long to pay the sort of money they now do for what is little more than feeding the officialdom monster, is a realistic concern. Unfortunately for them, stopping the evolution of science communication in its tracks is simply not an option. Perhaps the current boycott is one of the rare successful ones, and perhaps it will spur publishers on to reconsider their role and position. There are definitely ways for a publisher to play a beneficial role. Just a small example: I was told of a recent case where the peer reviewer expressed his frustration with the words “Imagine if before it was sent to me for review a professional editor actually read all 40 pages and discovered the heinous number of basic grammatical issues, spelling errors, and typos, and sent it back to the authors or to an English correction service before I had to spend more time on that, rather than on the actual scientific content.”

Personally, I think open arXiv-like set-ups in disciplines other than physics are the way forward. Publishers should – and truly forward-looking ones may – establish those themselves, if they don’t want to be reduced to an afterthought in the scientific communication game.

We live in hope, though not holding our breath.

Jan Velterop

Sunday, February 05, 2012

Collaborate, don't frustrate

We have seen a fair amount of activity on the web in the last few weeks with regard to protests, even boycotts, aimed at prominent publishers. Most of it seems to be about money. When money is tight, it leads to a fight.

We are in the huge pickle of a dysfunctional system. And that’s certainly not just the publishers’ fault. They just make the most of the system that is there and that is being kept alive by the scientific community at large. See my previous post. All publishers are commercial and all want to optimize their profits, although some, the not-for-profit outfits, optimize their ‘results’ or their ‘surplus’. Same thing, really. It’s just the way the capitalist system works. The system is dysfunctional because there is no competition. The scientific community allows it to exist without competition. Relying on subscriptions for their income makes journals, and their publishers, monopoloid in an environment where content is non-rivalrous. If the only options to get from A to B – and you have to get from A to B – are a train or walking, because there are no roads, then the train company has a hold on you. And on your money. The situation in science publishing is scarcely different.

So the solution is introducing competition. ‘Gold’ Open Access publishing does just that, albeit perhaps in a fairly primitive way, so far. It’s typically a game of new entrants. But in order to be truly successful, the scientific community at large has to buy in to it. Literally ‘buy’ into it. Publishers can lead the horse to the Open Access water, but they can’t make it drink.

I won’t hold my breath. And there is so much else in science publishing, besides money matters, that needs to be improved.

Just one example: fragmentation. Fragmentation is a big, frustrating problem. Particularly for the efficient and effective ingestion of information. But it need not be so bad. Although science publishers are bound by antitrust rules, there are areas of a pre-competitive nature where they are allowed to collaborate. Think standards, think CrossRef. Those forms of collaboration, for the benefit of science, could be expanded. Other standards could be introduced, to do with data linking, for instance, with data representation, computer-readability, interoperability. Things like structured abstracts. Perhaps even ontologies and agreed vocabularies for scientific concepts, analogous to biological and chemical nomenclature. User licences could be standardized, pre-competitively. Et cetera. There are some sophisticated features around, but their wide adoption all too often suffers from the not-invented-here syndrome. Publishers, too, live in an ego-system of their own.

And it is not just in pre-competitive areas where fragmentation could be remedied. There are areas that you could call ‘post-competitive’, where collaborations between publishers and standardisations of practices and technologies could be of tremendous value to the scientific community, without costing the publishers much, or even anything. Take fragmentation again. Even if the subscription system were to be kept alive, publishers could, PubMedCentral-like, deposit all the journal articles they publish in discipline-central global databases, after, say, a year. The vast majority of the realizable economic value of annual subscriptions is realized within a year (that’s why the subscriptions are annual), and although open access after a year is not ideal, it would be a massive improvement over the current situation with very little cost to the publishers. And unlike PubMedCentral, the publishers should, collectively and proactively, set up and organize these open repositories. Asking funding agencies to help support the future maintenance of such repositories should not be too difficult. It's a conservation issue the responsibility for which cannot and should not be put on the shoulders of potentially fickle private enterprise. 

Another area of post-competitive collaboration, or at least cooperation, would be the so-called ‘enrichment’ of journal articles. In html as well as in their pdf manifestations. Every publisher seems to have its own ideas, and that’s all very well, but it doesn’t make life easier for researchers. Why not pool these ideas and apply them as widely as possible? There is hardly, if any, competitive cost to that, and a great deal of potential benefit to the scientific community, the professed aim of virtually all scientific publishers.

It clearly is not beyond the publishers to work together and create something very useful. Just look at CrossRef. It is an example worthy of being the paradigm for publisher attitudes and behaviour with regard to pre-competitive and post-competitive collaborations. 

Jan Velterop

Publishers are not evil

Commercial publishers, as a class, are not evil. To think so is wrong. They have just been doing what the scientific community can't or won't do by itself. And like most businesses, they charge what they can get away with. It’s known as ‘the market’. They can’t be criticised for existing and functioning in a perfectly legal capitalist market and regulatory environment. That doesn’t mean they can’t be criticised. Individual publishers can be criticised for their actions and inactions. As an industry, among the things they can be criticised for are not evolving fast enough, given the environmental change that the web has brought about. But so can the academic community. The reliance on old and now effectively dysfunctional systems and habits from a bygone era is mutual.

Centuries ago, in Europe, non-Christians were forbidden to belong to the guilds, which made it impossible for them to be any kind of craftsman, essentially leaving them with few other options than being an unskilled labourer, trader, or money lender. So some became very wealthy and thus became the target of envy. And accused of usury and the like. Just for doing the only thing they were allowed to do and society needed someone to do. It’s more complicated than that, but it captures the essence.

The relevance of this to science publishing? Well, at a certain point, when science had grown into a sizeable, professional and global pursuit, academics didn’t, or couldn’t, organise publishing properly anymore on that global scale. University presses were, by definition, rather local, and so were scientific societies. Commercial publishers stepped into the breach, some became very wealthy, and are now the target of envy. Or at least of criticism of their wealth. And accused of greed and the like. Just for doing some of the things the academic community needs or thinks it needs, in the environment of a ‘market’ (starting in the 1950’s with e.g. internationalisation of science communication; abolishing the sort of author charges the scientific societies were levying for their journals, standardisation of article structures, language, et cetera).

Lesson: if you leave it to outsiders to provide your essential services, because you can’t, or won’t, truly assimilate and embed those outsiders, and provide the services from within your own circles, you risk losing control and you cannot blame the outsiders for taking the opportunities you give them.

Jan Velterop

PS. The first Open Access publisher was a commercial publisher. The largest publisher of Open Access articles today is a commercial publisher. Why are there not more scientist-led initiatives like PLoS?

Thursday, January 19, 2012

The Problem of 'Overwhelm'

I wrote my previous post, Peer Review, Holy Cow, as a provocative one, of course. The reason, though, why I think the matter requires attention is that I see a problem looming: the problem of what I call 'overwhelm'. This problem looms on several levels. First of all the capacity of the pool of reviewers able and willing to deal with the growing manuscript flow – several times over, given that many articles 'cascade' down the journal pecking order and need to be peer-reviewed at every stage – is reaching its limits. And secondly, arguably more importantly, the increasing difficulty for researchers to read all that they might want to read. In many areas there is simply too much that's relevant.

In 1995 I wrote an article entitled "Keeping the Minutes of Science". I still take that phrase fairly literally. Research articles in journals are an account, for the record, of the research work done and the results achieved. Thinking along those lines (I realise that not everybody agrees with me) leads to a few consequences. 1) Articles are not primarily written for the reader-researcher, but mainly an obligation for the author-researcher (though the two groups overlap to a very large degree). The adage is 'publish-or-perish', remember? It's not 'read-or-rot'. 2) As a result, it is only proper that payment for publication comes from the side of the author. That is most fortunate, because that also makes Open Access possible. 3) 'Minutes' are often filed quickly and are not meant to be read widely. Only when there is a problem, are they retrieved and perused.

You will have spotted the paradox of, on the one hand, articles not being read widely and, on the other hand, the desirability for Open Access. Well, that is where the 'minutes' analogy falls down. Because articles do play a role in conveying the scientific knowledge and insights gained by the authors.

However, that doesn't mean that the information and knowledge bundled up in an article can be conveyed by it being read in the conventional way. This is where the problem of 'overwhelm' starts to bite. There is just too much to read for any normal researcher*. So more and more, articles need to be 'read' by computers as a help to researchers to ingesting the information they contain. That is why Open Access is so important, especially the BOAI-compliant OA that is governed by licences such as CC-BY. It makes large-scale computer-aided reading possible.

The notion of using computers for reading and ingesting large amounts of research articles means that presentation becomes unimportant. Or rather, important in different ways. The computer-readability and interoperability comes first, and visual elements such as lay-outs are practically irrelevant. In a comment by Scott Epstein to my Peer Review, Holy Cow post the point is made that visual presentation is a value-add that distinguishes publishers from outfits like ArXiv. The publishers' presentations are nice-to-have, of course, but far less 'need-to-have' than they seem to assume. Computers can deal very well with information presented in an ArXiv fashion, provided the computer interoperability is taken care of.

My personal contention is also that peer-review is largely redundant beyond a basic level that can well be taken care of by an endorsement system, once computers are involved. We shouldn't be so scared of potential 'rubbish' being included. Analyses of large amounts of information will most likely expose potential 'rubbish' in a way that makes it recognisable similar to how 'outliers' are identified in scatter graphs (and ignored, if that seems the right thing to do, in the judgment of the researcher looking at the results).

Scott also mentions 'curation'. Curation is important, but is not necessarily – or even often – done by reviewers or journal editors, in my experience. Scott seems to agree, as demonstrated by his choice of words: "journal editors also (or should) curate the content." A system that allows for (semantic) crowd-curation of scientific assertions, which subsequently could be recognised by computers in any articles in which these assertions are repeated, is likely to be a better use of available resources, with the added benefit of not relying on a small number of 'experts' but instead, using a much wider pool of potential expertise.

I would like to finish with a few quotes from Richard Smith, ex Editor of the British Medical Journal and currently Board Member of the Public Library of Science:
"...peer review is a faith-based rather than evidence-based process, which is hugely ironic when it is at the heart of science."

"...we should scrap prepublication peer review and concentrate on postpublication peer review, which has always been the ‘real’ peer review in that it decides whether a study matters or not. By postpublication peer review I do not mean the few published comments made on papers, but rather the whole ‘market of ideas,’ which has many participants and processes and moves like an economic market to determine the value of a paper. Prepublication peer review simply obstructs this process."
 Jan Velterop

*See "On the Impossibility of Being Expert" by Fraser and Dunstan, BMJ Vol 341, 18-25 December 2010

Wednesday, January 11, 2012

Holy Cow, Peer Review

Looking at it as dispassionately as possible, one could conclude that peer review is the only remaining significant raison d’être of formal scientific publishing in journals. Imagine that scientists, collectively, decided that sharing results were of paramount importance (a truism), but peer-review isn't considered important any longer. If you imagine that, then the whole publishing edifice would suddenly look very different. More like ArXiv (where, by the way, I found this interesting article).

A recent report estimates that the “total revenues for the scientific, technical and medical publishing market are estimated to rise by 15.8% over the next three years – from $26bn in 2011 to just over $30bn in 2014.” If we assume an annual output of 1 million articles, this revenue – which, for practical purposes, equals the cost to science of access to research publications – equates to a cost of $3000 per article, and even if the output is 1.5 million articles, it’s still $2000 per article.

So the real question is: is peer review worth that much? It’s not that peer review might not have benefits at all. At issue is the cost to science of such benefits as there may be. And although post-publication peer review could easily be done, by those who feel the inclination to do so, when and where it seems to be worth the effort, it may not happen very often, of course, as there are few incentives. Isn't the endorsement system a viable alternative?

Of course, ArXiv-oid publishing platforms also carry a cost, but per article it’s likely to be only a small fraction of the amounts mentioned above. In the case of ArXiv it is about $7 per article, each of which is also completely Open Access. Seven dollars! That’s the size of a rounding error on the amounts of $2000 - $3000.

Peer review made sense in an era when publishing necessarily claimed expensive resources, such as paper to print on, physical distribution, shelf space in libraries, et cetera. One had to be careful and spend those resources on articles that were likely to be worth it, and even then restrict what was spent on individual articles by imposing maximum lengths and the like. Also, finding the articles worth reading was difficult and the choices and guidance journal editors and editorial boards made were welcome.

How all this has changed with the advent of the Web. There is hardly any need for restrictions on the number and length of articles anymore, and searching – not to mention finding – articles that are relevant to the specific project a researcher is working on has become dramatically easier. As a result, the filtering and selecting functions of journals have become rather redundant.

“All very well, but what about the quality assurance that peer review provides?” Well, it is debatable that peer review does that reliably, though I’m willing to accept that it might. However, given its costs, can we really not deal with a lack of this quality assurance in the light of the benefits of universal and inexpensive Open Access that ArXiv-oid platforms could bring? Are we not dealing with it right now? We all know that almost all articles eventually meet their accepting journal editor, and it’s difficult to imagine that every article we find with a literature web search is of sufficient ‘quality’ (whatever that means anyway) for our purposes. And yes, we will encounter ‘rubbish’ articles. Don’t we now, with nigh universal peer review? But we deal with outliers in data all the time, and it is my conviction that we can deal with outliers in the literature just as well. Anyway, ArXiv-oid platforms with an endorsement system will to a large degree prevent excesses.

Scientists are people, and as such not too well equipped to make completely rational choices. Besides, the ‘ego-system’ of qualifying for grants, tenure, et cetera, has it’s own rationality (akin to how to deal with the prisoner's dilemma). But the prospect of being able to save tens of billions of dollars each year, even after allowing generous sums for running ArXiv-oids with endorsement systems instead of peer review, which savings could be used for research (the amounts saveable are not far off the annual NIH research budget!), must be food for some serious thought. Let's see if we can think this through. It's not fair to expect scientists themselves to break the cycle. But funding bodies?

I realise that what I'm proposing here is the 'furthest point', but that's where we have to hook up the tightrope, if we want to be able to traverse the chasm separating today from what might be, no?

Jan Velterop