Tuesday, February 05, 2013

Transitions, transitions


Although I am generally very skeptical of any form of exceptionalism, political, cultural, academic, or otherwise, I do think that scholarly publishing is quite different from professional and general non-fiction publishing. The difference is the relationship between authors and readers. That relationship is far more of a two-way affair for scholarly literature than for any other form of publishing.

Broad and open dissemination of research results, knowledge, and insights has always been the hallmark of science. When the Elseviers/Elzevirs (no relation to the current company of the same name, which was started by Mr. Robbers [his last name; I can’t help it] a century and a half after the Elsevier family stopped their business), among the first true ‘publishers’, started to publish scholarship, for example the writings of Erasmus, they used the technology of the day to spread knowledge as widely as was then possible.

In those days, publishing meant ‘to make public’. And ‘openness’ was primarily to do with escaping censorship. (Some members of the Elsevier family went as far as to establish a pseudonymous imprint, Pierre Marteau, in order to secure freedom from censorship). But openness in a wider sense — freedom from censorship as well as broad availability — has, together with peer-review, been a constituent part of what is understood by the notions of scholarship and science since the Enlightenment. Indeed, science can be seen as a process of continuous and open review, criticism, and revision, by people who understand the subject matter: ‘peers’.

The practicalities of dissemination in print dictated that funds must be generated to defray the cost of publishing. And pre-publication peer review emerged as a way to limit waste of precious paper and its distribution cost by weeding out what wasn’t up to standards of scientific rigour and therefore not worth the expense needed to publish. The physical nature of books and journals, and of their transportation by stagecoach, train, ship, lorry, and the like, made it completely understandable and acceptable that scientific publications had to be paid for. Usually by means of subscriptions. However, scientific information never really was a physical good. It only looked like that, because of the necessary physicality of the information carriers. The essence of science publishing was the service of making public. You paid for the service, though it felt like paying for something tangible.

The new technology of the internet, specifically the development of web browsers (remember Mosaic?), changed the publishing environment fundamentally. The need for carriers that had to be physically transported all but disappeared from the equation. The irresistible possibility of unrestrained openness emerged. But something else happened as well. With the disappearance of physical carriers of information, software, etc. the perception of value changed. The psychology of paying for physical carriers, such as books, journals, CDs, DVDs is very different from the psychology of paying for intangibles, such as binary strings downloaded from the web, with no other carrier than wire, or optical cable, or even radio waves. In order to perceive value, the human expectation — need, even — for physical, tangible goods in exchange for payment is very strong, though not necessarily rational, especially where we have been used to receiving physical goods in exchange for money for a very long time. That is not to say that we wouldn’t be prepared to value and to pay for intangibles, like services. We do that all the time. But it has to be clear to us what exactly the value of a service is — something we often find more difficult, reportedly, than for physical goods.

This is a conundrum for science publishers. Carrying on with what they are used to, but then presented as a service and not ‘supported’ by physical goods any longer, can look very ‘thin’. Yet it is clear that the assistance publishers provide to the process of science communication is a service par excellence. Mainly to authors ('publish-or-perish') and less so to readers (‘read-or-rot’ isn’t a strong adage). Hence the author-side payment pioneered by open access publishers (Article Processing Charges, or APCs).

Although it would be desirable to make the transit to open access electronic publishing swiftly, the reality of inertia in the ‘system’ dictates that there be a transition period and method. This transition is sought in many different ways: new, born-OA journals that gradually attract more authors; hybrid journals that accept OA articles against author-side payment; ‘green’ mandates, that require authors to self-archive a copy of their published articles; unmediated, ‘informal’ publishing such as in arXiv; even publishing on blogs.

What may be an underestimated transition — and no-doubt a controversial one — is a model (a kind of ‘freemium’ model?) that’s gradually changing from restrictive to more and more open, extending the ‘free’, ‘open’ element and reducing the features that have to be paid for by the user. I even don’t think it is recognized as a potential transition model at the moment at all, but that may be missing opportunities. Let’s take a look at an example. If you don’t have a subscription you can’t see the full-text. However, where only a short time ago you saw only the title and the abstract, you now see those, plus keywords and the abbreviations used in the article, its outline in some detail, and all the figures with their captions (hint to authors: put as much of the essence of your paper in the captions). All useful information. It is not a great stretch to imagine that the references are added to what non-subscribers can see (indeed, some publishers already do that), and even the important single scientific assertions in an article, possibly in the form of ‘nanopublications’, on the way to eventual complete openness.

Of course, it is not the same as full, BOAI-compliant open access, but in areas where ‘ocular’ access is perhaps less important than the ability to use and recombine factual data found in the literature, it may provide important steps during what may otherwise be quite a protracted transition from toll-access to open access, from a model based on physical product analogies to one based on the provision of services that science needs.

Jan Velterop

Saturday, January 19, 2013

On knowledge sharing — #upgoerfive

This post was written with the  #upgoerfive text editor, using only the most common 1000 words in English.

At one time there was a man who some people thought was god. Other people thought he was sent to the world by god. This man had two water animals you could eat and five pieces of other food and he wanted the many people who were with him to have enough to eat. But two water animals and five other pieces of food were not enough for the people if they all had to eat. So the man who some people thought was god and others that he was sent by god, made the food last until all the people had had enough to eat. This was a wonder. The people saw this and did not know if they could believe what they saw. But when it seemed true that he had a power that no other men or women had, they believed the man was really god or sent by god, because he could do what other men could never do at all. This story became very well known. And many people believe it is about food.

But I think it is not about food. I think it is about food for thought. About what we know, not about what we eat. Because if we give food that we have to others, we do not have it anymore for us to eat. But if we tell others what we know, they know it, too, and we still know it as well. So we can not share our food and still have it all, but we can share what we know and still have it all. We should share what we know if it is good for us all. Especially people who work on knowing more and more every day, as their job. They are paid by us all to work in their jobs on knowing more and more, and they really should share what they come to know with us, and in such a way that we can understand it, too.

Jan Velterop

Tuesday, January 15, 2013

Imagine if funding bodies did this


There is apparently a widespread fear that if a ‘gold’ (author-side paid) open access model for publishing scientific research is supported by funding bodies, the so-called article processing fees, paid for by funders on behalf of authors, might see unbridled increases. This fear is not unwarranted if not addressed properly. If funders agree to pay whatever publishers charge, they undermine the potential for competition among publishers and provide them with an incentive to maximize their income, while at the same time removing any price sensitivity on the part of the publishing researcher. However, it is not very difficult to address this problem.

In order to avoid untrammeled article processing fee increases, funding bodies should foster competition amongst publishers, and create price sensitivity to article processing charges in researchers publishing their results.

Imagine if they did the following:
  • Require open access publishing of research results;
  • Include in any grants a fixed amount for publishing results in open access journals;
  • Allow researchers to spend either more or less than that amount on article processing charges, any surplus to be used for the research itself, or any shortfall to be paid from the research budget;
  • Require any excess paid over and above the fixed amount to be justified by the researcher to the funder;
  • Provide a fixed amount for more than one publication if the research project warrants that, but so that researchers have an incentive to limit the number of published articles instead of salami-slicing the results into as many articles as possible, again by giving them discretion over how the fixed amounts are spent. 
Jan Velterop

Sunday, September 09, 2012

'Pixels of information'

My friend Barend Mons wrote to me and I think it is worth sharing his letter on this blog. I checked with him, and he agrees that it can be shared on this blog.
Dear Jan,

I'm writing to you inspired by your remark that "OA is not a goal in itself but one means to an end: more effective knowledge discovery".

What we need for eScience is Open Information to support the Knowledge Discovery process. As eScience can be pictured as 'science that can not be done without a computer', computer reasonable information is the most important element to be 'open'. 
You're right, Barend. That's why I think CC-BY is a necessary element of open access. 
As we discussed many times before, computer reasoning and 'in silico' knowledge discovery leads essentially to 'hypotheses' not to final discoveries. There are two very important next steps. First, what I would call 'in cerebro' validation, mainly browsing the suggestions provided by computer algorithms mining the literature and 'validating' individual assertions (call them triples if you wish) in their original context. 'Who asserted it, where, based on what experimental evidence, assay...?' etc. In other words, why should I believe (in the context of my knowledge discovery process) this individual element of my 'hypothesis-graph' to be 'true' or 'valid'? Obviously in the end, the entire hypothesis put forward by a computer algorithm and 'pre'-validated by human reasoning based on 'what we collectively already know' needs to be experimentally proven (call it 'in origine' validation).

What I would like to discuss in a bit more depth is the 'in cerebro' part. For practical purposes I here define 'everything we collectively know', or at least what we have 'shared' as the 'explicitome' (I hope Jon Eisen doesn't include that in his 'bad -omes'), essentially a huge dynamic graph of 'nanopublications' or actually rather 'cardinal assertions' where identical, repetitive nanopublications have already been aggregated and assigned an 'evidence factor'.  Whenever a given assertion (connecting triple) is not a 'completely established fact' (the sort of assertion you repeat in a new narrative without the need to add a reference/citation) we will go to narrative text 'forever' to 'check the validity' in my opinion.

Major computer power is now exploited for various intelligent ways to infer the 'implicitome' of what we implicitly know (sorry, Jon, should you ever see this!), but triples captured in RDF are certainly no replacement for narrative in terms of reading a good reasoning, why conclusions are warranted, extensive description of materials and methods etc. So the 'validation' of triples outside their context will be a very important process in eScience for many decades to come. In fact your earlier metaphor of the 'minutes of science' fits perfectly in this model. 'Why would I believe this particular assertion'? ... Well, look in the minutes by whom, where and based on what evidence it was made'.

Now here is a very relevant part of the OA discussion: The time when some people thought that OA was a sort of charity model for scientific publishing is definitely over, with profitable OA publishers around us. The only real difference is: do we (the authors) pay up front, or do we refuse that (for whatever good reason, see below) and now the reader has to pay 'after the fact'. So let's first agree that there is no 'moral superiority', whatever that is, in OA over the traditional subscription model.  
Not sure if I agree, Barend. OK, let's leave morals out of it, but first of all, articles in subscription journals can also be made open access via the so-called 'green' route of depositing the accepted manuscript in an open repository; and secondly, OA at source, the so-called 'gold' route, is definitely practically and transparently the superior way to share scientific information with anyone who needs or wants it.
We have also seen the downsides of OA, for instance for researchers in developing countries who may still have great difficulty to find the substantial fees to publish in the leading Open Access journals.

I believe however, that we have a great paradigm shift right in front of us. Computer reasoning and ultralight 'RDF graphs' distributing the results to (inter alia) mobile devices will allow global open distribution of such 'pixels of information' at affordable costs, even in developing countries. Obviously, a practice that will be associated is to 'go and check' the validity of individual assertions in these graphs. That is exactly where the 'classical' narrative article will continue to have its great value. It is clear that the costs of reviewing, formatting, cross-linking and sustainably providing the 'minutes of science' is costly and that the community will have to pay for these costs via various routes. I feel that it is perfectly defensible that those articles for which the publishing costs have not been paid for by the authors, and that are still being provided by classical publishing houses, should continue to 'have a price'. As long as all nanopublications (let's say the assertions representing the 'dry facts' contained in the narrative legacy as well as data in databases) are exposed in Open (RDF) Spaces for people and computers to reason with, the knowledge discovery process will be enormously accelerated. Some people may still resent that they may have to pay (at least for some time to come) for narrative that was published following the 'don't pay now — subscribe later' adage. We obviously believe that the major players from the 'subscription age' have a responsibility, but also a very strong incentive to develop new methods and business models that allow a smooth transition to eScience-supportive publication without becoming extinct before they can adapt.

Best,
Barend
Your views are certainly worth a serious and in-depth discussion, Barend. I invite readers of this blog to join in and engage in that discussion.

Jan Velterop

Tuesday, August 07, 2012

Open access – gold versus green

Recently, Andrew Adams contributed to the 'gold' vs. 'green' open access discussion and he wrote this on the LIBLICENSE list (edited for typos):
There are on the order of 10,000 research instutitions and more than ten times as many journals. Persudaing 10,000 institutions to adopt OA deposit mandates seems to me a quicker and more certain route to obtain OA than persuading 100,000 journals to go Gold (and finding more money to bribe them into it, it would appear – money which is going to continue to be demanded by them in perpetuity, not accepted as a transitional fee – there's nothing so permanent as a temporary measure). (Full message here.)
The LIBLICENSE list moderator would not post my response, so I'm giving it here:

10,000 research institutes means, in terms of Harnadian 'green' mandates, a need for 10,000 repositories; 100,000 journals (if there were so many; I've only ever heard numbers in the order of 20-25,000 [recently confirmed as in the order of 28K]) does not mean 100,000 publishers. Besides, there is no existential reason for institutions to have a repository and 'green' mandate. The fact that others have repositories and it doesn't have one itself does not harm a research institution in the same way that not being 'gold' (or at least having a 'gold' option) does existentially harm journals in an environment of more and more 'gold' journals.

As for costs, there are two things that seem to escape the attention of 'green' advocates (by which I mean those who see no place for 'gold' open access at this stage on the basis that 'green' would be a faster route to OA and would be cheaper):
  1. 'Green' fully depends on the prolongation of the subscription model. Without subscription revenues no journals, hence no peer-reviewed articles, hence nothing to self-archive but manuscripts, arXiv-style. (That would be fine by me, actually, with post-publication peer review mechanisms overlaying arXiv-oids). The cost of maintaining subscriptions is completely ignored by exclusively 'green' advocates, who always talk about 'green' costing next to nothing. They are talking about the *marginal* cost of 'green', and compare it to the *integral* cost of 'gold'.
  2. Exclusively 'green' advocates do not seem to understand that for 'gold' journals, publishers are not in any position to "demand money". They can only offer their services in exchange for a fee if those who would pay the fee are willing to pay it. That's known as 'competition', or as a 'functioning market'. By its very nature, it drives down prices. This in contrast to the monopoloid subscription market, a dysfunctional market, where the price drivers face upwards. Sure, some APC's increased since the early beginnings of 'gold' OA publishing, when 'gold' publishers found out they couldn't do it for amounts below their costs. But generally, the average APCs per 'gold' article are lower — much lower — than the average publisher revenues per subscription article. And this average per-article subscription price will still have to be coughed up in order to keep 'green' afloat.
Price-reducing mechanisms would even work faster if and when the denizens of the ivory tower were to reduce their culturalism and anglo-linguism that currently prevails, in which case we could rapidly see science publishing emerge in places like China, India, and other countries keen on establishing their place in a global market, competing on price. APCs could tumble. Some call this 'predatory gold OA publishing'. Few seem to realise that the 'prey' is the subscription model.

The recently published Finch Report expresses a preference for immediate, 'libre', open access, and sees 'gold' as more likely to be able to deliver that than 'green'. Meanwhile, 'green' is a way to deliver OA (albeit delayed and not libre) in cases where 'gold' is not feasible yet. That is an entirely sensible viewpoint, completely compatible with the letter – and I think also the spirit – of the Budapest Open Access Initiative (BOAI). Incidentally, referring to the BOAI is characterised as "fetishism" (sic) by Andrew Adams.

Comparing 'green' and 'gold' is almost, to borrow a phrase from Stevan Harnad, "comparing apples and orang-utans". The Finch report is not mistaken to see 'green' as (in the words of Michael Jubb) an "impoverished type of open access, with embargo periods, access only to an authors’ manuscript, without links and semantic enrichment; and severe limitations on the rights of use." After all, in the 'green' ID/OA scheme (ID = Immediate Deposit and OA meaning 'Optional Access" here) favoured by Harnad c.s., deposited articles may be made open if and when the publisher permits.

Besides, 'gold' implies also 'green' ('gold' articles can be deposited, without embargo or limits on use, anywhere, and by anyone), where 'green' does not imply 'gold'. A Venn diagram might look like this (below).

The Finch group has come to its conclusions because they have clearly learnt the lessons of the last decade. There is nothing — repeat: *nothing* — that prevents academics to eschew the services of "rent-seeking" (as Adams put it) publishers. They could easily self-organise (though I realise that both the words 'could' and 'easily' are probably misplaced). To expect publishers (for-profit and not-for-profit ones alike) to refuse providing services that academics are seeking from them is silly.

For the avoidance of doubt, I am not against 'green' OA (in spite of what some 'green'-only advocates assert), especially not where there is no other option. The choice is not so much for or against 'green' or 'gold', but emphatically for full, unimpeded open access, however it is delivered, as long as it is "permitting any users to read, download, copy, distribute, print, search, or link to the full texts of these articles, crawl them for indexing, pass them as data to software, or use them for any other lawful purpose, without financial, legal, or technical barriers other than those inseparable from gaining access to the internet itself." You recognise this last phrase? Indeed, the precise wording of the BOAI.

Jan Velterop

Wednesday, August 01, 2012

The triumph of cloud cuckoo land over reality?

It should be abundantly clear that Open Access policies by Finch, RCUK, Wellcome Trust and many others are very important for the development of universal OA, in that they not only indicate practical ways of achieving it, but also signal to the scholarly community and the wider society interested in scientific knowledge and its advance that OA should be the norm.

The 'sin' that RCUK, Finch and the Wellcome Trust committed is that they didn't formulate their policies according to strict Harnadian orthodoxy. It's not that they forbid Harnadian OA (a.k.a. 'green'), oh no. It is that they see the 'gold' route to OA as worthy of support as well. Harnad, as ultimate arbiter of Harnadian OA (though he has acolytes), would like to see funder and institutional OA policies focus entirely and only on Harnadian OA, and would want them, to all intents and purposed, forbid the 'gold' route. In Harnad's view, the 'gold' route comes into play (as 'downsized gold', whatever that means) only once all scholarly journal literature is OA according to Harnadian rules. These rules are quite specific:
  • articles must be published in peer-reviewed subscription journals; 
  • institutions must mandate their subsequent deposit in an institutional repository (not, for instance in a global subject repository); 
  • there must be no insistence on OA immediately upon publication (his big idea is ID/OA — Institutional Deposit / Optional [sic] Access); 
  • here must be no insistence on CC-BY or equivalent (which would make re-use and text-mining possible — OA in his view should just be ocular access, not machine-access).
It must be difficult to comply with these rules, and seeing his recent applause, subsequently followed by withdrawal of support, for the RCUK policy, even Harnad himself finds it difficult to assess whether his rules are 'properly' adhered to. It also seems as if his main focus is not OA but mandated deposit in institutional repositories. Probably hoping that that will eventually lead to OA. He would like to see 'gold' OA — OA at source — considered only if and when it is "downsized Gold OA, once Green OA has prevailed globally, making subscriptions unsustainable and forcing journals to downsize." It is the equivalent of opening the parachute only a split second before hitting the ground. It would be the triumph of a dogmatically serial process over a pragmatically parallel one. The triumph of cloud cuckoo land over reality.

Open Access is more than worth having. Different, complementary, ways help achieve it. There are many roads leading to Rome.

Jan Velterop
OA advocate

Monday, June 11, 2012

Small publications, large implications


When I recently enjoyed lunch with Steve Pettifer of Manchester University (the ‘father’ of Utopia Documents), the conversation turned to nanopublications. Ah, you want to know what nanopublications are. Nanopublications are machine-readable, single, attributable, scientific assertions.

Steve posed the question “why would any scientist believe a nanopublication, particularly if out of context?” Indeed, why would they? Why should they, well versed as scientists are in the art of critical thinking. They won’t, at least not without seeing the appropriate context.

Herein lies a great opportunity.

Let me explain. Nanopublications, or rather, their core in the form of machine-readable object-predicate-subject triples, can be incorporated in (vast) collections of such triples and used for reasoning, to discover new knowledge, or to make explicit hitherto tacit or hidden knowledge. Triples can therefore be very valuable to science. (The Open PHACTS project is in the process of establishing the value of this approach for drug discovery.) Many, perhaps most, scientific articles contain such single assertions, which could be presented as nanopublications.

In a recent Nature Genetics commentary called ‘The Value of Data’, Barend Mons et al. addressed this issue with the metaphor of the chicken and the egg. Now that eggs (individual assertions) are being distributed (‘traded’), their value (they all look roughly the same) can only be truly assessed by knowing the parents. Scientists will always want to personally judge whether a crucial connecting assertion in a given hypothesis is one they can accept as valid. The ability to locate where the assertion came from, in which article, in which journal, by which author, and when it was published – in short the ‘provenance’ of individual scientific assertions functioning in computer reasoning – is crucial for that. As is the ability to access the article in question.

Scientific publishers should, in their quest to add value to research publications, expose and clearly present the nanopublications contained in the articles they publish, particularly those that are believed (e.g. by the author, or the reviewers) to be unique and new. What’s more, they should make them openly and freely available, like they do with abstracts, even publishers that are not yet convinced that they should change their business models and make all their content open access. And they should not just make nanopublications open and accessible to human readers, but also to machines, because only machines are able to effectively process large numbers of nanopublications, treating each one as a ‘pixel’ of the larger picture that a researcher is building up.

So what’s the opportunity?

Well, openly accessible nanopublications are very useful for scientific discovery, they are attributable (to author, article, and journal) and scientist don’t just believe them when they see them, particularly if the assertion is new to them or when they find it in a process of computer-assisted (in silico) reasoning. Researchers will be eager to investigate their source, i.e. check out the article from which the nanopublication comes. They may cite the nanopublication, and in doing so, cite the article. An obvious win-win situation for scientists (in their roles of users and authors) and publishers alike.

What are we waiting for?

Jan Velterop