The Parachute: 2009

Tuesday, November 10, 2009

Fruit and frugivores

Publishers often don't seem to clearly see their real role in the world of scientific knowledge exchange, but libraries don't seem to, either. Both roles have evolved considerably in the last decade and a half.

For publishers it means that owning and selling content is now becoming a relic of the past, and only still exists because of the considerable inertia in the system. Their role is now clearly what it always was in disguise, namely a service to authors. The need for print and distribution made it perhaps inevitable to get a distorted view of the situation, and selling the carrier – paper – was easily confused with selling the content, but a publisher's real 'market' was, and is, authors. That's also – as appropriate for a 'market' – where the competition takes place. Only authors have a meaningful choice between publishers; libraries (readers) do not.

As for libraries, their role in the past has always included dealing with publishers. After all, they are in charge of the incoming collection of literature, and making sure that their constituency of readers has access to what it needs. Now that the role of publishers has become more clear, namely the role of service providers to authors (which is clearest with open access publishers), many librarians still feel the need to be involved, this time on behalf of their constituency of authors. But authors never were the library's true constituency, at least not when it comes to the authors' dealings with publishers.

So why should libraries take it upon themselves to play the intermediary between authors and publishers. Have authors asked for that? Have university administrators asked for that? Have funders asked them to take up that role? Or is it a consequence of the publishers asking libraries to support OA publishing? I suspect it is the latter. But libraries could decline. They are not in charge of research funding, so why should they be involved in paying for the necessary publication of research results, the cost of which is an integral part of doing research? It certainly isn't provided for in their collection budget.

There is of course nothing against libraries being in charge of the 'outgoing' collection – the papers written by researchers at the library's institution – as well as the incoming one. But it is my impression that a clarity of understanding of such a role is missing, particularly of the budgetary implications.

Take this blog entry in "The Scholarly Kitchen". It is fairly typical for such comments to take as read the idea that libraries pay for OA article processing charges. But logical it isn't. And any comparison between the cost of subscriptions and the cost of OA publishing is bound to be misleading, as it is, in the expression of Stevan Harnad, comparing apples with orangutans. (The analogy may be more appropriate than might first appear: fruit/frugivores – articles/researchers ['textivores'?].)

Library 'membership' of an OA publisher is a supporter-scheme for OA. A stimulus in OA's early stages. It can never be – and shouldn't be – a subscription substitute. Susan Klimley, the Serials and Electronic Resources Librarian in the Health Sciences Library at Columbia University, who is quoted in the blog post, got it right (I paraphrase): The library creating author funds to pay article processing fees helps to reinforce a fundamental disconnect between who creates, and who pays for, article publication. Setting up a pot of money is not going to solve that problem. Authors need to be more sensitized to the cost of producing information, and author publishing funds work against that aim.

Jan Velterop

Monday, November 09, 2009

Preparing for the previous war

Whilst ‘green’ OA and ‘gold’ OA may be equivalent when it comes to open access, to be frank, there is a difference in usefulness. The matter is one of practice rather than of principle. The issue is PDFs.

Gold OA almost always includes an HTML version as well as a PDF. And if anything is missing, it is the PDF. Green OA, on the other hand, more often than not offers just the PDF, and not a machine-readable HTML or XML version. Both are, of course, fine for the traditional form of knowledge intake, via the eye, by reading the articles. But they are not both suitable for computer-assisted intake, via machine-reading and text-mining. That is not easily possible, in practice, with PDFs, and not at all with bitmap PDFs (at least not without cumbersome procedures involving prints and optical character recognition, or OCR scanning).

Not having machine-readable access may not be a problem for everyone, but in disciplines where there is a growing over-abundance of new papers, traditional human reading is not an option if one wants to stay truly up-to-date. In areas such as the ‘-omics’ (genomics, proteomics, metabolomics), but not only in those, the ability to perform text-mining is of crucial importance.

There is no reason in principle that a machine-readable version of one’s paper is not deposited in one’s repository, and advocates of ‘only green OA’, ‘primarily green OA’, or ‘green OA first’, ought to encourage HTML deposits. They are readable by machine and human eye alike, and therefore vastly superior for the purpose of knowledge sharing.

OA to PDFs may be better than non-OA, and that of course remains the case. But relying on OA PDFs for knowledge sharing and dissemination is not dissimilar to ‘preparing for the previous war’.

Jan Velterop

Saturday, November 07, 2009

On OA, language barriers, and the meaning of 'ambush'

I missed the original “Open and Shut?” blog post, but reading Walt Crawford’s “Cites & Insights” for November 2009, I saw that Richard Poynder “seems to suggest that [I] have been an effective agent for ‘ambushing the OA movement’”. Ambushing? Not being a native speaker of English, I thought I’d better look up if ‘to ambush’ could have another meaning than “staging a surprise attack”, and to read Poynder’s original article. Actually, Poynder, in his post “Open Access: Whom would you back” of 10 March 2009, doesn’t just ‘suggest’ that I have been an effective agent for ambushing the OA movement, but he asserts: “Velterop began to mastermind stage two of the publisher's strategy for ambushing the OA movement: accelerating take-up of Hybrid OA in order to marginalise Green OA.” Perilously close to libel, Mister Poynder!

Poynder is entitled to his views, of course, but it would be nice if he could expound them without misrepresenting and insulting people (yes, I am offended, and an apology on his blog is appreciated!). He doesn’t do OA any favours, either, with a blog post that is teeming with inaccuraces, conjectures, and mistaken inferences, and given that, it doesn’t surprise me that he even misses the fact that BioMed Central, now Springer, actively promotes repositories (green!) and offers services to universities to install them. Fortunately for OA, there are many more people like me, who truly work on advocating OA in its wider sense, and who are not drawn into what in my view is a narrow-minded pseudo-orthodoxy that only sees green.

The idea that whatever I did or advocated with regard to accelerating gold OA was in any way an ‘attack’ on green OA (“…in order to marginalise…” even) is preposterous, and that it could be a surprise is nothing less than absurd. The surprise is more likely that anybody could see advocating OA in general and working on gold OA as an attack on green OA.

How advocating gold OA as one of the routes to OA could be an attack on green OA is a complete mystery. The original Budapest Initiative recommended two, complementary, strategies, that later came to be called ‘green’ and ‘gold’ open access by Stevan Harnad. Both were hailed as welcome strategies to achieve Open Access, and Harnad, as well as I, and all the other participants of the meeting that effectively kick-started the ‘movement’, signed the Initiative. Poynder was not on the OA scene yet. Although the Budapest Initiative spoke of ‘OA journals’, a little while later the Bethesda Statement clarified that Open Access is a property of individual works, not necessarily journals or publishers. Poynder has a problem with so-called hybrid journals, but according to ‘Bethesda’, the OA articles in hybrid journals are true open access. No surprises there, no attack on the OA movement, no ambush. Just genuine, pure OA. Of articles in otherwise traditional journals.

Hybrid journals were an attempt at transiting existing journals to OA. In some cases it worked (Nuclear Acid Research), and in other cases not (yet). I would be the last to deny that the hybrid model is problematic. Because it gives a choice to authors, it cannot impose either the traditional model or the OA publishing model. And because of the widespread, but naïve, perception that a journal’s subscription price is, or should be, proportional to the number of papers published, it is not understood and sometimes severely criticised. Publishers, therefore, have good reason to dislike the hybrid model as well. They will, I suspect, move in the direction of full OA (what might be called the “pay-or-go-away” or “POGA” model), or revert back to subscriptions/licences (the “licence-sphere”, or “L-sphere” model).

Poynder brings up the affordability issue. And he complains about the level of article charges. That’s to the point, and fair comment. But it seemingly hasn’t dawned upon him that gold OA is open to competition, and these charges are bound to converge on a level that reflects this competition. Green OA, on the other hand, relies on the L-sphere, with its monopoloid characteristics, remaining intact for the foreseeable future. Dismissing the role of gold OA publishers in moving OA forward, because they see it as a business opportunity, is deeply misguided. It is like dismissing companies for making equipment to generate clean energy and reduce CO2 emissions on the grounds that they may benefit from doing that. Or venting the opinion that what these companies do is bad, because there might be even better techniques. Quite absurd.

Poynder seems to have it in for publishers, any publishers, be they OA publishers or not, and sees any differences between OA publishers and traditional subscription publishers as “a figment of OA advocates' imagination.” As one of the early OA advocates, I couldn’t disagree more. Besides, if OA is about publisher bashing and money only, then it’s bound to fail. Sure, a more economical system may be a desirable side effect of OA, but can’t be the core aim of it all. The mistake Poynder (and, I’m afraid, his guru Harnad) make(s) is to see so-called ‘green OA’ – seemingly not even OA as such – as an end in itself. It isn’t, and it shouldn’t be. The ultimate goal is to universally share (scientific and scholarly) knowledge, in what I call the noösphere (a term taken from Pierre Teilhard de Chardin), a ‘knowledge-sphere’ around the world that everyone can ‘inhale’. And OA is just one of the methods to share knowledge. Any OA. Including gold OA, and even ‘delayed OA’ (after all, ignoring the value of opening up older knowledge is devaluing older knowledge). Is delayed OA ideal? No, of course not. But OA itself is not ideal and is no more – or less – than one of the first steps to be taken to come to true knowledge sharing, to a true noösphere. OA is mostly about sharing documents, often enough just in PDF format. Access to documents is great, but it still leaves formidable barriers to knowledge sharing intact. One of the examples I have in mind is the language barrier.

English may be the lingua franca of scholarly exchange; the notion that there is no unique scientific knowledge available in other languages is absurd. And even the notion that if it is available in English its true availability is universal is a wholly unrealistic one.

But it’s not just the barrier put up by different languages. Even between native speakers of English a lot of knowledge that is published and openly available in English is nonetheless lost. Lost in ambiguity. Researchers are famously (infamously?) sloppy with their language. And publishers, although they sometimes ameliorate the worst excesses, do not, on the whole, seem to set a lot of store by disambiguation of scientific literature. OA publishers are no better than traditional ones in that regard.

Open Access is a most significant element in getting to a global noösphere, and although it’s clearly not the only element, all efforts to promote OA, in any form, help. Unlike dismissing gold OA, which doesn’t.

Jan Velterop

Sunday, March 15, 2009

Open wider

There seems to be a bit of a discussion between Joe Esposito and Stevan Harnad on Liblicense, loosely about the significance of OA and of peer-review.

Quoting Joe Esposito (reacting to an article by Richard Poynder on 'Open and Shut'):

"The real thrust of the world of open access is neither green nor gold, but what I have termed "unwashed," that is, the vast and growing – and growing and growing and growing – world of material that is not peer-reviewed. Take Poynder's own article, for example, or posts to this list. Look at the material that is accumulating in IRs, arXiv, and elsewhere; think about all the blogs and Twitter feeds.

The evidence is mounting that many advocates of open access have never actually used the Internet. The myth persists that OA publishing is just like traditional publishing except that it is free to the user. While there are some segments of OA that are just that, it is a shrinking part of the open access material that is being generated. And it is minuscule compared to what we will see in the years to come.

This doesn't mean peer review is going away. It simply means that peer review is evolving to conform to the characteristics of the online medium, just as the novel grew with the printed page and tennis is a game played around a net. Increasingly peer review will be post-publication, not pre-publication. I suspect all this talk about Gold and Green is a waste of everybody's time."

Quoting Stevan Harnad (reacting to Joe Esposito):

"Or could it be that some of the opponents of Open Access to the 2.5 million articles published annually in the planet's 25,000 peer-reviewed scholarly and scientific journals have never actually done any scholarly or scientific research, hence never published in a refereed journal, and never had any need to consult one for their scholarly and scientific research?"

Open Access to peer-reviewed material is important, but to reduce the scholarly knowledge exchange to just peer-reviewed articles is to ignore the massive amounts of data and knowledge that are shared in other ways. I see the importance of unrefereed scientific published material increase. Dramatically. As long as it is open.

But what does that do to the trustworthiness of that information? Isn't the whole point of peer-review to make sure that what is published conforms to accepted standards of scientific inquiry so that the reader can have a certain amount of trust in the results that are presented? Well, of course. But what is presented in journal articles are mostly results derived from data. Interpretations and annotations of data. Seldom the data themselves. Journal publishing evolved in the past, when the physical reality of sharing actual raw data was nigh impossible, so almost every scientist had to rely on the interpretations as published in journals. But now that we can share the raw data (view Tim Berners-Lee's call for sharing raw data), and tools to manipulate those raw data become widely available, relying on journal articles may well take second seat. And now that instant comment on data as well as on journal articles has become possible, with blogs, twitter, and what not, review after publication is a reality of today (albeit not used all that widely yet).

Furthermore, technology is emerging that is able to quickly identify if data and articles are in essence in line with the scientifically accepted knowledge of today, and is merely confirmatory in nature, which makes the outliers stand out. Those can either be scientific rubbish, or potential breakthroughs, and a peer-review process is well-spent on them, ante (the technology is a great tool for editors!) or post publication.

Peer-review may or may not survive in the way it is now. But it seems clear to me that openness of published articles as well as raw data is, after initial hesitant steps, bound to show explosive growth.

Jan Velterop

Monday, March 09, 2009

Harold Varmus...

...on the Daily Show with Jon Stewart: video

Sunday, March 08, 2009

Getting the right arguments right

Congressman John Conyers has publicly responded, on the Huffington Post, to the call from Larry Lessig (initiator of Creative Commons) and Mike Eisen (initiator of PLoS) to speak up.

Peter Suber, in turn, responded in detail to John Conyers. Admirable detail, and a discussion with well-articulated arguments, like this, is the way forward, in my view. In an earlier post, 'Aiming at the right target', Peter is saying "Let's not make it easy for the bill's supporters to say that the critics simply don't understand". He is right, and in that vein I feel that I should humbly offer some advice.

In one of his arguments he points out the problem with the old NIH policy, which, he says, "...had the effect of steering publicly-funded research into journals accessible only to subscribers, and whose subscription prices have been rising faster than inflation for three decades". It is the second half of this sentence that is misleading. Technically it is true, of course, especially if he refers to average prices. But it is misleading by omission.

There are at least two reasons why the comment about inflation has to be put in context in order to avoid being misleading:

The average journal prices have risen faster than inflation, but the average number of articles published in them as well, reflecting the above-inflation rise in scientific output. The correct measure should not be the average journal price, but the average price per article published. That may still have risen faster than inflation, and I haven't done the math, but having those data would turn the argument of inflated prices into a real one, or render it irrelevant.
Secondly, scientific journal publishing is a global pursuit. Inflated prices may just as easily be an effect of a precipitiously plunging currency (at the library's side), or a steeply rising one (in the publisher's country), as of publishers' pricing policies. Indeed, if much of the work hadn't been sweat-shop-ized, outsourced to low-wage countries, price rises might have been much bigger. As with so many of the goods we purchase these days.

I think the arguments for open access are strong enough without the inflation red herring.

Jan Velterop

Tuesday, March 03, 2009

Footing the Bill

Should you need any further evidence that the American democracy is in essence a lobbyocracy, the anti-open-access bill of congressman John Conyers provides it. Of course it isn’t called the ‘Anti Open Access Bill’, but the “Fair Copyright in Research Works Act”. But then, this is the way of the world these days: euphemania.

The Americans are not alone in living in a lobbyocracy, where powerful special interests rule the roost. In other countries the people do as well. Take Australia. But Australians don’t seem to do euphemisms. They call a spade a shovel, and they have a web site to address these matters, unambiguously called lobbyocracy.org, exposing money flows in politics. (By the way, lobbyocracy.info and lobbyocracy.us are still available today, March 3rd 2009, should anyone want to do the same in the US.)

It is amazing how misguided the reasoning is of Conyers' bill (Peter Suber does a sterling job exposing the fallacies in his Newsletter). "Fair copyright in research works", huh? For a scientist, fair copyright is a notion used to ensure attributed plagiarism, otherwise known as ‘citation’. It is one of the most important things about copyright. No, it is the most important thing about copyright. For a researcher.

For publishers it’s different. For them, copyright, or rather, the transfer of copyright, is a way of payment for the services they render. Though they call themselves publishers, these services are hardly to be called publishing any longer (in the sense of ‘making public’). They are procedural services resulting in the labelling of an article as ‘peer-reviewed and accepted by’ a given journal. The act of publishing is on the web these days, and anyone can do it. This is, of course, precisely the problem. The publishers’ business models are based on the idea that it is they who are publishing. They did, but that’s the past, when print was the only means of dissemination, of making public.

That said, the 'publishers' do fulfill a role that is needed in science. Researchers are required to publish in peer-reviewed journals. Essential for survival in the ego-system. ‘Publish or Perish’, remember? Of course, they also need to read, although the imperative isn’t quite there. No such thing as ‘Read or Rot’, after all. But to publish is the key to any career as a scientist at all. This fact should inform the business models: he who has the most interest pays.

Back to copyright. For publishers who think they publish, the transfer of copyright is just a way in which the author pays for the publishers’ services. If the value of that copyright is eroded – or in the view of some publishers even nullified – by funders’ mandates and embargoes, they have a problem. The most straightforward way out of that is of course substituting a monetary charge for the transfer of copyright. This is what the open access publishers have understood. The so-called ‘gold’ open access model.

So what should traditional publishers do? (I’m assuming that an egregious bill such as Conyers’ will fail.) Should they refuse articles that come with open access mandates attached? After all, they do not come with the required ‘payment’ of full copyright transfer. And embargoes are problematic (although the argument that articles have appreciable economic value after the typical embargo period of 12 months is rather weak, to say the least, seeing that almost all of the revenues of a publisher are realised in advance, as the subscription and licensing model demands). Refusing is hardly possible if they want to stay in business at all, since the authors are obliged by their funders to withhold transfer of copyright for anything other than temporary (a period of generally a year) and have no choice. Here, too, open access publishers have the advantage. After all, they simply do refuse articles that come without payment. With some discretionary exceptions, their policy could be expressed with the slogan “Pay, or just go away!”

But one thing I rarely see or hear. That is the notion that mandates with embargoes are a threat to ‘gold’ open access publishers as well. Especially the mandates with short embargoes, of, say, six months. What if researchers can wait that long to see most articles? And authors to publish their articles? Neither on the side of the reader or the writer would there be an incentive to pay for the necessary service that publishers do provide, be it in the form of transfer of copyright or plain money.

Which brings me to my final point. Payment for ‘gold’ open access publishing the way it is done now is also problematic. The reason is that payment for the services of a publisher is fully loaded on the published articles (and the same is true for ‘toll-access’ publishing as well, of course). And yet, much of the work is related to articles that do not come through the peer-review process and are rejected. A truly fair system would charge a submission fee, for which the publisher would organise the peer-review process. Like a driver’s test. You don’t just pay when you’ve passed and get your driver’s licence. You pay every time you take the test. It would probably also mean alleviation of the peer-review burden, since submissions would be carefully pitched to the journal of the appropriate level for the article, and not be allowed to cascade down the journal hierarchy.

Could that be a bill to put before Congress? Requiring that all scientific research is published with open access and that the only charges scientific journals can make are submission charges?

Jan Velterop

Saturday, February 14, 2009

Industry-funded research IFfy?

In his column Bad Science, in The Guardian on Saturday 14 February, Ben Goldacre drew attention to an article in the British Medical Journal by Tom Jefferson et al in which the observation was reported that...

"Publication in prestigious journals is associatedwith partial or total industry funding, and this associationis not explained by study quality or size."

The Impact Factor (IF) of the journals in which research funded by the public sector was published averaged 3.74 and the IF of the journals in which industry-funded research was published averaged 8.78. As Impact Factors go, that is a substantial difference. And, as Jefferson et al indicate, there was no discernable difference in terms of quality, methodological rigour, sample size, et cetera between the articles in question. Goldacre doesn't have an explanation. The suggestion is given in his column (he admits it is an "unkind suggestion") that it may have to do with journals' interest in advertisements and reprint orders – which can indeed be massive – from the very same industry that funds the research these journals publish. He doesn't say it, but this could mean, of course, that the journals accept articles based on research funded by industry, particularly the pharmaceutical industry, more readily than articles based on publicly-funded research.

I don't have an explanation for the phenomenon, either, but I doubt that journals accept industry-funded articles more easily than public sector articles. For a start, most publishers do not have in-house Editors-in-Chief who decide what's published and what not. That doesn't mean the publishers cannot have an influence on those Editors, but often it is already so difficult for them to get Editors to comply with everyday, sensible wishes, that I think this would be rather far-fetched. For publishers that do have in-house Editors-in-Chief, such influence may be more easily exerted.

A hypothesis I can imagine, however, is different and less sinister, although also to do with the massive numbers of reprints disseminated by the pharmaceutical industry. But this hypothesis would reverse cause and effect. Might it be that because of the wide dissemination, availability, and visibility of these reprints, the industry-funded articles are cited more often? After all, we know that articles are not only cited because they are the most appropriate ones, but also simply because they are the appropriate ones known to the author. (Sort of like when you ask a 'randomer' – a word I learnt from my 18-year old daughter and that I guess means random person – for the best restaurant in town, you are likely to get the best restaurant he or she knows, which is not necessarily the best restaurant in town). If articles based on industry-funded research are cited more often, the journals in which they appear get a higher Impact Factor.

If this hypothesis holds water, it would mean that wide availability is one of the important factors – with dissemination and visibility, and of course relevance – for being cited. In other words, could the results described in the BMJ article constitute evidence that open access could have a similar effect on Impact Factors as that – still hypothetically – caused by the massive numbers of reprints that the pharmaceutical industry purchases and disseminates?

Food for further study, I would think.

Jan Velterop

Tuesday, February 10, 2009

Deploring or exploring?

When Homo sapiens was still in the early stages of his evolutionary development, he hadn't yet figured out many other uses for water than to drink it. And perhaps to bath and swim in it. This is conjecture, of course, but the earliest evidence of the use of boats, or even just rafts, dates from much later than the emergence of Homo sapiens, so assuming that he was just using water to drink may be an acceptable point of departure for my story.

Water is one of the most abundant resources on earth, but if you're just using it to drink, you don't quite get much of its potential out of it. When people invented rafts, and developed boats – probably in the form of dug-out logs – a whole new world, literally, opened up to them. They all of a sudden didn’t have to see expanses of water as impediments to getting to the other side, and once navigation was thus discovered, waterways and seas became the most important transportation routes upon eventually empires were built. The rest is history, to use a cliché.

There is something similar going on with the way we use information. The image that I have in mind is that there are virtually oceans of information available to humans, but that the only use we make of that information is ‘by the drink’ – by reading articles or bits of articles. That way, the knowledge contained in the ever growing seas of information (just think of the amounts of information coming out of, say, microarray experiments), is unlikely to come out in full. There remains an enormous amount of “unknown knowns” (apologies for using a Rumsfeldism) if we do not find a way to do more with information than read articles and books, or consult databases. We have to develop ways of extracting knowledge out of large amounts of information. Thousands of papers, and thousands of database entries. Or hundreds of thousands. We can’t read those. We have to invent the equivalents of rafts and boats to navigate information. And still read, but manageable amounts (after all, we still drink, too).

In whatever information navigation we already do, we stay very close to the coast, and only to the coasts we know. We search. And we pretend that we are navigating the vast expanse of knowledge that search capabilities on the internet have opened up. But are we? Is searching not a retrograde step in terms of knowledge discovery? Aren’t we inclined to search for knowledge and relations between bits of information we already know to exist? And so foster more homophily in the process than before, when large-scale search wasn’t yet possible? And stay in our knowledge comfort-zone. Look for confirmation rather than for falsification. We should give chance more of a chance. Serendipitous discoveries are, after all, the 'stuff' of which breakthroughs are made.

Some people deplore the fact that more and more information becomes available. They talk of information overload or overabundance. And if the only thing you can imagine doing with it is read (‘drink’), then you may have reason to be negative about it. If you think like this you may seek solutions in selection, in limiting access, in having the choices made for you. But if you can imagine truly navigating the ever growing seas of information, you will not deplore the abundance, but instead, start exploring it.

Jan Velterop