Gold OA almost always includes an HTML version as well as a PDF. And if anything is missing, it is the PDF. Green OA, on the other hand, more often than not offers just the PDF, and not a machine-readable HTML or XML version. Both are, of course, fine for the traditional form of knowledge intake, via the eye, by reading the articles. But they are not both suitable for computer-assisted intake, via machine-reading and text-mining. That is not easily possible, in practice, with PDFs, and not at all with bitmap PDFs (at least not without cumbersome procedures involving prints and optical character recognition, or OCR scanning).
Not having machine-readable access may not be a problem for everyone, but in disciplines where there is a growing over-abundance of new papers, traditional human reading is not an option if one wants to stay truly up-to-date. In areas such as the ‘-omics’ (genomics, proteomics, metabolomics), but not only in those, the ability to perform text-mining is of crucial importance.
There is no reason in principle that a machine-readable version of one’s paper is not deposited in one’s repository, and advocates of ‘only green OA’, ‘primarily green OA’, or ‘green OA first’, ought to encourage HTML deposits. They are readable by machine and human eye alike, and therefore vastly superior for the purpose of knowledge sharing.
OA to PDFs may be better than non-OA, and that of course remains the case. But relying on OA PDFs for knowledge sharing and dissemination is not dissimilar to ‘preparing for the previous war’.
Jan Velterop
No comments:
Post a Comment