Friday, May 30, 2008

The meanings of 'free'

I've received questions about Knewco's WikiProfessional. How free it is; and if it is free as in 'free beer' or free as in 'free speech'.

Life's never simple: it's a combination of both.

WikiProfessional's million minds approach does rely on user input. That's nothing new in science – in fact, the whole scientific knowledge edifice relies on user input. The user-generated content in WikiProfessonal is indeed free as in 'free speech'. The relationship-concept matrix (the knowlet-database, dynamic, relational, and constantly recalculated, reacting to any infusion of new knowledge) is also free to users, but free as in 'free beer'. It took considerable effort to develop and build it – and to maintain it – so it actually is (will be) paid for, by advertising and sponsorships we hope. The users 'pay' as in 'paying' a visit, and 'paying' attention, which we can then use to attract appropriate advertisers. (For some reason we haven't quite figured out yet how to survive on plain air, and we need to generate income to sustain our activities.)

It is important to distinguish the knowlet part and the wiki part in the WikiProfessional database. Knewco (the Knowledge Navigation and Expert Wiki Company) owns the first one and the knowlet is patented. In due time, there will be feeds available from the knowlet database to whoever wants to (or pays for, this might typically be a premium service).

The wiki part of the database on the other hand contains publicly as well as privately available authority and community contributions. We don't 'have' those; we just use those, as anyone else can do, at least with regard to the public ones (one has to approach the 'owners', authorities – NLM, Swissprot/Uniprot, etc. – for these authoritative databases). With respect to the community annotations and contributions, those are freely available under a CC-BY licence (Creative Commons Attribution Licence), and eventually we may have this available in a suitable form for downloading. There may be a potentially fruitful collaboration with Open Progress with regard to standardizing the download/exchange format.

Meanwhile, go to WikiProfessional, use the system, give us feedback, register and contribute, and work with us on spreading scientific knowledge via collaborative intelligence.

Jan Velterop

Wednesday, May 28, 2008

A rose by any other name

"Doctors often exude an air of omniscience, but in truth they are surprisingly ignorant."
Thus began an article in this week’s Economist. Harsh language, but many a doctor, or other professional, including scientists, will recognize himself or herself in these words. The article in The Economist isn’t specifically about that, but the sense of information overload is surely a major contributory factor to this 'surprising ignorance'. After all, a lot of the information one gets to digest is ambiguous, redundant, fragmented, inconsistent, to name a few problems. As Herbert Simon, an American political scientist once observed: “What information consumes is rather obvious: it consumes attention. Hence a wealth of information creates a poverty of attention.” The problem of the information glut in a nutshell.

Today saw the launch of an attempt to combat this abundance, redundancy, fragmentation and inconsistency: WikiProfessional.

The idea is that the combined efforts of a ‘million minds’ would be able, in a collaborative intelligence exercise, to refine a system that 'distills' the essence of established knowledge as well as points to new knowledge that has a high likelihood of being established soon. What it all entails is explained in an open access article in Genome Biology.

The concept (so to speak) is so far optimized for the life sciences and medicine, but there is no reason why it shouldn’t work in other areas as well. And in languages other than English. It is based on concepts, and those are of course valid in any language. It’s just the words or descriptions used for them are different. As Shakespeare already noted in Romeo and Juliet: "What's in a name? That which we call a rose by any other name would smell as sweet."

Just imagine what that means. One of the beauties of the concept approach (as opposed to the keyword approach) is that search terms in one language could, for instance, yield search results in another. Think of Chinese researchers searching with Chinese terms for English literature (they can read English, but may find it more difficult to come up with search terms in English, in the same way that I find it sometimes easier to search with Dutch terms), yet getting served up with English search results. Things like that. Wonderful.

(I have to declare an interest: I’m running Knewco, the company behind WikiProfessional).

Jan Velterop

Sunday, May 25, 2008

Wiki temperatures

In the Chronicle of Higher Education Jeffrey Young reports about a 'frozen' Wikipedia being more academically useful for students than the current version, which can be – and is – edited all the time, sometimes resulting in a lot of heat. There is something tremendously attractive in having unfettered editing possibilities, but also in having stable, authoritative articles in such an extremely useful web resource as the Wikipedia. In an academic environment, one would ideally have both. WikiProfessional, which is specifically conceived for the academic and professional environment, actually gives both. On the one hand it presents stable, vetted and authoritative knowledge, yet on the other hand it gives the utterly useful and necessary option for knowledge to be supplemented and annotated in real time by anyone wishing to do so. Both the authoritative version, and community annotations and additions, are presented side-by-side. Only when annotations and additions are deemed acceptable by the professional or academic community in question – peer-reviewed in one way or another – are they elevated to the level of 'received knowledge'.

For open access WikiProfessional presents a nice additional opportunity: 'annotations' can be links to particularly appropriate and relevant articles. And if such links were made to freely available versions of the articles in question, this would give WikiProfessional some of the functionality of a federated repository, not just enhancing an article's exposure and findability, but at the same time putting it in the right context in the Concept Web. This, in turn, may well further increase the chances of such an article to be cited.

Jan Velterop

Thursday, May 15, 2008

Dealing with abundance – getting more out of the science literature than you thought possible

Open access is adding to the abundance of scientific information available to us. It is to be expected that this abundance will be growing fast, with the growth of open access. This is good, because only comprehensive and unfettered access to the science literature will make it possible for us to be truly abreast of the scientific progress that's being made.

On the other hand, however, it will present us with even more challenges than we already face in terms of being able to deal with all that information. In certain disciplines reading all the relevant papers to our research topic means digesting thousands of papers per year – enough to fill our entire working time. Without assistance from the processing capabilities and speed of computers, we cannot hope to keep up with emerging trends in our chosen fields.

Few scientists can properly cope with mushrooming information and were they to read all the articles relevant to them, they would find that they almost always contain a very large amount of information already known to them. That redundant information is usually provided for the sole purpose of context and readability. The amount of actual new information is often surprisingly small and could have been conveyed in one or two sentences if the context were clear. Yet the essence of the scientific discourse is captured in those few sentences. The surrounding text of articles is, if you wish, the packaging in which the essence is transported, and analogous to the mass of fluffy stuff that's surrounding breakable item that's being shipped: emballage.

At Knewco, the company that I now work for, we aim to provide an environment for concentrating this scientific discourse – 'distilling' it from the abundance of sources, if you wish – and make it more productive by making it computer-processable. Very few scientists can read and digest all the articles and database entries that they would need to read and digest in order to synthesize the essence of the knowledge they need. So what we do is to enable and foster collaborative intelligence between machine processing power and human brainpower. Knewco 'distills' information to the essence of knowledge content from millions of documents, enriching it in the process with linked concepts and context.

This is not the same as making it possible to locate the one right document out of the abundance available. It is identifying 'atoms' of knowledge about a given concept from the literature and combining these atoms into 'molecules' of knowledge (we call those "knowlets" – a knowlet connects facts). Just as a graph can give you in one glance the essence of an enormous array of numbers in one glance, the knowlet gives you the essence of an enormous amount of scientific literature. It's like reading out of a picture instead of text. And as "a picture is worth more than a thousand words", a knowlet could be said to be worth more than the text of a thousand articles. Knowledge redesigned, as it were.

Perhaps more importantly, since a knowlet is a computer artifact, it can be used to identify related information, predict trends and intersections in data (see it as a kind of topology of knowledge), be used in combination with other knowlets of more complex concepts, and be updated in real time to keep information current up to the minute.

For technology of this kind to be optimally effective for scientific knowledge discovery, access to the literature is not sufficient by itself. It goes without saying that the source documents must be computer-readable to be optimally usable. Publishers as well as repositories may wish to take this to heart if they are serious about helping to speed up the pace of scientific progress.

Jan Velterop