Tuesday, May 31, 2005

PubChem Again

From Peter Murray-Rust

As major users of PubChem, Henry Rzepa and I have
written an Open statement of its value. You are free
to copy, quote from, redistribute or re-use this
document.

AN OPEN STATEMENT IN SUPPORT OF NIH/NCBI's PubChem

We write as scientists committed to the sharing of
chemical information on the public Internet. For 12
years we have developed and promoted the technology
and culture of a global, open, approach and have freely
contributed specifications, software and data. We are
frequently invited to present these at International
conferences, including many in the US.
We have done this in the belief that this creates a
better infrastructure for scientific research of benefit
to
all.

We believe that bioscience and healthcare have benefited
greatly from the aggregation of and free access to
research data such as genomes, protein structures and
sequences. Scientists and lay people alike can search
the international databases for disease information,
and drill down to find the most recent information about
the molecular basis.

We wish to emphasize in the strongest terms the current
and future value of the NCBI/NIH's PubChem to the
scientific and medical community.
(SPARC's recent statement
[http://www.arl.org/sparc/oa/PubChemlet.html] outlines
the concerns that PubChem may be closed down or severely
restricted). We have been using the molecules in PubChem
and promoting their value in research.

The substances in PubChem are those critical for research
in bioscience and healthcare. They include:
-naturally occurring substances in living organisms;

-synthetic substances with known biological action,
including drugs and toxic materials

-reagents for performing standard biological and chemical
assays such as in drug discovery
Each PubChem entry is curated and includes:
-the precise chemical formula for the substance
-one or more common names for the compound
-properties by which the compound can be identified and on
which its physical and biological activities depend
-the new InChI identifier from the International Union of
Pure and Applied Chemistry. Through this PubChem has become
a unique resource for reference and discovery, especially
on the Internet.
Until PubChem, virtually no chemical information was freely
available (i.e. without a library or a subscription to an
information supplier). It is generally not possible to look
up freely the chemical formulae of common drugs, food
additives, or materials in the environment. Yet much of
this information was first published many decades or centuries
ago. PubChem provides a reliable, instant, resource for anyone.

As an example of its value, the UK had a recent concern about
a red dye in chili powder. From home we were able to use
PubChem to find out what the chemical formula of this material
is and what its reported toxicity and biological properties are.
We know of no other freely available resource that we could
have used with confidence.

In our laboratories we are using PubChem for systematic research
and are enhancing its value by publishing the results to the
world. We have systematically computed the properties of over
200,000 molecules and published our peer-reviewed results freely.
These properties are typical of those used in computer-aided drug
discovery or the prediction of the safety of compounds. We have
automated the process so that eventually all molecules in PubChem
will have this information. Using InChI we have recently created
a web site so that anyone can use search engines (e.g. Google(TM)
or MSN(TM)) on this database without prior chemical knowledge.
This is typical of the way in which information-driven science
builds on, and enhances, existing knowledge.

Even now it is very difficult for many bioscientists to read
papers which include chemical names. We have therefore recently
urged [1] that scientific publishers should link their electronic
publications to PubChem to help the reader understand the chemistry.
Since PubChem also provides important biological background to many
entries it enhances the scientific process, speeding it up and
reducing the chance of error. We see PubChem as a universal tool
for authors, helping them to reduce the chance of mistakes.

Finally we re-emphasize the global nature of scientific information.
By sharing resources freely we detect and correct errors, and
encourage innovation in the way we access information. Many
developments in bioscience and healthcare come not from the wet
laboratory, but through computational knowledge-driven methods.
PubChem represents the start of such a process in chemical bioscience.
No one site holds the totally of the world's knowledge and through
the Web we create distributed resources from which all of us benefit.

Peter Murray-Rust, Reader in Molecular Informatics, University of
Cambridge, UK

Henry Rzepa, Professor of Chemistry, Imperial College, UK

This statement may be copied and redistributed under Creative Commons
(Attribution-NonCommercial-NoDerivs 2.0). We urge others to circulate
it freely and use it to promote the continued use of PubChem.

[1] Preprints for BMC Bioinformatics in our Institutional Repository:
http://www.dspace.cam.ac.uk/handle/1810/34580
http://www.dspace.cam.ac.uk/handle/1810/34579

statement ends

Peter Murray-Rust
Chemistry Department, Cambridge University
Lensfield Road, CAMBRIDGE, CB2 1EW, UK
Tel: +44-1223-763069 Fax: +44 1223 763076

Friday, May 20, 2005

Open Access - Open Data

Peter Murray-Rust (Unilever Centre for Molecular Informatics, Cambridge, UK) has posted a few items on open access/open data in the Cambridge Institutional Repository:

A short invited presentation to JISC2005 - the UK's information infrastructure organisation, in which he emphasizes that machines read publications as well as humans.

An invited overview for BioMedCentral Bioinformatics on Open chemical data in biosciences and how it can be accelerated by forward looking publishers and funders

An accompanying technical article for BioMedCentral Bioinformatics on the extraction of such data in XML

Thursday, May 19, 2005

PubChem and Chemical Abstracts

Quoting Eugene Garfield:
"It is remarkable that the same society that accepted millions of dollars in grants from the NSF for establishing the chemical registry system, now objects to the government's use of the data."

He puts his finger right on the sore spot.

Tuesday, May 17, 2005

A new forum

The Parachute aims to provide a forum for those who have thoughts and ideas about how and why openness is good for science and scholarship. The Parachute is not limited to open access to scholarly literature, but explores all areas where openness contributes to better research with more results that benefit society.

Contributions are welcome at theparachute@btinternet.com. All contributions will be moderated. Short contributions will be posted in the blog itself. For longer ones, an abstract or excerpt should be provided with a link to a site where the full contribution resides. Comments to any contributions are encouraged.