Tuesday, May 31, 2005

PubChem Again

From Peter Murray-Rust

As major users of PubChem, Henry Rzepa and I have
written an Open statement of its value. You are free
to copy, quote from, redistribute or re-use this


We write as scientists committed to the sharing of
chemical information on the public Internet. For 12
years we have developed and promoted the technology
and culture of a global, open, approach and have freely
contributed specifications, software and data. We are
frequently invited to present these at International
conferences, including many in the US.
We have done this in the belief that this creates a
better infrastructure for scientific research of benefit

We believe that bioscience and healthcare have benefited
greatly from the aggregation of and free access to
research data such as genomes, protein structures and
sequences. Scientists and lay people alike can search
the international databases for disease information,
and drill down to find the most recent information about
the molecular basis.

We wish to emphasize in the strongest terms the current
and future value of the NCBI/NIH's PubChem to the
scientific and medical community.
(SPARC's recent statement
[http://www.arl.org/sparc/oa/PubChemlet.html] outlines
the concerns that PubChem may be closed down or severely
restricted). We have been using the molecules in PubChem
and promoting their value in research.

The substances in PubChem are those critical for research
in bioscience and healthcare. They include:
-naturally occurring substances in living organisms;

-synthetic substances with known biological action,
including drugs and toxic materials

-reagents for performing standard biological and chemical
assays such as in drug discovery
Each PubChem entry is curated and includes:
-the precise chemical formula for the substance
-one or more common names for the compound
-properties by which the compound can be identified and on
which its physical and biological activities depend
-the new InChI identifier from the International Union of
Pure and Applied Chemistry. Through this PubChem has become
a unique resource for reference and discovery, especially
on the Internet.
Until PubChem, virtually no chemical information was freely
available (i.e. without a library or a subscription to an
information supplier). It is generally not possible to look
up freely the chemical formulae of common drugs, food
additives, or materials in the environment. Yet much of
this information was first published many decades or centuries
ago. PubChem provides a reliable, instant, resource for anyone.

As an example of its value, the UK had a recent concern about
a red dye in chili powder. From home we were able to use
PubChem to find out what the chemical formula of this material
is and what its reported toxicity and biological properties are.
We know of no other freely available resource that we could
have used with confidence.

In our laboratories we are using PubChem for systematic research
and are enhancing its value by publishing the results to the
world. We have systematically computed the properties of over
200,000 molecules and published our peer-reviewed results freely.
These properties are typical of those used in computer-aided drug
discovery or the prediction of the safety of compounds. We have
automated the process so that eventually all molecules in PubChem
will have this information. Using InChI we have recently created
a web site so that anyone can use search engines (e.g. Google(TM)
or MSN(TM)) on this database without prior chemical knowledge.
This is typical of the way in which information-driven science
builds on, and enhances, existing knowledge.

Even now it is very difficult for many bioscientists to read
papers which include chemical names. We have therefore recently
urged [1] that scientific publishers should link their electronic
publications to PubChem to help the reader understand the chemistry.
Since PubChem also provides important biological background to many
entries it enhances the scientific process, speeding it up and
reducing the chance of error. We see PubChem as a universal tool
for authors, helping them to reduce the chance of mistakes.

Finally we re-emphasize the global nature of scientific information.
By sharing resources freely we detect and correct errors, and
encourage innovation in the way we access information. Many
developments in bioscience and healthcare come not from the wet
laboratory, but through computational knowledge-driven methods.
PubChem represents the start of such a process in chemical bioscience.
No one site holds the totally of the world's knowledge and through
the Web we create distributed resources from which all of us benefit.

Peter Murray-Rust, Reader in Molecular Informatics, University of
Cambridge, UK

Henry Rzepa, Professor of Chemistry, Imperial College, UK

This statement may be copied and redistributed under Creative Commons
(Attribution-NonCommercial-NoDerivs 2.0). We urge others to circulate
it freely and use it to promote the continued use of PubChem.

[1] Preprints for BMC Bioinformatics in our Institutional Repository:

statement ends

Peter Murray-Rust
Chemistry Department, Cambridge University
Lensfield Road, CAMBRIDGE, CB2 1EW, UK
Tel: +44-1223-763069 Fax: +44 1223 763076


  1. Hi The Parachute, very unique blog you have! I was looking for skydive related information and came across your site. Very good info, I'm definitely going to bookmark you! I have a skydive site. You'll find info on skydiving gear, equipment, drop zones, powered parachutes, tandem sky diving, base jumping and more! Please visit and enjoy!

  2. I enjoyed reading some of your posts The Parachute. I was looking for skydiving supplies related information and found your blog. I have a skydiving supplies site. You'll find info on skydiving gear, equipment, drop zones, powered parachutes, tandem sky diving, base jumping and more! Come and check it out if you get time :-)