Scientists can rapidly develop chemical modelling on the Internet. Henry Rzepa and Benjamin Whitaker explain
Scientists and mathematicians have extended the written language by developing special symbols to describe more complex notions.
But for many people, difficulties still arise in receiving scientific ideas on a printed page limited to words, the occasional diagram, and a collection of scientific hieroglyphs. It is hard to describe complex spatial relationships in words, particularly when they might also be changing in time. Two dimensional pictures are often inadequate if they present only one perspective.
To illustrate how important this can be, consider the drug thalidomide. It has two forms, known as R and S, the first of which is thought to be safe and the second of which can cause foetal abnormalities in pregnant women.
The difference between these two forms can only be understood if the chemical structure of the molecule is considered in three dimensions, and it takes a highly experienced and confident chemist to translate a two-dimensional diagram together with the R/S symbolic notation, into the laboratory synthesis of a safe pharmaceutical product. The tragic results of the failure to understand the importance of this molecular handedness are only too well known.
Computer networking allows many of the restrictions of the printed page to be overcome. The new medium is the Internet, and it is our belief that this genuinely represents a major sea-change in scientific communication. The liberating aspect of the new technology lies in the fact that the scientist, teacher or student can now easily acquire information and test its value, integrity and semantic content within seconds, without fear of introducing errors or misinterpreting symbolic notation.
At this point, it may seem ironic that we are attempting to communicate this concept using the "old technology". So for those readers who wish to explore for themselves and understand what all the fuss is about, we have made a version of this article available, along with some pointers to other sites of special interest in the form of a Uniform Resource Locator. A URL is the notation for a collection of data and ideas available via the Internet using a program known as a World Wide Web browser. NCSA Mosaic, or Mosaic Netscape are good examples, available free of charge by anonymous file transfer from archive sites such as src.doc.ic.ac.uk (in the sub-directory Web). Enter the URL location and you can try out these ideas for yourself. At home you will need a PC or Macintosh computer, a modern modem and an Internet dial-up account.
In an illustration of the future, imagine the following scenario. As a scientist or teacher, you might wish to communicate the structure of a molecule such as thalidomide (or indeed something which might be 100 times larger and more complex), and be absolutely sure that your audience does not go away and synthesise the S instead of the R form in the laboratory. Along with the symbolic structure, you might also wish to pass on detailed toxicology data, some spectral information, accurate three dimensional coordinates for the molecule determined from X-ray crystallography, the structures of a few dozen analogues, together with details of the enzymes involved in metabolic pathways, and so forth.
The first step is to store all this information as computer files in an internationally agreed format. In Internet terms, these formats are encapsulated in a set of standards known as MIME types. We, ourselves are in the process of starting the ratification of a collection of chemical MIME types to serve the interests of molecular scientists.
The next process is to allocate a URL to each of these data files, and to write a description of the project in which each set of data is described in context, and then linked together. This kind of structured document is known as hypertext, and the things that tie it together are not surprisingly known as hyperlinks. The hyperlinks are simply a mechanism implemented in the World Wide Web which allows the user to retrieve the data by clicking with a mouse pointer at an obviously marked spot in the document. The underlying Internet protocols then ensure that an exact digital copy of this information is transmitted to the user's own computer. The MIME mechanism enables the user to specify precisely what will happen to the data once it arrives. For example, three dimensional coordinates for thalidomide can be transparently passed to a special computer program which transforms these into a 3D image of the molecule. Readers who have molecular modelling software can rotate the molecule in three dimensions and begin fully to appreciate the distinction between the R and S forms. Most importantly, much of the jargon has been replaced by more tangible concepts such as molecular models, and the idea that science after all may really be fun. We have coined the term "hyperactive molecules" for this overall concept. Hyperlinks can also be invoked to "mark-up" the molecule, by perhaps highlighting the chiral centre specified by the R/S notation in the same sense that one might italicise a word on a printed page. In the same vein, other forms of information associated with the molecule can be passed to other programs to further enhance the value of the communication process.
The result is that a very rich dataset can be attached to the primary document, and readers are able to extract the exact data they need without fear of transcription errors. Readers can explore the implications of what they are reading, for example, by checking some aspect of the data against their own knowledge. We believe the key point is that the reader is no longer passive. We have here a medium which transcends the two-dimensionality of the printed page. Readers can take control of the perspective of the object they are looking at, and manipulate it to suit their conceptual framework.
This will have major implications for the teaching of chemistry at all levels, and it is also not difficult to imagine how other scientific subjects might benefit from these concepts. For example, complex and semantically correct mathematical notation can be transferred, error-free, via a hyperlink and passed to programs such as Mathematica or Maple for visualisation or further manipulation.
Sound, music, video are other forms of information easily handled by the medium. Indeed, few scientists have yet begun to apply these mechanisms to their own disciplines. We are currently wielding the equivalent of stone implements to communicate ideas. It is a breathtaking feeling to think of what might be accomplished within a very few years.
Henry Rzepa is reader in organic chemistry, Imperial College, London (rzepa@ic.ac.uk and WWW : Benjamin Whitaker is a lecturer in organic chemistry (benn@chem.leeds.ac.uk) at the University of Leeds.