This excellent introduction to a growing area of computing in chemistry will interest students, scientists and academics.
"synopsis" may belong to another edition of this title.
Walter Thiel studied chemistry at the University of Marburg (West Germany) from 1966 to 1971, where he subsequently obtained his doctorate with A. Schweig in 1973. After a post-doctoral stint at the University of Texas at Austin with M. J. S. Dewar (1973–1975), he obtained his habilitation from the University of Marburg in 1981. He was appointed Professor of Theoretical Chemistry at the University of Wuppertal (West Germany) in 1983 and Professor of Chemistry at the University of Zurich (Switzerland) in 1992. In 1987 he was a visiting professor at the University of California at Berkeley. Since 1999, he is a director at the Max Planck Institute for Coal Research in Mülheim an der Ruhr (Germany) and an honorary professor at the neighbouring University of Düsseldorf (Germany) since 2001.
This is currently the only book available on the development of knowledge-based, and related, expert systems in chemistry and toxicology. Written by a pioneer in the field, it shows how computers can work with qualitative information where precise numerical methods are not satisfactory. An underlying theme is the current concern in society about the conflicts between basing decisions on reasoned judgements and wanting precise decisions and measurable effectiveness. As well as explaining how the computer programs work, the book provides insights into how personal and political factors influence scientific progress. The introduction of regulations such as REACH in Europe and modifications to UN and OECD Guidelines on assessment of chemical hazard mean that the use of toxicity prediction is at a turning point. They put a heavy burden on the chemical industry but, for the first time, allow for the use of computer prediction to support or replace in vivo and in vitro experiments. There is increasing recognition among scientists and regulators that qualitative computer methods have much to offer and that in some circumstances they may be more reliable and informative than quantitative methods. This excellent introduction to a field where employment opportunities are growing is aimed at students, scientists and academics with a knowledge of chemistry.
Chapter 1 Artificial Intelligence – Making Use of Reasoning, 1,
Chapter 2 Synthesis Planning by Computer, 6,
Chapter 3 Other Programs to Support Chemical Synthesis Planning, 16,
Chapter 4 International Repercussions of the Harvard LHASA Project, 35,
Chapter 5 Structure Representation, 41,
Chapter 6 Structure, Sub-Structure and Super-Structure Searching, 55,
Chapter 7 Protons that Come and Go, 78,
Chapter 8 Aromaticity and Stereochemistry, 85,
Chapter 9 Derek – Predicting Toxicity, 94,
Chapter 10 Other Alert-Based Toxicity Prediction Systems, 103,
Chapter 11 Rule Discovery, 110,
Chapter 12 The 2D–3D Debate, 119,
Chapter 13 Making Use of Reasoning: Derek for Windows, 124,
Chapter 14 Predicting Metabolism, 142,
Chapter 15 Relative Reasoning, 155,
Chapter 16 Predicting Biodegradation, 165,
Chapter 17 Other Applications and Potential Applications of Knowledge-Based Prediction in Chemistry,
176,
Chapter 18 Evaluation and Validation of Knowledge-Based Systems, 183,
Chapter 19 Combining Predictions, 191,
Chapter 20 A Subjective View of the Future, 201,
Subject Index, 204,
Artificial Intelligence – Making Use of Reasoning
Launched by half a dozen young men at a run, a three-metre long paper dart can fly successfully, dare we even say "gracefully", the length of a research station canteen before making an unfortunate landing in the director of research's Christmas lunch. It is just a question of getting the aerodynamics right. My school mathematics teacher reminded us on most days (several times on some) that all science is mathematics. But was it only the power of numbers he had in mind? Does science come down to a sweatshop full of equations mindlessly crunching numbers, real and imaginary?
Contrary to the perceptions of many people outside science, as well as too many inside it, science is not about proving facts: it is about testing hypotheses and theories; ultimately, it is about people and their opinions. Simple, rigid application of rules of aerodynamics may get you a paper dart that flies but in many fields human decision making is best supported by reasoned argument or the use of analogy and not much helped by numerical answers. The minimum braking distance for a car travelling at forty miles per hour is twenty-four metres, according to the Driving Manual from the Driving Standards Agency. Assuming you can countenance the required mixing of miles and metres, does this information help you to drive more safely? Have you any more idea than I have how far ahead an imaginary twenty-four metre boundary-marker precedes you along the road?
And there is a further problem. "Numbers out" implies "numbers in", so what do you do if you have no numbers to put in? A regrettably popular solution is to invent them – or at least to come up with dubious estimates to feed into a model that demands them, which is close to invention. It is the only option if you want to apply numerical methods and to give numbers to the people asking for solutions. That numbers make people feel comfortable is a bigger problem than it may at first appear to be, too. Uncritical recipients of numerical answers tend to believe them, and to act on them, without probing very deeply. More sceptical recipients want to judge for themselves how meaningful the answers are but often find that the kind of supporting evidence associated with a numerical method is not much help. Many are the controversies over whether this or that numerical method is more precise but they are missing the point if the data are far less precise than the method. Perhaps numbers are unnecessary – even unsuitable – for expressing some kinds of scientific knowledge.
There are circumstances in which numerical methods are highly reliable. Aeroplanes stay up in the sky and make it safely to earth where they are supposed to do. Chemical plants run twenty-four hours a day, year in year out. Numerical methods work routinely in physical chemistry laboratories, and toxicology and pharmacology departments. But it is unlikely that the designers of the three-metre paper dart that took flight at the start of this chapter did any calculations at all. My guess is that they just went with a gut feeling based on years of experience making little ones.
This book is about uses of artificial intelligence (AI) and databases in computational chemistry and related science where qualitative output may be of more practical use than quantitative output. It touches on quantitative structure–activity relationships (QSAR) and how they can inform qualitative predictions, but it is not about QSAR. Neither is it a book about molecular modelling. Both subjects are well-covered in too many books to list comprehensively. A few examples are given in the references at the end of this chapter. This book focuses on less widely described and yet, probably, more widely-used applications of AI in chemistry.
The term "artificial intelligence" carries with it notions of thinking computers but, as a radio personality in former times would have had it, it all depends on what you mean by intelligence. If you type "Liebig Consender" into the Google™ search box, Google™ responds with "Did you mean Liebig Condenser" and provides a list of corresponding links without waiting for an answer. That is worryingly like intelligent behaviour whether it is intelligent behaviour or not. Arguments continue about whether tests for artificial intelligence such as the Turing test are valid and whether a categorical test or set of tests can be devised. Perhaps it is sufficient to require that to be intelligent a system must be able to learn, be able to reason, be creative, and be able to explain itself persuasively. Currently, no artificial intelligence system can claim to have all of these characteristics. Individual systems typically have two or three.
To count as intelligent, solving problems needs to involve a degree of novel thinking, i.e. creativity. Restating the known, specific answer to a question requires only memory. Compare the following questions and answers. The first answer merely reproduces a single fact. Generating the second answer, simple though it is, requires reasoning and a degree of creativity.
"Where's the sugar?"
"In the sugar bowl".
"Where will the sugar be in this supermarket?"
"A lot of supermarkets put it near the tea and coffee, so it could be along the aisle labelled 'tea and coffee'. Alternatively, it might be in the aisle labelled 'baking'. Let's try 'baking' first – it is nearer".
One of the first computer systems to behave like an expert using a logical sequence of questions and answers to solve a problem was MYCIN, a system to support medical diagnosis.
"Doctor, I keep getting these terrible headaches".
"Sorry to hear that. Is there any pattern to when the headaches occur?"
"Now you ask, they do seem to come mostly on Sunday mornings".
"And what do you do on Saturday evenings?"
The doctor's questions are not arbitrary. You can see how they are directed by the patient's responses. You can probably see where they are leading, too, but the doctor would still want to ask further questions to rule out all the possibilities before jumping to the obvious conclusion about the patient's Saturday nights out on the town. The aim of the MYCIN experiment was to design a computer system capable of choosing appropriate sequences of questions similarly, in order to reach a diagnosis efficiently.
This kind of reasoning is common throughout science although it often does not involve a dialogue; the questions may be implicit in a process of thought rather than consciously asked. Suppose you know that:
many α,β-unsaturated aldehydes cause skin sensitisation;
for activity to be expressed a compound must penetrate the skin;
compounds with low fat/water partition coefficients do not penetrate the skin easily:
many imines can be hydrolysed easily in living systems to generate aldehydes.
Actually, the story for skin sensitisers is better understood and can be more fully and more usefully described than this, but what we have will do for the purposes of illustration. Suppose you are shown the structure of a novel α,β-unsaturated imine and asked for an assessment of its potential to cause skin sensitisation. You will be aware that the imine might be converted into a potentially skin-sensitising aldehyde. If you have access to suitable methods you will get an estimate of the fat/water partition coefficient for the imine in order to make a judgement about whether it will penetrate the skin (most likely you will use a calculated logP value as a measure of fat/water partition coefficient, but there is more about that later in this book). You will presumably have the gumption to consider the partition coefficient for the aldehyde as well, in case the imine is unstable enough to hydrolyse on the surface of the skin.
Depending on the information, you will come up with conclusions and explanations such as:
"the query substance is likely to be a skin sensitiser because it has the right partition coefficient to penetrate the skin and the potential to be converted into an α,β-unsaturated aldehyde – a class of compounds including many skin sensitisers";
"the query is not likely to be a skin sensitiser because although it is an imine which could be converted into an α,β-unsaturated aldehyde – a class of compounds including many skin sensitisers – both compounds have such low fat/water partition coefficients that they are unlikely to penetrate the skin";
"the situation is equivocal because the imine has too high a fat/water partition coefficient to penetrate the skin easily but the related aldehyde has a lower fat/water partition coefficient and I do not know how readily the imine will hydrolyse to the aldehyde on the skin surface."
Systems in which a reasoning engine solves problems by applying rules from a knowledge base compiled by human experts were originally called "expert systems", on the grounds that they behave like experts. In this book they are distinguished by being called "knowledge-based systems". They use reasoning to varying degrees and they are creative in the sense that they solve novel problems and make predictions. The particular strength of the best of them is their ability to explain themselves. For example, there is fairly good understanding of why α,β-unsaturated aldehydes are skin sensitisers. The human compilers of a knowledge base can include that information so that the expert system can present it to a user when it makes a prediction and can explain how it reached its conclusion.
Given access to structures and biological data for lots of compounds, you might discover the rule that α,β-unsaturated aldehydes are often skin sensitisers, assuming you were not overwhelmed by the quantity of data. Knowledge-based systems as defined here make no attempt to discover rules from patterns in data – they simply apply the rules put into them by human experts. In terms of the criteria for intelligence, they are unable to learn for themselves. The more general term, "expert system", was later extended to include systems that generate their own models by statistical methods and apply them. While these systems are perhaps nearer to all-rounders in the stakes for showing intelligence than knowledge-based systems, they fall down on explaining themselves. They cannot go beyond presenting the statistical evidence for their rules.
A speaker remarked at a meeting I attended that "An expert system is one that gives the answers an expert would give ... including the wrong ones". It might be fairer to compare consulting a knowledge-based system (which is what he was talking about at the time) with consulting a group of human experts rather than one, since knowledge bases are normally compiled from collective knowledge, not just individual knowledge, but his warning stands. Other people have, only half-jokingly, suggested that an expert system is one suitable only for use by an expert. That may be over-cautious but users of expert systems should at least be thinking and well-informed: it is what you would expect of someone taking advice from a team of experts.
CHAPTER 2Synthesis Planning by Computer
Organic synthesis chemists are used to working with ideas and rules of thumb. They are not inclined to plan reaction sequences to novel compounds on the basis of kinetic or thermodynamic calculations – indeed, they are rarely in the position to do so because data of sufficient reliability are not available for the calculations – but they have a reasonable success rate. How do they do it? Could a computer emulate the thinking of a chemist who works out a practical synthetic route to a complicated organic compound?
The tale is told of a conversation over a few beers one evening between three eminent chemists famed for their work in organic synthesis – Elias J. Corey, Alexander R. Todd and Robert B. Woodward. Corey, it is said, expressed the view that computers would eventually be capable of matching or even outclassing human reasoning; soon there would be machines capable of designing chemical syntheses just as well as chemists do. Todd and Woodward were sceptical, it is said, arguing that chemical synthesis was an art more than a science, calling for imagination and creativity well beyond the capacity of a computer. Corey saw how a computer might reason like a chemist and he proposed to set up a project to demonstrate the feasibility of his ideas. The story may be apocryphal but it does not matter if it is. The exciting thing is that Corey recognised a new challenge well beyond the everyday goals of most researchers and took it on. He was not alone in seeing and taking up the challenge – there were others who will feature in this chapter and the next – but his project proliferated like the mustard tree in the parable so that by now every chemist is familiar with at least one spin-off computer application that roosts in its branches.
Corey's project to develop a synthesis-planning program, OCSS ("Organic Chemical Simulation of Synthesis"), started in the 1960s and was described in a paper in Science in 1969. By 1971, when a paper was submitted to the Journal of the American Chemical Society, the program had been re-implemented as LHASA (Logic and Heuristics Applied to Synthetic Analysis) and the project was expanding.
Right from the start the plan was to develop a computer system that did not just think like a chemist, but communicated like one, too. Computer graphics was in its infancy. The computer mouse was yet to come to public notice – Douglas Engelbart filed his application for a patent in 1967 – but there were systems that linked a graphics tablet, or "bit pad", to a vector graphics screen (a line is displayed on a vector graphics screen by scanning the electron beam between the coordinates of the ends of the line, whereas in a television or a modern personal computer system the screen is scanned systematically from side to side and top to bottom and the beam is activated at the right moments to illuminate the pixels on the screen that lie on the line). Other researchers interested in using computers for chemistry were developing representations of chemical structures to suit computers, but in this project the computer would be expected to use the representations favoured by organic chemists – structural diagrams. In their paper in 1969, Corey and Wipke wrote, "The following general requirements for the computer system were envisaged at the outset: (i) that it be an 'interactive system' allowing facile graphical communication of both input and output in a form most convenient and natural for the chemist ...".
A structural diagram is full of implicit information for a chemist that would not be perceived by someone not trained in chemistry. It is not a picture of a molecule, in as much as there can be a picture of one; it tells you what is connected to what, and how, but it does not tell you the three dimensional locations of atoms: like the map of the London Underground it is a graph. To make useful inferences, the computer needs to be able to "see" the graph like a chemist sees it, and so a chemical perception module in LHASA fills checklists for the atoms and bonds in a molecule for use in subsequent processing. For example, if a carbon atom is found to be bonded through a double bond to one oxygen atom and through a single bond to another oxygen atom which itself bears a hydrogen atom, the carbon atom can be flagged as the centre of a carboxylic acid group; if an atom is at a fusion point between two rings (which would have implications for its reactivity) it can be flagged as a "fusion atom".
Computer perception of a molecule may put the computer in the position to think about it the way a chemist would, but how does a chemist think of ways to synthesise even a simple molecule? The question embodies a host of others each of which probably has more than one answer. Corey would have been well-placed to look for answers suited to computer-implementation, having formulated his ideas for the retrosynthetic approach to chemical synthesis design for which he was later to receive a Nobel Prize in Chemistry, and his thinking on the subject and his work on a computer system must surely have fed each other.
Excerpted from Knowledge-Based Expert Systems in Chemistry by Philip Judson. Copyright © 2009 Philip Judson. Excerpted by permission of The Royal Society of Chemistry.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.
"About this title" may belong to another edition of this title.
£ 8 shipping within United Kingdom
Destination, rates & speedsFREE shipping from U.S.A. to United Kingdom
Destination, rates & speedsSeller: Basi6 International, Irving, TX, U.S.A.
Condition: Brand New. New. US edition. Expediting shipping for all USA and Europe orders excluding PO Box. Excellent Customer Service. Seller Inventory # ABEJUNE24-137820
Quantity: 2 available
Seller: Basi6 International, Irving, TX, U.S.A.
Condition: Brand New. New. US edition. Expediting shipping for all USA and Europe orders excluding PO Box. Excellent Customer Service. Seller Inventory # ABEJUNE24-137819
Quantity: 7 available
Seller: Romtrade Corp., STERLING HEIGHTS, MI, U.S.A.
Condition: New. This is a Brand-new US Edition. This Item may be shipped from US or any other country as we have multiple locations worldwide. Seller Inventory # ABNR-204922
Quantity: 2 available
Seller: Basi6 International, Irving, TX, U.S.A.
Condition: Brand New. New. US edition. Expediting shipping for all USA and Europe orders excluding PO Box. Excellent Customer Service. Seller Inventory # ABEJUNE24-384543
Quantity: 8 available
Seller: Romtrade Corp., STERLING HEIGHTS, MI, U.S.A.
Condition: New. This is a Brand-new US Edition. This Item may be shipped from US or any other country as we have multiple locations worldwide. Seller Inventory # ABNR-34583
Quantity: 5 available
Seller: Romtrade Corp., STERLING HEIGHTS, MI, U.S.A.
Condition: New. This is a Brand-new US Edition. This Item may be shipped from US or any other country as we have multiple locations worldwide. Seller Inventory # ABNR-191853
Quantity: 1 available
Seller: Romtrade Corp., STERLING HEIGHTS, MI, U.S.A.
Condition: New. This is a Brand-new US Edition. This Item may be shipped from US or any other country as we have multiple locations worldwide. Seller Inventory # ABNR-99675
Quantity: 1 available
Seller: Basi6 International, Irving, TX, U.S.A.
Condition: Brand New. New. US edition. Expediting shipping for all USA and Europe orders excluding PO Box. Excellent Customer Service. Seller Inventory # ABEJUNE24-137821
Quantity: 1 available
Seller: Basi6 International, Irving, TX, U.S.A.
Condition: Brand New. New. US edition. Expediting shipping for all USA and Europe orders excluding PO Box. Excellent Customer Service. Seller Inventory # ABEJUNE24-137822
Quantity: 1 available
Seller: THE SAINT BOOKSTORE, Southport, United Kingdom
Hardback. Condition: New. New copy - Usually dispatched within 4 working days. 520. Seller Inventory # B9780854041602
Quantity: Over 20 available