This handbook provides the first-ever inside view of today's integrated approach to rational drug design. Chemoinformatics experts from large pharmaceutical companies, as well as from chemoinformatics service providers and from academia demonstrate what can be achieved today by harnessing the power of computational methods for the drug discovery process.
With the user rather than the developer of chemoinformatics software in mind, this book describes the successful application of computational tools to real-life problems and presents solution strategies to commonly encountered problems. It shows how almost every step of the drug discovery pipeline can be optimized and accelerated by using chemoinformatics tools -- from the management of compound databases to targeted combinatorial synthesis, virtual screening and efficient hit-to-lead transition.
An invaluable resource for drug developers and medicinal chemists in academia and industry.
"synopsis" may belong to another edition of this title.
Tudor I. Oprea is Professor of Biochemistry and Molecular Biology and Chief, Division of Biocomputing at the University of New Mexico School of Medicine, Albuquerque (USA). He was born in Timisoara (Romania) where he did all his studies including his Ph.D. thesis under the supervision of Francisc Schneider. He was a post-doctoral fellow at Washington University with Garland Marshall, and Los Alamos National Laboratory with Angel Garcia. He worked six years at AstraZeneca in Sweden, before moving to New Mexico as full Professor in 2002. He received the Hansch Award from the QSAR and Modeling Society in 2002. He is interested in chemoinformatics, virtual screening, QSAR, and lead and drug discovery.
Chemoinformatics experts from large pharmaceutical companies, as well as from chemoinformatics service providers and from academia demonstrate what can be achieved today by harnessing the power of computational methods for the drug discovery process.
From the contents:
* Chemoinformatics in Lead Discovery
* Molecular Complexity and Screening Set Design
* Algorithmic Engines in Virtual Screening
* Pharmacophore-Based Virtual Screening
* Enhancing Hit Quality and Diversity
* Molecular Diversity in Lead Discovery
* In Silico Lead Optimization
* Using Databases and Libraries
* Combinational Libraries Based on Privileged Substructures
* Strategies for Directed Compound Acquisition
* Predictive QSAR Models in Database Mining
* Drug Discovery in Academia - a Case Study
With the user rather than the developer of chemoinformatics software in mind, the successful application of computational tools for commonly encountered tasks is described in detail, and numerous real life examples are given. An invaluable resource for drug developers and medicinal chemists in academia and industry.
Chemoinformatics experts from large pharmaceutical companies, as well as from chemoinformatics service providers and from academia demonstrate what can be achieved today by harnessing the power of computational methods for the drug discovery process.
From the contents:
* Chemoinformatics in Lead Discovery
* Molecular Complexity and Screening Set Design
* Algorithmic Engines in Virtual Screening
* Pharmacophore-Based Virtual Screening
* Enhancing Hit Quality and Diversity
* Molecular Diversity in Lead Discovery
* In Silico Lead Optimization
* Using Databases and Libraries
* Combinational Libraries Based on Privileged Substructures
* Strategies for Directed Compound Acquisition
* Predictive QSAR Models in Database Mining
* Drug Discovery in Academia - a Case Study
With the user rather than the developer of chemoinformatics software in mind, the successful application of computational tools for commonly encountered tasks is described in detail, and numerous real life examples are given. An invaluable resource for drug developers and medicinal chemists in academia and industry.
Garland R. Marshall
1.1 Introduction
The first issue to be discussed is the definition of the topic. What is chemoinformatics and why should you care? There is no clear definition, although a consensus view appears to be emerging. "Chemoinformatics is the mixing of those information resources to transform data into information and information into knowledge for the intended purpose of making better decisions faster in the area of drug lead identification and organization" according to one view. Hann and Green suggest that chemoinformatics is simply a new name for an old problem, a viewpoint I share. There are sufficient reviews and even a book by Leach and Gillet with the topic as their focus that there is little doubt what is meant, despite the absence of a precise definition that is generally accepted.
One aspect of a new emphasis is the sheer magnitude of chemical information that must be processed. For example, Chemical Abstracts Service adds over three-quarters of a million new compounds to its database annually, for which large amounts of physical and chemical property data are available. Some groups generate hundreds of thousands to millions of compounds on a regular basis through combinatorial chemistry that are screened for biological activity. Even more compounds are generated and screened in silico in the search for a magic bullet for a given disease. Either one of the two processes for generating information about chemistry has its own limitations. Experimental approaches have practical limitations despite automation; each in vitro bioassay utilizes a finite amount of reagents including valuable cloned and expressed receptors. Computational chemistry has to establish relevant criteria by which to select compounds of interest for synthesis and testing. The accuracy of prediction of affinities with current methodology is just now approaching sufficient accuracy to be of utility.
Let me emphasize the magnitude of the problem with a simple example. I was once asked to estimate the number of compounds covered by a typical issued patent for a drug of commercial interest. The patent that I selected to analyze was for enalapril, a prominent prodrug ACE inhibitor with a well-established commercial market. Given the parameters as outlined in the patent covering enalapril, an estimation of the total number of compounds included in the generic claim for enalaprilat, the active ingredient, was made. The following is the reference formula as described by the patent and simplified with [R.sub.6] = OH, and [R.sub.2] and [R.sub.7] = H:
Thus, one can simply enumerate the members of each class of substituent and combine them combinatorially. The following details the manner in which the number of each substituent was determined with the help of Chris Ho (Marshall and Ho, unpublished).
Substituent R: R is described as a lower alkoxy. The patent states that substituents are "otherwise represented by any of the variables including straight and branched chain hydrocarbon radicals from one to six carbon atoms, for example, methyl, ethyl, isopentyl, hexyl or vinyl, allyl, butenyl and the like." DBMAKER was used to generate a database of compounds containing any combination of one to six carbon atoms, interspersed with occasional double and triple bonds, as well as all possible branching patterns. Constraints were employed to forbid the generation of chemically impossible constructs. Concord 3.01 was used to generate and validate the chemical integrity of all compounds. 290 unique substituents were generated as a minimal estimate. Substituent R3: This substituent is identical to substituent R, only that it is an alkyl instead of an alkoxy. Again, 290 unique substituents of six or fewer carbon atoms were generated.
Substituent R1: R1 is described as a substituted lower alkyl wherein the substituent is a phenyl group. The patent is vague with regard to where this phenyl group should reside. If the phenyl group always resides at the carbon farthest away from the main chain, then again, 290 different substituents will result. However, if the phenyl group can reside anywhere along the 1- to 6-member chain, then approximately 1000 substituents are chemically and sterically possible.
Substituents R4 & R5: These two substituents are described by the patent as being lower alkyl groups, which may be linked to form a cyclic 4- to 6-membered ring in this position. This produces two scenarios: if these groups remain unlinked, then, as before, 290 substituents are found at each position.
To determine the number of possible compounds when R4 and R5 are cyclized, a different approach was used. The patent states, "R4 and R5 when joined through the carbon and nitrogen atoms to which they are attached form a 4- to 6-membered ring". Preferred ring has the formula:
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
The patent is again vague in describing the generation of these cyclic systems. However, given that R4 and R5 are each 1-6 carbon alkyl groups with various branching patterns that are linked together, what results is a 4- to 6-membered ring system that may contain none, one or two side chains depending upon how R4 and R5 are connected. The overall requirement is that the total number of atoms comprising this ring system be less than or equal to 12.
To construct these ring systems, two databases were generated. The first database ("ring database") contained three compounds - a 4-, 5- and 6-membered ring as specified by the patent. The second database ("side-chain database") was constructed by cleaving each of the 290 alkyl compounds in half. One would assume that the first half of the alkyl chain would generate the ring, leaving the second half to dangle and form a side chain. A program DBCROSS (Ho, unpublished) was then used to join one compound from the ring database with up to two structures from the side-chain database at chemically appropriate substitution sites. Again, the overall requirement was that the number of atoms be less than or equal to 12. Approximately 4100 different cyclic systems were generated in this manner.
[MATHEMATICAL EXPRESSION NOT REPRODUCIBLE IN ASCII]
Summation (290)(1000)(290)(290)(290) = 7.07 x [10.sup.12] R4/R5 noncyclic (290)(1000)(290)(4100) = 3.44 x [10.sup.11] R4/R5 cyclized
Sum = 7.41 x ]10.sup.12] [right arrow] 3 chiral centers (carbons where [R.sub.1], [R.sub.3] and [R.sub.5] are attached to the backbone) in this molecule: X 8 = 5.93 [10.sup.13] or more than 59 trillion compounds included in the patent.
Note: If the phenyl group of substituent R1 is limited to the position farthest from the parent chain, then the number of compounds drops to 1.72 [10.sup.13] or more than 17 trillion compounds included in the patent.
Actually, the number of compounds included in the patent is severalfold larger as esters of enalaprilat such as enalapril were also included. Of the 100 trillion or so compounds included in the patent, how many could be predicted to lack druglike properties (molecular weight too large? logP too high?)? How many would be predicted to be inactive on the basis of the known structure-activity data available on angiotensin-converting enzyme (ACE) inhibitors such as captopril? How many would be predicted to be inactive now that a crystal structure of a complex of ACE with an inhibitor has been published? Given the structure-activity relationships (SAR) available on the inhibitors, what could one determine regarding the active site of ACE? What novel classes of compound could be suggested on the basis of the SAR of inhibitors? On the basis of the new crystal structure of the complex? Do the most potent compounds share a set of properties that can be identified and used to optimize a novel lead structure? Can a predictive equation relating properties and affinity for the isolated enzyme be established? Can a similar equation relating properties and in vitro bioassay effectiveness be established? These are representative questions facing the current drug design community and one focus of chemoinformatics.
One significant tool that is employed is molecular modeling. Because I have been involved more directly with computational chemistry and molecular modeling, there is a certain bias in my perspective. This is the reason I have used "A Personal View" as part of the title. I have also chosen a historical presentation and focused largely on those contributions that significantly impacted my thinking. This approach, of course, has its own limitation, and I apologize to my colleagues for any distortions or omissions.
1.2 Historical Evolution
With the advent of computers and the ability to store and retrieve chemical information, serious efforts to compile relevant databases and construct information retrieval systems began. One of the first efforts to have a substantial long-term impact was to collect the crystal structure information for small molecules by Olga Kennard. The Cambridge Structural Database (CSD) stores crystal structures of small molecules and provides a fertile resource for geometrical data on molecular fragments for calibration of force fields and validation of results from computational chemistry. As protein crystallography gained momentum, the need for a common repository of macromolecular structural data led to the Protein Data Base (PDB) originally located at Brookhaven National Laboratories. These efforts focused on the accumulation and organization of experimental results on the three-dimensional structure of molecules, both large and small. Todd Wipke recognized the need for a chemical information system to handle the increasing numbers of smallmolecules generated in industry, and thus MDL and MACCS were born.
With the advent of computers and the availability of oscilloscopes, the idea of displaying a three-dimensional structure of the screen was obvious with rotation providing depth cueing. Cyrus Levinthal and colleagues utilized the primitive computer graphics facilities at MIT to generate rotating images of proteins and nucleic acids to provide insight into the three-dimensional aspects of these structures without having to build physical models. His paper in Scientific American in 1965 was sensational and inspired others (including myself) to explore computer graphics (1966/1967) as a means of coping with the 3D nature of chemistry. Physical models (Dreiding stick figures, CPK models, etc.) were useful accepted tools for medicinal chemists, but physical overlap of two or more compounds was difficult and exploration of the potential energy surface hard to correlate with a given conformation of a physical model.
As more and more chemical data accumulated with its implicit information content, a multitude of approaches began to extract useful information. Certainly, the shape and variability in geometry of molecular fragments from CSD was mined to provide fragments of functional groups for a variety of purposes. As series of compounds were tested for biological activity in a given assay, the desire to distill the essence of the chemical requirements for such activity to guide optimization was generated. Initially, the efforts focused on congeneric series as the common scaffold presumably eliminated the molecular alignment problem with the assumption that all molecules bound with a common orientation of the scaffold. This was the intellectual basis of the Hansch approach (quantitative structure-activity relationships, QSAR), in which substituent parameters from physical chemistry were used to correlate chemical properties with biological activity for a series of compounds with the same substitution pattern on the congeneric scaffold.
1.3 Known versus Unknown Targets
Intellectually, the application of molecular modeling has dichotomized into those methods dealing with biological systems where no structural information at the atomic level is known, the unknown receptor, and those systems that have become relatively common, where a three-dimensional structure is know from crystallography or NMR spectroscopy. The Washington University group has spent most of its efforts over the last three decades focused on the common problem encountered where one has little structural information. Others, such as Peter Goodford and Tak Kuntz, have taken the lead in developing approaches to therapeutic targets where the structure of the target was available at atomic resolution. The seminal work of Goodford and colleagues on designing inhibitors of the 2,3-diphosphorylglycerate (DPG) binding site on hemoglobin for the treatment of sickle-cell disease certainly stimulated many others to obtain crystal structures of their therapeutic target. The most dramatic example of computer-aided drug design of which I am aware is the development of superoxide dismutase mimetics of below 500 molecular weight by Dennis Riley of Metaphore Pharmaceuticals. By understanding the redox chemistry of manganese superoxide reductase, Riley was able to design a totally novel pentaazacrown scaffold complexed with manganese (Figure 1.1) that catalyzes the conversion of superoxide to hydrogen peroxide at diffusion-controlled rates. This is the first example of a synthetic enzyme with a catalytic rate equal to or better than nature's best. The advances in molecular biology provided the means of cloning and expressing proteins in sufficient quantities to screen a variety of conditions for crystallization. Thus, it is almost expected that a crystal structure is available for any therapeutic target of interest. Unfortunately, many therapeutic targets such as G-protein-coupled receptors are still significant challenges to structural biology.
1.4 Graph Theory and Molecular Numerology
Considerable literature developed around the ability of numerical indices derived from graph theoretical considerations to correlate with SAR data. This was a source of mystery to me for some time. A colleague, Ioan Motoc, from Romania, with experience in this arena and a very strong intellect, helped me understand the ability of various indices to be useful parameters in QSAR equations. Ioan correlated various indices with more physically relevant (at least to me) variables such as surface area and molecular volume. Since computational time was at a premium during the early days of QSAR and such indices could be calculated withminimal computations, they played a useful role and continue to be used. As a chemist, however, I am much more comfortable with parameters such as surface area or volume.
(Continues...)
Excerpted from Chemoinformatics in Drug Discovery Copyright © 2005 by WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim. Excerpted by permission.
All rights reserved. No part of this excerpt may be reproduced or reprinted without permission in writing from the publisher.
Excerpts are provided by Dial-A-Book Inc. solely for the personal use of visitors to this web site.
"About this title" may belong to another edition of this title.
Seller: Anybook.com, Lincoln, United Kingdom
Condition: Good. Volume 23. This is an ex-library book and may have the usual library/used-book markings inside.This book has hardback covers. Clean from markings. In good all round condition. No dust jacket. Please note the Image in this listing is a stock photo and may not match the covers of the actual item,1200grams, ISBN:9783527307531. Seller Inventory # 9199075
Quantity: 1 available
Seller: Charlie Byrne's Bookshop, Galway, GALWA, Ireland
Hardcover. Condition: As New. In mint Condition. Volume 23. Seller Inventory # AB01150
Seller: Charlie Byrne's Bookshop, Galway, GALWA, Ireland
Hardcover. Condition: As New. In mint Condition. Volume 23. Seller Inventory # AB01151