Application of structural biology in new drug discovery (1)


Structure Based Drug Design (SBDD) is a new approach to drug design. With the increasing understanding of the three-dimensional structure of drug target proteins and the interaction sites between target proteins and lead compounds or natural ligands, a molecular design strategy based on target-ligand interactions has emerged. Rapid advances in protein expression, purification and protein crystallography have provided detailed structural information on disease-relevant protein targets. Concurrently, developments in chemical synthesis are timely, and novel potent reagents, protecting groups, catalytic transfer, and multi-step synthesis strategies offer great potential for innovation in structure-based drug design. The tool of structure-based drug design has revolutionized the research strategy of medicinal chemistry, as well as the approach to new drug screening and optimization.


Figure (I): Structure-based drug design process

The SBDD approach was used by Roche to develop Saquinavir, a protease inhibitor that made possible the treatment of HIV-AIDS and demonstrated the enormous potential of this design approach. Subsequently, a large number of new drugs were developed for the treatment of hypertension, HIV/AIDS, various cancers, and other human diseases. Recently, SBDD has been applied again in the research of developing drugs for the treatment of neo-coronaviruses.

Structure-based design strategies require information on the shape and charge characteristics of the target protein binding site, as well as the determination and resolution of the crystal structure of the protein-ligand complex to obtain information on the interaction between the two. This structural resolution at the molecular level often provides the active conformation of the ligand for molecular design. This critical information allows structure-based design strategies to optimize the structure while maintaining the framework of the lead compound, thereby improving the activity and selectivity of the drug. In this process, a very important and fundamental discipline, macromolecular crystallography, will be applied.

In the following, we will describe how macromolecular crystallography is involved in each stage of SBDD and how it can guide the development of innovative drugs.

A. Target identification and selection
1. Protein function prediction

The first step in modern drug research and development is the search, identification and preparation of drug screening targets - molecular drug targets. Data analysis and statistics show that more than 50% of the currently available drugs on the market have protein receptors as their targets of action, reflecting the importance of proteins in drug targets of action.

The functions of proteins are mainly verified by experimental means, and if experiments do not provide the required information, they can also be inferred from the sequence similarity of proteins. However, there are still a large number of protein sequences that do not have any closely related similarity to proteins with known functions at the present time, and it is then necessary to predict and analyze the protein structure to obtain functionally relevant clues.

Predicting protein function from structure relies on the following pathways.

1) Folding match
The first step in structure-based function prediction is to find proteins with similar folding patterns by matching and matching analysis.

2)Surface clefts and binding pockets
Surface clefts (duct-like, groove-like, shallow depression-like regions) and pockets (pocket-like regions) on the surface of proteins are also important clues to infer their functions. For example, if the two largest clefts are found on the surface of some enzymes, usually one of these will be their active or catalytic site.

Figure (II) Schematic diagram of the structure of the drug binding pocket in K-Ras protein

3)Residue template methods

Proteins with a specific function often find evidence that they are influenced by a small number of residues in a small region of the three-dimensional structure. The catalytic function of an enzyme is performed by a small number of catalytic residues located at the active site; a few specific residues on the surface of a DNA-binding protein determine its ability to bind specifically to a particular DNA sequence or to bind DNA through a portion of its own structural motif (Motif), such as the most common HTH motif. The specific arrangement and conformation of the residues are essential for the protein to perform its function and are highly conserved and have changed very little during the evolutionary process.

4) Phylogenetic relationships
Since the function-determining regions of protein structure are most likely to be highly conserved during evolution, phylogenetic analysis or three-determinant residues can be used to infer and assess protein function.

5)Machine learning techniques
In addition to inferring protein functions based on existing structural similarities, we can also try to predict catalytic residues, protein-protein binding sites and enzyme family classification by computer methods such as statistical methods, data mining methods and machine learning techniques (support vector machines and neural networks).

2. Evaluation of druggenerating properties
To obtain an effective and selective drug molecule with oral activity, a suitable binding pocket must be present on its protein target. There are two types of binding pockets, one that allows the drug to reach the site of action and the other that mediates the interaction between the drug and the molecular target after it has reached the site of action.

Figure (III) Determination of molecular drug-forming properties based on protein structure

Generic criteria for druggable targets are: having a suitably sized binding site (generally accommodating compounds with molecular weights up to 500 Da), appropriate lipophilicity and sufficient hydrogen bonding sites. All these properties can be judged from the protein crystal structure. In addition, the presence of variable binding sites and different structural conformations in the protein can also greatly increase the probability of drug binding.

Structure-based assessment of drug targeting focuses on predicting the presence of ligand binding sites on the protein that are complementary to the drug properties. Two main approaches are used to automatically identify ligand-binding sites.

The first is the use of geometric factors alone. The presence of a suitable binding pocket on a protein is identified by observing its conformation in 2D or 3D. Proteins can be divided into three regions in terms of spatial structure: first, the core body region, an internal dense region composed of the protein's own atoms; second, the solvent contact interface, the outer surface region where the protein is in contact with the solvent that carries it; and third, the lumen region, which is generally part of the solvent contact interface, but which is deep inside the protein body region to form a depressed space where the drug binding pocket generally lies within the cavity region of the protein.

In addition, it can also be judged by the specific physicochemical properties of the protein cavity surface. A drug-ready protein target should have physicochemical properties that mirror those of the drug-like molecule itself. Therefore, the size and shape characteristics of the protein pocket, the measurement of hydrophobicity (i.e., the relative number of hydrogen bond donors to acceptors, and the relative ratio of polar to hydrophobic atoms within the protein cavity) are important evaluation indicators when identifying drug binding pockets .

Three approaches are commonly used to assess target druggenerating properties based on structure: first, predicting and evaluating potential drug binding sites by structural algorithms; second, using specific discriminant functions based on the physicochemical properties of the binding pockets of proteins; and third, comparing and screening a large library of compounds based on a reference set of targets labeled with known results (druggenerating difficulty).