Computer Aided Drug Design Tools-Open sources/Open Acces for Non-commercial Use-part-1

by April 22, 2013 0 comments

--> Computer Aided Drug Design Tools (CADD) is an emerging area of Research of present Drug Discovery process which is an improved and broad area for current drug development and discovery. This area is basically to reduce the time taken for traditional drug development procedures and make them fast using various programmed and designed software.
These tools are majorly for various types of applications such as to Over view the molecules, for molecular modelling, molecular homology, molecular dynamics, QSAR studies, docking, activity testing and predictors, structure related as well as activity related and binding relations between ligands and proteins, between two proteins etc....
Now a days, many of these tools are available for free for non-commercial use as open source developed and programmed by various organizations, people to make the technology to be understandable, applicable and reusable to develop new molecular entities to target various diseases.
Now, I would like to lookout what are those tools available as or under Open Source.
For this, first we have to know about CRDD, which is abbreviated as Computational Resources for Drug Discovery. CRDD is a forum to initiate and develop a vision to provide affordable healthcare to the developing World. The OSDD concept aims to synergize the power of genomics, computational technologies and facilitate the participation of young and brilliant talent from Universities and industry. It seeks to provide a global platform where the best brains can collaborate and collectively endeavor to solve the complex problems associated with discovering novel therapies for neglected diseases like Tuberculosis.
CRDD is one of the module of OSDD (Open Source Drug discovery), which created a platform to collaborate young talented minds across the world in the area of Drug Design and discovery and make them to available on one platform.
So, in this post currently I am trying to post majorly concentrating on 3 topics, i.e., 1) Target Identification and validation, 2) Virtual Screening and 3) Drug Design which are the important aspects of Drug design and discovery.
Drugs fail in the clinic for two main reasons; the first is that they do not work and the second is that they are not safe. As such, one of the most important steps in developing a new drug is target identification and validation. A target is a broad term which can be applied to a range of biological entities which may include for example proteins, genes and RNA. A good target needs to be efficacious, safe, meet clinical and commercial needs and, above all, be ‘druggable’. A ‘druggable’ target is accessible to the putative drug molecule, be that a small molecule or larger biologicals and upon binding, elicit a biological response which may be measured both in vitro and in vivo.
Target based Drug Discovery starts with a thorough understanding of the disease mechanisms and the role of enzymes, receptors and proteins in the disease pathology. Target Identification majorly starts with genome annotations, proteome annotations, potential targets, protein structure and Si/mi RNA.
Gene Annotations:

Genome sequencing techniques now a days becoming more advanced and hence the number of sequencing genomes are increasing exponentially. One of the major challenge in contemporary science is annotate the available sequence data.Annotation defines the coding regions in the genome as well as their physical location. It also provides the number and spatial distribution of repeat regions and the evolutionary information about the whole genomes.

Several computational tools have been developed to cut down time and expense involved in the experimental procedure of annotation.
Servers integrated for CADD.
A web server for locating probable protein coding region in nucleotide sequence using fourier tranform approach (Issac, B., Singh, H., Kaur, H. and Raghava, G.P.S. (2002) Bioinformatics 18:196).
This server allows to predict gene (protein coding regions) in eukaryote genomes that includes introns and exons, using similarity aided (double) and consensus Ab Intion methods. (Issac B, Raghava GP. (2004) Genome Res. 14(9):1756-66)
A web server for predicting genes in a DNAsequence.
A genome wide blast server. It allow user to search ther sequence against sequenced genomes and annonated proteomes. This integrate various tools which allows analysys of BLAST SEARCH.
It is a support vector based approach to identify the protein coding regions in human genomic DNA.
Spectral Repeat Finder (SRF) is a program to find repeats through an analysis of the power spectrum of a given DNA sequence. By repeat we mean the repeated occurrence of a segment of N nucleotides within a DNA sequence. SRF is an ab initio technique as no prior assumptions need to be made regarding either the repeat length, its fidelity, or whether the repeats are in tandem or not (Sharma D, Issac B, Raghava GP, Ramaswamy R. (2004) Bioinformatics. 20(9):1405-12)
Genome Wise Sequence Similarity Search using FASTA. It allow user to search their sequence against sequenced genomes and their product proteome. This integrate various tools which allows analysys of FASTA search (Issac, B. and Raghava, G.P.S. (2002) Biotechniques 33:548-56).
A suite of datasets and tools for evaluating gene prediction methods.
MyPattern Finder is a program for detection of a 'motif' in DNA sequence by using an exact search method (Option A (1.0)) or an alignment technique (Option B (1.0)).
Can be used for
Archaea,Metagenomes,Eukaryotes,Viruses, Phages, Plasmids, EST and cDNA
hidden Markov model
Microbial genomes
Markov model
Hidden Markov Model
vertebrate and C.elegans
Hidden Markov Model
Ab-inito METHOD
Bacteria ,Viruses and eukaryotes
HMM and similarity based searches
Animal, Human, Plants fungus,Protists
Neural Network
Vertebrates, Arabidopsis, Maize
Ab-inito Method
Web Interface on Libraries
 Standalone Software

Can be used for


Similarity-based gene prediction program where additional cDNA/EST and/or protein sequences are used to predict gene structures via spliced alignments

JIGSAW(formerly "Combiner")

multiple sources of evidence (output from gene finders, splice site prediction programs and sequence alignments to predict gene models)
GlimmerHMM is based on a Generalized Hidden Markov Model (GHMM). Although the gene finder conforms to the overall mathematical framework of a GHMM, additionally it incorporates splice site models adapted from the GeneSplicerprogram and a decision tree adapted from GlimmerM. It also utilizes Interpolated Markov Models for the coding and noncodingmodels . Currently,GlimmerHMM's GHMM structure includes introns of each phase, intergenicregions, and four types ofexons (initial, internal, final, and single).

GeneZilla is based on the Generalized Hidden Markov Model (GHMM). It evolved out of the ab initioeukaryotic gene finderTIGRscan, which was developed at The Institute for Genomic Research.
Twinscan: Mammals,Caenorhabditis(worm), Dicotplants, andCryptococci. N-SCAN: human and Drosophila
TWINSCAN extends the probability model of GENSCAN, allowing it to exploit homology between two related genomes. Separate probability models are used for conservation inexons, introns, splice sites, and UTRs, reflecting the differences among their patterns of evolutionary conservation.

N-SCAN (a.k.a. TWINSCAN 3.0) model the phylogenetic relationships between the aligned genome sequences, context dependent substitution rates, and insertions and deletions. N-SCAN Is created and used to generate predictions for the entire human genome and the genome of the fruit fly Drosophila melanogaster.

prokaryotic and eukaryotic genomes
Manatee is a web-based gene evaluation and genome annotation tool that can view, modify, and store annotation for prokaryotic and eukaryotic genomes. The Manatee interface allows biologists to quickly identify genes and make high quality functional assignments using a multitude of genome analyses tools. These tools consist of, but are not limited to GO classifications, BER and blast search data,paralogous families, and annotation suggestions generated from automated analysis.
alignment of multiple genomic sequences
(Coding Region Identification Tool Invoking Comparative Analysis)

CRITICA combines traditional approaches to the problem with a novel comparative analysis. If, in a nucleotide alignment, a pair of ORFs can be found in which the conceptual translated products are more conserved than would be expected from the amount of conservation at the nucleotide level, this is evolutionary evidence that the DNA sequences are protein coding. Regions found by this method are used to generate traditionaldicodon frequencies for further analysis and give the prediction about a probable protein coding region.

Sgp2 predict genes by comparing anonymous genomic sequences from two different species. Further it combines tblastx, a sequence similarity search program, with geneid, an "abinitio" gene prediction program.

Eukaryotes (Homo sapiens, Plasmodiumfalciparum, Plasmodiumvivax)
Phat is a HMM-basedgenefinder, originally developed for genefinding in
Plasmodium falciparum.

EuGène exploit probabilistic models like Markov models for discriminating coding from non coding sequences or to discriminate effective splice sites from false splice sites (using various mathematical models).

Eukaryotic genomic sequences
It allows to use protein homology information and travel in the prediction.

A database of human genes, their products and their involvement in diseases. It offers concise information about the functions of all human genes that have an approved symbol as well as selected others. It is especially useful for those who are searching for information working in functional genomics and proteomics. The data is collected with Knowledge Discovery and Data Mining's techniques and accessed by means of proprietary Guidance System that makes more or less intelligent suggestions to the user of where and how the information may be retrieved.
TRANSFAC is a transcription factor database. It compiles data about gene regulatory DNA sequences and protein factors binding to them. On this basis, programs are developed that help to identify putative promoter or enhancer structures and to suggest their features.
A database of genes that relate to vertebrate red blood cells. A detailed description of EpoDB can be found on Chapter 5. The database includes DNA sequence, structural features and potential transcription factor binding sites.
A Database of plant promoter
A Database of plant promoter
RegulonDB provides curated information on gene organization and regulation in E. coli. Current information is provided on the gene, operonand regulon level. Future expansion will include information on regulation beyond transcription initiation.

Ashok Kumar

Chemistry Blogger

Hi, I am Dr. T. Ashok Kumar, A Chemistry Research Scholar and blogger. Here you can find some interesting articles related to Chemistry with added spices and ingredients.


Post a Comment