BDGP Logo . Berkeley Drosophila Genome Project
Searches
Home
About BDGP
Contact Information, News, Citing BDGP
Projects
Genomic Sequencing

Expression Patterns

cDNAs & ESTs

Natural Transposable Elements

Gene Disruption

Comparative Genomics

SNPs

BDGP Resources
Download
Sequence Data Sets

Materials
Clones, Stocks, Libraries

Publications

Methods

Searches
FlyBase All Searches
FlyBase, BDGP

Analysis Tools

Search BDGP Site
Go To FlyBase
.
Genie: Help

Read Abstract Do Search

Genie: A Gene Finder Based on Generalized Hidden Markov Models

In a collaboration between the Computational Biology Group at the University of California, Santa Cruz, headed by David Haussler, and the Genome Informatics Group at Lawrence Berkeley National Laboratory (LBNL) headed by Frank Eeckman, we have developed a new gene finding program called Genie. The program and data sets were created by David Kulp and Martin Reese.

Genie uses a statistical model of genes in DNA. A Generalized Hidden Markov Model (GHMM) provides the framework for describing the grammar of a legal parse of a DNA sequence. Probabilities are assigned to transitions between states in the GHMM and to the generation of each nucleotide base given a particular state. Machine learning techniques are applied to optimize these probabilities using a standardized gene data set ,which we provide for the community to test gene finding tools.

Performance of Genie

Genie's performance is tested on a second dataset provided by Burset and Guigo (1996). This dataset of 570 genes from different organisms was used in Burset and Guigo (1996) to compare different gene-finding methods. In the following table Genie's performance is added to the table copied from the Burset and Guigo paper. The first table shows the results for gene finders that do not use any database information to existing protein homologs in the databases. The second table shows Genie performance when information about existing protein homologs is used for prediction.

Genie without homology information from the protein database:

Base-level Exon-level
MethodSn Sp AC Sn Sp (Sn+Sp)/2 ME WE
Genie 0.78 0.84 0.77 0.61 0.64 0.62 0.15 0.16
FGENEH 0.77 0.85 0.78 0.61 0.61 0.61 0.15 0.11
GeneID 0.63 0.81 0.67 0.44 0.45 0.45 0.28 0.24
GeneParser2 0.66 0.79 0.66 0.35 0.39 0.37 0.29 0.17
GenLang 0.72 0.75 0.69 0.50 0.49 0.50 0.21 0.21
GRAILII 0.72 0.84 0.75 0.36 0.41 0.38 0.25 0.10
SORFIND 0.71 0.85 0.73 0.42 0.47 0.45 0.24 0.14
Xpound 0.61 0.82 0.68 0.15 0.17 0.16 0.32 0.13

Table Captions:
Base-level: Prediction accuracy per base coding/non-coding
Exon-level: Prediction accuracy with respect to exact prediction of exon start and end points
Sn: Sensitivity
Sp: Specificity
AC: Approximate Coefficient
ME: Missing Exons: fraction of true exons that are not identified at all
WE: Wrong Exons: fraction of predicted exons that do not overlap any true exon

AC is defined as:

AC=0.5x((TP/(TP+FN)) + (TP/(TP+FP)) + (TN/(TN+FP)) + (TN/(TN+FN))) - 1

TP:true positives
TN:true negatives
FP:false positives
FN:false negatives

In the first 6 columns, higher values indicate better performance. In the last 2 columns lower values indicate better performance.

Genie with homology information from the protein database:

Base-level Exon-level
MethodSn Sp AC Sn Sp (Sn+Sp)/2 ME WE
Genie 0.95 0.91 0.91 0.77 0.74 0.76 0.04 0.13
GeneID+ 0.91 0.90 0.88 0.73 0.70 0.71 0.07 0.13
GeneParser3 0.86 0.91 0.86 0.56 0.58 0.57 0.14 0.09

Genie was presented at the 4th Conference on Intelligent Systems in Molecular Biology in St. Louis, June 1996.

Genie uses a neural network recognizer for splice sites; this splice site predictor program can independently be accessed via our splice site web server. Genie uses a neural network recognizer for splice sites; this splice site predictor program can independently be accessed via our splice site web server.

Read abstract.

Download paper (56 Kb compressed postscript).

Try our Genie Web server.

Genie uses a neural network recognizer for splice sites; this splice site predictor program can independently be accessed via our NEW Splice Site Web Server.

Please send comments or questions about the web site to bdgp@fruitfly.org