Martin G. Reese and Frank H. Eeckman
David Kulp and David Haussler
Lawrence Berkeley Laboratory
Genome Informatics Group
1 Cyclotron Road
Berkeley, CA, 94720
{martinr,eeckman}@genome.lbl.gov
Baskin Center for Computer
Engineering and Information Sciences
University of California
Santa Cruz, CA 95064
{dkulp,haussler}@cse.ucsc.edu
One of the hardest problems in genefinding is to determine the complete gene structure correctly. The splice site sensors are the key signal sensors that address this problem. We replaced the existing splice site sensors in Genie with two novel neural networks based on dinucleotide frequencies. Using these novel sensors, Genie shows significant improvements in the sensitivity and specificity of gene structure identification. Experimental results in tests using a standard set of annotated genes showed that Genie identified 82% of coding nucleotides correctly with a specificity of 81%, versus 74% and 81% in the older system. In further splice site experiments, we also looked at correlations between splice site scores and intron and exon lengths, as well as at the effect of distance to the nearest splice site on false positive rates.