E. Frise, M.G. Reese and G.M. Rubin.
Department of Molecular and Cell Biology,
Life Science Addition #3200,
University of California,
Berkeley, CA 94720-3200.
firstname.lastname@example.org, email@example.com, firstname.lastname@example.org
ORFFinder uses a 5th order Markov model to predict the most likely open reading frame. We trained the Markov model using a collection of 809 Drosophila melanogaster cDNA sequences derived from Genbank. The program finds positions of putative frameshifts by comparing in-frame and out-of- frame Markov model scores before and after every sequence position. Any potential frameshifts are corrected and the resulting reading frame is evaluated using a knowledge base. If frameshifts are detected, their positions are reported for further detailed analysis. Those frameshifts are corrected automatically, the ORF translated and the amino acid sequence is shown in the program output.
To recognize the correct startcodon a combination of a neural network and the in-frame coding probability is used. A feed-forward neural network is trained for the nucleotide sequence around the annotated startcodons from the collected cDNA dataset described above (similar to Brunak et.al. (1991), JMB 220: 49-65). The program combines the predictions from the neural net with the differences of the 5th order Markov model coding scores before and after the startcodon position to derive the most likely translation initiation.
We will present data about the performance of ORFFinder with known and newly sequenced cDNA's.