Table of Contents
            The challenge of annotating a complete eukaryotic genome:A case study in Drosophila melanogaster 
        Abstract 
        Tutorial goals 
        Tutorial organization 
        What is a gene? 
        What are annotations? 
        How does an annotation differ from a gene? 
        Transcription and translation 
        Schematic gene structure 
        Sequence feature types 
        DNA transcription unit features 
        mRNA features 
        PPT Slide 
        Definitions for data modeling 
        Annotation 
        Annotation process overview 
        Types of sequence data 
        Auxiliary data 
        Computational annotation tools 
        Database resources 
        Biological issues in annotation 
        Engineering issues in annotation 
        Engineering issues in annotation 
        Engineering issues in annotation 
        Engineering issues in annotation 
        Engineering issues in annotation 
        Drosophila melanogaster 
        Drosophila Genome Project 
        Goals of the Drosophila Genome Project 
        Sequencing at the BDGP 
        The BDGP sequence annotation process 
        What sequence to start with? 
        Which analyses need to be run? 
        Which analyses need to be run and how? 
        What public sequence data sets are needed? 
        Which analyses need to be run and how? 
        How do you achieve computational throughput? 
        What do you do with the results? 
        Is human curation needed? 
        Gene Skimmer 
        Gene Skimmer 
        CloneCurator 
        PPT Slide 
        How do we annotate gene/protein function? 
        Ontology browser 
        PPT Slide 
        Ontology browser:  searching for terms 
        How do you distribute the data? 
        Ribbon 
        Ribbon 
        How do you manage the data? 
        How do you maintain annotations? 
        Integrated annotation systems	 
        Integrated annotation systems: ACeDB 
        ACeDB 
        Genotator 
        Magpie	 
        GAIA 
        TIGR Human Gene Index	 
        Computational analysis tools 
        Gene finding: 	Prokaryotes vs. Eukaryotes 
        Gene finding: 	Prokaryotes vs. Eukaryotes 
        Integrated gene finding 
        Integrated gene finding:  	Dynamic programming 
        Integrated gene finding:  	Dynamic programming 
        Integrated gene finding: Linear and Quadratic Discriminant Analysis (LDA/QDA) 
        Integrated gene finding:   Feed-forward neural networks 
        Approaches to gene finding: Hidden Markov models 
        Approaches to gene finding: Generalized hidden Markov models 
        Gene finding software 
        Promoter recognition 
        Promoter recognition (cont.) 
        Promoter recognition (cont.) 
        Promoter recognition (cont.) 
        Example: NNPP 
        Promoter recognition (cont.) 
        Splice site prediction 
        Splice site prediction (cont.) 
        Splice site prediction (cont.) 
        Start codon prediction 
        Poly-adenylation signal prediction 
        Prediction of coding potential 
        Prediction of coding potential (cont.) 
        Prediction of coding potential (cont.) 
        Prediction of coding potential (cont.) 
        Prediction of coding potential (cont.) 
        Prediction of coding exons 
        �Integrated� gene models: LDA/QDA 
        �Integrated� gene models: NN 
        �Integrated� gene models: Artificial intelligence approaches 
        �Integrated� gene models: Artificial intelligence approaches 
        �Integrated� gene models: HMMs 
        �Integrated� gene models: GHMMs 
        Example: Genie 
        �Integrated� gene models: GHMMs 
        EST/cDNA alignment for gene finding: Spliced alignments 
        EST/cDNA alignment  
        EST/cDNA alignment (cont.) 
        Repeat finders 
        Repeat finders (cont.) 
        Homology searching 
        Gene family searching 
        The genome annotation experiment (GASP1) 
        PPT Slide 
        Goals of the experiment 
        Adh contig 
        Adh paper (to appear in Genetics) 
        Raw sequence: Adh.fa 
        Drosophila data sets provided to participants 
        Timetable 
        Resources for assessing predictions 
        Curated data sets for assessing predictions 
        Curated data sets for assessing predictions 
        Curated data sets for assessment 
        Submission format 
        Sample submission 
        Submissions 
        Submissions (cont.) 
        Submissions (cont.) 
        Submissions (cont.) 
        Submissions (cont.) 
        Submissions (cont.) 
        Submissions (cont.) 
        Submissions (cont.) 
        Submissions (cont.) 
        Submissions (cont.) 
        Submissions (cont.) 
        Submissions (cont.) 
        Submissions (cont.) 
        Submissions (cont.) 
        Submissions (cont.) 
        Submissions (cont.) 
        Submissions (cont.) 
        Submissions (cont.) 
        Submission classes 
        Submission classes (cont.) 
        Gene finding techniques 
        Measuring success 
        Definitions and formulae 
        Genes: True positives (TP) 
        Genes: False positives (FP) 
        Genes: False Negatives (FN) 
        Toy example 1 (1) 
        Genes: Missing Genes (MG) 
        Genes: Wrong Genes (WG) 
        Toy example 1 (2) 
        Genes: Std 1 versus Std 3 
        Toy example 1 (3) 
        Genes: Std1 and Std3 versus �real� gene structure 
        Toy example 1 (4) 
        Toy example 1 (5): Exon level 
        Genes: Joined genes (JG) 
        Genes: Split genes (SG) 
        Definition: �Joined� and �split� genes 
        Toy example 2 (1) 
        Annotation experiment results 
        Results: Base level 
        Results: Exon level 
        Results:  Gene level 
        Results: Gene level 
        Results (protein homology):  Base level 
        Results (protein homology):  Exon level 
        Results (protein homology):  Gene level 
        Transcription Start Site (TSS): Standard 1 
        TSS: Standard 3 
        Results: TSS recognition 
        Interesting gene examples: bubblegum 
        Adh/Adhr (Alcohol dehydrogenase/Adh related) 
        Adh/Adhr (cont..) 
        osp (outspread) 
        cact (cactus) 
        kuz (kuzbanian) 
        beat (beaten path) 
        Idfg1, Idfg2, Idfg3 (Imaginal Disc Growth Factor) 
        Idfg1, Idfg2, Idfg3 (cont.) 
        Conclusion of GASP1 
        Conclusion GASP1 (cont.) 
        Discussion GASP1 
        Conclusions on annotating complete eukaryotic genomes 
        Conclusions on annotating complete eukaryotic genomes (cont.) 
        Discussion on annotating complete eukaryotic genomes  
        Acknowledgments 
    | 
   
	 Author: Martin G. Reese, Nomi L. Harris,
George Hartzell, Suzanna E. Lewis 
       Email:  [email protected]   
	  Home Page:  http://www.fruitfly.org/GASP1   
	  Other information:  Tutorial #3 Presentation at the ISMB '99 
conference in Heidelberg, Germany, August 6, 1999
including the annotation experiment GASP1   
	
	
	  Download powerpoint presentation source  
     |