Date: Oct 1996 ftp://www-hgc.lbl.gov/pub/genesets/OtherDataSets/ASHBURNER_97 This directory contains a set unique Drosophila melanogaster coding region transcripts. The data sets are: unique-50-genes.80.fa unique-50-genes.50.fa These data sets were made from the ./nuclear_cds.embl file at ftp://ftp.ebi.ac.uk/pub/databases/edgp/sequence_sets/ to build a unique coding region transcript dataset for i.e. training coding region predictors. The original nuclear_cds.embl data set was compiled by Takis Benos (EBI) and Michael Ashburner (EBI & Cambridge) with help from Aubrey de Grey (Cambridge). Additional Drosophila melanogaster datasets can be obtained at above ftp site. BLASTN was used in an all-against-all comparison of the nucleic acid sequences. From any clump of sequences with an overall identity above 80% (or 50%) only one member was chosen (at random) for inclusion. ===== Martin Reese (LBNL) mgreese@lbl.gov