This directory contains real splice sites and faked splice sites, where the faked splice sites were selected from a window of +/- 40bp around the actual splice sites D (Donor) A (Acceptor). The data set of cleaned 202 Drosophila melanogaster genes is devided into a test and a training data set (PART0). This data is from ftp://www-hgc.lbl.gov/pub/genesets/Drosophila/GENIE_96/multi_exon_GB.sets. This data set was created to compare different splice site models. Both Donor and Acceptor data sets have 230bp of the exon/intron and 230bp of the following intron/exon. =================== Martin Reese, 31jul97 mgreese@lbl.gov