This directory contains real and fake splice sites and a window of +/- 40bp around the actual splice sites D (Donor) A (Acceptor). The data set of cleaned 269 genes is devided into a test and a training data set (PART0). This data is from ftp://www-hgc.lbl.gov/pub/genesets/data_1995/multi_exon_GB.cleaned269.sets. This data set was created to compare different splice site models. The Donor data sets have 7bp of the exon and 8bp of the following intron (starting with GT). The Acceptor data sets have 70bp in the intron (ending with AG) and 20bp of the following exon. =================== Martin Reese, 24jun97 mgreese@lbl.gov