From: Jorja Henikoff To: mgreese@lbl.gov Subject: Re: Information on submissions Date: Wed, 21 Jul 1999 11:12:55 -0700 (PDT) 7/21/99 Fly Annotation Experiment - BLOCKS ----------------------------------------------------------------------------- 1. List of credits: Jorja & Steven Henikoff henikoff@muller.fhcrc.org Fred Hutchinson Cancer Research Center FAX: 206-667-5889 1100 Fairview AV N, A1-162, PO Box 19024 Seattle, WA 98109-1024 http://blocks.fhcrc.org ----------------------------------------------------------------------------- 2. Short method: Searched 2.9mb DNA sequence vs Blocks+ (15June99, http://blocks.fhcrc.org) and vs blocks extracted from Smart 3.0 multiple alignments (http://coot-embl-heidelberg.de/SMART/) using BLIMPS, which translates the DNA sequence in all 6 frames for comparison with blocks. Post-processed BLIMPS search results with the "blkprob" program to compute expected values, using combined (for multiple blocks from the same family) cutoff evalue = 10. Sorted results by location on each strand. ----------------------------------------------------------------------------- 3. Detailed method: Searched 2.9mb DNA sequence vs Blocks+ (15June99, http://blocks.fhcrc.org) and vs blocks extracted from Smart 3.0 multiple alignments (http://coot-embl-heidelberg.de/SMART/) using BLIMPS 3.2.5, which translates the DNA sequence in all 6 frames for comparison with blocks. Post-processed BLIMPS search results with the "blkprob" program to compute expected values, using combined (for multiple blocks from the same family) cutoff evalue = 10. Sorted results by location on each strand. Removed the following types of results: a. Unsupported alignments to single blocks known to be compositionally biased (not supported by other blocks from the family). b. Unsupported alignments to single blocks with evalue > 1. Manually investigated regions where blocks from different families hit in the same region. The GFF file contains one record for each block, so if multiple blocks were hit for a family there will be multiple records. Documentation for hits can be found at: http://blocks.fhcrc.org/blocks-bin/getblock.sh? Searching each strand took about 24 hours and manual post-processing about 4 hours by one person. ------------------------------------------------------------------------- 4. References J.G. Henikoff, S. Henikoff & S. Pietrokovski, "New features of the Blocks Database servers", Nucl. Acids Res. 27:226-228 (1999). S. Henikoff & J.G. Henikoff, "A Protein Family Classification Method for Analysis of Large DNA Sequences." Proceedings of the 27th Ann. Hawaii Intl. Conf. on System Sciences pp. 265-274 (1994). S. Henikoff & J.G. Henikoff, "Protein family classification based on searching a database of blocks." Genomics 19:97-107 (1994). ------------------------------------------------------------------------- 5. URLs http://blocks.fhcrc.org http://blocks.fhcrc.org/blocks-bin/getblock.sh?