From: benson@ecology.biomath.mssm.edu To: "@msvax.mssm.edu":@msvax.mssm.edu:mgreese@lbl.gov Cc: benson@ecology.biomath.mssm.edu Subject: Re: Information on submissions Date: Fri, 23 Jul 1999 15:53:25 -0400 (EDT) Dear Martin, Below is the additional information you requested for my submission to the compfly project for ISMB99. Unfortunately, I will not be able to attend the tutorial. I will be there on the 9th, 10th and 11th and give a talk on the afternoon of the 10th. I hope we will have time to talk about the results. Best regards, Gary Benson --------------------------------------------------------------------- Tandem Repeats Finder Program -- Compfly Annotation for ISMB99 1. Gary Benson Department of Biomathematical Sciences Box 1023 The Mount Sinai School of Medicine One Gustave L. Levy Place New York, NY 10029-6574 (212) 241-5777 work (212) 860-4630 fax email: benson@ecology.biomath.mssm.edu url: www.mssm.edu/biomath/benson.html 2. Annotation obtained by running Tandem Repeats Finder v2.02 on the entire 2.9 Mb file using default parameters. Run time was 40 seconds. Output from Tandem Repeats Finder converted to GFF format with a perl program. No manual intervention. 3. Tandem Repeats Finder locates approximate tandem repeats in nucleotide sequences. A tandem repeat is two or more contiguous, approximate copies of a pattern of nucleotides. The program is able to identify tandem repeats without the need to specify either the pattern or the pattern size. Repeats with pattern size from 1 to 500 bases are detected. The program uses a theoretical model of tandem repeats which specifies how much similarity must exist between adjacent copies. For example, the user could specify that the copies be, on average, at least 80% similar. Additionally, an average number of indels between corresponding positions in adjacent copies is specified, for example 10%. Based on the statistical model, we developed a number of statistical criteria for detection based on distributions of head runs in Bernoulli trials (coin tosses) where the probability of heads is the same as the average similarity. The program uses k-tuple matches (short word matches) to locate candidate tandem repeats. The required number of matches and their overall separation are based on the statistical distributions. Alignment by wraparound dynamic programming is used to validate the candidates. Output from the program consists of at least two html files which can be viewed through a web browser. One is a summary table of the repeats detected, including their location, pattern size, number of copies and nucleotide content. This information was converted to the GFF format. A second file contains the alignment of each tandem repeat against its consensus sequence. The summary table and alignment file are linked so that clicking on an entry in the summary table brings up the correct alignment. The output files from this annotation project can be viewed at http://c3.biomath.mssm.edu/trf/Adh.fa.2.7.7.80.10.50.500.1.html on our webserver. 4. Reference: G. Benson, "Tandem repeats finder: a program to analyze DNA sequences" Nucleic Acids Research (1999) Vol. 27, No. 2, pp. 573-580. 5. Website: http://c3.biomath.mssm.edu/trf.html Tandem Repeats Finder is available for use over the internet and Unix or PC versions can be downloaded for local use.