Data structure for representation of maximal repeats in strings
Diploma thesis home page
Author: Michal Linhard
Thesis advisor: Tibor Hegedus
Last changed: May 01 2007
Abstract
In this diploma thesis we present data structure for representation of maximal
repeats in strings
- R3 tree, based on well known data structure - suffix tree. It requires O(n)
space and it can be
constructed in O(n) time and space for string of length n over constant-sized
alphabet. We formalize repeat in
string S as triple (p1, p2, l), where p1, p2 are two distinct positions in S
and l is the length of the repeat.
We formulate query for maximal repeats in S in the form of the function findPairs(p1,
k, S)
that returns all pairs (p2, l) such that (p1, p2, l) is maximal repeat with
l >= k. R3 tree allows
computation of findPairs queries in optimal time O(z), where z is the number
of found pairs. We
also describe design and functionality of R3lib
- library written in C, for finding maximal repeats
in arbitrary binary data, that works with proposed structure.
Documents
Name |
Last change |
Link |
Diploma thesis - final version |
may 01 2007 |
pdf |
Diploma thesis TeX sources + figures - final version |
may 01 2007 |
zip |
Diploma thesis - working version |
apr 24 2007 |
pdf |
Diploma thesis - working version |
apr 23 2007 |
pdf |
Diploma thesis - working version |
apr 10 2007 |
pdf |
Diploma thesis - working version |
mar 26 2007 |
pdf |
Diploma thesis - working version |
mar 22 2007 |
pdf |
Fourth presentation for seminar (slovak) |
feb 2 2007 |
pdf, odp |
Third presentation for seminar (slovak) |
nov 29 2006 |
pdf, odp |
Second presentation for seminar (slovak) |
sep 27 2006 |
pdf, odp |
Presentation for diploma thesis seminar (slovak) |
jun 29 2006 |
ppt |
References
- Brenda S. Baker, On finding duplication in strings and
software, technical report, AT&T Bell Laboratories, February, 1993
[source]
[.ps.gz]
- Manber, U., & Myers, G. (1993). Suffix arrays: A
new method for on-line string searches. SIAM Journal on Computing, 22(5),
935--948 [source]
[.ps]
- E. Ukkonen: On-line construction of suffix-trees. Algorithmica 14
(1995), 249-260 [source]
[.pdf]
- M. Crochemore, W. Rytter: Text Algorithms. Oxford University Press,
New York, 1994 [source]
- P. Ko and S. Aluru, "Space-efficient linear time construction of suffix
arrays," Combinatorial Pattern Matching, pp. 200-210, 2003.
[source] [.pdf]
- Juha Kärkkäinen and Peter Sanders: Simple linear work suffix array construction,
in Proc. 30th International Colloquium on
Automata, Languages and Programming (ICALP '03). LNCS 2719, Springer,
2003, pp. 943-955 [source]
[.pdf]
- S. Kurtz and C. Schleiermacher: REPuter: Fast Computation of Maximal
Repeats in Complete Genomes, Bioinformatics, 15(5):426-427, 1999. [source][.pdf]
- S. Kurtz, and E. Ohlebusch: Replacing Suffix Trees with Enhanced Suffix
Arrays, M.I. Abouelhoda, Journal of Discrete Algorithms, 2:53-86, 2004.
[source][.pdf]
- M.I. Abouelhoda, S. Kurtz, and E. Ohlebusch: The Enhanced Suffix Array
and its Applications to Genome Analysis, In Proceedings of the Second
Workshop on Algorithms in Bioinformatics, pages 449-463. Lecture Notes in
Computer Science 2452, Springer-Verlag, 2002. [source][.pdf]
- R. Dementiev, J. Mehnert, and J. Kärkkäinen: Better External Memory
Suffix Array Construction. In Workshop on Algorithm Engineering &
Experiments, Vancouver, 2005. [source][.pdf]
- Jens Mehnert: External Memory Suffix Array Construction, Diploma
Thesis [source][.pdf]
- R. Dementiev, L. Kettner, P. Sanders: Stxxl: Standard Template Library
for XXL Data Sets. Technical Report 2005/18, Fakultät für Informatik,
University of Karlsruhe [source][.pdf]
- Stefan Burkhardt, Juha Kärkkäinen: Fast Lightweight Suffix Array Construction
and Checking, CPM 2003, LNCS 2676, pp. 55–69, 2003 [.pdf]
- Klaus-Bernd Schurmann, Jens Stoye: An Incomplex Algorithm for Fast Suffix
Array Construction, Proceedings of the 7th Workshop on Algorithm Engineering
and Experiments, 2005 [.pdf]
- S. Lee, K. Park, Efficient Implementations of Suffix Array Construction
Algorithms, Proceedings of the 15th Australasian Workshop on Combinatorial
Algorithms, pp. 64-72, 2004 [.ps]
- D. K. Kim, J. S. Sim, H. Park, K. Park, Linear-time construction of
suffix arrays, Proceedings of the 14th Annual Symposium on Combinatorial
Pattern Matching, pp. 186-199, 2003 [.ps]
- Juha Kärkkäinen, Peter Sanders and Stefan Burkhardt, Linear work suffix
array construction, Journal of ACM, 2005. [.pdf]
- Giovanni Manzini, Two space saving tricks for linear time LCP computation,
Technical Report TR-INF-2004-02-03-UNIPMN, [.pdf]
- Moritz G. Maaß. Linear Bidirectional On-line Construction of Affix Trees.
In Proceedings of the 11th Annual Symposium on Combinatorial Pattern Matching
(CPM), volume 1848 of LNCS, pages 320-334. Springer, jun 2000 [source][.ps.gz]
- J. Stoye. Affixbäume. Diploma thesis, 1995. [source][.ps]
- J. Stoye. Affix trees. Diploma thesis, 1995. [source][.ps]
- Shunsuke Inenaga, Hiromasa Hoshino, Ayumi Shinohara, Masayuki Takeda, Setsuo
Arikawa: On-Line Construction of Symmetric Compact Directed Acyclic Word
Graphs, Proc. 8th Symposium on String Processing and Information Retrieval
(SPIRE2001), IEEE-CS Press, pp. 96-110, November 2001 [source][.pdf]
Relevant software
Links
Changelog
May 01 2007 - Final version of diploma thesis added.
April 30 2007 - Pre-Final version of diploma thesis added.
April 25 2007 - Link added.
April 24 2007 - New working version of thesis.
April 23 2007 - New working version of thesis.
April 10 2007 - R3lib project page link added, new working version of diploma
thesis.
March 22 2007 - New version of diploma thesis home page in english. Old version
can be found here : Old version of page.