Fast pattern matching in indexed texts
β Scribed by Jean Senellart
- Book ID
- 104326551
- Publisher
- Elsevier Science
- Year
- 2000
- Tongue
- English
- Weight
- 309 KB
- Volume
- 237
- Category
- Article
- ISSN
- 0304-3975
No coin nor oath required. For personal study only.
β¦ Synopsis
We present a simple and e cient algorithm for matching regular expression with texts, using full inverted text. It is based on the max-ow=min-cut algorithm, traditionaly employed to resolve linear problems. Our procedure constructs an optimal set of nodes for any automaton. They constitute the set of starting points for the pattern matching procedure. The algorithm is presented in two forms. The ΓΏrst one, named Index-Text, selects areas on which are applied classical algorithms (that have an O(nf(m)) complexity where n is the size of the text, and m the size of the automaton) for regular expression matching. The second one, named Index-Index, selects a set of entries of the full inverted text. This set contains the points of departure for algorithms applied to full inverted text. In each case, the set is chosen to optimize the ensuing matching procedure. As a result, classical algorithms are at least accelerated by a constant factor. The algorithm presented can be viewed as a ΓΏlter either on the text or on the full inverted text. We show that for very large databases of natural language texts, our (pre-)algorithm combined with the simplest linear regular expression matching algorithm is faster than those that sort texts in a preprocessing step and that have a log(n + |W |) complexity. In the special case of string matching, we give an average-case analysis of the algorithm.
π SIMILAR VOLUMES
We consider the complexity of problems related to two-dimensional texts (2D-texts) described succinctly. In a succinct description, larger rectangular subtexts are defined in terms of smaller parts in a way similar to that