𝔖 Bobbio Scriptorium
✦   LIBER   ✦

Fast pattern matching in indexed texts

✍ Scribed by Jean Senellart


Book ID
104326551
Publisher
Elsevier Science
Year
2000
Tongue
English
Weight
309 KB
Volume
237
Category
Article
ISSN
0304-3975

No coin nor oath required. For personal study only.

✦ Synopsis


We present a simple and e cient algorithm for matching regular expression with texts, using full inverted text. It is based on the max-ow=min-cut algorithm, traditionaly employed to resolve linear problems. Our procedure constructs an optimal set of nodes for any automaton. They constitute the set of starting points for the pattern matching procedure. The algorithm is presented in two forms. The ΓΏrst one, named Index-Text, selects areas on which are applied classical algorithms (that have an O(nf(m)) complexity where n is the size of the text, and m the size of the automaton) for regular expression matching. The second one, named Index-Index, selects a set of entries of the full inverted text. This set contains the points of departure for algorithms applied to full inverted text. In each case, the set is chosen to optimize the ensuing matching procedure. As a result, classical algorithms are at least accelerated by a constant factor. The algorithm presented can be viewed as a ΓΏlter either on the text or on the full inverted text. We show that for very large databases of natural language texts, our (pre-)algorithm combined with the simplest linear regular expression matching algorithm is faster than those that sort texts in a preprocessing step and that have a log(n + |W |) complexity. In the special case of string matching, we give an average-case analysis of the algorithm.


πŸ“œ SIMILAR VOLUMES


Fast two-dimensional pattern matching
✍ Ricardo Baeza-Yates; Mireille RΓ©gnier πŸ“‚ Article πŸ“… 1993 πŸ› Elsevier Science 🌐 English βš– 561 KB
On the Complexity of Pattern Matching fo
✍ Piotr Berman; Marek Karpinski; Lawrence L. Larmore; Wojciech Plandowski; Wojciec πŸ“‚ Article πŸ“… 2002 πŸ› Elsevier Science 🌐 English βš– 293 KB

We consider the complexity of problems related to two-dimensional texts (2D-texts) described succinctly. In a succinct description, larger rectangular subtexts are defined in terms of smaller parts in a way similar to that