✦ LIBER ✦

Fast pattern matching in indexed texts

✍ Scribed by Jean Senellart

Book ID: 104326551
Publisher: Elsevier Science
Year: 2000
Tongue: English
Weight: 309 KB
Volume: 237
Category: Article
ISSN: 0304-3975
DOI: 10.1016/s0304-3975(98)00173-x

No coin nor oath required. For personal study only.

✦ Synopsis

We present a simple and e cient algorithm for matching regular expression with texts, using full inverted text. It is based on the max-ow=min-cut algorithm, traditionaly employed to resolve linear problems. Our procedure constructs an optimal set of nodes for any automaton. They constitute the set of starting points for the pattern matching procedure. The algorithm is presented in two forms. The ÿrst one, named Index-Text, selects areas on which are applied classical algorithms (that have an O(nf(m)) complexity where n is the size of the text, and m the size of the automaton) for regular expression matching. The second one, named Index-Index, selects a set of entries of the full inverted text. This set contains the points of departure for algorithms applied to full inverted text. In each case, the set is chosen to optimize the ensuing matching procedure. As a result, classical algorithms are at least accelerated by a constant factor. The algorithm presented can be viewed as a ÿlter either on the text or on the full inverted text. We show that for very large databases of natural language texts, our (pre-)algorithm combined with the simplest linear regular expression matching algorithm is faster than those that sort texts in a preprocessing step and that have a log(n + |W |) complexity. In the special case of string matching, we give an average-case analysis of the algorithm.

📜 SIMILAR VOLUMES

Pattern matching in Huffman encoded text

Pattern matching in Huffman encoded texts

✍ Shmuel T. Klein; Dana Shapira 📂 Article 📅 2005 🏛 Elsevier Science 🌐 English ⚖ 320 KB

Fast Pattern-Matching via k-bit Filterin

Fast Pattern-Matching via k-bit Filtering Based Text Decomposition

✍ Kulekci, M. O.; Vitter, J. S.; Xu, B. 📂 Article 📅 2010 🏛 Oxford University Press 🌐 English ⚖ 250 KB

Fast two-dimensional pattern matching

✍ Ricardo Baeza-Yates; Mireille Régnier 📂 Article 📅 1993 🏛 Elsevier Science 🌐 English ⚖ 561 KB

Segmented Gray-Code Kernels for Fast Pat

Segmented Gray-Code Kernels for Fast Pattern Matching

✍ Wanli Ouyang, ; Renqi Zhang, ; Wai-Kuen Cham, 📂 Article 📅 2013 🏛 IEEE 🌐 English ⚖ 950 KB

Adapting the Knuth–Morris–Pratt algorith

Adapting the Knuth–Morris–Pratt algorithm for pattern matching in Huffman encoded texts

✍ Dana Shapira; Ajay Daptardar 📂 Article 📅 2006 🏛 Elsevier Science 🌐 English ⚖ 185 KB

On the Complexity of Pattern Matching fo

On the Complexity of Pattern Matching for Highly Compressed Two-Dimensional Texts

✍ Piotr Berman; Marek Karpinski; Lawrence L. Larmore; Wojciech Plandowski; Wojciec 📂 Article 📅 2002 🏛 Elsevier Science 🌐 English ⚖ 293 KB

We consider the complexity of problems related to two-dimensional texts (2D-texts) described succinctly. In a succinct description, larger rectangular subtexts are defined in terms of smaller parts in a way similar to that