This book constitutes the proceedings of the 17th International Symposium on String Processing and Information Retrieval, SPIRE 2010, held in Los Cabos, Mexico, in October 2010. The 26 long and 13 short papers presented were carefully reviewed and selected from 109 submissions. The volume also conta
String Processing and Information Retrieval: 17th International Symposium, SPIRE 2010, Los Cabos, Mexico, October 11-13, 2010, Proceedings (Lecture Notes in Computer Science, 6393)
✍ Scribed by Edgar Chavez (editor), Stefano Lonardi (editor)
- Publisher
- Springer
- Year
- 2010
- Tongue
- English
- Leaves
- 421
- Category
- Library
No coin nor oath required. For personal study only.
✦ Synopsis
This book constitutes the proceedings of the 17th International Symposium on String Processing and Information Retrieval, SPIRE 2010, held in Los Cabos, Mexico, in October 2010.
The 26 long and 13 short papers presented were carefully reviewed and selected from 109 submissions. The volume also contains 2 invited talks. The papers are structured in topical sections on crowdsourcing and recommendation; indexes and compressed indexes; theory; string algorithms; compressions; querying and search user experience; document analysis and comparison; compressed indexes; and string matching.
✦ Table of Contents
Title
Preface
Organization
Table of Contents
Crowdsourcing and Recommendation
Querying the Web Graph
The Ranking Problem
Using Hyperlinks to Rank Web Search Results
PageRank
HITS and Its Variants
Conclusion
References
Incremental Algorithms for Effective and Efficient Query Recommendation
Introduction
Related Work
Incremental Algorithms for Query Recommendation
Static solutions
Incremental algorithms
Quality Metrics
Experiments
Experimental Setup
Results
Efficiency Evaluation
Conclusions
References
Fingerprinting Ratings for Collaborative Filtering — Theoretical and Empirical Analysis
Introduction
Preliminaries
Rank Correlation Fingerprints
Empirical Analysis
Related Work
Conclusion
References
On Tag Spell Checking
Introduction
Model Description
Experiments
Conclusions
References
Indexes and Compressed Indexes
Compressed Self-indices Supporting Conjunctive Queries on Document Collections
Introduction, Results and Previous Work
Preliminary Concepts and Notation
Succinct Encodings for Document Reporting
Efficient Support for Conjunctive Queries
A Simple Worst-Case Algorithm for Conjunctive Queries
An Adaptive Algorithm for Conjunctive Queries
Experimental Results
Conclusions
References
String Retrieval for Multi-pattern Queries
Introduction
Comparison with Previous Work
Preliminaries
Suffix Trees and Suffix Arrays
Wavelet Tree
Weight-Balanced Wavelet Tree
Any-One Index
Index Construction
Query Answering
Efficient Indexes for K-Mine and K-Repeats Problems
Query Answering
Top-K Queries
Generalization to Multi-pattern Queries
References
Colored Range Queries and Document Retrieval
Introduction
Listing
Top-k Queries
Counting
Further Applications to Information Retrieval
References
Range Queries over Untangled Chains
Introduction
The U-Chains Structure
Implementation Details
Partitioning the Points
Untangling Chains
Experimental Results
Partitioning
Construction
Query Generation
Range Searching with Different Partitioning Methods
Conclusions
References
Theory
Multiplication Algorithms for Monge Matrices
Introduction and Related Work
Basic Concepts
Highest Score Matrices
Core Sensitive Multiplication
Representing C= A⊗B
Obtaining the Core
Analysis
Experimental Results
Conclusions
References
Why Large Closest String Instances Are Easy to Solve in Practice
Introduction
Preliminaries
Previous Results
A Randomized Algorithm for Closest String
Approximation Guarantee for CSP-Greedy
Probabilistic Analysis of CSP-Greedy
Experimental Results
Smoothed Analysis of {\it CSP-Greedy}
References
A PTAS for the Square Tiling Problem
Motivation
Background and Definitions
Rectangle Tiling
The Approximation
Conclusions and Open Problems
References
On the Hardness of Counting and Sampling Center Strings
Introduction
Hardness Results
Counting and Sampling with Fixed Parameters
Conclusion
References
String Algorithms I
Counting and Verifying Maximal Palindromes
Introduction
Palindromes in Strings
Our Contribution
Related Work
Preliminaries
Palindromes and Parameterized Matching
Inferring a String from Maximal Palindromes
Problem
Linear-Time Algorithm to Compute Maximal Palindromes from a String
Our Algorithm to Compute a String from Maximal Palindromes
Conclusions and Future Work
References
Identifying SNPs without a Reference Genome by Comparing Raw Reads
Introduction
Preliminaries
Comparative Micro-assembly Model
Algorithm kisSnp for Mouth Detection
Creating the Mouths
Complexity
Checking for Read Coherence
Applications to Simulated Read Data
Finding the SNPs
Execution Time
Applications to Real Read Data
Conclusion and Future Work
References
Dynamic Z-Fast Tries
Introduction
Notation and Tools
The Components of a Z-Fast Trie
The Compacted Trie
The Dictionary
Implementing the Dictionary: Short vs. Long Strings
Jump Pointers
Space Usage
Queries
Updates
References
Improved Fast Similarity Search in Dictionaries
Introduction and Previous Results
Approximate Dictionary Matching
Experimental Results
Conclusions and Future Work
References
Compression
Training Parse Trees for Efficient VF Coding
Introduction
Variable-Length-to-Fixed-Length Codes
Training Parse Trees
Experimental Results
Compression Ratios and Speeds
Effects of Training
Speeding-Up by Sampling
References
Algorithms for Finding a Minimum Repetition Representation of a String
Introduction
Problem Setting
Algorithms
General Algorithm
Algorithm for a Constant Repetition Weight Function
Concluding Remarks
References
Faster Compressed Dictionary Matching
Introduction
Preliminaries
A Review of BWT
A Review of XBW
Compressed Prefix Matching with XBW
Compressing Belazzougui's Scheme
Our Index for Compressed Dictionary Matching
Conclusion
References
Relative Lempel-Ziv Compression of Genomes for Large-Scale Storage and Retrieval
Introduction
Background and Basic Tools
Storing and Accessing Related Genomes
Experimental Results
Discussion
References
Querying and Search User Experience
Standard Deviation as a Query Hardness Estimator
Introduction
Related Work
Ranking List Score Dispersion as a Predictor
Results
Conclusions
References
Using Related Queries to Improve Web Search Results Ranking
Introduction
Generative Model
Exclusive Subtopics
Naive Bayes
Naive Bayes Generative Model
Numerical Experiments
Models Comparison
Discussions and Conclusions
References
Fast Evaluation
Evaluation of Query Performance Prediction Methods by Range
Introduction
Current Evaluation Approaches
Evaluating by Range
Experiments and Results
Conclusions
References
Mining Large Query Induced Graphs towards a Hierarchical Query Folksonomy
Introduction
Related Work
Click Induced Graph
Graph Clustering and Induced Query Folksonomy
Experimental Evaluation
References
String Algorithms II
Finite Automata Based Algorithms for the Generalized Constrained Longest Common Subsequence Problems
Introduction
Preliminaries
Finite Automata Based Algorithms
STR-IC-LCS Problem
STR-EC-LCS Problem
SEQ-IC-LCS Problem
SEQ-EC-LCS Problem
Complexity
Conclusion
References
Restricted LCS
Introduction
Related Work
Our Contribution
Preliminaries
Hardness Result
Exact Algorithms
Approximation Algorithms
References
Extracting Powers and Periods in a String from Its Runs Structure
Introduction
Preliminaries
Lyndon Representations of Runs
Inferring Powers from Runs
Computation of Local Periods
Factor-Primitivity Queries
References
On Shortest Common Superstring and Swap Permutations
Introduction
Motivation
Previous Work
Our Contributions
Hardness of Swapped Restricted Superstring
Exact Algorithm for Swapped Restricted Superstring with Repetitions
A Dynamic Programming Approach
A Faster Solution
Open Questions
References
Document Analysis and Comparison
A Self-Supervised Approach for Extraction of Attribute-Value Pairs from Wikipedia Articles
Introduction
Related Work
Extracting Attribute-Value Pairs from Wikipedia
Wikipedia Processor
Classifier
Filter
Extractor
Experimental Results
Conclusions
References
Temporal Analysis of Document Collections: Framework and Applications
Time in Information Retrieval
Time Annotated Document Model
Temporal Document Measures
Temporal Document Similarity
Applications
References
Text Comparison Using Soft Cardinality
Introduction
Related Work
Soft Cardinality
Estimating Cardinality of Set Union Using Pairwise Intersections
Decoupling and Combining Affinity and Importance
Preliminary Results
Experimental Setup
Results
Conclusions
References
Hypergeometric Language Model and Zipf-LikeScoring Function for Web Document Similarity Retrieval
Introduction
Previous Work
Hypergeometric Language Model for Query Generation
Meta Search Engine and Zipf-Like Scoring Function for Generated Queries
Experiments
Results and Discussion
Conclusions
References
Compressed Indexes
Dual-Sorted Inverted Lists
Introduction
Related Work
Intersection Algorithms for Inverted Lists
Data Structures for Inverted Lists
Algorithms for Ranked Retrieval
Wavelet Trees
Our Data Representation
Full-Text Retrieval
Ranked Retrieval
Conclusions and Future Work
References
CST++
Introduction
A Guided Tour through Compressed Suffix Trees
Our Results in Context
A New Representation of Super-Cartesian Trees
New Compressed Suffix Tree
References
Succinct Representations of Dynamic Strings
Introduction
Related Work
Our Results
Preliminaries
Collections of Searchable Partial Sums
Strings over Small Alphabets
Strings over General Alphabets
Applications
Concluding Remarks
References
Computing Matching Statistics and Maximal Exact Matches on Compressed Full-Text Indexes
Introduction
Preliminaries
Matching Statistics by Backward Search
Computing Maximal Exact Matches by Backward Search
Experimental Results
References
The Gapped Suffix Array: A New Index Structure for Fast Approximate Matching
Introduction
Definitions
Approximate String Matching
The Gapped Suffix Array
Definitions
Searching Using the Gapped Suffix Array
Computing the Gapped Suffix Array
Computing the Gapped LCP Array gLCP
Representing the Array gSA in Reduced Space
Conclusion
References
String Matching
Parameterized Searching with Mismatches for Run-Length Encoded Strings
Introduction
Problem Description
Parameterized String Matching via Parametric Graph Matching
Conclusion
References
Fast Bit-Parallel Matching for Network and Regular Expressions
Introduction
Preliminary
Fast Bit-Parallel Algorithm for Extended Network Expressions
Basic NFA Simulation Algorithm
Bit-Parallel Implementation
Main Results
Extension for General Regular Expressions
Experimental Results
Conclusion
References
String Matching with Variable Length Gaps
Introduction
Previous Work
Our Results
Technical Overview
Algorithm
Multi-string Matching
Relevant Occurrences
The Algorithm
Analysis
Correctness
Time and Space Complexity
References
Approximate String Matching with Stuck Address Bits
Introduction
Pattern Matching with Address Errors
Problem Definition
Our Results
The Stuck Bits Problem
Pattern Matching under Stuck Bits Errors
Total Time for the Stuck Bits Problem
Transient Stuck Bits Problem
Verifying the Existence of a Transient Stuck Bit Matching
Finding the Stuck Bits Location
Faster Verification of Transient Stuck Bit Matching
Total Time for the Transient Stuck Bits Problem
Conclusions and Open Problems
References
Erratum
Range Queries over Untangled Chains
Author Index
📜 SIMILAR VOLUMES
This book constitutes the proceedings of the 17th International Symposium on String Processing and Information Retrieval, SPIRE 2010, held in Los Cabos, Mexico, in October 2010. The 26 long and 13 short papers presented were carefully reviewed and selected from 109 submissions. The volume also conta
<p>This book constitutes the proceedings of the 18th International Symposium on String Processing and Information Retrieval, SPIRE 2011, held in Pisa, Italy, in October 2011. <br>The 30 long and 10 short papers together with 1 keynote presented were carefully reviewed and selected from 102 submissio
<p>This book constitutes the proceedings of the 18th International Symposium on String Processing and Information Retrieval, SPIRE 2011, held in Pisa, Italy, in October 2011. <br>The 30 long and 10 short papers together with 1 keynote presented were carefully reviewed and selected from 102 submissio
<span>This volume contains the papers presented at the 13th International Symposium on String Processing and Information Retrieval (SPIRE), held October 11-13, 2006, in Glasgow, Scotland. The SPIRE annual symposium provides an opportunity for both new and established researchers to present original
<span>This volume of the Lecture Notes in Computer Science series provides a c- prehensive, state-of-the-art survey of recent advances in string processing and information retrieval. It includes invited and research papers presented at the 9th International Symposium on String Processing and Informa