Computational Linguistics and Intelligent Text Processing: 8th International Conference, CICLing 2007, Mexico City, Mexico, February 18-24, 2007, Proceedings (Lecture Notes in Computer Science, 4394)

✍ Scribed by Alexander Gelbukh (editor)

Publisher: Springer
Year: 2007
Tongue: English
Leaves: 662
Category: Library

No coin nor oath required. For personal study only.

✦ Synopsis

This book constitutes the refereed proceedings of the 8th International Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2007, held in Mexico City, Mexico in February 2007.

The 53 revised full papers presented together with 3 invited papers cover all current issues in computational linguistics research and present intelligent text processing applications.

✦ Table of Contents

Title
Preface
Organization
Table of Contents
Integration of Linguistic Resources for Verb Classification: FrameNet Frame, WordNet Verb and Suggested Upper Merged Ontology
Introduction
Background and Motivation
Extension of FrameNet Verb Coverage
Direct Retrieval – WordNet Synset
WordNet Relation Links and Frame as Domain
Affinity of Candidate Synsets with Domain Frame
Linking FrameNet Frame with SUMO Concept
Data Evaluation
Evaluation Result
Conclusion
References
French EuroWordNet Lexical Database Improvements
Introduction
EuroWordNet: Presentation and Limits
Improvement Made to the Relationships
An Usable Database
Update of the Semantic Relationships
Inserting Definitions into EuroWordNet Thesaurus
Wikipedia
Definition Extraction in French Language
General Process
Results Analysis
Conclusion
Building a Large-Scale Commonsense Knowledge Base by Converting an Existing One in a Different Language
Introduction
The Method
Translating the English OpenMind Using a Commercial MT Software
Translating the English ConceptNet with Heuristic Translation Rules
Combining Two Translation Results
Manual Evaluations of E-K Translated Concepts
Concluding Remarks and Future Work
References
Conquering Language: Using NLP on a Massive Scale to Build High Dimensional Language Models from the Web
Introduction
Estimating Language Presence
Extracting Language Modeling Information
Fetching Web Pages and Character Encoding
Extracting Text from Web Pages
Language Identification and Web Crawling
Natural Language Processing
Processing Output and Example
Possible Applications
Related Research and Conclusion
References
On Heads and Coordination in Valence Acquisition
Introduction
Motivation and Outline
The IPI PAN Corpus
Poliqarp
Distinguishing Syntactic and Semantic Heads
Coordination
XML Representation
Extending the Poliqarp Query Language
Simple Constructions
Coordination
Conclusion
Chinese Terminology Extraction Using Window-Based Contextual Information
Introduction
Related Work
Algorithm Design
The Preprocessing Module
Automatic Term Extraction
Terminology Verification
Experiment and Discussion
Performance of the Two Approaches
The Hybrid Approach
Conclusion
References
Baby-Steps Towards Building a Spanglish Language Model
What Is Spanglish?
Linguistic Features of Spanglish
Code-Switching
Borrowing
Code-Mixing
Examples of Shallow Phenomena
Language Models
Data Collection
Tools of the Trade
Test Phase and Results
SML Test
UTI Test
Final Remarks
Current and Future Work
Latent Variable Models for Causal Knowledge Acquisition
Introduction
Related Work
Statistical Models for Causal Knowledge Acquisition
Model Structures
Model Estimation
Causality Detection
Experiments
Settings
Results(1): The Effectiveness of Incorporating Dependencies Between Two Events into Causal Models
Results(2): The Effectiveness of Class Labels
Examples
Conclusion
Finite-State Technology as a Programming Environment
Introduction
A Motivating Example
An Alternative Implementation
Comparison and Evaluation
Discussion
Morphological Disambiguation of Turkish Text with Perceptron Algorithm
Introduction
Morphological Disambiguation
Representation
Problem Definition
Methodology
Baseline Trigram-Based Model
Perceptron Algorithm
Experiments
Data Set
Features
Optimal Parameter and Feature Selection
Results
Conclusions
Part-of-Speech Tagging Using Word Probability Based on Category Patterns
Introduction
N-Gram POS-Tagging Models and Korean Language Characteristics
Word N-Gram Is Not of Practical Use for Korean
Morphotactic Constraints Within a Korean Word and Previous Alternatives
Korean POS-Tagging Using Word Probability Based on Category Patterns
Application of Category-Pattern-Based Model to Bayesian Models for POS-Tagging
Parameter Training and POS Assigning
Experimentation and Application
Conclusions and Further Work
References
Handling Conjunctions in Named Entities
Introduction
Problem Description
Related Work
Experimental Setup
Corpus and Data Preparation
The Tag Set
Encoding
The Algorithms
Baseline
Classifiers
Results
Evaluation Scheme
Classification Results
Analysis
Conjunction Category Indicators
Error Analysis
Conclusions and Future Work
ANERsys: An Arabic Named Entity Recognition System Based on Maximum Entropy
Introduction
Named Entity Recognition in Arabic
The Maximum Entropy Approach
The Developed Resources
ANERcorp$^13$: Two Corpora for Training and Test
ANERgazet$^15$: Integrating Web-Based Gazetteers
Experiments and Results
Conclusions and Future Works
References
Applying Machine Learning to Chinese Entity Detection and Tracking
Introduction
Related Work
Mention Boundary Detection
Character-Based Model Combined with Word Knowledge
Boundary Detection with Conditional Random Fields
Character-Based N-Gram Features and Wordlist-Based Features
Head and Extent Combination
Entity Attribute Identification
Coreference Resolution
Experimental Results
Mention Boundary Detection Results
Entity Attribute Identification Results
Entity Tracking Results
Conclusion
References
Evaluation of an Automatic Extension of Temporal Expression Treatment to Catalan
Introduction
History of TERSEO Extensions
Extension to Catalan
Evaluation of the New Extension
Corpus Development
Results
Conclusions
A Generalized Approach to Word Segmentation Using Maximum Length Descending Frequency and Entropy Rate
Introduction
Related Work
Proposed Method
A Walk-Through Example
Evaluation and Experimental Results
Conclusion and Future Work
References
Tagging Sentence Boundaries in Biomedical Literature
Introduction
Methods
Special SBD Issues in Biomedical Literature
Rule-Based Approach for Biomedical Literature SBD
Results
Discussion
References
Probabilistic Classifications with TBL
Introduction
Transformation Based Learning
Probability Estimation with TBL Classifiers
The Proposed Method
Equivalence Class Partitioning
Smoothing
Related Work
Experiments
Cross Entropy and Perplexity
Rejection Curve
Active Learning
Conclusions
The Non-associativity of Polarized Tree-Based Grammars
Introduction
Existing Polarity Systems
XMG Colors
PUGs
General Polarity Systems
Conclusion
Dependency Analysis of Clauses Using Parse Tree Kernels
Introduction
Problem Setting
Clausal Dependency Identification
Parse Tree Kernels
Clause Dependency Analysis with Parse Tree Kernels
Dependency Relation in Korean Clauses
Clause Representation
Multi-class Classification
Experiments
Conclusions
Unsupervised Method for Parsing Coordinated Base Noun Phrases
Introduction
Motivation
Previous Work
Syntactic Parsing
Web Counts
Approach and Statistical Modeling
The Models
Experimental Setup and Results
Web Counts
Evaluation Measures
Conclusion
Text Categorization for Improved Priors of Word Meaning
Introduction
Finding Predominant Senses
Creating the Domain Corpora
The GigaWord Corpus
The Classifier
The Domain Corpora
Domain Rankings
Experiments and Evaluation
Hand-Labelled Versus Automatically Classified
Senseval
Domain Salient Words
Discussion and Future Research
Case-Sensitivity of Classifiers for WSD: Complex Systems Disambiguate Tough Words Better
Introduction
Prediction Factors
Case Factors
System Factors
Optimal Ensembling Method
Evaluation
Test Setting
Base System Complexity vs Tough / Easy Words
Base System Complexity vs Best Optimal Ensembles
Discussion
Conclusions and Future Work
References
Word Clustering for Collocation-Based Word Sense Disambiguation
Introduction
Related Work
The Yarowsky Algorithm
Word Clustering
Extending the Collocation List
Experiment
Data Set
Experimental Setup
Experiment Results
Discussion of Results
Relationship Between F-Measure with Word-Class and Corpus
Error Analysis
Conclusion and the Future Work
References
Lexical Constellations and the Structure of Meaning: A Prototype Application to WSD
Introduction
Meaning by Constellation and WSD
DFA and WSD
The Algorithm in Action
Some Considerations and Conclusions
References
Rule-Based Protein Term Identification with Help from Automatic Species Tagging
Introduction
Related Work
Data and Ontology
Hybrid Approachs to TI
Assigning Potential CM Identifiers to Protein Mentions
Term Disambiguation
Results
Conclusions
Unsupervised Discrimination of Person Names in Web Contexts
Introduction
Lexical Features
Second Order Context Representation
Cluster Stopping
Experimental Data
Evaluation
Experimental Results
Discussion and Conclusions
Learning for Semantic Parsing
Introduction
Sample Applications and Their MRLs
Systems for Learning Semantic Parsers
Scissor
Wasp
Krisp
Experimental Evaluation
Future Research
Conclusions
The Usefulness of Conceptual Representation for the Identification of Semantic Variability Expressions
Introduction
Conceptual Representation Space with Latent Semantic Analysis
WordNet Domains
WordNet Alignment with SUMO
Latent Semantic Analysis
A Walk-Through Example
Evaluation
Microsoft Paraphrase Corpus
Evaluation Measures
Results
Discussion
Conclusions
Characterizing Humour: An Exploration of Features in Humorous Texts
Introduction
Related Work
Datasets for Computational Humour
One-Liners
Humorous News Articles
Automatic Humour Recognition
Negative Datasets
Text Classification
Classification Results
Characteristics of Verbal Humour
Human Centeredness
Polarity Orientation
Discussion and Conclusions
Representing Emotions with Linguistic Acuity
Introduction
Related Work
Emotions in Soap Opera
Types of Emotional Cues
Annotating Text with Emotions
Linguistic Analysis
Words with Explicit Emotional Senses
Disambiguating Arguments and Expressions
Reassessing the Emotional Weight of a Given Expression
Strengthening Vague Expressions into Marked Ones
Encoding Emotions with Parsing
Discussion and Conclusion
An Evaluation of UNL Usability for High Quality Multilingualization and Projections for a Future UNL++ Language
Introduction
Embedding a Comparative Task-Related Evaluation of UNL in a Real Translation Task
Steps of the Experiment
Measured Results
Potential Actual and Future Gains
Aspects to Improve in the UNL Way
Common Development Web Platform
UNL Specification and UW Construction
Need for Open Source and More Tools
A New Impetus: The U++ Consortium
Goals
Roadmap
Spreading UNL Usage
Conclusion
References
Transfer-Based MT from Spanish into Basque: Reusability, Standardization and Open Source
Introduction
General Architecture
The De-formatter
The Analyzer
The Transfer Module
The Generation
The Re-formatter
Formats for Interaction
Linguistic Data
Dictionaries
Grammars
The Programs
Evaluation
Conclusions
References
Dependency-Based Chinese-English Statistical Machine Translation
Introduction
The Dependency Based Translation Model
The Training of Treelet Mapping
Data Preparation
Training of Treelet Translation Probability
Chinese-English Specific Treatment
The Decoder
Experiments and Results
Training and Test Data
Experiments
Discussions
Conclusions and Future Work
References
Asymmetric Hybrid Machine Translation for Languages with Scarce Resources
Introduction
Natural Language Processing Resources
Monolingual Natural Language Processing
Towards Asymmetric Hybrid Translation Machine
Evaluation Results
Conclusions
References
CL-Guided Korean-English MT System for Scientific Papers
Introduction
Related Works
Customizing Korean-English MT System for Scientific Papers
Korean POS Tagger and Syntactic Analyzer
Pattern-Based Generation
Long Sentence Processing
CL-Guided Machine Translation
CL Rules for Source Language Rewriting
Target Language Rewriting Using Example Expressions
Evaluation
Conclusion
References
Comparing and Integrating Alignment Template and Standard Phrase-Based Statistical Machine Translation
Introduction
Survey of Phrase-Based SMT
Comparisons of SP-SMT and AT-SMT
Sequence Alignment Model
Feature Functions
Decoding Process
Integration AT into SP-SMT
Experiments
Corpus
Feature
Language
Integration Method
Conclusions
Dependency Analysis and CBR to Bridge the Generation Gap in Template-Based NLG
Introduction
Case Based Reasoning Techniques and Dependency Analysis
Case Based Reasoning Techniques and Technologies
The Role of Taxonomies in Computing Similarity
Dependency Analysis
Using Dependency Analysis to Build a Case-Base for Template Selection
Basic Operation of the Case-Based Template-Selection Module
The Resources Required: Case Base and Vocabulary
Constructing the Case Base from the Dependency Trees for the Corpus
Evaluation and Discussion
Conclusions and Future Work
Experiments on Generating Questions About Facts
Introduction
Why Question Generation?
AutoTutor
Related Work
Our Approach to Question Generation
The Question Generation Mark-Up Language
AIML
Question Generation Mark-Up Language (QG-ML)
Example of a Category
Interpreter of the Mark-Up Language
Evaluation and Experimental Results
Factual Question Generation
Future Work
Conclusions
Expert vs. Non-expert Tutoring: Dialogue Moves, Interaction Patterns and Multi-utterance Turns
Introduction
Our Previous Work
Study of Tutorial Interaction Patterns
Tutor-Student Interaction Patterns
Student-Tutor Interaction Patterns
Study of Multi-utterance Turns
Conclusions and Future Work
A Competitive Term Selection Method for Information Retrieval
Introduction
Term Selection and Weighting
Entropy
Transition Point
Term Enrichment
Union of Entropy and TP
Experiments
Data Description
Results
Discussion
Incorporating Passage Feature Within Language Model Framework for Information Retrieval
Introduction
Previous Work
Passage Retrieval
Language Model in Information Retrieval
Incorporating Passage Information into the Language Model Framework
Experiments and Results
Experiment Design
Experimental Results
Conclusion
References
Enhancing Cross-Language Question Answering by Combining Multiple Question Translations
Introduction
Proposed Methods
Method 1: “Selecting the Best Translation”
Method 2: “Combining Passages from Several Translations”
Method 3: “Constructing a Question Reformulation”
Experimental Results
Experimental Setup
Results
Conclusions and Future Work
References
The Negative Effect of Machine Translation on Cross–Lingual Question Answering
Introduction
State of the Art
Taxonomy of the MT Errors for CL--QA
Wrong Word--by--Word Translation
Wrong Translated Sense
Wrong Syntactic Structure
Wrong Interrogative Particle
Wrong Lexical-Syntactic Category
Unknown Words
Wrong Proper Name
Our Approach to CL--QA
Solution to Wrong Word--by--Word Translation
Solution to Wrong Translated Sense
Solution to Wrong Syntactic Structure
Solution to Wrong Interrogative Particle
Solution to Wrong Lexical-Syntactic Category
Solution to Unknown Words
Solution to Wrong Proper Name
Evaluation
Conclusion and Future Work
Using Clustering Approaches to Open-Domain Question Answering
Introduction
Sentence Clustering for Cluster-Based Language Model
Main Idea
One-Sentence-Multi-Topic
Experiments with Sentence Retrieval
Pattern-Similarity-Based Clustering to Learn Answer Patterns
Architecture for Answer Pattern Learning
Sentence Clustering and Pattern Extraction
Vertical Clustering
Horizontal Clustering
Experiments with Unsupervised Answer Pattern Learning
Conclusion and Future Work
References
A Little Known Fact Is . . . Answering Other Questions Using Interest-Markers
Introduction
Answering ``Other'' Questions
Finding the Wikipedia Article
Extracting Target-Specific Interest Markers
Finding Interesting Sentences
Ranking Interesting Sentences
Universal Interest Markers
Results and Analysis
Related Work
Summary and Future Work
Adapting the JIRS Passage Retrieval System to the Arabic Language
Introduction
Retrieving Passages in Arabic
The Arabic-JIRS Passage Retrieval System
Experiments and Results
Conclusions and Further Work
Test-Bed for Arabic Question Answeringhttp://www.dsic.upv.es/ybenajiba
Preliminary Results
Using Question-Answer Pairs in Extractive Summarization of Email Conversations
Introduction
Previous and Related Work
The Data
Extractive Summarization
Question-Answer Pair Detection
Integrating Question-Answer Pairs with Extractive Sentences
Postprocessing Extracted Sentences
Email Summarization Interface
Conclusion and Future Work
NEO-CORTEX: A Performant User-Oriented Multi-Document Summarization System
Introduction
Background and Related Works
System Overview
Similarity
Overlap
Final Sentence Ranking
Evaluating Summary Quality
Tuning the Parameters for the DUC Task
Adaptations for DUC 2006 Task
Managing the Topics
Finding the Best Metrics for DUC 2006
Managing the Sentence Length
Results
Conclusion and Future Work
Event-Based Summarization Using Time Features
Introduction
Related Work
Event Representation on Time Line
Event Weighting and Sentence Selection
Experiments and Discussion
Preliminary Evaluation on Two Clusters
Evaluation on Ten Clusters
Discussion
Conclusions and Future Work
References
NLP-Based Curation of Bacterial Regulatory Networks
Introduction
A Regulatory Network Extraction System
A Markup Language for Mining Bacterial Regulatory Networks
Analyzed Corpora, RegulonDB and the Manual Curation Process
Evaluating the Network Extraction Task
Evaluating Manual/Automatic Curation Strategies
References
Discussion
Exploiting Category Information and Document Information to Improve Term Weighting for Text Categorization
Introduction
Feature Selection and Term Weighting
Category Information and Document Information
Intuitionistic Term Weighting Factor Constraints
A Partial Probability Model for Term Weighting
Model Formulation
Global Term Weighting Factor
Relation with idf
Statistical Characteristics
Experiments
Results and Discussions
Comprehensible Samples
Conclusions
On the Impact of Lexical and Linguistic Features in Genre- and Domain-Based Categorization
Introduction
Genres and Domains
Methodology
Development of a Pilot Corpus
Feature Selection for Scientific Texts
Classifiers Used
Evaluation Framework
Experimentations
Domain Classification
Genre Classification
Further Analysis: Micro vs. Macro-precision
Analysis of the Discriminatory Descriptors
Domain Descriptors
Genre Descriptors
Conclusion
Clustering Narrow-Domain Short Texts by Using the Kullback-Leibler Distance
Introduction
The Kullback-Leibler Distance
Description of the Corpora
The $CICLing-2002$ Corpus
The $hep-ex$ Corpus of CERN
The KnCr Corpus of MEDLINE
Preprocessing
Description of the FSTs Used
Experimental Results
Performance Measurement
Results
Conclusions
A Mixed Trigrams Approach for Context Sensitive Spell Checking
Introduction
A Mixed Trigrams Approach
Mixed Trigrams
Confusion Sets
Levensthein Distance
Method
Conditional Probability Estimation for the Central Word
Experimental Settings
Algorithm Training
Test Data
Performance Measures
Experimental Results
Hit and False Positive Rates
Coverage
Conclusions and Future Work
Combining Methods for Detecting and Correcting Semantic Hidden Errors in Arabic Texts
Introduction
Semantic Hidden Errors
Detecting Semantic Irregularities
Co-occurrence-Collocation Method
Context-Vector Method
Vocabulary-Vector Method
Latent Semantic Analysis Method
Voting Method
Correcting Semantic Errors
Context of Work
Testing and Results
Evaluation of the Detection Component
Evaluation of the Correction Component
Conclusion
References
Author Index

📜 SIMILAR VOLUMES