๐”– Bobbio Scriptorium
โœฆ   LIBER   โœฆ

A heuristic method based on a statistical approach for Chinese text segmentation

โœ Scribed by Christopher C. Yang; K. W. Li


Publisher
John Wiley and Sons
Year
2005
Tongue
English
Weight
251 KB
Volume
56
Category
Article
ISSN
1532-2882

No coin nor oath required. For personal study only.

โœฆ Synopsis


Abstract

The authors propose a heuristic method for Chinese automatic text segmentation based on a statistical approach. This method is developed based on statistical information about the association among adjacent characters in Chinese text. Mutual information of biโ€grams and significant estimation of triโ€grams are utilized. A heuristic method with six rules is then proposed to determine the segmentation points in a Chinese sentence. No dictionary is required in this method. Chinese text segmentation is important in Chinese text indexing and thus greatly affects the performance of Chinese information retrieval. Due to the lack of delimiters of words in Chinese text, Chinese text segmentation is more difficult than English text segmentation. Besides, segmentation ambiguities and occurrences of outโ€ofโ€vocabulary words (i.e., unknown words) are the major challenges in Chinese segmentation. Many research studies dealing with the problem of word segmentation have focused on the resolution of segmentation ambiguities. The problem of unknown word identification has not drawn much attention. The experimental result shows that the proposed heuristic method is promising to segment the unknown words as well as the known words. The authors further investigated the distribution of the errors of commission and the errors of omission caused by the proposed heuristic method and benchmarked the proposed heuristic method with a previous proposed technique, boundary detection. It is found that the heuristic method outperformed the boundary detection method.


๐Ÿ“œ SIMILAR VOLUMES


A statistical framework for multivariate
โœ Alison J. Burnham; John F. MacGregor; Roman Viveros ๐Ÿ“‚ Article ๐Ÿ“… 1999 ๐Ÿ› John Wiley and Sons ๐ŸŒ English โš– 124 KB

A statistical framework is developed to contrast methods used for parameter estimation for a latent variable multivariate regression (LVMR) model. This model involves two sets of variables, X and Y, both with multiple variables and sharing a common latent structure with additive random errors. The m

A novel acceleration method for the vari
โœ Michel A. Tournour; Noureddine Atalla ๐Ÿ“‚ Article ๐Ÿ“… 1998 ๐Ÿ› John Wiley and Sons ๐ŸŒ English โš– 151 KB ๐Ÿ‘ 2 views

The acoustic radiation of general structures with Neumann's boundary condition using Variational Boundary Element Method (VBEM) is considered. The classical numerical implementation of the VBEM su ers from the computation cost associated with double surface integration. To alleviate this limitation,