Automated indexing of the hazardous substances data bank
β Scribed by Carlo Nuss; Hua Florence Chang; Dorothy Moore; George C. Fonger
- Publisher
- Wiley (John Wiley & Sons)
- Year
- 2005
- Tongue
- English
- Weight
- 307 KB
- Volume
- 40
- Category
- Article
- ISSN
- 0044-7870
No coin nor oath required. For personal study only.
β¦ Synopsis
Abstract
The Hazardous Substances Data Bank (HSDB), a factual data file produced and maintained by the Specialized Information Services (SIS) Division of the National Library of Medicine (NLM), contains over 4600 records on potentially hazardous chemicals. To improve information retrieval from HSDB, SIS has undertaken the development of an automated indexing protocol in collaboration with NLM's Indexing Initiative group. The Indexing Initiative investigates methods whereby automated indexing may partially or completely substitute for human indexing. Three main methodologies are applied: the MetaMap Indexing method, which maps text to concepts in the Unified Medical Language System (UMLS) Metathesaurus; the Trigram Phrase Matching method, which uses character trigrams to match text to Metathesaurus concepts; and a variant of the PubMed Related Citations method to find MeSH terms related to input text. The UMLS concepts generated by the first two methods are mapped to MeSH main headings through the RestrictβtoβMeSH algorithm. The resulting MeSH terms are then clustered into a ranked list of recommended indexing terms. The purpose of the poster is to present our experience in applying these automated indexing methodologies to a large data file with highly structured records, a variety of text and data formats, and complex technical and biomedical terminology.
π SIMILAR VOLUMES
Today a considerable amount of video data in multimedia databases requires sophisticated indices for its effective use. Manual indexing is the most effective method to do this, but it is also the slowest and the most expensive. Automated methods have then to be developed. This paper surveys several