In this issue
โ Scribed by Boyce, Bert R.
- Publisher
- John Wiley and Sons
- Year
- 1997
- Tongue
- English
- Weight
- 17 KB
- Volume
- 48
- Category
- Article
- ISSN
- 0002-8231
No coin nor oath required. For personal study only.
โฆ Synopsis
Three articles herein are bibliometric in nature, and the function which also penalized terms of very high document occurrence and set a maximum cluster size of 100. remainder concern automatic classification and indexing.
Bookstein shows the Informetric laws can all be trans-These vocabularies can be browsed by user designed queries or used in a spreading activation process with the formed into close approximations of Lotka's law. A model with a general stable component providing an ex-Hopfield net algorithm. Terms suggested by biologists to link concepts in the two areas are in the conjoined thesau-pected value for the counts is supplemented by a generating function for a random component effectively supplied rus 60 to 85% of the time.
Leung and Kan, using a ''positive training set,'' words by the family of compound Poisson distributions. This introduction of ambiguities does not effect the regularity.
from titles and abstracts of documents that have been indexed by a particular controlled term, and a ''negative Liu compares citation data from seven Chinese journals and translation data from several sources to deter-training set,'' words from documents that have not been so indexed, compute a z-score for each word in each set. mine that a strong relationship exists between the number of items translated into Chinese from a particular country
The difference in the z-scores in the two sets is used to weight the words in both cases. and the number of citations of items from that country in Chinese journals. The correlation between translations A vector of the words in the training sets with their difference weights is created for each controlled term and into Chinese from a language and citations to items in that language is even stronger.
an indexing score for that controlled term and document, which is the sum of the weights times the frequency of the Exon and Punch repeated the 1981 Paustian analysis, which noted a weak but significant positive correlation be-word in the document, divided by the number of words in the document, is generated. If this indexing score is greater tween numbers of items borrowed on interlibrary loan and the collection size of borrowing academic libraries, finding than the sum of the average indexing scores in the positive and negative indexing set, the controlled term is assigned. a correlation of considerably greater strength. There is no evidence that building a collection will reduce interlibrary
In an extensive sample in INSPEC 88% of the positive evaluation set was properly indexed and 8% of the nega-borrowing and thus lead to a self-sufficient collection.
May provides an interesting contribution to the e-mail tive evaluation set was improperly indexed. In a similar MEDLINE sample the percentages were 88% and 6%. filtering problem by suggesting grouping by message type rather than topic. Over 1300 messages were grouped into Reruns of the experiment with changing thresholds indicates that the sum of the average indexing scores in the question, response, announcement and administrative categories based on matching with selected pattern strings of two training sets is an optimal threshold. Using abstracts from eight different disciplines, target text, and these groups compared with the investigator's manual groupings. Cohen's Kappa indicates agreement beyond sets of terms representing the domain of each discipline were selected manually by Haas. Words that did not ap-chance for the 897 messages classified by the automatic method with those categorized manually. Unfortunately pear, or appeared with a domain marker, in a standard college dictionary were termed seed words. Words in a 34% of the messages could not be automatically classified.
Lin shows that display structures of related documents 1 to 9 word window about the seed words were extracted and categorized as stop words, domain terms that matched provide a way of handling large retrieved sets. In a trial using index term generated vectors for documents in a the targets, and general words. In all disciplines the percent of target terms extracted conventionally retrieved set, Kohonen's algorithm modifies term weights to strengthen the links between close increases and the percent of domain terms of all the words extracted decreases as window size grows. The extraction document vectors and finally partition the set. The areas formed correspond in size to the word frequencies and process is more successful in the sciences, which have more domain terms occurring in sequences, than other their closeness represents co-occurrence.
Chen and others look at linkages between two special-disciplines. ized vocabularies with a 30% term overlap. Terms were weighted using the product of term frequency and inverse document frequency and clustered using an asymmetric
๐ SIMILAR VOLUMES
The articles by Egghe, by Rousseau and Van Hooydonk, and by Kokol and Kokol are bibliometric in nature, while Persin et al. deals with efficient data structures for information retrieval. Wolfram examines a hypertext structure for information retrieval. Dillon and Schaap are concerned with readers'
Bookstein shows that information on usage counts and linkages between patrons and items can be retained and utilized while maintaining confidentiality. If one links the patron ID file with the historical use data by using a transforming function which computes easily from patron to circulation data,
According to Robbin and Frost-Kumpf, data produc-distance between