Create your own natural language training corpus for machine learning. This example-driven book walks you through the annotation cycle, from selecting an annotation task and creating the annotation specification to designing the guidelines, creating a "gold standard" corpus, and then beginning the a
Natural Language Annotation for Machine Learning: A guide to corpus-building for applications
โ Scribed by James Pustejovsky, Amber Stubbs
- Publisher
- O'Reilly Media
- Year
- 2012
- Tongue
- English
- Leaves
- 97
- Edition
- Early Release
- Category
- Library
No coin nor oath required. For personal study only.
โฆ Synopsis
Create your own natural language training corpus for machine learning. This example-driven book walks you through the annotation cycle, from selecting an annotation task and creating the annotation specification to designing the guidelines, creating a "gold standard" corpus, and then beginning the actual data creation with the annotation process.
Systems exist for analyzing existing corpora, but making a new corpus can be extremely complex. To help you build a foundation for your own machine learning goals, this easy-to-use guide includes case studies that demonstrate four different annotation tasks in detail. Youโll also learn how to use a lightweight software package for annotating texts and adjudicating the annotations.
This book is a perfect companion to O'Reillyโs Natural Language Processing with Python, which describes how to use existing corpora with the Natural Language Toolkit.
โฆ Table of Contents
Cover......Page 1
Table of Contents......Page 4
Natural Language Annotation for Machine Learning......Page 6
Conventions Used in This Book......Page 7
Safariยฎ Books Online......Page 8
From the Authors......Page 9
The Importance of Language Annotation......Page 12
The Layers of Linguistic Description......Page 13
What is Natural Language Processing?......Page 15
A Brief History of Corpus Linguistics......Page 16
What is a Corpus?......Page 18
Early Use of Corpora......Page 20
Corpora Today......Page 23
Kinds of Annotation......Page 24
Language Data and Machine Learning......Page 29
Structured Pattern Induction......Page 30
The Annotation Development Cycle......Page 31
Model the phenomenon......Page 32
Annotate with the Specification......Page 35
Train and Test the algorithms over the corpus......Page 36
Evaluate the results......Page 37
Revise the Model and Algorithms......Page 38
Summary......Page 39
Defining a goal......Page 42
The Statement of Purpose......Page 43
Refining your Goal: Informativity versus Correctness......Page 44
The scope of the annotation task......Page 45
Where will the corpus come from?......Page 47
How will the result be achieved?......Page 48
Background research......Page 49
Organizations and Conferences......Page 50
Assembling your dataset......Page 51
Read speech......Page 52
Metadata......Page 53
Pre-processed data......Page 54
Existing Corpora......Page 55
Distributions within corpora......Page 56
Summary......Page 58
Some Example Models and Specs......Page 60
Film genre classification......Page 63
Adding Named Entities......Page 64
Semantic Roles......Page 65
Adopting (or not Adopting) Existing Models......Page 66
Creating your own Model and Specification: Generality versus Specificity......Page 67
Using Existing Models and Specifications......Page 69
Using Models without Specifications......Page 70
ISO Standards......Page 71
Annotation format standards......Page 72
Annotation specification standards......Page 73
Other standards affecting annotation......Page 74
Summary......Page 75
Annotated corpora......Page 78
Unique labels - movie reviews......Page 79
Multiple labels - film genres......Page 81
Text Extent Annotation: Named Entities......Page 84
In-line annotation......Page 85
Stand-off annotation by tokens......Page 87
Stand-off annotation by character location......Page 90
Linked Extent Annotation: Semantic Roles......Page 92
Summary......Page 93
Appendix. Bibliography......Page 96
๐ SIMILAR VOLUMES
Create your own natural language training corpus for machine learning. This example-driven book walks you through the annotation cycle, from selecting an annotation task and creating the annotation specification to designing the guidelines, creating a "gold standard" corpus, and then begin
<DIV><p>Create your own natural language training corpus for machine learning. Whether youโre working with English, Chinese, or any other natural language, this hands-on book guides you through a proven annotation development cycleโthe process of adding metadata to your training corpus to help ML al
<DIV><p>Create your own natural language training corpus for machine learning. Whether youโre working with English, Chinese, or any other natural language, this hands-on book guides you through a proven annotation development cycleโthe process of adding metadata to your training corpus to help ML al