𝔖 Scriptorium
✦   LIBER   ✦

πŸ“

The Web As Corpus: Theory and Practice

✍ Scribed by Maristella Gatto


Publisher
Bloomsbury Academic
Year
2014
Tongue
English
Leaves
255
Category
Library

⬇  Acquire This Volume

No coin nor oath required. For personal study only.

✦ Synopsis


Is the internet a suitable linguistic corpus? How can we use it in corpus techniques? What are the special properties that we need to be aware of?This book answers those questions.
The Web is an exponentially increasing source of language and corpus linguistics data. From user-generated Web 2.0 content to gigantic static information resources, the breadth and depth of information available is breathtaking - and bewildering. This book explores the theory and practice of "web as corpus". It looks at the most common tools and methods used and features a plethora of examples based on the author's own teaching experience. This book also bridges the gap between studies in computational linguistics, which emphasize technical aspects, and studies in corpus linguistics, which focus on the implications for language theory and use.

✦ Table of Contents


FC
Half Title
Title Page
Copyright
Contents
List of Figures
List of Tables
Preface
Acknowledgements
Introduction
1. Corpus Linguistics: Basic Principles
Introduction
1. Theory, approach, methods: Corpus linguistics as a research field
2. Key issues in corpus linguistics
2.1. Authenticity
2.2. Representativeness
2.3. Balance and sampling
2.4. Size
2.5. Types of corpora
3. Corpus, concordance, collocation: Tools and analysis
3.1. Corpus creation
3.2. Corpus analysis
3.2.1. Wordlists and keywords
3.2.2. Concordances
3.3. Collocation and basic statistics
3.4. Colligation and semantic associations
Conclusion
Study questions and activities
Suggestions for further reading
2. The Body and the Web: An Introduction to the Web as Corpus
Introduction
1. Corpus linguistics and the web
2. The web as corpus: A β€˜body’ of texts?
3. The corpus and the web: Key issues
3.1. Authenticity
3.2. Representativeness
3.3. Size
3.4. Composition
3.4.1. Medium
3.4.2. Language
3.4.3. Topics
3.4.4. Registers, (web) genres, and text types
3.5 Copyright
4. From β€˜body’ to β€˜web’: New issues
4.1. Dynamism
4.2. Reproducibility
4.3. Relevance and reliability
Conclusion
Study questions and activities
Suggestions for further reading
3. Challenging Anarchy: Web Search from a Corpus Perspective
Introduction
1. The corpus and the search
2. Search engine basics: Crawling, indexing, searching, ranking
3. Google and the others: An overview of commercial search engines
4. Challenging anarchy: Mastering advanced web search
4.1. An overview of web search options
4.2. Limits and potentials of β€˜webidence’
4.3. Phrase search and collocation
4.4. Phraseology and patterns
4.5. Provenance, site, domain and more: Searching subsections of the web
4.6. Testing translation candidates
5. Query complexity: Web search from a corpus perspective
Conclusion
Study questions and activities
Suggestions for further reading
4. Beyond Ordinary Search Engines: Concordancing the Web
Introduction
1. Beyond ordinary search engines: Concordancing the web
2. WebCorp Live
3. Concordancing the web in the foreign language classroom
3.1. Exploring collocation: The case of scenery
3.2. Investigating neologisms and phrasal creativity
4. Beyond web concordancing tools: The web as/for corpus
5. Towards a linguist’s search engine: The case of WebCorpLSE
5.1. The Web as/for Corpus: From WebCorp Live to WebCorpLSE
5.2. Using WebCorpLSE to explore contemporary English
5.2.1. Synchronic English Web Corpus and Diachronic English Web Corpus
5.2.2. Birmingham Blog Corpus
Conclusion
Study questions and activities
Suggestions for further reading
5. Building and Using Comparable Web Corpora: Tools and Methods
Introduction
1. Building DIY web corpora
2. From words to corpus: The β€˜bootstrap’ process
2.1. Compiling a domain-specific corpus with BootCaT
2.2. Compiling specialized corpora with WebBootCaT
3. Building and using comparable web corpora for translation practice
Conclusion
Study questions and activities
Suggestions for further reading
6. Sketches of Language and Culture from Large Web Corpora
Introduction
1. From web as corpus to corpus as web: Introducing large general purpose web corpora
2. Mega–corpus, mini–Web: The case of ukWaC
2.1. Selecting β€˜seed URLs’ and crawling
2.2. Post-crawl cleaning and annotation
3. Exploring large web corpora: Tools and resources
3.1. The Sketch Engine: From concordance lines to word sketches
3.2. The Sketch Difference function
4. Case study: Sketches of culture from the BNC to the web
4.1. A very difficult word…
4.2. Sketches of culture from the British National Corpus
4.3. Sketches of culture from UkWaC
4.3.1. Culture as object
4.3.2. Culture as subject
4.3.3. Modifiers of culture
4.3.4. Culture as modifier
4.3.5. The pattern culture and/or NOUN
4.4. A culture of: The changing face of culture in contemporary society
Conclusion
Study questions and activities
Suggestions for further reading
7. From Download to Upload: The Web as Corpus in the Web 2.0 Era
Introduction
1. From download to upload: Web users as prosumers
2. Web 2.0 as corpus. The case of Wikipedia as a multilingual corpus
3. The corpus in the cloud? The challenges that lie ahead
Suggestions for further reading
Conclusion
References
Index


πŸ“œ SIMILAR VOLUMES


The Web As Corpus: Theory and Practice
✍ Maristella Gatto πŸ“‚ Library πŸ“… 2014 πŸ› Bloomsbury Academic 🌐 English

<p>Is the internet a suitable linguistic corpus? How can we use it in corpus techniques? What are the special properties that we need to be aware of? This book answers those questions.</p><p>The Web is an exponentially increasing source of language and corpus linguistics data. From gigantic static i

Corpus Stylistics: Theory and Practice
✍ Dan McIntyre, Brian Walker πŸ“‚ Library πŸ“… 2019 πŸ› Edinburgh University Press 🌐 English

This theoretical and practical guide to using corpus linguistic techniques in stylistic analysis focuses on how to use off-the-shelf corpus software, such as AntConc, Wmatrix, and the Brigham Young University (BYU) corpus interface.

Corpus Stylistics: Theory and Practice
✍ Dan McIntyre; Brian Walker πŸ“‚ Library πŸ“… 2022 πŸ› Edinburgh University Press 🌐 English

<h4>A theoretical and practical guide to using corpus linguistic techniques in stylistic analysis</h4> <p>The use of corpora in stylistics has increased substantially in recent years but until now there has been no book detailing the theoretical basis and methodological practices of corpus stylistic

Corpus Linguistics: Method, Theory and P
✍ Tony McEnery, Andrew Hardie πŸ“‚ Library πŸ“… 2011 πŸ› Cambridge University Press 🌐 English

Corpus linguistics is the study of language data on a large scale - the computer-aided analysis of very extensive collections of transcribed utterances or written texts. This textbook outlines the basic methods of corpus linguistics, explains how the discipline of corpus linguistics developed and su

Corpus Linguistics: Method, Theory and P
✍ Tony McEnery and Andrew Hardie πŸ“‚ Library πŸ“… 2012 πŸ› Cambridge University Press 🌐 English

Corpus linguistics is the study of language data on a large scale - the computer-aided analysis of very extensive collections of transcribed utterances or written texts. This textbook outlines the basic methods of corpus linguistics, explains how the discipline of corpus linguistics developed and su