Python Web Scraping

✍ Scribed by Richard Lawson, Katharine Jarmul

Publisher: Packt Publishing
Year: 2017
Tongue: English
Leaves: 215
Edition: 2nd ed
Category: Library

No coin nor oath required. For personal study only.

✦ Synopsis

Chapter 5: Dynamic Content ; An example dynamic web page; Reverse engineering a dynamic web page; Edge cases; Rendering a dynamic web page; PyQt or PySide; Debugging with Qt; Executing JavaScript; Website interaction with WebKit; Waiting for results; The Render class; Selenium; Selenium and Headless Browsers; Summary; Chapter 6: Interacting with Forms ; The Login form; Loading cookies from the web browser; Extending Read more...

Abstract: Chapter 5: Dynamic Content ; An example dynamic web page; Reverse engineering a dynamic web page; Edge cases; Rendering a dynamic web page; PyQt or PySide; Debugging with Qt; Executing JavaScript; Website interaction with WebKit; Waiting for results; The Render class; Selenium; Selenium and Headless Browsers; Summary; Chapter 6: Interacting with Forms ; The Login form; Loading cookies from the web browser; Extending the login script to update content; Automating forms with Selenium; ; Summary; Chapter 7: Solving CAPTCHA ; Registering an account; Loading the CAPTCHA image

✦ Table of Contents

Content: Cover
Credits
Copyright
About the Authors
About the Reviewers
www.PacktPub.com
Customer Feedback
Table of Contents
Preface
Chapter 1: Introduction to Web Scraping
When is web scraping useful?
Is web scraping legal?
Python 3
Background research
Checking robots.txt
Examining the Sitemap
Estimating the size of a website
Identifying the technology used by a website
Finding the owner of a website
Crawling your first website
Scraping versus crawling
Downloading a web page
Retrying downloads
Setting a user agent
Sitemap crawler
ID iteration crawler
Link crawlers. Advanced featuresParsing robots.txt
Supporting proxies
Throttling downloads
Avoiding spider traps
Final version
Using the requests library
Summary
Chapter 2: Scraping the Data
Analyzing a web page
Three approaches to scrape a web page
Regular expressions
Beautiful Soup
Lxml
CSS selectors and your Browser Console
XPath Selectors
LXML and Family Trees
Comparing performance
Scraping results
Overview of Scraping
Adding a scrape callback to the link crawler
Summary
Chapter 3: Caching Downloads
When to use caching?
Adding cache support to the link crawler
Disk Cache. Implementing DiskCacheTesting the cache
Saving disk space
Expiring stale data
Drawbacks of DiskCache
Key-value storage cache
What is key-value storage?
Installing Redis
Overview of Redis
Redis cache implementation
Compression
Testing the cache
Exploring requests-cache
Summary
Chapter 4: Concurrent Downloading
One million web pages
Parsing the Alexa list
Sequential crawler
Threaded crawler
How threads and processes work
Implementing a multithreaded crawler
Multiprocessing crawler
Performance
[Python multiprocessing and the GIL]
Python multiprocessing and the GIL. Optical character recognitionFurther improvements
Solving complex CAPTCHAs
Using a CAPTCHA solving service
Getting started with 9kw
The 9kw CAPTCHA API
Reporting errors
Integrating with registration
CAPTCHAs and machine learning
Summary
Chapter 8: Scrapy
Installing Scrapy
Starting a project
Defining a model
Creating a spider
Tuning settings
Testing the spider
Different Spider Types
Scraping with the shell command
Checking results
Interrupting and resuming a crawl
Scrapy Performance Tuning
Visual scraping with Portia
Installation
Annotation
Running the Spider.

✦ Subjects

Python.;Computer programming.

📜 SIMILAR VOLUMES

Web Scraping with Python

📁 Web Scraping with Python

✍ Ryan Mitchell 📂 Library 📅 2015 🏛 O'Reilly Media, Inc. 🌐 English

Python Web Scraping Cookbook

📁 Python Web Scraping Cookbook

✍ Michael Heydt 📂 Library 📅 2018 🏛 Packt 🌐 English

Python Web Scraping Cookbook is a solution-focused book that will teach you techniques to develop high-performance Scrapers, and deal with cookies, hidden form fields, Ajax-based sites, proxies, and more. By the end of this book, you will be able to scrape websites more efficiently with more accurat

Web Scraping with Python

📁 Web Scraping with Python

✍ Ryan Mitchell 📂 Library 📅 2018 🏛 O'Reilly Media, Inc. 🌐 English

Python Web Scraping Cookbook

📁 Python Web Scraping Cookbook

✍ Heydt, Michael 📂 Library 📅 2018 🏛 Packt Publishing 🌐 English

Python Web Scraping Cookbook

📁 Python Web Scraping Cookbook

✍ Heydt, Michael 📂 Library 📅 2018 🏛 Packt Publishing 🌐 English

Python Web Scraping Cookbook

📁 Python Web Scraping Cookbook

✍ Heydt, Michael;Zeng, Jay 📂 Library 📅 2018 🏛 Packt Publishing, Limited 🌐 English