𝔖 Scriptorium
✦   LIBER   ✦

📁

Mining the Social Web: Data Mining Facebook, Twitter, LinkedIn, Google+, GitHub, and More

✍ Scribed by Russell, Matthew A


Publisher
O'Reilly Media
Year
2013
Tongue
English
Leaves
448
Edition
Second edition
Category
Library

⬇  Acquire This Volume

No coin nor oath required. For personal study only.

✦ Synopsis


Facebook, Twitter, and LinkedIn generate a tremendous amount of valuable social data, but how can you find out who's making connections with social media, what they’re talking about, or where they’re located? This concise and practical book shows you how to answer these questions and more. You'll learn how to combine social web data, analysis techniques, and visualization to help you find what you've been looking for in the social haystack, as well as useful information you didn't know existed.Each standalone chapter introduces techniques for mining data in different areas of the social Web, including blogs and email. All you need to get started is a programming background and a willingness to learn basic Python tools.Get a straightforward synopsis of the social web landscape Use adaptable scripts on GitHub to harvest data from social network APIs such as Twitter, Facebook, and LinkedIn Learn how to employ easy-to-use Python tools to slice and dice the data you collect Explore social connections in microformats with the XHTML Friends Network Apply advanced mining techniques such as TF-IDF, cosine similarity, collocation analysis, document summarization, and clique detection Build interactive visualizations with web technologies based upon HTML5 and JavaScript toolkits "Data from the social Web is different: networks and text, not tables and numbers, are the rule, and familiar query languages are replaced with rapidly evolving web service APIs. Let Matthew Russell serve as your guide to working with social data sets old (email, blogs) and new (Twitter, LinkedIn, Facebook). Mining the Social Web is a natural successor to Programming Collective Intelligence: a practical, hands-on approach to hacking on data from the social Web with Python." —Jeff Hammerbacher

✦ Table of Contents


Copyright......Page 6
Table of Contents......Page 9
Managing Your Expectations......Page 15
Python-Centric Technology......Page 17
Improvements Specific to the Second Edition......Page 20
Conventions Used in This Book......Page 21
Using Code Examples......Page 22
Safari® Books Online......Page 23
Acknowledgments for the Second Edition......Page 24
Acknowledgments from the First Edition......Page 26
Part I. A Guided Tour of the Social Web......Page 27
Prelude......Page 29
Chapter 1. Mining Twitter: Exploring Trending Topics, Discovering What People Are Talking About, and More......Page 31
1.2. Why Is Twitter All the Rage?......Page 32
1.3.1. Fundamental Twitter Terminology......Page 35
1.3.2. Creating a Twitter API Connection......Page 38
1.3.3. Exploring Trending Topics......Page 41
1.3.4. Searching for Tweets......Page 46
1.4. Analyzing the 140 Characters......Page 52
1.4.1. Extracting Tweet Entities......Page 54
1.4.2. Analyzing Tweets and Tweet Entities with Frequency Analysis......Page 55
1.4.3. Computing the Lexical Diversity of Tweets......Page 58
1.4.4. Examining Patterns in Retweets......Page 60
1.4.5. Visualizing Frequency Data with Histograms......Page 62
1.5. Closing Remarks......Page 67
1.6. Recommended Exercises......Page 68
1.7. Online Resources......Page 69
Chapter 2. Mining Facebook: Analyzing Fan Pages, Examining Friendships, and More......Page 71
2.2. Exploring Facebook’s Social Graph API......Page 72
2.2.1. Understanding the Social Graph API......Page 74
2.2.2. Understanding the Open Graph Protocol......Page 80
2.3. Analyzing Social Graph Connections......Page 85
2.3.1. Analyzing Facebook Pages......Page 89
2.3.2. Examining Friendships......Page 96
2.5. Recommended Exercises......Page 111
2.6. Online Resources......Page 112
Chapter 3. Mining LinkedIn: Faceting Job Titles, Clustering Colleagues, and More......Page 115
3.2. Exploring the LinkedIn API......Page 116
3.2.1. Making LinkedIn API Requests......Page 117
3.2.2. Downloading LinkedIn Connections as a CSV File......Page 122
3.3. Crash Course on Clustering Data......Page 123
3.3.1. Clustering Enhances User Experiences......Page 126
3.3.2. Normalizing Data to Enable Analysis......Page 127
3.3.3. Measuring Similarity......Page 138
3.3.4. Clustering Algorithms......Page 141
3.4. Closing Remarks......Page 157
3.5. Recommended Exercises......Page 158
3.6. Online Resources......Page 159
Chapter 4. Mining Google+: Computing Document Similarity, Extracting Collocations, and More......Page 161
4.2. Exploring the Google+ API......Page 162
4.2.1. Making Google+ API Requests......Page 164
4.3. A Whiz-Bang Introduction to TF-IDF......Page 173
4.3.1. Term Frequency......Page 174
4.3.2. Inverse Document Frequency......Page 176
4.3.3. TF-IDF......Page 177
4.4.1. Introducing the Natural Language Toolkit......Page 181
4.4.2. Applying TF-IDF to Human Language......Page 184
4.4.3. Finding Similar Documents......Page 186
4.4.4. Analyzing Bigrams in Human Language......Page 193
4.4.5. Reflections on Analyzing Human Language Data......Page 203
4.5. Closing Remarks......Page 204
4.6. Recommended Exercises......Page 205
4.7. Online Resources......Page 206
Chapter 5. Mining Web Pages: Using Natural Language Processing to Understand Human Language, Summarize Blog Posts, and More......Page 207
5.1. Overview......Page 208
5.2. Scraping, Parsing, and Crawling the Web......Page 209
5.2.1. Breadth-First Search in Web Crawling......Page 212
5.3. Discovering Semantics by Decoding Syntax......Page 216
5.3.1. Natural Language Processing Illustrated Step-by-Step......Page 218
5.3.2. Sentence Detection in Human Language Data......Page 222
5.3.3. Document Summarization......Page 226
5.4. Entity-Centric Analysis: A Paradigm Shift......Page 235
5.4.1. Gisting Human Language Data......Page 239
5.5. Quality of Analytics for Processing Human Language Data......Page 245
5.7. Recommended Exercises......Page 248
5.8. Online Resources......Page 249
Chapter 6. Mining Mailboxes: Analyzing Who’s Talking to Whom About What, How Often, and More......Page 251
6.1. Overview......Page 252
6.2.1. A Primer on Unix Mailboxes......Page 253
6.2.2. Getting the Enron Data......Page 258
6.2.3. Converting a Mail Corpus to a Unix Mailbox......Page 261
6.2.4. Converting Unix Mailboxes to JSON......Page 262
6.2.5. Importing a JSONified Mail Corpus into MongoDB......Page 266
6.2.6. Programmatically Accessing MongoDB with Python......Page 270
6.3. Analyzing the Enron Corpus......Page 272
6.3.1. Querying by Date/Time Range......Page 273
6.3.2. Analyzing Patterns in Sender/Recipient Communications......Page 276
6.3.3. Writing Advanced Queries......Page 281
6.3.4. Searching Emails by Keywords......Page 285
6.4. Discovering and Visualizing Time-Series Trends......Page 290
6.5. Analyzing Your Own Mail Data......Page 294
6.5.1. Accessing Your Gmail with OAuth......Page 295
6.5.2. Fetching and Parsing Email Messages with IMAP......Page 297
6.5.3. Visualizing Patterns in GMail with the “Graph Your Inbox” Chrome Extension......Page 299
6.6. Closing Remarks......Page 300
6.7. Recommended Exercises......Page 301
6.8. Online Resources......Page 302
Chapter 7. Mining GitHub: Inspecting Software Collaboration Habits, Building Interest Graphs, and More......Page 305
7.1. Overview......Page 306
7.2. Exploring GitHub’s API......Page 307
7.2.1. Creating a GitHub API Connection......Page 308
7.2.2. Making GitHub API Requests......Page 312
7.3. Modeling Data with Property Graphs......Page 314
7.4.1. Seeding an Interest Graph......Page 318
7.4.2. Computing Graph Centrality Measures......Page 322
7.4.3. Extending the Interest Graph with “Follows” Edges for Users......Page 325
7.4.4. Using Nodes as Pivots for More Efficient Queries......Page 337
7.4.5. Visualizing Interest Graphs......Page 342
7.6. Recommended Exercises......Page 344
7.7. Online Resources......Page 346
Chapter 8. Mining the Semantically Marked-Up Web: Extracting Microformats, Inferencing over RDF, and More......Page 347
8.2. Microformats: Easy-to-Implement Metadata......Page 348
8.2.1. Geocoordinates: A Common Thread for Just About Anything......Page 351
8.2.2. Using Recipe Data to Improve Online Matchmaking......Page 357
8.2.3. Accessing LinkedIn’s 200 Million Online Résumés......Page 362
8.3. From Semantic Markup to Semantic Web: A Brief Interlude......Page 364
8.4. The Semantic Web: An Evolutionary Revolution......Page 365
8.4.1. Man Cannot Live on Facts Alone......Page 366
8.4.2. Inferencing About an Open World......Page 368
8.5. Closing Remarks......Page 371
8.6. Recommended Exercises......Page 372
8.7. Online Resources......Page 373
Part II. Twitter Cookbook......Page 375
Chapter 9. Twitter Cookbook......Page 377
9.1.3. Discussion......Page 378
9.2.1. Problem......Page 379
9.2.3. Discussion......Page 380
9.3.3. Discussion......Page 384
9.4.3. Discussion......Page 385
9.5.3. Discussion......Page 387
9.6.3. Discussion......Page 388
9.7.3. Discussion......Page 389
9.8.3. Discussion......Page 391
9.9.1. Problem......Page 392
9.9.3. Discussion......Page 393
9.10.3. Discussion......Page 394
9.11.3. Discussion......Page 396
9.12.2. Solution......Page 397
9.12.3. Discussion......Page 398
9.13.3. Discussion......Page 399
9.14.3. Discussion......Page 400
9.15.3. Discussion......Page 402
9.16.1. Problem......Page 403
9.16.3. Discussion......Page 404
9.17.3. Discussion......Page 406
9.18.3. Discussion......Page 407
9.19.3. Discussion......Page 408
9.20.3. Discussion......Page 410
9.21.3. Discussion......Page 412
9.22.3. Discussion......Page 414
9.23.2. Solution......Page 415
9.23.3. Discussion......Page 416
9.24.3. Discussion......Page 417
9.25.3. Discussion......Page 420
9.27. Recommended Exercises......Page 422
9.28. Online Resources......Page 423
Part III. Appendixes......Page 425
Appendix A. Information About This Book’s Virtual Machine Experience......Page 427
Overview......Page 429
OAuth 1.0A......Page 430
OAuth 2.0......Page 431
Appendix C. Python and IPython Notebook Tips & Tricks......Page 435
Index......Page 437
About the Author......Page 448

✦ Subjects


Computer Science;Programming;Science;Technology;Nonfiction;Reference;Social Science;Social Media;Business;Computers;Technical;Coding


📜 SIMILAR VOLUMES


Mining the social web: data mining Faceb
✍ Matthew A. Russell 📂 Library 📅 2013 🏛 O'Reilly Media 🌐 English

How can you tap into the wealth of social web data to discover who’s making connections with whom, what they’re talking about, and where they’re located? With this expanded and thoroughly revised edition, you’ll learn how to acquire, analyze, and summarize data from all corners of the social web, in

Mining the social web: data mining Faceb
✍ Matthew A. Russell 📂 Library 📅 2013 🏛 O'Reilly Media 🌐 English

How can you tap into the wealth of social web data to discover who’s making connections with whom, what they’re talking about, and where they’re located? With this expanded and thoroughly revised edition, you’ll learn how to acquire, analyze, and summarize data from all corners of the social web, in

Mining the social web [data mining Faceb
✍ Russell, Matthew A 📂 Library 📅 2013 🏛 O'Reilly Media 🌐 English

<p>Facebook, Twitter, LinkedIn, Google+, and other social web properties generate a wealth of valuable social data, but how can you tap into this data and discover who&#8217;s connecting with whom, which insights are lurking just beneath the surface, and what people are talking about? This book show

Mining the Social Web, 2nd Edition: Data
✍ Matthew A. Russell 📂 Library 📅 2013 🏛 O'Reilly Media 🌐 English

How can you tap into the wealth of social web data to discover who's making connections with whom, what they're talking about, and where they're located? With this expanded and thoroughly revised edition, you'll learn how to acquire, analyze, and summarize data from all corners of the social web, in

Mining the Social Web Data Mining Facebo
✍ Matthew A. Russell, Mikhail Klassen 📂 Library 📅 2019 🏛 O’Reilly Media 🌐 English

This book will teach you a few things that you’ll be thankful to learn and will add a few indispensable tools to your toolbox, but perhaps even more importantly, it will tell you a story and entertain you along the way. It’s a story about data science involving social websites, the data that’s tucke

Mining the Social Web: Analyzing Data fr
✍ Matthew A. Russell 📂 Library 📅 2011 🏛 O'Reilly Media 🌐 English

<DIV><p>Facebook, Twitter, and LinkedIn generate a tremendous amount of valuable social data, but how can you find out who's making connections with social media, what they’re talking about, or where they’re located? This concise and practical book shows you how to answer these questions and more. Y