𝔖 Scriptorium
✦   LIBER   ✦

πŸ“

Data Analysis with Open Source Tools: A Hands-On Guide for Programmers and Data Scientists

✍ Scribed by Janert, Philipp K


Publisher
O'Reilly Media
Year
2010;2011
Tongue
English
Leaves
532
Category
Library

⬇  Acquire This Volume

No coin nor oath required. For personal study only.

✦ Synopsis


Collecting data is relatively easy, but turning raw information into something useful requires that you know how to extract precisely what you need. With this insightful book, intermediate to experienced programmers interested in data analysis will learn techniques for working with data in a business environment. You'll learn how to look at data to discover what it contains, how to capture those ideas in conceptual models, and then feed your understanding back into the organization through business plans, metrics dashboards, and other applications.

Along the way, you'll experiment with concepts through hands-on workshops at the end of each chapter. Above all, you'll learn how to think about the results you want to achieve -- rather than rely on tools to think for you.


Use graphics to describe data with one, two, or dozens of variables
Develop conceptual models using back-of-the-envelope calculations, as well asscaling and probability arguments
Mine data with computationally intensive methods such as simulation and clustering
Make your conclusions understandable through reports, dashboards, and other metrics programs
Understand financial calculations, including the time-value of money
Use dimensionality reduction techniques or predictive analytics to conquer challenging data analysis situations
Become familiar with different open source programming environments for data analysis
"Finally, a concise reference for understanding how to conquer piles of data."--Austin King, Senior Web Developer, Mozilla

"An indispensable text for aspiring data scientists."--Michael E. Driscoll, CEO/Founder, Dataspora

✦ Table of Contents


Contents......Page 9
Preface......Page 15
Before We Begin......Page 16
Conventions Used in This Book......Page 17
How to Contact Us......Page 18
Acknowledgments......Page 19
Data Analysis......Page 21
What's in This Book......Page 22
What's with the Workshops?......Page 23
What's with the Math?......Page 24
What You'll Need......Page 25
What's Missing......Page 26
Part I. Graphics: Looking at Data......Page 29
Chapter 2. A Single Variable: Shape and Distribution......Page 31
Dot and Jitter Plots......Page 32
Histograms and Kernel Density Estimates......Page 34
The Cumulative Distribution Function......Page 43
Rank-Order Plots and Lift Charts......Page 50
Only When Appropriate: Summary Statistics and Box Plots......Page 53
Workshop: NumPy......Page 58
Further Reading......Page 65
Scatter Plots......Page 67
Conquering Noise: Smoothing......Page 68
Logarithmic Plots......Page 77
Banking......Page 81
Linear Regression and All That......Page 82
Showing What's Important......Page 86
Graphical Analysis and Presentation Graphics......Page 88
Workshop: matplotlib......Page 89
Further Reading......Page 98
Examples......Page 99
The Task......Page 103
Smoothing......Page 104
Don't Overlook the Obvious!......Page 110
The Correlation Function......Page 111
Optional: Filters and Convolutions......Page 115
Workshop: scipy.signal......Page 116
Further Reading......Page 118
Chapter 5. More Than Two Variables: Graphical Multivariate Analysis......Page 119
False-Color Plots......Page 120
A Lot at a Glance: Multiplots......Page 125
Composition Problems......Page 130
Novel Plot Types......Page 136
Interactive Explorations......Page 140
Workshop: Tools for Multivariate Graphics......Page 143
Further Reading......Page 145
A Data Analysis Session......Page 147
Workshop: gnuplot......Page 156
Further Reading......Page 158
Part II. Analytics: Modeling Data......Page 159
Chapter 7. Guesstimation and the Back of the Envelope......Page 161
Principles of Guesstimation......Page 162
How Good Are Those Numbers?......Page 171
Optional: A Closer Look at Perturbation Theory and Error Propagation......Page 175
Workshop: The Gnu Scientific Library (GSL)......Page 178
Further Reading......Page 181
Models......Page 183
Arguments from Scale......Page 185
Mean-Field Approximations......Page 195
Common Time-Evolution Scenarios......Page 198
Case Study: How Many Servers Are Best?......Page 202
Workshop: Sage......Page 204
Further Reading......Page 208
The Binomial Distribution and Bernoulli Trials......Page 211
The Gaussian Distribution and the Central Limit Theorem......Page 215
Power-Law Distributions and Non-Normal Statistics......Page 221
Other Distributions......Page 226
Optional: Case Studyβ€”Unique Visitors over Time......Page 231
Workshop: Power-Law Distributions......Page 235
Further Reading......Page 238
Genesis......Page 241
Statistics Defined......Page 243
Statistics Explained......Page 246
Controlled Experiments Versus Observational Studies......Page 250
Optional: Bayesian Statisticsβ€”The Other Point of View......Page 255
Workshop: R......Page 263
Further Reading......Page 269
How to Average Averages......Page 273
The Standard Deviation......Page 276
Least Squares......Page 280
Further Reading......Page 284
Part III. Computation: Mining Data......Page 285
A Warm-Up Question......Page 287
Monte Carlo Simulations......Page 290
Resampling Methods......Page 296
Workshop: Discrete Event Simulations with SimPy......Page 300
Further Reading......Page 311
What Constitutes a Cluster?......Page 313
Distance and Similarity Measures......Page 318
Clustering Methods......Page 324
Pre- and Postprocessing......Page 331
Other Thoughts......Page 334
A Special Case: Market Basket Analysis......Page 336
A Word of Warning......Page 339
Workshop: Pycluster and the C Clustering Library......Page 340
Further Reading......Page 344
Chapter 14. Seeing the Forest for the Trees: Finding Important Attributes......Page 347
Principal Component Analysis......Page 348
Visual Techniques......Page 357
Kohonen Maps......Page 359
Workshop: PCA with R......Page 362
Further Reading......Page 368
Chapter 15. Intermezzo: When More Is Different......Page 371
A Horror Story......Page 373
Some Suggestions......Page 374
What About Map/Reduce?......Page 376
Workshop: Generating Permutations......Page 377
Further Reading......Page 378
Part IV. Applications: Using Data......Page 379
Chapter 16. Reporting, Business Intelligence, and Dashboards......Page 381
Business Intelligence......Page 382
Corporate Metrics and Dashboards......Page 389
Data Quality Issues......Page 393
Workshop: Berkeley DB and SQLite......Page 396
Further Reading......Page 401
Chapter 17. Financial Calculations and Modeling......Page 403
The Time Value of Money......Page 404
Uncertainty in Planning and Opportunity Costs......Page 411
Cost Concepts and Depreciation......Page 414
Should You Care?......Page 418
Is This All That Matters?......Page 419
Workshop: The Newsvendor Problem......Page 420
Further Reading......Page 423
Introduction......Page 425
Some Classification Terminology......Page 427
Algorithms for Classification......Page 428
The Process......Page 439
The Secret Sauce......Page 443
The Nature of Statistical Learning......Page 444
Workshop: Two Do-It-Yourself Classifiers......Page 446
Further Reading......Page 451
Chapter 19. Epilogue: Facts Are Not Reality......Page 453
Appendix A......Page 455
Appendix B......Page 467
Appendix C......Page 505
Index......Page 519

✦ Subjects


Computer Science;Programming;Reference;Science;Technology;Nonfiction;Technical;Computers;Artificial Intelligence;Mathematics


πŸ“œ SIMILAR VOLUMES


Data Analysis with Open Source Tools: A
✍ Philipp K. Janert πŸ“‚ Library πŸ“… 2010 πŸ› O'Reilly Media 🌐 English

Collecting data is relatively easy, but turning raw information into something useful requires that you know how to extract precisely what you need. With this insightful book, intermediate to experienced programmers interested in data analysis will learn techniques for working with data in a busines

Data Analysis with Open Source Tools
✍ Philipp K. Janert πŸ“‚ Library πŸ“… 2010 πŸ› O'Reilly Media 🌐 English

These days it seems like everyone is collecting data. But all of that data is just raw information -- to make that information meaningful, it has to be organized, filtered, and analyzed. Anyone can apply data analysis tools and get results, but without the right approach those results may be useless

Practical Data Analysis: Transform, mode
✍ Hector Cuesta πŸ“‚ Library πŸ“… 2013 πŸ› Packt Publishing 🌐 English

Plenty of small businesses face big amounts of data but lack the internal skills to support quantitative analysis. Understanding how to harness the power of data analysis using the latest open source technology can lead them to providing better customer service, the visualization of customer needs,

Hands-On Data Analysis with Pandas: A Py
✍ Stefanie Molin πŸ“‚ Library πŸ“… 2021 πŸ› Packt Publishing 🌐 English

Get to grips with pandas - a versatile and high-performance library for manipulating, processing, cleaning, and crunching datasets in Python Key Features β€’ Perform efficient data analysis and manipulation tasks using pandas 1.x β€’ Implement pandas in different real-world domains with the help of

Quantitative Data Analysis with Minitab:
✍ Duncan Cramer πŸ“‚ Library πŸ“… 1996 🌐 English

Quantitative Data Analysis with Minitab explains statistical tests for Minitab users using the same formula-free, non-technical approach as the very successful SPSS version. Students are introduced to the basic commands of the package and shown how quantitative data analysis techniques can be implem