Python Data Science Essentials

✍ Scribed by Boschetti, Alberto;Massaron, Luca

Publisher: Packt Publishing
Year: 2016
Tongue: English
Edition: 2nd edition
Category: Library

No coin nor oath required. For personal study only.

✦ Synopsis

NumPy's fast operations and computations -- Matrix operations -- Slicing and indexing with NumPy arrays -- Stacking NumPy arrays -- Summary -- Chapter 3: The Data Pipeline -- Introducing EDA -- Building new features -- Dimensionality reduction -- The covariance matrix -- Principal Component Analysis (PCA) -- PCA for big data - RandomizedPCA -- Latent Factor Analysis (LFA) -- Linear Discriminant Analysis (LDA) -- Latent Semantical Analysis (LSA) -- Independent Component Analysis (ICA) -- Kernel PCA -- T-SNE -- Restricted Boltzmann Machine (RBM) -- The detection and treatment of outliers -- Univariate outlier detection -- EllipticEnvelope -- OneClassSVM -- Validation metrics -- Multilabel classification -- Binary classification -- Regression -- Testing and validating -- Cross-validation -- Using cross-validation iterators -- Sampling and bootstrapping -- Hyperparameter optimization -- Building custom scoring functions -- Reducing the grid search runtime -- Feature selection -- Selection based on feature variance -- Univariate selection -- Recursive elimination -- Stability and L1-based selection -- Wrapping everything in a pipeline -- Combining features together and chaining transformations -- Building custom transformation functions -- Summary -- Chapter 4: Machine Learning -- Preparing tools and datasets -- Linear and logistic regression -- Naive Bayes -- K-Nearest Neighbors -- Nonlinear algorithms -- SVM for classification -- SVM for regression -- Tuning SVM -- Ensemble strategies -- Pasting by random samples -- Bagging with weak classifiers -- Random subspaces and random patches -- Random Forests and Extra-Trees -- Estimating probabilities from an ensemble -- Sequences of models - AdaBoost -- Gradient tree boosting (GTB) -- XGBoost -- Dealing with big data -- Creating some big datasets as examples -- Scalability with volume;Keeping up with velocity -- Dealing with variety -- An overview of Stochastic Gradient Descent (SGD) -- Approaching deep learning -- A peek at Natural Language Processing (NLP) -- Word tokenization -- Stemming -- Word tagging -- Named Entity Recognition (NER) -- Stopwords -- A complete data science example - text classification -- An overview of unsupervised learning -- Summary -- Chapter 5: Social Network Analysis -- Introduction to graph theory -- Graph algorithms -- Graph loading, dumping, and sampling -- Summary -- Chapter 6: Visualization, Insights, and Results -- Introducing the basics of matplotlib -- Curve plotting -- Using panels -- Scatterplots for relationships in data -- Histograms -- Bar graphs -- Image visualization -- Selected graphical examples with pandas -- Boxplots and histograms -- Scatterplots -- Parallel coordinates -- Wrapping up matplotlib's commands -- Introducing Seaborn -- Enhancing your EDA capabilities -- Interactive visualizations with Bokeh -- Advanced data-learning representations -- Learning curves -- Validation curves -- Feature importance for RandomForests -- GBT partial dependence plots -- Creating a prediction server for ML-AAS -- Summary -- Appendix: Strengthen Your Python Foundations -- Your learning list -- Lists -- Dictionaries -- Defining functions -- Classes, objects, and OOP -- Exceptions -- Iterators and generators -- Conditionals -- Comprehensions for lists and dictionaries -- Learn by watching, reading, and doing -- MOOCs -- PyCon and PyData -- Interactive Jupyter -- Don't be shy, take a real challenge -- Index;Cover -- Copyright -- Credits -- About the Authors -- About the Reviewer -- www.PacktPub.com -- Table of Contents -- Preface -- Chapter 1: First Steps -- Introducing data science and Python -- Installing Python -- Python 2 or Python 3? -- Step-by-step installation -- The installation of packages -- Package upgrades -- Scientific distributions -- Anaconda -- Leveraging conda to install packages -- Enthought Canopy -- PythonXY -- WinPython -- Explaining virtual environments -- conda for managing environments -- A glance at the essential packages -- NumPy -- SciPy -- pandas -- Scikit-learn -- Jupyter -- Matplotlib -- Statsmodels -- Beautiful Soup -- NetworkX -- NLTK -- Gensim -- PyPy -- XGBoost -- Theano -- Keras -- Introducing Jupyter -- Fast installation and first test usage -- Jupyter magic commands -- How Jupyter Notebooks can help data scientists -- Alternatives to Jupyter -- Datasets and code used in the book -- Scikit-learn toy datasets -- The MLdata.org public repository -- LIBSVM data examples -- Loading data directly from CSV or text files -- Scikit-learn sample generators -- Summary -- Chapter 2: Data Munging -- The data science process -- Data loading and preprocessing with pandas -- Fast and easy data loading -- Dealing with problematic data -- Dealing with big datasets -- Accessing other data formats -- Data preprocessing -- Data selection -- Working with categorical and text data -- A special type of data - text -- Scraping the Web with Beautiful Soup -- Data processing with NumPy -- NumPy's n-dimensional array -- The basics of NumPy ndarray objects -- Creating NumPy arrays -- From lists to unidimensional arrays -- Controlling the memory size -- Heterogeneous lists -- From lists to multidimensional arrays -- Resizing arrays -- Arrays derived from NumPy functions -- Getting an array directly from a file -- Extracting data from pandas