This practical guide provides nearly 200 self-contained recipes to help you solve machine learning challenges you may encounter in your daily work. If youβre comfortable with Python and its libraries, including pandas and scikit-learn, youβll be able to address specific problems such as loading data
Python Machine Learning Cookbook: Practical Solutions from Preprocessing to Deep Learning (draft)
β Scribed by Chris Albon
- Publisher
- O'Reilly
- Year
- 2017
- Tongue
- English
- Leaves
- 170
- Category
- Library
No coin nor oath required. For personal study only.
β¦ Table of Contents
Chapter 1. 1.0 Introduction The first step in any machine learning endeavor is get to the raw data into our system. The raw data can be held in a log file, dataset file, or database. Furthermore, often we will want to get data from multiple sources. The recipes in this chapter look at methods of loading data from a variety of sources including CSV files and SQL databases. We also cover methods of generating simulated data with desirable properties for experimentation. Finally, while there are many ways to load data in the Python ecosystem, we will focus on using the pandas libraryβs extensive set of methods for loading external data and scikit-learn -- an open source machine learning library Python -- for generating simulated data. 1.1 Loading A Sample Dataset Problem You need to load a pre-existing sample dataset. Solution scikit-learn comes with a number of popular datasets for you to use. # Load scikit-learn's datasets from sklearn import datasets # Load the digits dataset digits =
Chapter 1. 1.0 Introduction
Chapter 2. 2.0 Introduction Data wrangling is a broad term use, often informally, to describe the process of transforming raw data to a clean and organized format ready for further preprocessing, or final use. For us, data wrangling is only one step in preprocessing our data, but it is an important step. The most common data structure used to βwrangleβ data is the data frame, which can be both intuitive and incredibly versatile. Data frames are tabular, meaning that they are based on rows and columns like you would see in a spreadsheet. Here is a data frame created from data about passengers on the Titanic: # Load library import pandas as pd # Create URL url = 'https://raw.githubusercontent.com/chrisalbon/simulated_datasets/master/titanic.csv' # Load data df = pd.read_csv(url) # Show the first 5 rows df.head(5) Name PClass Age Sex Survived SexCode 0 Allen, Miss Elisabeth Walton 1st 29.00 female 1 1 1 Allison, Miss Helen Loraine 1st 2.00 female 0 1 2 Allison, Mr Hudson Joshua Creighton
Chapter 2. 2.0 Introduction
Chapter 3. 3.0 Introduction Quantitative data is the measurement of something -- whether class size, monthly sales, or student scores. The natural way to represent these quantities is numerically (e.g. 29 students, $529,392 in sales, etc.). In this chapter, we will cover numerous strategies for transforming raw numerical data into features purpose-built for machine learning algorithms. 3.1 Rescaling A Feature Problem You need to rescale the values of a numerical feature to be between two values. Solution Use scikit-learnβs MinMaxScaler to rescale a feature array: # Load libraries from sklearn import preprocessing import numpy as np # Create feature x = np.array([[-500.5], [-100.1], [0], [100.1], [900.9]]) # Create scaler minmax_scale = preprocessing.MinMaxScaler(feature_range=(0, 1)) # Scale feature x_scale = minmax_scale.fit_transform(x) # Show feature x_scale array([[ 0. ], [ 0.28571429], [ 0.35714286], [ 0.42857143], [ 1. ]]) Discussion Rescaling is a common preprocessing task in ma
Chapter 3. 3.0 Introduction
Blank Page
π SIMILAR VOLUMES
With Early Release ebooks, you get books in their earliest form--the author's raw and unedited content as he or she writes--so you can take advantage of these technologies long before the official release of these titles. You'll also receive updates when significant changes are made, new chapters ar
Vectors, matrices, and arrays -- Loading data -- Data wrangling -- Handling numerical data -- Handling categorical data -- Handling text -- Handling dates and times -- Handling images -- Dimensionalit reduction using feature extraction -- Dimensionality reduction using feature selection -- Model eva
Vectors, matrices, and arrays -- Loading data -- Data wrangling -- Handling numerical data -- Handling categorical data -- Handling text -- Handling dates and times -- Handling images -- Dimensionalit reduction using feature extraction -- Dimensionality reduction using feature selection -- Model eva
Vectors, matrices, and arrays -- Loading data -- Data wrangling -- Handling numerical data -- Handling categorical data -- Handling text -- Handling dates and times -- Handling images -- Dimensionalit reduction using feature extraction -- Dimensionality reduction using feature selection -- Model eva
This practical guide provides nearly 200 self-contained recipes to help you solve machine learning challenges you may encounter in your daily work. If you're comfortable with Python and its libraries, including pandas and scikit-learn, you'll be able to address specific problems such as loading data