Data Analysis: A Gentle Introduction for Future Data Scientists

✍ Scribed by Graham Upton, Dan Brawn

Publisher: Oxford University Press
Year: 2023
Tongue: English
Leaves: 161
Category: Library

No coin nor oath required. For personal study only.

✦ Table of Contents

Cover
Titlepage
Copyright
Contents
Preface
1 First steps
1.1 Types of data
1.2 Sample and population
1.2.1 Observations and random variables
1.2.2 Sampling variation
1.3 Methods for sampling a population
1.3.1 The simple random sample
1.3.2 Cluster sampling
1.3.3 Stratified sampling
1.3.4 Systematic sampling
1.4 Oversampling and the use of weights
2 Summarizing data
2.1 Measures of location
2.1.1 The mode
2.1.2 The mean
2.1.3 The trimmed mean
2.1.4 The Winsorized mean
2.1.5 The median
2.2 Measures of spread
2.2.1 The range
2.2.2 The interquartile range
2.3 Boxplot
2.4 Histograms
2.5 Cumulative frequency diagrams
2.6 Step diagrams
2.7 The variance and standard deviation
2.8 Symmetric and skewed data
3 Probability
3.1 Probability
3.2 The rules of probability
3.3 Conditional probability and independence
3.4 The total probability theorem
3.5 Bayes' theorem
4 Probability distributions
4.1 Notation
4.2 Mean and variance of a probability distribution
4.3 The relation between sample and population
4.4 Combining means and variances
4.5 Discrete uniform distribution
4.6 Probability density function
4.7 The continuous uniform distribution
5 Estimation and confidence
5.1 Point estimates
5.1.1 Maximum likelihood estimation (mle)
5.2 Confidence intervals
5.3 Confidence interval for the population mean
5.3.1 The normal distribution
5.3.2 The Central Limit Theorem
5.3.3 Construction of the confidence interval
5.4 Confidence interval for a proportion
5.4.1 The binomial distribution
5.4.2 Confidence interval for a proportion (large sample case)
5.4.3 Confidence interval for a proportion (small sample)
5.5 Confidence bounds for other summary statistics
5.5.1 The bootstrap
5.6 Some other probability distributions
5.6.1 The Poisson and exponential distributions
5.6.2 The Weibull distribution
5.6.3 The chi-squared (χ2) distribution
6 Models, p-values, and hypotheses
6.1 Models
6.2 p-values and the null hypothesis
6.2.1 Two-sided or one-sided?
6.2.2 Interpreting p-values
6.2.3 Comparing p-values
6.2.4 Link with confidence interval
6.3 p-values when comparing two samples
6.3.1 Do the two samples come from the same population?
6.3.2 Do the two populations have the same mean?
7 Comparing proportions
7.1 The 2 2 table
7.2 Some terminology
7.2.1 Odds, odds ratios, and independence
7.2.2 Relative risk
7.2.3 Sensitivity, specificity, and related quantities
7.3 The R C table
7.3.1 Residuals
7.3.2 Partitioning
8 Relations between two continuous variables
8.1 Scatter diagrams
8.2 Correlation
8.2.1 Testing for independence
8.3 The equation of a line
8.4 The method of least squares
8.5 A random dependent variable, Y
8.5.1 Estimation of σ2
8.5.2 Confidence interval for the regression line
8.5.3 Prediction interval for future values
8.6 Departures from linearity
8.6.1 Transformations
8.6.2 Extrapolation
8.6.3 Outliers
8.7 Distinguishing x and Y
8.8 Why `regression'?
9 Several explanatory variables
9.1 AIC and related measures
9.2 Multiple regression
9.2.1 Two variables
9.2.2 Collinearity
9.2.3 Using a dummy variable
9.2.4 The use of multiple dummy variables
9.2.5 Model selection
9.2.6 Interactions
9.2.7 Residuals
9.3 Cross-validation
9.3.1 k-fold cross-validation
9.3.2 Leave-one-out cross-validation (LOOCV)
9.4 Reconciling bias and variability
9.5 Shrinkage
9.5.1 Standardization
9.6 Generalized linear models (GLMs)
9.6.1 Logistic regression
9.6.2 Loglinear models
10 Classification
10.1 Naive Bayes classification
10.2 Classification using logistic regression
10.3 Classification trees
10.4 The random forest classifier
10.5 k-nearest neighbours (kNN)
10.6 Support-vector machines
10.7 Ensemble approaches
10.8 Combining variables
11 Last words
Further reading
Index

📜 SIMILAR VOLUMES

📁 Exploring data : an introduction to data analysis for social scientists

✍ Marsh, Catherine, author 📂 Library 📅 2008 🏛 Cambridge : Polity 🌐 English

xxiii, 304 pages : 25 cm

Statistics for Data Scientists: An Intro

📁 Statistics for Data Scientists: An Introduction to Probability, Statistics, and Data Analysis

✍ Maurits Kaptein, Edwin van den Heuvel 📂 Library 📅 2020 🏛 Springer 🌐 English

<span><p>This book provides an undergraduate introduction to analysing data for data science, computer science, and quantitative social science students. It uniquely combines a hands-on approach to data analysis – supported by numerous real data examples and reusable [R] code – with a rigorous treat

Missing Data: A Gentle Introduction

📁 Missing Data: A Gentle Introduction

✍ Patrick E. McKnight PhD, Katherine M. McKnight PhD, Souraya Sidani PhD, Aurelio 📂 Library 📅 2007 🏛 The Guilford Press 🌐 English

<DIV><DIV>While most books on missing data focus on applying sophisticated statistical techniques to deal with the problem after it has occurred, this volume provides a methodology for the control and prevention of missing data. In clear, nontechnical language, the authors help the reader understand

Data Management: a Gentle Introduction

📁 Data Management: a Gentle Introduction

✍ Gils, Bas van; 📂 Library 📅 2020 🏛 Van Haren Publishing 🌐 English

Introduction to Data Analysis with R for

📁 Introduction to Data Analysis with R for Forensic Scientists

✍ James Michael Curran 📂 Library 📅 2010 🏛 CRC Press 🌐 English

Statistical methods provide a logical, coherent framework in which data from experimental science can be analyzed. However, many researchers lack the statistical skills or resources that would allow them to explore their data to its full potential. Introduction to Data Analysis with R for Forensic S

Introduction to Data Visualization & Sto

📁 Introduction to Data Visualization & Storytelling: A Guide For The Data Scientist

✍ Jose Berengueres; Ali Fenwick; Marybeth Sandell 📂 Library 📅 2019 🏛 Stokes-Hamilton 🌐 English