๐”– Scriptorium
โœฆ   LIBER   โœฆ

๐Ÿ“

From Concepts to Code

โœ Scribed by Adam P. Tashman


Publisher
Chapman and Hall/CRC
Year
2024
Tongue
English
Leaves
386
Edition
1
Category
Library

โฌ‡  Acquire This Volume

No coin nor oath required. For personal study only.

โœฆ Synopsis


The breadth of problems that can be solved with data science is astonishing, and this book provides the required tools and skills for a broad audience. The reader takes a journey into the forms, uses, and abuses of data and models, and learns how to critically examine each step. Python coding and data analysis skills are built from the ground up, with no prior coding experience assumed. The necessary background in computer science, mathematics, and statistics is provided in an approachable manner.

Each step of the machine learning lifecycle is discussed, from business objective planning to monitoring a model in production. This end-to-end approach supplies the broad view necessary to sidestep many of the pitfalls that can sink a data science project. Detailed examples are provided from a wide range of applications and fields, from fraud detection in banking to breast cancer classification in healthcare. The reader will learn the techniques to accomplish tasks that include predicting outcomes, explaining observations, and detecting patterns. Improper use of data and models can introduce unwanted effects and dangers to society. A chapter on model risk provides a framework for comprehensively challenging a model and mitigating weaknesses. When data is collected, stored, and used, it may misrepresent reality and introduce bias. Strategies for addressing bias are discussed. From Concepts to Code: Introduction to Data Science leverages content developed by the author for a full-year data science course suitable for advanced high school or early undergraduate students. This course is freely available and it includes weekly lesson plans.

โœฆ Table of Contents


Cover
Half Title
Title Page
Copyright Page
Dedication
Contents
Acknowledgments
Preface
Symbols
1. Introduction
1.1. What Is Data Science?
1.2. Relationships Are of Primary Importance
1.3. Modeling and Uncertainty
1.4. Pipelines
1.4.1. The Data Pipeline
1.4.2. The Data Science Pipeline
1.5. Representation
1.6. For Everyone
1.7. Target Audience
1.8. How this Book Teaches Coding
1.9. Course and Code Package
1.10. Why Isn't Data Science Typically Done with Excel?
1.11. Goals and Scope
1.12. Exercises
2. Communicating Effectively and Earning Trust
2.1. Master Yourself
2.2. Technical Competence
2.3. Know Your Audience
2.4. Tell Good Stories
2.5. State Your Needs
2.6. Assume Positive Intent
2.7. Help Others
2.8. Take Ownership
2.9. Chapter Summary
2.10. Exercises
3. Data Science Project Planning
3.1. Defining the Project Objectives
3.2. A Questionnaire for Defining the Objectives
3.3. Analytical Framing
3.4. Planning Data Collection and Usage
3.5. Data Quantity and Coverage
3.6. Sourcing Data
3.7. Chapter Summary
3.8. Exercises
4. An Overview of Data
4.1. Data Types
4.2. Statistical Data Types
4.3. Datasets and States of Data
4.4. Data Sources and Data Veracity
4.5. Data Ingestion
4.5.1. Data Velocity and Volume
4.5.2. Batch versus Streaming
4.5.3. Web Scraping and APIs
4.6. Data Integration
4.7. Levels of Data Processing
4.7.1. Trusted Zone
4.7.2. Standardizing Data
4.7.3. Natural Language Processing
4.7.4. Protecting Identity
4.7.5. Refined Zone
4.8. The Structure of Data at Rest
4.8.1. Structured Data
4.8.2. Semi-structured Data
4.8.3. Unstructured Data
4.9. Metadata
4.10. Representativeness and Bias
4.11. Data Is Never Neutral
4.12. Chapter Summary
4.13. Exercises
5. Computing Preliminaries and Setup
5.1. Hardware
5.1.1. Processor
5.1.2. Memory (RAM)
5.1.3. Storage
5.1.4. Motherboard
5.2. Software
5.2.1. Modules
5.3. I/O
5.3.1. Directories and Paths
5.3.2. File Formats
5.4. Shell, Terminal, and Command Line
5.5. Version Control
5.5.1. Git
5.5.2. GitHub
5.5.3. GitHub Setup and Course Repo Download
5.6. Exploring the Code Repo
5.7. Coding Tools
5.7.1. IDEs
5.8. Cloud Computing
5.9. Chapter Summary
5.10. Appendix: Going Further with Git and GitHub
5.10.1. Syncing with the Upstream Repository
5.10.2. Initializing a Git Repo
5.10.3. Tracking Changes
5.11. Exercises
6. Data Processing
6.1. California Wildfires
6.1.1. Running Python with the CLI
6.1.2. Setting the Relative Path
6.1.3. Variables
6.1.4. Strings
6.1.5. Importing Data
6.1.6. Text Processing
6.1.7. Getting Help
6.2. Counting Leopards
6.2.1. Extracting DataFrame Attributes
6.2.2. Subsetting
6.2.3. Creating and Appending New Columns
6.2.4. Sorting
6.2.5. Saving the DataFrame
6.3. Patient Blood Pressure
6.3.1. Data Validation
6.3.2. Imputation
6.3.3. Data Type Conversion
6.3.4. Extreme Observations
6.4. Chapter Summary
6.5. Exercises
7. Data Storage and Retrieval
7.1. Relational Databases
7.1.1. Primary Key
7.1.2. Foreign Key
7.2. SQL
7.3. Music Query: Single Table
7.4. Music Query: Multiple Tables
7.5. Houses, Lakes, and Lake Houses
7.5.1. Data Warehouse
7.5.2. Data Lake
7.5.3. Data Lakehouse
7.6. Chapter Summary
7.7. Exercises
8. Mathematics Preliminaries
8.1. Set Theory
8.2. Functions
8.3. Differential Calculus
8.4. Probability
8.5. Matrix Algebra
8.6. Chapter Summary
8.7. Exercises
9. Statistics Preliminaries
9.1. Descriptive Statistics
9.2. Inferential Statistics
9.2.1. One-Sample Test of the Mean
9.2.2. Confidence Intervals
9.3. Chapter Summary
9.4. Exercises
10. Data Transformation
10.1. Transforms for Treating Noise
10.1.1. Moving Average
10.1.2. Limiting Extreme Values
10.2. Transforms for Treating Scale
10.2.1. Order of Magnitude and Use of Logarithm
10.2.2. Standardization
10.2.3. Normalization
10.3. Transforms for Treating Data Representation
10.3.1. Count Vectorizer
10.3.2. One-Hot Encoding
10.4. Other Common Methods for Creating Predictors
10.4.1. Binarization
10.4.2. Discretization
10.4.3. Additional Common Transformations
10.4.4. Storing Transformed Data
10.5. Chapter Summary
10.6. Exercises
11. Exploratory Data Analysis
11.1. Check Fraud
11.2. World Happiness
11.3. Use and Limitations of Summary Statistics
11.4. Graphical Excellence
11.5. Chapter Summary
11.6. Exercises
12. An Overview of Machine Learning
12.1. A Simple Tool for Decision Making
12.2. Supervised Learning
12.3. Unsupervised Learning
12.4. Semi-supervised Learning
12.5. Reinforcement Learning
12.6. Generalization
12.7. Loss Functions
12.8. Hyperparameter Tuning
12.9. Metrics
12.10. Chapter Summary
12.11. Exercises
13. Modeling with Linear Regression
13.1. Mathematical Framework
13.1.1. Parameter Estimation for Linear Regression
13.2. Being Thoughtful about Predictors
13.3. Predicting Housing Prices
13.3.1. Data Splitting
13.3.2. Data Scaling
13.3.3. Model Fitting
13.3.4. Interpreting the Parameter Estimates
13.3.5. Model Performance Evaluation
13.3.6. Comparing Models
13.3.7. Calculating R-Squared
13.4. Chapter Summary
13.5. Appendix: Parameter Estimation in Matrix Form
13.6. Exercises
14. Classification with Logistic Regression
14.1. Mathematical Framework
14.1.1. Parameter Estimation for Logistic Regression
14.2. Detecting Breast Cancer
14.2.1. Interpreting the Parameter Estimates
14.2.2. Revisiting Gradient Descent
14.2.3. Model Performance Evaluation
14.3. Chapter Summary
14.4. Exercises
15. Clustering with K-Means
15.1. Clustering Concepts
15.2. K-Means
15.2.1. K-Means Hand Calculation
15.2.2. Performance Evaluation
15.3. Clustering Foods by Nutritional Value
15.4. Chapter Summary
15.5. Exercises
16. Elements of Reproducible Data Science
16.1. Sharing Code
16.2. Testing
16.3. Containers
16.3.1. Installing Docker
16.3.2. Common Docker Commands
16.3.3. Dockerizing an ML Application
16.4. Chapter Summary
16.5. Exercises
17. Model Risk
17.1. Model Documentation
17.2. Conceptual Soundness
17.3. Data and Inputs
17.4. Outcomes Analysis
17.5. Model Benchmarking
17.6. Sensitivity Analysis
17.7. Stress Testing
17.8. Ongoing Model Performance Monitoring
17.8.1. Diving Deeper on Monitoring
17.9. Case Study: Fair Lending Risk
17.9.1. Fair Lending Background
17.9.2. Numerical Example
17.10. Chapter Summary
17.11. Exercises
18. Next Steps
18.1. Building Blocks
18.2. Advanced Technique: Regularization
18.3. Advanced Machine Learning Models
18.3.1. Tree-Based Models
18.3.2. Artificial Neural Networks
18.4. Additional Languages
18.5. Resources
18.6. Applications
18.7. Final Thoughts
Bibliography
Index


๐Ÿ“œ SIMILAR VOLUMES


From Concepts to Code: Introduction to D
โœ Adam Tashman ๐Ÿ“‚ Library ๐Ÿ“… 2024 ๐Ÿ› CRC Press ๐ŸŒ English

The breadth of problems that can be solved with data science is astonishing, and this book provides the required tools and skills fot a broad audience. The reader takes a journey into the forms, uses, and abuses of data and models, and learns how to critically examine each step. Python coding and da

Beginning C# Objects: From Concepts to C
โœ Jacquie Barker ๐Ÿ“‚ Library ๐Ÿ“… 2004 ๐ŸŒ English

Learning to design objects effectively with C# is the goal of Beginning C# Objects: From Concepts to Code - a comprehensive yet approachable guide to object oriented programming using UML and today's hottest programming language, which is C#. This book is a guide for anyone wanting to learn the C# l

Beginning Java Objects: From Concepts to
โœ Jacquie Barker ๐Ÿ“‚ Library ๐Ÿ“… 2023 ๐Ÿ› Apress ๐ŸŒ English

<p>As a programming language, Java's object-oriented nature is key to creating powerful, reusable code and applications that are easy to maintain and extend.&nbsp;That being said, many people learn Java syntax without truly understanding its object-oriented roots, setting them up to fail to harness

Beginning Java Objects: From Concepts to
โœ Jacquie Barker ๐Ÿ“‚ Library ๐Ÿ“… 2023 ๐Ÿ› Apress ๐ŸŒ English

<p>As a programming language, Java's object-oriented nature is key to creating powerful, reusable code and applications that are easy to maintain and extend.&nbsp;That being said, many people learn Java syntax without truly understanding its object-oriented roots, setting them up to fail to harness

Beginning C# Objects: From Concepts to C
โœ Jacquie Barker, Grant Palmer (auth.) ๐Ÿ“‚ Library ๐Ÿ“… 2004 ๐Ÿ› Apress ๐ŸŒ English

<p><em>Beginning C# Objects: From Concepts to Code</em> is a comprehensive, yet approachable guide for anyone interested in learning the C# language, beginning with the basics.</p><p>To begin, this book addresses the two fundamental concepts that programmers must grasp in order to write a profession

Beginning C# Objects: From Concepts to C
โœ Jacquie Barker, Grant Palmer ๐Ÿ“‚ Library ๐Ÿ“… 2004 ๐Ÿ› Apress ๐ŸŒ English

This book addresses the two fundamental concepts that programmers must grasp in order to write a professional object-oriented C# application, then introduces object terminology so you can translate an object model into C# code with ease.