<span>Data Science is an emerging field and all the domains are becoming more dependent on data. In this book "Beginnerâs Guide to Data Science", the author gives an introduction about Data Science. The book also put forward the different real-life examples of data science, the phases involved, the
The Beginner's Guide to Data Science
â Scribed by Robert Ball, Brian Rague
- Publisher
- Springer
- Year
- 2022
- Tongue
- English
- Leaves
- 251
- Edition
- 1st ed. 2022
- Category
- Library
No coin nor oath required. For personal study only.
⌠Synopsis
This book discusses the principles and practical applications of data science, addressing key topics including data wrangling, statistics, machine learning, data visualization, natural language processing and time series analysis. Detailed investigations of techniques used in the implementation of recommendation engines and the proper selection of metrics for distance-based analysis are also covered.
Utilizing numerous comprehensive code examples, figures, and tables to help clarify and illuminate essential data science topics, the authors provide an extensive treatment and analysis of real-world questions, focusing especially on the task of determining and assessing answers to these questions as expeditiously and precisely as possible. This book addresses the challenges related to uncovering the actionable insights in âbig data,â leveraging database and data collection tools such as web scraping and text identification.
This book is organized as 11 chapters, structured as independent treatments of the following crucial data science topics:
- Data gathering and acquisition techniques including data creation
- Managing, transforming, and organizing data to ultimately package the information into an accessible format ready for analysis
- Fundamentals of descriptive statistics intended to summarize and aggregate data into a few concise but meaningful measurements
- Inferential statistics that allow us to infer (or generalize) trends about the larger population based only on the sample portion collected and recorded
- Metrics that measure some quantity such as distance, similarity, or error and which are especially useful when comparing one or more data observations
- Recommendation engines representing a set of algorithms designed to predict (or recommend) a particular product, service, or other item of interest a user or customer wishes to buy or utilize in some manner
- Machine learning implementations and associated algorithms, comprising core data science technologies with many practical applications, especially predictive analytics
- Natural Language Processing, which expedites the parsing and comprehension of written and spoken language in an effective and accurate manner
- Time series analysis, techniques to examine and generate forecasts about the progress and evolution of data over time
Data science provides the methodology and tools to accurately interpret an increasing volume of incoming information in order to discern patterns, evaluate trends, and make the right decisions. The results of data science analysis provide real world answers to real world questions. Professionals working on data science and business intelligence projects as well as advanced-level students and researchers focused on data science, computer science, business and mathematics programs will benefit from this book.Â
⌠Table of Contents
Preface
Is This a Textbook or a Practitionerâs Book?
Programming Examples and Images
Contents
Chapter 1: Introduction to Data Science
1.1 Superpowers
1.2 What Is Data Science?
1.3 Predicting the Future
1.4 Understand the Process by Focusing on the End
1.4.1 Actionable Insights
1.4.2 Tell Stories with Data
1.4.3 Communicate Complex Results
1.4.4 Create Consumable Predictive Products
1.4.5 Aligning Business Goals with the Data Science Process
1.5 It Is All About the Question!
1.5.1 Classification Questions
1.5.2 Anomaly Detection
1.5.3 Prediction/Forecasting
1.5.4 Clustering
1.5.5 Recommendations
1.5.6 Data Science Project Examples
1.6 Understanding vs Specific Tools
1.7 Data Science Life Cycle
1.8 Python vs R
1.9 Big Data, Data Analytics, and Data Science
Chapter 2: Data Collection
2.1 Data Creation
2.2 IRB Approval
2.3 HCI: AÂ Case Study
2.4 Data Gathering
2.5 Databases
2.6 Downloading Data
2.7 Web Scraping
2.8 Why Web Scraping?
2.9 What Does It Really Mean to Perform Web Scraping?
2.9.1 Download a Webpage
2.9.2 Parse the Webpage
2.10 Web Scraping with BeautifulSoup and Selenium
Chapter 3: Data Wrangling
3.1 Data vs Information vs Knowledge
3.2 From Data to Information
3.3 Pandas
3.3.1 Series and Dataframe Basics
3.3.2 Dropping or Removing Data
3.3.3 Adding, Modifying Data, and Mapping
3.3.4 Changing Datatypes of Series or Columns
3.3.5 Conditionals in Dataframes and Series
3.3.6 loc and iloc Functions
3.3.7 Binning
3.3.8 Reshaping with Pivot, Pivot_Table, Groupby, Stack, Unstack, and Transpose
3.3.9 Understanding Dataframe Indexes
3.3.10 Common Statistics Functions, Counting, and Sorting
3.3.11 Different Encodings for Categorical Data
Chapter 4: Crash Course on Descriptive Statistics
4.1 Min and Max
4.2 Count
4.3 Mean
4.4 Standard Deviation
4.5 âBell Curveâ or Normal Distribution or Gaussian Distribution
4.6 Median
4.7 Quantile and Boxplots
4.8 Pandas âDescribeâ Function
4.9 Z-Score
4.10 Mode
4.11 Data Visualization Using Distributions
4.12 Basic Distribution Concepts
4.13 Probability
4.14 Percentile
4.15 Cumulative Distribution Function (CDF) and Probability Density Function (PDF)
4.16 Percent Point Function (PPF)
4.17 Skewness
4.18 Exponential Distribution
4.19 Poisson Distribution
4.20 Additional Distributions and Reading
4.21 Transformations
4.22 Correlation
Chapter 5: Inferential Statistics
5.1 Independent and Dependent Variables
5.2 Chi-Squared Analysis
5.3 Chi-Squared Example: Titanic Gender Example
5.4 Chi-Square Example: Titanic Age Example
5.5 Chi-Square Example: Titanic Passenger Class Example
5.6 T-test Example: Fare and Gender
5.7 ANOVA Example: Price Differences Between Passenger Classes
5.8 Two-Way ANOVA Example: How Gender and Passenger Class Together Affect Fare Price
Chapter 6: Metrics
6.1 Distance Metrics: Movies Example
6.1.1 KNN with Euclidean Distance
6.1.2 KNN with Jaccard Similarity Index
6.1.3 KNN with Weighted Jaccard Similarity Index
6.1.4 KNN with Levenshtein Distance
6.1.5 KNN with Cosine Similarity
6.1.6 Combining Metrics and Filters Together
6.1.7 Mahalanobis Distance
6.1.8 Additional Metrics
6.2 Regression Metrics: Diet Example
6.2.1 Mean Squared Error (MSE)
6.2.2 Root Mean Squared Error (RMSE)
6.2.3 Mean Absolute Error (MAE)
6.2.4 R2 or R Squared: Coefficient of Determination
6.2.5 Adjusted R-Squared ()
6.3 Prediction Metrics
6.3.1 Accuracy
6.3.2 Confusion Matrix
6.3.3 Classification Report
Chapter 7: Recommendation Engines
7.1 Knowledge-Based Recommendation Engines
7.2 Content Based
7.3 Collaborative Filtering
7.4 Specialty Types
Chapter 8: Machine Learning
8.1 Machine Learning Overview and Terminology
8.2 Decision Trees
8.3 Linear Regression
8.4 Logistic Regression
8.5 SVM (Support Vector Machine)
8.6 Neural Networks
8.7 Ensemble Algorithms
8.8 Cross Validation, Hyperparameter Tuning, and Pipelining
8.9 Dimensionality Reduction and Feature Selection
8.9.1 Feature Selection with RFE (Recursive Feature Elimination)
8.9.2 Dimensionality Reduction with PCA (Principal Component Analysis)
8.9.3 Dimensionality Reduction and Feature Selection with Examples
Chapter 9: Natural Language Processing (NLP)
9.1 Bag of Words
9.2 TFIDF (Term Frequency-Inverse Document Frequency)
9.3 NaĂŻve Bayes
9.4 Stemming, Lemmatization, and Parts of Speech
9.5 WordNet
9.6 Natural Language Understanding, and Natural Language Generation
9.7 Collocations/N-Grams
9.8 Scoring Collocations
9.9 Sentiment and Emotion
Chapter 10: Time Series
10.1 Seasonality
10.2 Time Invariant, Structural Breaks, and Piecewise Analysis
10.3 Stationarity, Autocorrelation, and Partial Autocorrelation
10.4 Autoregression Models
10.5 Smoothing and Holt-Winters Method
10.6 Time Series with Neural Networks
10.7 Real-Time Analysis
10.8 Stock Market
10.9 Facebook Prophet
Chapter 11: Final Product
11.1 Presentation
11.2 Information Visualization Theory Basics
11.3 Software Engineering
đ SIMILAR VOLUMES
<span>Data Science is an emerging field and all the domains are becoming more dependent on data. In this book "Beginnerâs Guide to Data Science", the author gives an introduction about Data Science. The book also put forward the different real-life examples of data science, the phases involved, the
This book is a comprehensive guide for beginners to learn Python Programming, especially its application for Data Science. While the lessons in this book are targeted at the absolute beginner to programming, people at various levels of proficiency in Python, or any other programming languages can al
This book is a comprehensive guide for beginners to learn Python Programming, especially its application for Data Science. While the lessons in this book are targeted at the absolute beginner to programming, people at various levels of proficiency in Python, or any other programming languages can al