Big data and data science are transforming our world today in ways we could not have imagined at the beginning of the twenty-first century. The accompanying wave of innovation has sparked advances in healthcare, engineering, business, science, and human perception, among others. The tremendous advan
Applied Data Science in Tourism: Interdisciplinary Approaches, Methodologies, and Applications
β Scribed by Roman Egger
- Publisher
- Springer
- Year
- 2022
- Tongue
- English
- Leaves
- 647
- Series
- Tourism on the Verge
- Category
- Library
No coin nor oath required. For personal study only.
β¦ Synopsis
Access to large data sets has led to a paradigm shift in the tourism research landscape. Big data is enabling a new form of knowledge gain, while at the same time shaking the epistemological foundations and requiring new methods and analysis approaches. It allows for interdisciplinary cooperation between computer sciences and social and economic sciences, and complements the traditional research approaches. This book provides a broad basis for the practical application of data science approaches such as machine learning, text mining, social network analysis, and many more, which are essential for interdisciplinary tourism research. Each method is presented in principle, viewed analytically, and its advantages and disadvantages are weighed up and typical fields of application are presented. The correct methodical application is presented with a "how-to" approach, together with code examples, allowing a wider reader base including researchers, practitioners, and students entering the field.Β
β¦ Table of Contents
Preface
Purpose of the Book and Potential Audience
What This Book Is Not!
Features of This Book
Acknowledgments
Contents
Notes on Contributors
Abbreviations and Acronyms
Introduction: Data Science in Tourism
A Brief Introduction and the Structure of This Book
Data Science and Tourism
Data Science in Tourism and the Structure of This Book
Chapters 1-5: Theoretical Fundamentals
Chapters 6-14: Machine Learning
Chapters 15-20: Natural Language Processing
Chapters 21-26: Additional Methods
Conclusion
References
Industry Insights from Data Scientists: QandA Session
Interview 1
Interview 2
Interview 3
Interview 4
Interview 5
Concluding Remarks
Part I: Theoretical Fundaments
AI and Big Data in Tourism
1 Introduction
2 AI, Machine Learning, and Data Science
3 AI for Big Data
3.1 AI for Problem Framing
3.2 AI for Data Gathering
3.3 AI for Data Cleaning and Preparation
3.4 AI for Data Processing
3.5 AI for Data Exploitation
4 AI and Big Data in Tourism
Further Readings
Other Sources
References
Epistemological Challenges
1 Introduction
2 Epistemological Evolution
3 Epistemological Challenges: Data Science in Tourism Research
3.1 Topic Formulation and Relevance for Academia and Industry
3.2 Data and Its Access and Collection
3.3 Data Pre-processing
3.4 Feature Engineering
3.5 Data Analysis
3.6 Model Evaluation and Model Tuning
3.7 Interpretation of Results
4 Conclusion
References
Data Science and Interdisciplinarity
1 Introduction
2 Problem Identification
3 Data Science Is an Interdisciplinary Area
4 The Importance of Core Competencies in Data Science
5 Conclusion
References
Data Science and Ethical Issues
1 Introduction
2 Ethics
2.1 Data Science and Ethics
3 Data Science Ethics Issues
3.1 Privacy
3.2 Data Validity
3.3 Algorithm Fairness and Bias
4 Big Data
5 Artificial Intelligence and Machine Learning
6 Conclusion
References
Web Scraping
1 Introduction and Theoretical Foundations
1.1 Open Data
1.2 APIs
1.3 Scraping Data
1.4 Legal Perspectives of Text and Data Mining
1.5 Typical Use Cases of Web Scraping in Tourism
1.6 BeautifulSoup
1.7 Selenium
1.8 Scrapy
2 Practical Demonstration
Further Readings and Other Sources
Blogposts
References
Part II: Machine Learning
Machine Learning in Tourism: A Brief Overview
1 Introduction and Theoretical Foundations
1.1 The Machine Learning Process
1.2 Unsupervised Learning
1.2.1 Clustering
1.2.2 Dimensionality Reduction
1.3 Supervised Learning
1.3.1 Classification
1.3.2 Regression
1.4 Reinforcement Learning
1.5 Neural Networks
1.6 Machine Learning Limitations and Challenges
1.7 Auto-ML
Further Readings and Other Sources
References
Feature Engineering
1 Introduction
1.1 Definitions
1.2 Feature Engineering Cycle
2 Combining Features
2.1 Normalization
2.2 Discretization
2.3 Missing Data
2.4 Descriptive Features
3 Reducing Features
3.1 Feature Importance
3.2 Feature Selection
4 Expanding Features
4.1 Computable Features
4.2 One-Hot Encoding
4.3 Decomposing Complex Features
4.4 External Data
5 Practical Demonstration: Airbnb Pricing
5.1 Dataset and EDA
5.2 Data Split
5.3 Feature Transformers
5.4 Indexing Categorical Features
5.5 Set Features: Amenities and Host Verifications
5.6 Decomposing Complex Features: host_since
5.7 EDA
5.8 Imputation
5.9 Base Model
5.10 First Iteration
5.11 Feature Selection: Dropping Amenities
5.12 Second Iteration
5.13 Expanding Using External Data: SkyTrain Stations
5.14 Case Study Wrap-Up
6 Conclusions
Further Readings and Other Sources
References
Clustering
1 Introduction and Theoretical Foundations
1.1 Hierarchical Cluster Analysis
1.2 Partitioning
1.3 Density-Based Spatial Clustering of Applications with Noise
1.4 Cluster Evaluation and Profiling
2 Practical Demonstration
2.1 k-Means Clustering
2.2 Hierarchical Clustering
2.2.1 Top-Down Clustering
2.2.2 Agglomerative (Bottom-Up) Clustering
2.2.3 DBSCAN
3 Research-Case
References
Dimensionality Reduction
1 Introduction and Theoretical Foundations
1.1 PCA
1.2 tSNE
1.3 UMAP
2 Practical Demonstration
Further Readings and Other Sources
References
Classification
1 Introduction and Theoretical Foundations
1.1 Motivation and Basic Concepts
1.2 Evaluation
1.2.1 Generalization Error
1.2.2 Hold-Out Method and Cross Validation
1.2.3 Hyperparameter Selection
1.2.4 Evaluation Measures
Confusion Matrix and Evaluation Measures Computed Therefrom
ROC Analysis
Categorical Cross-Entropy
1.3 Data Preprocessing
1.3.1 One-Hot Encoding
1.3.2 Feature Scaling/Normalization
1.3.3 Projection Methods
1.3.4 Missing Values Imputation
1.3.5 General Caveat
1.4 Classification Methods
1.4.1 K-Nearest Neighbor
Advantages
Disadvantages
1.4.2 Logistic Regression
Advantages
Disadvantages
1.4.3 NaΓ―ve Bayes
Advantages
Disadvantages
1.4.4 Decision Trees
Advantages
Disadvantages
1.4.5 Random Forest
Advantages
Disadvantages
1.4.6 Gradient Tree Boosting
Advantages
Disadvantages
1.4.7 Support Vector Machines
Advantages
Disadvantages
1.4.8 Artificial Neural Networks
Advantages
Disadvantages
2 Practical Demonstration
2.1 Use Case
2.2 The Data Set
2.3 Descriptive Analysis of the Raw Data Set
2.4 Aggregation of Data: Creation of User Profiles
2.5 Classification of Visitors: Model Building
2.5.1 Summary of Results
2.6 Application of the Models
Further Readings and Other Sources
References
Regression
1 Introduction and Theoretical Foundations
1.1 Motivation and Basic Concepts
1.2 Evaluation
1.3 Regression Methods
1.3.1 Linear Regression
Advantages
Disadvantages
1.3.2 Regression Trees
Advantages
Disadvantages
1.3.3 Regression Tree Ensembles
Advantages
Disadvantages
1.3.4 Support Vector Regression
Advantages
Disadvantages
1.3.5 Artificial Neural Networks
Advantages
Disadvantages
2 Practical Demonstration
2.1 Use Case
2.2 The Data
2.3 Splitting Training and Test Data
2.4 Prediction of VisitorsΒ΄ Turnover: Model Building
2.4.1 Summary of Results
2.5 Application of the Model
Further Readings and Other Sources
References
Hyperparameter Tuning
1 Introduction and Theoretical Foundations
1.1 Motivations
1.2 Techniques
1.2.1 Manual Search
1.2.2 Grid Search
1.2.3 Random Search
1.2.4 Bayesian Optimization
1.2.5 Genetic Algorithms
1.3 Summary
2 Practical Demonstration
2.1 Data Preprocessing and Visualization
2.2 Modelling
2.2.1 Manual Search
2.2.2 Grid Search
2.2.3 Random Search
2.2.4 Bayesian Optimization
2.2.5 Genetic Algorithms
2.2.6 Conclusion
3 Research Case
Further Readings and Other Sources
References
Model Evaluation
1 Introduction
2 Performance of Classification Models
2.1 Performance at Fixed Operating Conditions
2.1.1 Classification Accuracy
2.1.2 Recall (Sensitivity)
2.1.3 Precision
2.1.4 F1
2.1.5 Specificity
2.2 ROC Curves, P-R Curves
3 Regression
3.1 Evaluation Scores
3.1.1 Mean Square Error (MSE)
3.1.2 Root Mean Square Error (RMSE)
3.1.3 Mean Absolute Error (MAE)
3.1.4 Coefficient of Determination (R2)
4 Overfitting
4.1 Random Sampling
4.2 Cross-Validation
4.3 Leave-One-Out
5 Practical Demonstration
5.1 Confusion Matrix
5.2 ROC Curve
5.3 Lift Curve
5.4 Data Over- or Undersampling
6 Research Case
Further Readings and Other Sources
References
Interpretability of Machine Learning Models
1 Introduction and Theoretical Foundations
1.1 Introduction to Explainability
1.2 Why Are Some Models Uninterpretable?
1.3 For Whom Can Explainability Be Useful or Necessary?
1.4 Why Should One Care About the Interpretability of ML Systems?
1.4.1 Providing Trust
1.4.2 Complying to Regulations
1.4.3 Understanding Predictions
1.4.4 Creating Better Models
1.5 Explainability Frameworks
1.5.1 Model Agnostic Strategy
1.5.2 LIME
1.5.3 ELI5
1.5.4 Anchors
1.5.5 Counterfactuals
1.5.6 SHAP
1.5.7 Deep Learning
1.5.8 Cloud Platforms
1.6 Fairness and Adversarial Attacks
2 Practical Demonstration
2.1 Data Description
2.2 Data Preparation
2.3 Classification Model
2.4 Explicability
2.4.1 SHAP: Global Interpretation
2.4.2 SHAP: Local Interpretation
2.4.3 Lime
2.5 Conclusions
3 Research-Case
Further Readings and Other Sources
References
Part III: Natural Language Processing
Natural Language Processing (NLP): An Introduction
1 Introduction and Theoretical Foundations
2 Text Analysis in Tourism
3 NLP Techniques
4 Text Preparation and Pre-processing
4.1 Language Detection
4.2 Tokenisation
4.3 Lowercasing and Removal of Punctuation
4.4 Expand Contractions
4.5 Removal of Stop Words
4.6 Removal of URLs, HTML Tags, and Emotions/Emojis
4.7 Correction of Spelling
4.8 Stemming and Lemmatisation
4.9 Part of Speech Tagging (POS)
4.10 Named Entity Recognition (NER)
4.11 Feature Extraction
4.12 Visual EDA
5 Challenges of Working with Text
6 Practical Demonstration
6.1 Tips for Using Python for an NLP Study
Further Readings and Other Sources
References
Text Representations and Word Embeddings
1 Introduction and Theoretical Foundations
1.1 One Hot Encoding
1.2 Bag-of-Words (CountVectorizer)
1.3 TF-IDF
1.4 Word Embeddings
1.4.1 Word2vec
1.4.2 Doc2Vec
1.4.3 fastText
1.4.4 GloVe
1.4.5 ELMo
1.4.6 BERT
1.5 Visualization of Multidimensional Data
1.6 The Future of Embeddings
1.7 Embeddings in Tourism-Related Research
2 Practical Demonstration
2.1 BOW
2.2 TF-IDF
2.3 Word2vec
2.4 BERT
Further Readings and Other Sources
References
Sentiment Analysis
1 Introduction
2 Theoretical Foundations
3 Practical Demonstration
4 Research Case 1: Lexicon-Based Sentiment Analysis
5 Research Case 2: Machine Learning Sentiment Analysis
Further Readings and Other Sources
References
Topic Modelling
1 Introduction and Theoretical Foundations
2 Topic Modelling Approaches
2.1 Latent Dirichlet Allocation (LDA)
2.1.1 LDA Hyperparameters
2.2 Non-negative Matrix Factorisation (NMF)
2.3 Correlation Explanation (CorEX)
2.4 Top2Vec
2.5 BERTopic
3 Topic Modelling Limitations and Challenges
3.1 Evaluating and Interpreting Topics
4 Topic Modelling in Tourism Studies
5 Topic Model Toolkits and Software Solutions
6 Practical Demonstration
6.1 LDA: Data Preparation and Preprocessing
6.2 Topic Modelling with CorEx
Further Readings and Other Sources
References
Entity Matching: Matching Entities Between Multiple Data Sources
1 Introduction and Theoretical Foundations
1.1 Entity Matching Problem Statement
1.2 Entity Matching Examples in the Travel Industry
1.3 Overview of the Stages of an Entity Matching Approach
2 Practical Demonstration
2.1 Data Formatting and Pre-processing
2.2 Candidate Generation
2.3 Record Pair Comparison (Threshold-based)
2.4 Record Pair Comparison (Neural-based)
3 Summary
Further Readings and Other Sources
References
Knowledge Graphs
1 Introduction and Theoretical Foundations
1.1 Fundamentals
1.2 Modeling the Domain
2 Steps Toward Building a Tourism Knowledge Graph
2.1 Knowledge Graph Construction
2.2 Knowledge Graph Identification
2.3 Storing, Querying, and Using the Knowledge Graph
3 Practical Demonstration and How-To Guidelines
3.1 Hints and Tips
4 Research Case
Further Readings and Other Sources
References
Part IV: Additional Methods
Network Analysis
1 Introduction and Theoretical Foundations
1.1 Network Analysis in a Nutshell
2 Practical Demonstration
3 A Worked Example
4 Research-Case
Further Readings and Other Sources
References
Time Series Analysis
1 Introduction and Theoretical Foundations
2 Practical Demonstration
2.1 Research Case
2.2 Forecasting Methods
2.2.1 Seasonal NaΓ―ve
2.2.2 Single Exponential Smoothing (SES)
2.2.3 Error Trend Seasonal (ETS)
2.2.4 Forecasting Combination Method
2.3 Measures of Forecasting Accuracy
3 Results
Further Readings and Other Sources
References
Agent-Based Modelling
1 Introduction and Theoretical Foundations
1.1 Tourism as a Complex System
1.2 ABM Benefits
1.3 Background of ABM
1.4 Key ABM Features
1.4.1 Agents
1.4.2 Environment
1.4.3 System-level
1.4.4 Interactions
1.5 Challenges When Applying ABM
1.6 Tourism-related ABM
2 Practical Demonstration
2.1 Defining the Model Purpose
2.2 Conceptual Model Set-up
2.3 Model Description
2.4 Model Components
2.4.1 Agents
2.4.2 Environment
2.4.3 System-Level Variables
2.4.4 Simulated Time
2.4.5 Interactions
2.5 Software
2.6 Analysis
2.6.1 Verification
2.6.2 Validation
2.6.3 Analysing Findings
3 Research Case
3.1 Contribution of Method
3.2 Analysis
Further Readings and Other Sources
Agent-Based Modelling
Overview, Design Concepts, and Details (ODD) + Decision-making (ODD+D) Protocols
Software Selection
GIS
Analysis
References
Geographic Information System (GIS)
1 Introduction
2 Theoretical Foundations
2.1 Data
2.2 Analysis
2.3 Creating Maps
3 Practical Demonstration
Further Readings and Other Sources
Books
Websites
Tourism Applications
References
Visual Data Analysis
1 Introduction and Theoretical Foundations
1.1 Data Visualization Techniques
1.2 Data Analysis Workflow
1.3 Data Visualization in the Data Analysis Workflow
2 Demonstration
2.1 Datasets
2.2 Data Analysis Workflow
2.2.1 Discover
2.2.2 Wrangle
2.2.3 Profile
2.2.4 Model
2.2.5 Report
References
Software and Tools
1 RapidMiner (by Wolfram HΓΆpken)
2 Orange (by Ajda Pretnar)
3 KNIME Analytics Platform (by Stefan Helfrich)
4 WEKA (by Tony C Smith)
5 SAS Viya (by Piere Paolo Ippolito)
6 BigML (by BigML)
7 Dataiku (by Laura Wiest)
8 DataRobot (by DataRobot)
References
Glossary
Index
π SIMILAR VOLUMES
<p><p></p><p>Big data and data science are transforming our world today in ways we could not have imagined at the beginning of the twenty-first century. The accompanying wave of innovation has sparked advances in healthcare, engineering, business, science, and human perception, among others. The tre
<span>This book emphasizes that learning efficiency of the learners can be increased by providing personalized course materials and guiding them to attune with suitable learning paths based on their characteristics such as learning style, knowledge level, emotion, motivation, self-efficacy and many
<span>This book offers an introduction to the topic of data science based on the visual processing of data. It deals with ethical considerations in the digital transformation and presents a process framework for the evaluation of technologies. It also explains special features and findings on the fa
This book offers an introduction to the topic of data science based on the visual processing of data. It deals with ethical considerations in the digital transformation and presents a process framework for the evaluation of technologies. It also explains special features and findings on the failure
The present volume contains a selection of papers that were read at the conference entitled Cognitive Approaches to English, an international event organized to mark the 30th anniversary of English studies at the Faculty of Philosophy (Josip Juraj Strossmayer University, Osijek), which was held in O