Data Classification and Incremental Clustering in Data Mining and Machine Learning

✍ Scribed by Sanjay Chakraborty, Sk Hafizul Islam, Debabrata Samanta

Publisher: Springer
Year: 2022
Tongue: English
Leaves: 210
Series: EAI/Springer Innovations in Communication and Computing
Category: Library

No coin nor oath required. For personal study only.

✦ Synopsis

This book is a comprehensive, hands-on guide to the basics of data mining and machine learning with a special emphasis on supervised and unsupervised learning methods. The book lays stress on the new ways of thinking needed to master in machine learning based on the Python, R, and Java programming platforms. This book first provides an understanding of data mining, machine learning and their applications, giving special attention to classification and clustering techniques. The authors offer a discussion on data mining and machine learning techniques with case studies and examples. The book also describes the hands-on coding examples of some well-known supervised and unsupervised learning techniques using three different and popular coding platforms: R, Python, and Java. This book explains some of the most popular classification techniques (K-NN, Naïve Bayes, Decision tree, Random forest, Support vector machine etc,) along with the basic description of artificial neural network and deep neural network. The book is useful for professionals, students studying data mining and machine learning, and researchers in supervised and unsupervised learning techniques.

✦ Table of Contents

Preface
Acknowledgement
Contents
About the Authors
Chapter 1: Introduction to Data Mining and Knowledge Discovery
1.1 Introduction
1.2 Architecture of Data Mining System
1.2.1 Knowledge Discovery Process (KDD)
1.2.2 Nature of Data
1.2.3 Data Mining Techniques
1.2.3.1 Classification Analysis
1.2.3.2 Case Study Example
1.2.3.3 Mining Frequent Patterns and Association Rule Learning
1.2.3.4 Clustering Analysis
1.2.4 Anomaly or Outlier Detection
1.3 Regression Analysis and Prediction
1.3.1 Sequence Analysis
1.3.2 Limitations of Data Mining
1.4 Applications of Data Mining
1.5 Incremental Data Mining
1.5.1 Benefits of Incremental Data Mining
1.6 Regression Methods
1.6.1 Illustration
1.6.2 Assumptions of Linear Regression
1.7 Evaluating Model Performances
1.7.1 Conclusion
1.8 Exercise
1.9 Interview Questions
References
Chapter 2: A Brief Concept on Machine Learning
2.1 Introduction
2.1.1 Case Study
2.1.2 Interesting Definitions
2.2 Artificial Intelligence (AI) vs. Data Mining vs. ML vs. Deep Learning
2.2.1 An Illustration
2.3 How to use ML?
2.4 Types of Data and ML Algorithms
2.4.1 Supervised Learning
2.4.2 Learning Without Supervision
2.4.3 Learning That Is Semi-supervised
2.5 Conclusion
2.6 Exercise
References
Chapter 3: Supervised Learning-Based Data Classification and Incremental Clustering
3.1 Introduction
3.1.1 Case Study
3.2 Classification
3.2.1 K-nearest Neighbour Classification
3.2.2 Probabilistic Learning with Naïve Bayes Classification
3.2.3 Divide and Conquer Classification
3.2.3.1 Rule-Based Learning
3.2.3.2 Decision Tree-Based Learning
3.3 Decision Tree Pruning
3.4 Postpruning Operations
3.5 Random Forests Classification
3.6 Support Vector Machine Classification
3.7 Multi-class SVM
3.8 R Framework
3.9 Conclusion
3.10 Exercises
References
Chapter 4: Data Classification and Incremental Clustering Using Unsupervised Learning
4.1 Introduction
4.2 Literature Review
4.3 Types of Clustering
4.4 Popular Clustering Techniques
4.4.1 Flowchart of K-Means Clustering
4.4.2 Applications of K-Means Clustering
4.4.3 K-Medoids Clustering
4.5 Flowchart of DBSCAN Clustering Algorithm
4.6 Illustrations
4.7 Hierarchical vs. K-Means Clustering
4.8 Outlier Analysis
4.9 Conclusion
References
Chapter 5: Research Intention Towards Incremental Clustering
5.1 Introduction
5.2 Problem Definition
5.3 Solution Approach
5.4 Incremental Clustering
5.5 Incremental K-Means Clustering
5.5.1 Proposed Incremental K-Means Clustering Algorithm
5.5.2 Proposed Model of Incremental K-Means Clustering
5.5.3 Illustrative Examples of Incremental K-Means Clustering
5.5.4 Benefits and Applications of Incremental K-Means Clustering
5.6 Incremental DBSCAN Clustering
5.6.1 Explanation of Pseudocode
5.6.2 Proposed Model of Incremental DBSCAN Clustering
5.6.3 Benefits of Incremental DBSCAN Clustering
5.7 Effects of Cluster Metadata on Incremental Clustering
5.8 Experiment and Result Analysis
5.8.1 Air Pollution Database
5.8.2 Java Platform
5.8.3 Weka Framework
5.8.4 Explanation of the Typical K-Means Algorithm Code
5.8.5 Explanation of the Incremental K-Means Algorithm Code
5.8.6 Result Analysis
5.8.7 Clustering vs. Incremental Clustering
5.9 Conclusion
5.10 Exercise
References
Chapter 6: Real-Time Application with Data Mining and Machine Learning
6.1 Introduction
6.1.1 Data Analysis in Finance
6.1.2 Industry of Retail
6.1.3 Telecommunications Sector
6.1.4 Analysing Biological Data
6.1.5 Additional Scientific Uses
6.2 Detection of Intruders
6.2.1 Choosing a DM Methodology
6.3 Applications of Machine Learning
6.4 Incremental Clustering
6.5 Supervised Learning
6.5.1 Input Dataset Description
6.5.2 Classification Algorithms with Pseudocode on Python Environment
6.5.3 Output
6.6 Classification on Iris Dataset in R Platform
6.6.1 Input Dataset Description
6.6.2 Pseudocode
6.6.3 Process
6.6.4 Output
6.7 Unsupervised Learning
6.7.1 Partitional Clustering with K-Means Clustering
6.7.2 Pseudocode of K-Means (DB,k)
6.7.3 Cluster Means
6.7.4 Elbow Method to Find the Optimum Value of K
6.8 Density-Based Clustering
6.8.1 Pseudocode: DBSCAN (DB, epsilon, Minpts)
6.8.2 Output
6.9 Hierarchical Clustering Using Agglomerative Clustering
6.9.1 Output of Complete Linkage Method
6.10 Grid-Based Clustering
6.10.1 Coding Snapshots of the Experiment
6.11 Explanation of the Typical K-Means Algorithm Code
6.12 Applications of Classifiers for Protein-Protein Prediction Between SARS-CoV-2 and Human
6.12.1 Pseudocode (Python Platform)
6.13 Deep Multilayer Perceptron (DMLP) for Protein-Protein Prediction Between SARS-CoV-2 and Homo sapiens
6.13.1 Pseudocode (Python Platform)
6.14 Conclusion
References
Chapter 7: Feature Subset Selection Techniques with Machine Learning
7.1 Introduction
7.2 Irrelevant Variants (Without Variant Selection): A Problem
7.3 Algorithms for Machine Learning Were Utilized in This Research
7.4 Methodology for Feature Selection
7.5 Approaches for Reducing Dimension
7.5.1 Variant Picking
7.5.2 Random Forest (RF)
7.5.3 Naïve Bayes
7.6 Supervised Filter Model Based on Relevance and Redundancy
7.7 Wrapper Model Under Supervision
7.8 Future Research on Feature Selection
7.8.1 Machine Learning with Extreme Data
7.8.2 Variable Selection via the Internet
7.8.3 Deep Learning (DL) and Variable Selection
7.8.4 Selection Properties of Variant Variants
7.8.5 Searching Engine Optimisation
7.9 Classification Accuracy
7.10 Conclusion
References
Chapter 8: Data Mining-Based Variant Subset Features
8.1 Introduction
8.2 Review of the Literature
8.3 Extraction of Variants
8.4 Variant Selection and Extraction Approaches
8.5 Choosing of Variant Subset Features
8.6 Unsupervised Variable Selection
8.7 Model of an Unsupervised Filter
8.8 Unsupervised Wrapper Approach
8.9 Selection of Semi-supervised Variants
8.10 Evaluation of Variants
8.10.1 Specification for Stopping
8.10.2 Experimenting with Variable Selection Techniques
8.10.3 I-RELIEF
8.10.4 Interactive
8.10.5 Genetic Algorithm (GA)
8.11 Conclusion
References
Index

📜 SIMILAR VOLUMES

Classification, Clustering, and Data Min

📁 Classification, Clustering, and Data Mining Applications

✍ David Banks, Leanna House, Frederick R. McMorris, Phipps Arabie, Wolfgang Gaul 📂 Library 📅 2004 🏛 Springer 🌐 English

This volume describes new methods with special emphasis on classification and cluster analysis. These methods are applied to problems in information retrieval, phylogeny, medical diagnosis, microarrays, and other active research areas.

Machine Learning and Data Mining

📁 Machine Learning and Data Mining

✍ Igor Kononenko, Matjaz Kukar 📂 Library 📅 2007 🏛 Woodhead Publishing 🌐 English

Data mining is often referred to by real-time users and software solutions providers as knowledge discovery in databases (KDD). Good data mining practice for business intelligence (the art of turning raw software into meaningful information) is demonstrated by the many new techniques and development

Data Mining and Machine Learning in Cybe

📁 Data Mining and Machine Learning in Cybersecurity

✍ Sumeet Dua, Xian Du 📂 Library 📅 2011 🏛 CRC Press 🌐 English

Data Mining and Machine Learning in Cybe

📁 Data Mining and Machine Learning in Cybersecurity

✍ Sumeet Dua, Xian Du 📂 Library 📅 2011 🏛 CRC Press 🌐 English

With the rapid advancement of information discovery techniques, machine learning and data mining continue to play a significant role in cybersecurity. Although several conferences, workshops, and journals focus on the fragmented research topics in this area, there has been no single interdisciplinar

Data Mining and Machine Learning in Cybe

📁 Data Mining and Machine Learning in Cybersecurity

✍ Sumeet Dua, Xian Du 📂 Library 📅 2011 🏛 CRC 🌐 English

Machine Learning and Data Mining in Aero

📁 Machine Learning and Data Mining in Aerospace Technology

✍ Aboul Ella Hassanien, Ashraf Darwish, Hesham El-Askary 📂 Library 📅 2020 🏛 Springer International Publishing 🌐 English

<p>This book explores the main concepts, algorithms, and techniques of Machine Learning and data mining for aerospace technology. Satellites are the ‘eagle eyes’ that allow us to view massive areas of the Earth simultaneously, and can gather more data, more quickly, than tools on the ground. Consequ