Mastering Java Machine Learning
β Scribed by Kamath, Dr Uday
- Publisher
- Packt Publishing
- Year
- 2017
- Tongue
- English
- Leaves
- 557
- Category
- Library
No coin nor oath required. For personal study only.
β¦ Table of Contents
Cover......Page 1
Copyright......Page 3
Credits......Page 5
Foreword......Page 7
About the Authors......Page 9
About the Reviewers......Page 10
www.PacktPub.com......Page 12
Customer Feedback......Page 13
Table of Contents......Page 17
Preface......Page 29
Chapter 1: Machine Learning Review......Page 39
Machine learning β history and definition......Page 41
What is not machine learning?......Page 42
Machine learning β concepts and terminology......Page 43
Machine learning β types and subtypes......Page 47
Datasets used in machine learning......Page 50
Machine learning applications......Page 53
Practical issues in machine learning......Page 54
Process......Page 56
Machine learning β tools and datasets......Page 60
Datasets......Page 63
Summary......Page 64
Chapter 2: Practical Approach to Real-World Supervised Learning......Page 67
Formal description and notation......Page 69
Basic label analysis......Page 70
Univariate feature analysis......Page 71
Data transformation and preprocessing......Page 72
Handling missing values......Page 73
Outliers......Page 75
Discretization......Page 76
Is sampling needed?......Page 77
Undersampling and oversampling......Page 78
Training, validation, and test set......Page 79
Filter approach......Page 84
Embedded approach......Page 89
Linear Regression......Page 90
NaΓ―ve Bayes......Page 92
Logistic Regression......Page 93
Decision Trees......Page 94
K-Nearest Neighbors (KNN)......Page 97
Support vector machines (SVM)......Page 99
Ensemble learning and meta learners......Page 103
Bootstrap aggregating or bagging......Page 104
Boosting......Page 105
Model assessment......Page 106
Model evaluation metrics......Page 107
Confusion matrix and related metrics......Page 108
ROC and PRC curves......Page 109
Model comparisons......Page 110
Comparing two algorithms......Page 111
Comparing multiple algorithms......Page 113
Case Study β Horse Colic Classification......Page 114
Machine learning mapping......Page 115
Features analysis......Page 116
Weka experiments......Page 118
RapidMiner experiments......Page 121
Results, observations, and analysis......Page 130
Summary......Page 131
References......Page 133
Chapter 3: Unsupervised Machine Learning Techniques......Page 135
Issues in common with supervised learning......Page 136
Notation......Page 137
Principal component analysis (PCA)......Page 138
Random projections (RP)......Page 141
Multidimensional Scaling (MDS)......Page 142
Kernel Principal Component Analysis (KPCA)......Page 143
Manifold learning......Page 144
k-Means......Page 146
DBSCAN......Page 147
Mean shift......Page 148
Expectation maximization (EM) or Gaussian mixture modeling (GMM)......Page 150
Hierarchical clustering......Page 151
Self-organizing maps (SOM)......Page 153
Spectral clustering......Page 154
Affinity propagation......Page 156
Internal evaluation measures......Page 158
External evaluation measures......Page 160
Outlier algorithms......Page 161
Statistical-based......Page 162
Distance-based methods......Page 163
Density-based methods......Page 164
Clustering-based methods......Page 167
High-dimensional-based methods......Page 168
One-class SVM......Page 170
Supervised evaluation......Page 172
Tools and software......Page 173
Data quality analysis......Page 174
Data sampling and transformation......Page 175
Feature analysis and dimensionality reduction......Page 176
Clustering models, results, and evaluation......Page 181
Observations and clustering analysis......Page 183
Outlier models, results, and evaluation......Page 184
Summary......Page 186
References......Page 187
Chapter 4: Semi-Supervised and Active Learning......Page 191
Semi-supervised learning......Page 193
Representation, notation, and assumptions......Page 194
Self-training SSL......Page 195
Co-training SSL or multi-view SSL......Page 196
Cluster and label SSL......Page 197
Transductive graph label propagation......Page 199
Transductive SVM (TSVM)......Page 201
Tools and software......Page 203
Business problem......Page 205
Datasets and analysis......Page 206
Experiments and results......Page 208
Active learning......Page 210
Active learning approaches......Page 211
Uncertainty sampling......Page 212
Query by disagreement (QBD)......Page 213
Advantages and limitations......Page 215
How does it work?......Page 216
Advantages and limitations......Page 217
Data Collection......Page 218
Feature analysis and dimensionality reduction......Page 219
Models, results, and evaluation......Page 220
Pool-based scenarios......Page 221
Stream-based scenarios......Page 222
Analysis of active learning results......Page 224
Summary......Page 225
References......Page 226
Chapter 5: Real-Time Stream Machine Learning......Page 229
Assumptions and mathematical notations......Page 230
Basic stream processing and computational techniques......Page 231
Stream computations......Page 232
Sliding windows......Page 233
Sampling......Page 234
Concept drift and drift detection......Page 235
Partial memory......Page 236
Detection methods......Page 237
Adaptation methods......Page 239
Linear algorithms......Page 240
Non-linear algorithms......Page 243
Ensemble algorithms......Page 245
Model validation techniques......Page 249
Incremental unsupervised learning using clustering......Page 252
Partition based......Page 253
Hierarchical based and micro clustering......Page 254
Density based......Page 258
Grid based......Page 260
Validation and evaluation techniques......Page 262
Inputs and outputs......Page 267
Distance-based clustering for outlier detection......Page 268
How does it work?......Page 269
Tools and software......Page 273
Data sampling and transformation......Page 276
Models, results, and evaluation......Page 277
Supervised learning experiments......Page 278
Clustering experiments......Page 280
Outlier detection experiments......Page 281
Analysis of stream learning results......Page 284
Summary......Page 286
References......Page 287
Chapter 6: Probabilistic Graph Modeling......Page 289
Chain rule and Bayes' theorem......Page 290
Random variables, joint, and marginal distributions......Page 291
Marginal independence and conditional independence......Page 292
Distribution queries......Page 293
Graph concepts......Page 294
Graph structure and properties......Page 295
Bayesian networks......Page 296
Reasoning patterns......Page 298
Independencies, flow of influence, D-Separation, I-Map......Page 300
Inference......Page 301
Elimination-based inference......Page 302
Propagation-based techniques......Page 309
Sampling-based techniques......Page 313
Learning......Page 314
Learning parameters......Page 316
Learning structures......Page 321
Parameterization......Page 326
Independencies......Page 327
Learning......Page 329
Conditional random fields......Page 330
Tree augmented network......Page 331
Markov chains......Page 332
Hidden Markov models......Page 334
Most probable path in HMM......Page 335
Posterior decoding in HMM......Page 336
Tools and usage......Page 337
OpenMarkov......Page 338
Weka Bayesian Network GUI......Page 340
Machine learning mapping......Page 341
Feature analysis......Page 342
Models, results, and evaluation......Page 344
Analysis of results......Page 346
Summary......Page 347
References......Page 348
Chapter 7: Deep Learning......Page 351
Inputs, neurons, activation function, and mathematical notation......Page 352
Structure and mathematical notations......Page 353
Activation functions in NN......Page 354
Training neural network......Page 355
Vanishing gradients, local optimum, and slow training......Page 362
Rectified linear activation function......Page 364
Restricted Boltzmann Machines......Page 365
Autoencoders......Page 370
Unsupervised pre-training and supervised fine-tuning......Page 374
Deep feed-forward NN......Page 375
Deep Autoencoders......Page 377
Deep Belief Networks......Page 378
Deep learning with dropouts......Page 380
Sparse coding......Page 382
Convolutional Neural Network......Page 383
CNN Layers......Page 391
Recurrent Neural Networks......Page 394
Tools and software......Page 401
Feature analysis......Page 402
Basic data handling......Page 403
Multi-layer perceptron......Page 404
Convolutional Network......Page 407
Variational Autoencoder......Page 409
DBN......Page 411
Parameter search using Arbiter......Page 412
Results and analysis......Page 413
Summary......Page 414
References......Page 415
Chapter 8: Text Mining and Natural Language Processing......Page 419
NLP, subfields, and tasks......Page 421
Text clustering......Page 422
Information extraction and named entity recognition......Page 423
Word sense disambiguation......Page 424
Automating question and answers......Page 425
Text processing components and transformations......Page 426
How does it work?......Page 427
Inputs and outputs......Page 428
Stemming or lemmatization......Page 429
Local/global dictionary or vocabulary......Page 430
Lexical features......Page 431
Syntactic features......Page 432
Vector space model......Page 434
Similarity measures......Page 437
Feature selection and dimensionality reduction......Page 438
Feature selection......Page 439
Dimensionality reduction......Page 440
Text categorization/classification......Page 441
Probabilistic latent semantic analysis (PLSA)......Page 443
Clustering techniques......Page 447
Evaluation of text clustering......Page 453
Hidden Markov models for NER......Page 454
Maximum entropy Markov models for NER......Page 456
Deep learning and NLP......Page 458
Mallet......Page 462
KNIME......Page 463
Topic modeling with mallet......Page 464
Machine Learning mapping......Page 465
Data sampling and transformation......Page 466
Feature analysis and dimensionality reduction......Page 468
Models, results, and evaluation......Page 469
Summary......Page 470
References......Page 471
Chapter 9: Big Data Machine Learning β The Final Frontier......Page 475
What are the characteristics of Big Data?......Page 477
General Big Data framework......Page 478
Big Data cluster deployment frameworks......Page 479
Data acquisition......Page 483
Data storage......Page 484
Data processing and preparation......Page 487
Visualization and analysis......Page 488
H2O as Big Data Machine Learning platform......Page 489
H2O architecture......Page 490
Tools and usage......Page 492
Business problem......Page 497
Experiments, results, and analysis......Page 498
Spark architecture......Page 501
Machine Learning in MLlib......Page 504
Experiments, results, and analysis......Page 505
Real-time Big Data Machine Learning......Page 510
SAMOA as a real-time Big Data Machine Learning framework......Page 512
Machine Learning algorithms......Page 514
Tools and usage......Page 515
The future of Machine Learning......Page 516
Summary......Page 518
References......Page 519
Vector......Page 521
Transpose of a matrix......Page 522
Matrix multiplication......Page 523
Singular value decomposition (SVD)......Page 526
Bayes' theorem......Page 529
Mean......Page 530
Standard deviation......Page 531
Covariance......Page 532
Binomial distribution......Page 533
Gaussian distribution......Page 534
Error propagation......Page 535
Index......Page 537
β¦ Subjects
Computer Science;Programming
π SIMILAR VOLUMES
<p><b>Design, build, and deploy your own machine learning applications by leveraging key Java machine learning libraries</b><p><b>About This Book</b><p><li>Develop a sound strategy to solve predictive modelling problems using the most popular machine learning Java libraries<li>Explore a broad variet