Machine Learning in Python for Process Systems Engineering: Achieving operational excellence using process data

✍ Scribed by Ankur Kumar, Jesus Flores-Cerrillo

Year: 2022
Tongue: English
Leaves: 352
Category: Library

No coin nor oath required. For personal study only.

✦ Synopsis

Perhaps you are reading this book because you too have been inspired by the capabilities
of machine learning and would like to use it to solve problems being faced by your
organization. However, you might be struggling to find a definite guide that can help you
decide which specific methodology to chose among the myriad of available
methodologies. You may have come across a nice research article that showcases an
interesting process systems application of a ML method. However, you might be facing
difficulties trying to understand the intricate details of the algorithm. We won’t be surprised
if you have struggled to find a data-science book that caters to the needs of a process
systems engineer, considers unique characteristics of industrial process systems, and
uses industrial-scale process systems for illustrations. We, the authors, have been in that
phase. A process engineer will arguably find it more relevant and useful to learn principal
component analysis (PCA) by working through a process monitoring application (the most
popular application area of PCA in process industry) and learning how to compute the
monitoring metrics. Similar arguments could be made for several other popular ML
methods. There is a gap in available machine learning resources for industrial
practitioners and this book attempts to cover this gap.

In one sense, we wrote this book for our younger selves; a book that we wish had existed
when we started experimenting with machine learning techniques. Drawing from our
years of experience in developing data-driven industrial solutions, this book has been
written with the focus on de-cluttering the world of machine learning, giving a
comprehensive exposition of ML tools that have proven useful in process industry,
providing step-by-step elucidation of implementation details, cautioning against the`
pitfalls and listing various tips & tricks that we have encountered over the years, and using
dataset from industrial-scale process systems for illustrations. We strongly believe in
‘learning by doing’ and therefore we encourage the readers to work through in-chapter
illustrations as they follow along the text. For reader’s assistance, Jupyter notebooks with
complete code implementations are available for download. We have chosen Python as
the coding language for the book as it convenient to use, has large collection of ML
libraries, and is the de facto standard language for ML. No prior experience with Python
is assumed. The book has been designed to teach machine learning from scratch and
upon completion, the reader will feel comfortable at using ML techniques.

✦ Table of Contents

Preface
Part 1 Introduction and Fundamentals
• Chapter 1 Machine Learning for Process Systems Engineering
o 1.1 What are Process Systems
▪ 1.1.1 Characteristics of process data
o 1.2 What is Machine Learning
▪ 1.2.1 Machine learning workflow
▪ 1.2.2 Type of machine learning systems
o 1.3 Machine Learning Applications in Process Industry
▪ 1.3.1 Decision hierarchy levels in a process plant
▪ 1.3.2 Application areas
o 1.4 ML Solution Deployment
o 1.5 The Future of Process Data Science
• Chapter 2 The Scripting Environment
o 2.1 Introduction to Python
o 2.2 Introduction to Spyder and Jupyter
o 2.3 Python Language: Basics
o 2.4 Scientific Computing Packages: Basics
▪ 2.4.1 Numpy
▪ 2.4.2 Pandas
o 2.5 Typical ML Script 20
• Chapter 3 Machine Learning Model Development: Workflow and Best Practices
o 3.1 ML Model Development Workflow
o 3.2 Data Pre-processing: Data Transformation
▪ 3.2.1 (Robust) Data centering & scaling
▪ 3.2.2 Feature extraction
▪ 3.2.3 Feature engineering
▪ 3.2.4 Workflow automation via pipelines
o 3.3 Model Evaluation
▪ 3.3.1 Regression metrics
▪ 3.3.2 Classification metrics
▪ 3.3.3 Holdout method / cross-validation
▪ 3.3.4 Residual analysis
o 3.4 Model Tuning
▪ 3.4.1 Overfitting & underfitting
▪ 3.4.2 Train/validation/test split
▪ 3.3.3 K-fold cross-validation
▪ 3.4.4 Regularization
▪ 3.4.5 Hyperparameter optimization via GridSearchCV 39
• Chapter 4 Data Pre-processing: Cleaning Process Data
o 4.1 Signal De-noising
▪ 4.1.1 Moving window average filter
▪ 4.1.2 SG filter 674.2 Variable Selection/Feature Selection ▪ 4.2.1 Filter methods ▪ 4.2.2 Wrapper methods ▪ 4.2.3 Embedded methods 4.3 Outlier Handling ▪ 4.3.1 Univariate methods ▪ 4.3.2 Multivariate methods ▪ 4.3.3 Data-mining methods 4.4 Handling Missing Data Part 2 Classical Machine Learning Methods • Chapter 5 Dimension Reduction and Latent Variable Methods (Part 1) o 5.1 PCA: An Introduction ▪ 5.1.1 Mathematical background ▪ 5.1.2 Dimensionality reduction for polymer manufacturing process o 5.2 Process Monitoring via PCA for Polymer Manufacturing Process ▪ 5.2.1 Process monitoring/fault detection indices ▪ 5.2.2 Fault detection ▪ 5.2.3 Fault diagnosis o 5.3 Variants of Classical PCA ▪ 5.3.1 Dynamic PCA ▪ 5.3.2 Multiway PCA ▪ 5.3.3 Kernel PCA o 5.4 PLS: An Introduction ▪ 5.4.1 Mathematical background o 5.5 Soft Sensing via PLS for Pulp & Paper Manufacturing Process o 5.6 Process monitoring via PLS for Polyethylene Manufacturing Process ▪ 5.6.1 Fault detection indices ▪ 5.6.2 Fault detection o 5.7 Variants of Classical PLS • Chapter 6 Dimension Reduction and Latent Variable Methods (Part 2) o 6.1 ICA: An Introduction ▪ 6.1.1 Mathematical background ▪ 6.1.2 Complex chemical process: Tennessee Eastman Process ▪ 6.1.3 Deciding number of ICs o 6.2 Process Monitoring via ICA for Tennessee Eastman Process ▪ 6.2.1 Fault detection indices ▪ 6.2.2 Fault detection 6.3 FDA: An Introduction ▪ 6.3.1 Mathematical background ▪ 6.3.2 Dimensionality reduction for Tennessee Eastman Process o 6.4 Fault Classification via FDA for Tennessee Eastman Process 120 • Chapter 7 Support Vector Machines & Kernel-based Learning o 7.1 SVMs: An Introduction ▪ 7.1.1 Mathematical background ▪ 7.1.2 Hard margin vs soft margin classification o 7.2 The Kernel Trick for Nonlinear Data ▪ 7.2.1 Mathematical background 7.3 SVDD: An Introduction 142
7.3.1 Mathematical background
7.3.2 OC-SVM vs SVDD
7.3.3 Bandwidth parameter and SVDD illustration
7.4 Process Fault Detection via SVDD
7.5 SVR: An Introduction
▪ 7.5.1 Mathematical background
7.6 Soft Sensing via SVR in a Polymer Processing Plant
7.7 Soft Sensing via SVR for Debutanizer Column in a Petroleum Refinery
• Chapter 8 Finding Groups in Process Data: Clustering & Mixture Modeling
o 8.1 Clustering: An Introduction
▪ 8.1.1 Multimode semiconductor manufacturing process
o 8.2 Centroid-based Clustering: K-Means
▪ 8.2.1 Determining the number of clusters via elbow method
▪ 8.2.2 Silhouette analysis for quantifying clusters quality
▪ 8.2.3 Pros and cons
8.3 Density-based Clustering: DBSCAN
▪ 8.3.3 Pros and cons
o 8.4 Probabilistic Clustering: Gaussian Mixtures
▪ 8.4.1 Mathematical background
▪ 8.4.2 Determining the number of clusters
o 8.5 Multimode Process Monitoring via GMM for Semiconductor Manufacturing Process
▪ 8.5.1 Fault detection indices
▪ 8.5.2 Fault detection • Chapter 9 Decision Trees & Ensemble Learning
o 9.1 Decision Trees: An Introduction
▪ 9.1.1 Mathematical background
o 9.2 Random Forests: An Introduction
▪ 9.2.1 Mathematical background
o 9.3 Soft Sensing via Random Forest in Concrete Construction Industry
▪ 9.3.1 Feature importances
o 9.4 Introduction to Ensemble Learning
▪ 9.4.1 Bagging
▪ 9.4.2 Boosting
o 9.5 Effluent Quality Prediction in Wastewater Treatment Plant via XGBoost 192
• Chapter 10 Other Useful Classical ML Techniques
o 10.1 KDE: An Introduction
▪ 10.1.1 Mathematical background
▪ 10.1.2 Deciding KDE hyperparameters
o 10.2 Determining Monitoring Metric Control Limit via KDE
o 10.3 kNN: An Introduction
▪ 10.3.1 Mathematical background
▪ 10.3.2 Deciding kNN hyperparameters
▪ 10.3.3 Applications of kNN for process systems
o 10.4 Process Fault Detection via kNN for semiconductor Manufacturing Process
o 10.5 Combining ML Techniques 214
Part 3 Artificial Neural Networks & Deep Learning
• Chapter 11 Feedforward Neural Networks
o 11.1 ANN: An Introduction
▪ 11.1.1 Deep learning
▪ 11.1.2 TensorFlow
o 11.2 Process Modeling via FFNN for Combined Cycle Power Plant
o 11.3 Mathematical Background
▪ 11.3.1 Activation functions
▪ 11.3.2 Loss functions & cost functions
▪ 11.3.3 Gradient descent optimization
▪ 11.3.4 Epochs & batch-size
▪ 11.3.5 Backpropagation
▪ 11,3,6 Vanishing/Exploding gradients
o 11.4 Nonlinearity in Neural Nets (Width vs Depth)
o 11.5 Neural Net Hyperparameter Optimization
o 11.6 Strategies for Improved Network Training
▪ 11.6.1 Early stopping
▪ 11.6.2 Regularization
▪ 11.6.3 Initialization
▪ 11.6.4 Batch normalization
o 11.7 Soft Sensing via FFNN for Debutanizer Column in a Petroleum Refinery
o FFNN Modeling Guidelines
• Chapter 12 Recurrent Neural Networks
o 12.1 RNN: An Introduction
▪ 12.1.1 RNN outputs
▪ 12.1.2 LSTM networks
o 12.2 System Identification via LSTM RNN for SISO Heater System
o 12.3 Mathematical Background
o 12.4 Stacked/Deep RNNs
o 12.5 Fault Classification vis LSTM for Tennessee Eastman Process
o 12.6 Predictive Maintenance using LSTM Networks
▪ 12.6.1 Failure prediction using LSTM
▪ 12.6.2 Remaining useful life (RUL) prediction using LSTM 256
• Chapter 13 Reinforcement Learning
o 13.1 Reinforcement Learning: An Introduction
▪ 13.1.1 RL for process control
o 13.2 RL Terminology & Mathematical Concepts
▪ 13.2.1 Environment and Markov decision process
▪ 13.2.2 Reward and return
▪ 13.2.3 Policy
▪ 13.2.4 Value function
▪ 13.2.5 Bellman equation
o 13.3 Fundamentals of Q-learning
o 13.4 Deep RL & Actor-Critic Framework
▪ 13.4.1 Deep Q-learning
▪ 13.4.2 Policy gradient methods
▪ 13.4.3 Actor-Critic framework
o 13.5 Deep Deterministic Policy Gradient (DDPG)
▪ 13.5.1 Replay memory 285`
13.5.2 Target networks
13.5.3 OU process as exploration noise
13.6 DDPG RL Agent as Level Controller
Part 4 Deploying ML Solutions Over Web
• Chapter 14 Process Monitoring Web Application
o 14.1 Process Monitoring Web App: Introduction
o 14.2 A Simple ‘Hello World’ Web App
o 14.3 Embedding ML Models into Web Apps
o 14.4 Building Front-end User Interface
Appendix
Dataset Descriptions

📜 SIMILAR VOLUMES

Machine Learning in Python for Dynamic P

📁 Machine Learning in Python for Dynamic Process Systems

✍ Ankur Kumar, Jesus Flores-Cerrillo 📂 Library 📅 2023 🏛 Leanpub 🌐 English

This book provides a comprehensive coverage of Machine Learning (ML) methods that have proven useful in process industry for dynamic process modeling. Step-by-step instructions, supported with industry-relevant case studies, show (using Python) how to develop solutions for process modeling, process

Data Processing and Reconciliation for C

📁 Data Processing and Reconciliation for Chemical Process Operations: Volume 2 (Process Systems Engineering)

✍ José A. Romagnoli; Mabel Cristina Sanchez 📂 Library 📅 1999 🌐 English

Computer techniques have made online measurements available at every sampling period in a chemical process. However, measurement errors are introduced that require suitable techniques for data reconciliation and improvements in accuracy. Reconciliation of process data and reliable monitoring are ess

Data Processing and Reconciliation for C

📁 Data Processing and Reconciliation for Chemical Process Operations: Volume 2 (Process Systems Engineering)

✍ José Alberto Romagnoli, Mabel Cristina Sánchez 📂 Library 📅 2000 🏛 Academic Press 🌐 English

Dirty Data Processing for Machine Learni

📁 Dirty Data Processing for Machine Learning

✍ Zhixin Qi, Hongzhi Wang, Zejiao Dong 📂 Library 📅 2023 🏛 Springer 🌐 English

In both the database and machine learning communities, data quality has become a serious issue which cannot be ignored. In this context, we refer to data with quality problems as “dirty data.” Clearly, for a given data mining or machine learning task, dirty data in both training and test da

Scala for Machine Learning - Second Edit

📁 Scala for Machine Learning - Second Edition: Build systems for data processing, machine learning, and deep learning

✍ Patrick R. Nicolas 📂 Library 🏛 Packt Publishing 🌐 English

<h4>Key Features</h4><ul><li>Explore a broad variety of data processing, machine learning, and genetic algorithms through diagrams, mathematical formulation, and updated source code in Scala</li><li>Take your expertise in Scala programming to the ne

Scala for Machine Learning - Second Edit

📁 Scala for Machine Learning - Second Edition: Build systems for data processing, machine learning, and deep learning

✍ Patrick R. Nicolas 📂 Library 🏛 Packt Publishing 🌐 English