<P><STRONG>Big Data in Omics and Imaging: Association Analysis</STRONG> addresses the recent development of association analysis and machine learning for both population and family genomic data in sequencing era. It is unique in that it presents both hypothesis testing and a data mining approach to
Big data in omics and imaging: association analysis
✍ Scribed by Akey, Joshua M.; Xiong, Momiao
- Publisher
- Taylor & Francis;CRC Press
- Year
- 2018
- Tongue
- English
- Leaves
- 701
- Series
- Chapman and Hall/CRC mathematical & computational biology series
- Category
- Library
No coin nor oath required. For personal study only.
✦ Table of Contents
Cover......Page 1
Halt Title......Page 2
Series Page......Page 3
Title Page......Page 6
Copyright Page......Page 7
Dedication......Page 8
Table of Contents......Page 10
Preface......Page 26
Author......Page 32
1.1 Sparsity-Inducing Norms, Dual Norms, and Fenchel Conjugate......Page 34
1.1.1 “Entrywise” Norms......Page 37
1.1.2.1 l1/l2 Norm......Page 38
1.1.3 Overlapping Groups......Page 39
1.1.4 Dual Norm......Page 41
1.1.4.1 The Norm Dual to the Group Norm......Page 42
1.1.5 Fenchel Conjugate......Page 43
1.1.6 Fenchel Duality......Page 46
1.2 Subdifferential......Page 49
1.2.1 Definition of Subgradient......Page 50
1.2.3 Calculus of Subgradients......Page 51
1.2.3.4 Pointwise Maximum......Page 52
1.2.3.6 Expectation......Page 54
1.2.3.8 Subdifferential of the Norm......Page 55
1.2.3.9 Optimality Conditions: Unconstrained......Page 56
1.2.3.10 Application to Sparse Regularized Convex Optimization Problems......Page 57
1.3.1 Introduction......Page 59
1.3.2.1 Definition of Proximal Operator......Page 60
1.3.3.1 Separable Sum......Page 61
1.3.3.2 Moreau–Yosida Regularization......Page 66
1.3.4 Proximal Algorithms......Page 69
1.3.4.2 Proximal Gradient Method......Page 70
1.3.4.3 Accelerated Proximal Gradient Method......Page 71
1.3.4.4 Alternating Direction Method of Multipliers......Page 72
1.3.4.5 Linearized ADMM......Page 74
1.3.5.1 Generic Function......Page 75
1.3.5.2 Norms......Page 83
1.4.1 Derivative of a Function with Respect to a Vector......Page 88
1.4.2 Derivative of a Function with Respect to a Matrix......Page 89
1.4.3 Derivative of a Matrix with Respect to a Scalar......Page 90
1.4.4 Derivative of a Matrix with Respect to a Matrix or a Vector......Page 91
1.4.6.1 Vector Function of Vectors......Page 92
1.4.7.1 Determinants......Page 93
1.4.7.3 Trace......Page 94
1.5 Functional Principal Component Analysis (FPCA)......Page 96
1.5.1.1 Least Square Formulation of PCA......Page 97
1.5.1.2 Variance-Maximization Formulation of PCA......Page 98
1.5.2.1 Calculus of Variation......Page 101
1.5.2.2 Stochastic Calculus......Page 102
1.5.3 Unsmoothed Functional Principal Component Analysis......Page 104
1.5.4 Smoothed Principal Component Analysis......Page 106
1.5.5 Computations for the Principal Component Function and the Principal Component Score......Page 108
1.6.1 Mathematical Formulation of Canonical Correlation Analysis......Page 110
1.6.2 Correlation Maximization Techniques for Canonical Correlation Analysis......Page 111
1.6.3 Single Value Decomposition for Canonical Correlation Analysis......Page 115
1.6.4 Test Statistics......Page 116
1.6.5 Functional Canonical Correlation Analysis......Page 120
Appendix 1A......Page 123
Exercises......Page 125
2.1 Concepts of Linkage Disequilibrium......Page 128
2.2.1 Linkage Disequilibrium Coefficient D......Page 129
2.2.3 Correlation Coefficient r......Page 130
2.2.4 Composite Measure of Linkage Disequilibrium......Page 134
2.2.5 Relationship Between the Measure of LD and Physical Distance......Page 135
2.3 Haplotype Reconstruction......Page 136
2.3.3 Bayesian and Coalescence-Based Methods......Page 137
2.4.1 Mutual Information Measure of LD......Page 138
2.4.2 Multi-Information and Multilocus Measure of LD......Page 140
2.4.3 Joint Mutual Information and a Measure of LD between a Marker and a Haplotype Block or between Two Haplotype Blocks......Page 142
2.4.4 Interaction Information......Page 145
2.4.5 Conditional Interaction Information......Page 147
2.4.7 Distribution of Estimated Mutual Information, Multi-Information, and Interaction Information......Page 148
2.5.1 Association Measure between Two Genomic Regions Based on CCA......Page 152
2.5.2 Relationship between Canonical Correlation and Joint Information......Page 155
Bibliographical Notes......Page 156
Appendix 2A......Page 157
Appendix 2B......Page 158
Appendix 2C......Page 159
Exercises......Page 161
3.1.1 Introduction......Page 164
3.1.2 The Hardy–Weinberg Equilibrium......Page 166
3.1.3 Genetic Models......Page 169
3.1.4 Odds Ratio......Page 172
3.1.5.1 Contingency Tables......Page 176
3.1.5.2 Fisher’s Exact Test......Page 179
3.1.5.3 The Traditional χ2 Test Statistic......Page 180
3.1.6 Multimarker Association Analysis......Page 183
3.1.6.1 Generalized T2 Test Statistic......Page 184
3.1.6.2 The Relationship between the Generalized T2 Test and Fisher’s Discriminant Analysis......Page 185
3.2 Population-Based Multivariate Association Analysis for Next-Generation Sequencing......Page 187
3.2.1.1 Collapsing Method......Page 188
3.2.1.2 Combined Multivariate and Collapsing Method......Page 189
3.2.1.3 Weighted Sum Method......Page 190
3.2.2.1 Score Function......Page 191
3.2.2.2 Score Tests......Page 193
3.2.3.1 Weighted Function Method......Page 194
3.2.3.2 Sum Test and Adaptive Association Test......Page 197
3.2.3.3 The Sum Test......Page 198
3.2.4.1 Logistic Mixed Effects Models for Association Analysis......Page 200
3.2.4.2 Sequencing Kernel Association Test......Page 210
3.3 Population-Based Functional Association Analysis for Next-Generation Sequencing......Page 211
3.3.1 Introduction......Page 212
3.3.2.1 Model and Principal Component Functions......Page 213
3.3.2.2 Computations for the Principal Component Function and the Principal Component Score......Page 215
3.3.2.3 Test Statistic......Page 217
3.3.3 Smoothed Functional Principal Component Analysis for Association Test......Page 219
3.3.3.1 A General Framework for the Smoothed Functional Principal Component Analysis......Page 220
3.3.3.2 Computations for the Smoothed Principal Component Function......Page 221
3.3.3.4 Power Comparisons......Page 223
3.3.3.5 Application to Real Data Examples......Page 226
Appendix 3A: Fisher Information Marix for γ......Page 229
Appendix 3B: Variance Function v(µ)......Page 231
Appendix 3C: Derivation of Score Function for Uτ......Page 232
Appendix 3D: Fisher Information Matrix of PQL......Page 233
Appendix 3E: Scoring Algorithm......Page 235
Appendix 3F: Equivalence between Iteratively Solving Linear Mixed Model and Iteratively Solving the Normal Equation......Page 236
Appendix 3G: Equation Reduction......Page 237
Exercises......Page 240
4.1.2.1 Variation Partition......Page 244
4.1.2.2 Genetic Additive and Dominance Effects......Page 246
4.1.2.3 Genetic Variance......Page 248
4.1.3 Linear Regression for a Quantitative Trait......Page 249
4.1.4 Multiple Linear Regression for a Quantitative Trait......Page 253
4.2.1.1 Model......Page 256
4.2.1.2 Parameter Estimation......Page 257
4.2.1.3 Test Statistics......Page 262
4.2.2.1 Multivariate Canonical Correlation Analysis......Page 264
4.3.1.1 Kernel and Nonlinear Feature Mapping......Page 266
4.3.1.2 The Reproducing Kernel Hilbert Space......Page 270
4.3.2.1 Hilbert–Schmidt Operator and Norm......Page 277
4.3.2.2 Tensor Product Space and Rank-One Operator......Page 279
4.3.2.3 Cross-Covariance Operator......Page 283
4.3.2.4 Dependence Measure and Covariance Operator......Page 287
4.3.2.5 Dependence Measure and Hilbert–Schmidt Norm of Covariance Operator......Page 288
4.3.2.6 Kernel-Based Association Tests......Page 290
4.4.1 Power Evaluation......Page 293
4.4.2 Application to Real Data Examples......Page 294
Software Package......Page 297
Appendix 4A: Convergence of the Least Square Estimator of the Regression Coefficients......Page 300
Appendix 4B: Convergence of Regression Coefficients in the Functional Linear Model......Page 305
Appendix 4D: Solution to the Constrained Nonlinear Covariance Optimization Problem and Dependence Measure......Page 308
Exercises......Page 311
5.1 Pleiotropic Additive and Dominance Effects......Page 314
5.2.1 Models......Page 316
5.2.2.1 Least Square Estimation......Page 317
5.2.2.2 Maximum Likelihood Estimator......Page 322
5.2.3.1 Classical Null Hypothesis......Page 327
5.2.3.2 The Multivariate General Linear Hypothesis......Page 328
5.2.3.3 Estimation of the Parameter Matrix under Constraints......Page 329
5.2.3.4 Multivariate Analysis of Variance (MANOVA)......Page 330
5.2.3.5 Other Multivariate Test Statistics......Page 331
5.3.1 Multivariate Multiple Linear Regression Models......Page 337
5.3.2 Multivariate Functional Linear Models for Gene-Based Genetic Analysis of Multiple Phenotypes......Page 339
5.3.2.1 Parameter Estimation......Page 340
5.3.2.2 Null Hypothesis and Test Statistics......Page 341
5.3.2.3 Other Multivariate Test Statistics......Page 342
5.3.2.5 F Approximation to the Distribution of Three Test Statistics......Page 343
5.4.1 Multivariate Canonical Correlation Analysis (CCA)......Page 344
5.4.2 Kernel CCA......Page 345
5.4.3 Functional CCA......Page 347
5.4.4 Quadratically Regularized Functional CCA......Page 350
5.5 Dependence Measure and Association Tests of Multiple Traits......Page 352
5.6.1 Principal Component Analysis......Page 354
5.6.2 Kernel Principal Component Analysis......Page 355
5.6.3 Quadratically Regularized PCA or Kernel PCA......Page 358
5.7.1 Sum of Squared Score Test......Page 359
5.7.2 Unified Score-Based Association Test (USAT)......Page 361
5.7.4 FPCA-Based Kernel Measure Test of Independence......Page 362
5.8 Connection between Statistics......Page 363
5.9.1 Type 1 Error Rate and Power Evaluation......Page 368
5.9.2 Application to Real Data Example......Page 369
Appendix 5A Optimization Formulation of Kernel CCA......Page 370
Appendix 5B Derivation of the Regression Coefficient Matrix in the Functional Linear Mode, Sum of Squares due to Regression, and RFCCA Matrix......Page 372
Exercises......Page 373
Chapter 6: Family-Based Association Analysis......Page 376
6.1.1 Kinship Coefficients......Page 377
6.1.2 Identity Coefficients......Page 380
6.1.3 Relation between Identity Coefficients and Kinship Coefficients......Page 381
6.1.4.1 A General Framework for Identity by Descent......Page 383
6.1.4.2 Kinship Matrix or Genetic Relationship Matrix in the Homogeneous Population......Page 385
6.1.4.3 Kinship Matrix or Genetic Relationship Matrix in the General Population......Page 386
6.1.4.4 Coefficient of Fraternity......Page 390
6.2.1 Assumptions and Genetic Models......Page 391
6.2.2 Analysis for Genetic Covariance between Relatives......Page 392
6.3.1.1 Single Random Variable......Page 395
6.3.1.2 Multiple Genetic Random Effects......Page 398
6.3.2.1 Mixed Linear Model......Page 399
6.3.2.2 Estimating Fixed and Random Effects......Page 400
6.3.3.1 ML Estimation of Variance Components......Page 403
6.3.3.2 Restricted Maximum Likelihood Estimation......Page 406
6.3.3.3 Numerical Solutions to the ML/REML Equations......Page 407
6.3.3.4 Fisher Information Matrix for the ML Estimators......Page 410
6.3.3.5 Expectation/Maximization (EM) Algorithm for ML Estimation......Page 411
6.3.3.6 Expectation/Maximization (EM) Algorithm for REML Estimation......Page 415
6.3.4 Hypothesis Test in Mixed Linear Models......Page 416
6.3.5.1 Sequence Kernel Association Test (SKAT)......Page 420
6.4.1 Mixed Functional Linear Models (Type 1)......Page 423
6.4.2 Mixed Functional Linear Models (Type 2: Functional Variance Component Models)......Page 426
6.5.1 Multivariate Mixed Linear Model......Page 428
6.5.2 Maximum Likelihood Estimate of Variance Components......Page 431
6.5.3 REML Estimate of Variance Components......Page 432
6.6.1.1 Definition of Narrow-Sense Heritability......Page 433
6.6.1.2 Mixed Linear Model for Heritability Estimation......Page 434
6.6.2.1 Definition of Heritability Matrix for Multiple Traits......Page 437
6.6.2.2 Connection between Heritability Matrix and€Multivariate Mixed Linear Models......Page 438
6.6.2.3 Another Interpretation of Heritability......Page 439
6.6.2.4 Maximizing Heritability......Page 441
6.7.1 The Generalized T2 Test with Families and Additional Population Structures......Page 443
6.7.2 Collapsing Method......Page 447
6.7.3 CMC with Families......Page 449
6.7.4 The Functional Principal Component Analysis and Smoothed Functional Principal Component Analysis with Families......Page 451
Appendix 6A: Genetic Relationship Matrix......Page 453
Appendix 6B: Derivation of Equation 6.30......Page 456
Appendix 6C: Derivation of Equation 6.33......Page 459
Appendix 6D: ML Estimation of Variance Components......Page 461
Appendix 6E: Covariance Matrix of the ML Estimators......Page 462
Appendix 6F: Selection of the Matrix K in the REML......Page 464
Appendix 6G: Alternative Form of Log-Likelihood Function for the REML......Page 466
Appendix 6H: ML Estimate of Variance Components in the Multivariate Mixed Linear Models......Page 469
Appendix 6I: Covariance Matrix for Family-Based T2 Statistic......Page 471
Appendix 6J: Family-Based Functional Principal Component Analysis......Page 473
Exercise......Page 476
Chapter 7: Interaction Analysis......Page 480
7.1.1.1 The Binary Measure of Gene–Gene Interaction for the Cohort Study Design......Page 481
7.1.1.2 The Binary Measure of Gene–Gene Interaction for the Case–Control Study Design......Page 485
7.1.2 Disequilibrium Measure of Gene–Gene and Gene–Environment Interactions......Page 486
7.1.3 Information Measure of Gene–Gene and Gene–Environment Interactions......Page 488
7.1.4.1 Multiplicative Measure of Interaction between a€Gene and a Continuous Environment......Page 491
7.1.4.2 Disequilibrium Measure of Interaction between a€Gene and a Continuous Environment......Page 492
7.1.4.3 Mutual Information Measure of Interaction between a Gene and a Continuous Environment......Page 493
7.2.1 Relative Risk and Odds-Ratio-Based Statistics for Testing Interaction between a Gene and a Discrete Environment......Page 495
7.2.2.1 Standard Disequilibrium Measure–Based Statistics......Page 497
7.2.2.2 Composite Measure of Linkage Disequilibrium for€Testing Interaction between Unlinked Loci......Page 499
7.2.3 Information-Based Statistics for Testing Gene–Gene Interaction......Page 502
7.2.4 Haplotype Odds Ratio and Tests for Gene–Gene Interaction......Page 505
7.2.4.1 Genotype-Based Odds Ratio Multiplicative Interaction Measure......Page 506
7.2.4.2 Allele-Based Odds Ratio Multiplicative Interaction Measure......Page 507
7.2.4.3 Haplotype-Based Odds Ratio Multiplicative Interaction Measure......Page 509
7.2.4.4 Haplotype-Based Odds Ratio Multiplicative Interaction Measure–Based Test Statistics......Page 512
7.2.5 Multiplicative Measure-Based Statistics for Testing Interaction between a Gene and a Continuous Environment......Page 513
7.2.7 Real Example......Page 514
7.3 Statistics for Testing Gene–Gene and Gene–Environment Interaction for a Qualitative Trait with Next-Generation Sequencing Data......Page 519
7.3.1 Multiple Logistic Regression Model for Gene–Gene Interaction Analysis......Page 520
7.3.2 Functional Logistic Regression Model for Gene–Gene Interaction Analysis......Page 521
7.4 Statistics for Testing Gene–Gene and Gene–Environment Interaction for Quantitative Traits......Page 525
7.4.1 Genetic Models for Epistasis Effects of Quantitative Traits......Page 526
7.4.2 Regression Model for Interaction Analysis with Quantitative Traits......Page 531
7.4.3.1 Model......Page 532
7.4.3.2 Parameter Estimation......Page 533
7.4.3.3 Test Statistics......Page 536
7.4.3.4 Simulations and Applications to Real Example......Page 537
7.4.4.1 Model......Page 540
7.4.4.2 Parameter Estimation......Page 542
7.4.4.3 Test Statistics......Page 544
7.4.4.4 Simulations and Real Example Applications......Page 545
7.5 Multivariate and Functional Canonical Correlation as a Unified Framework for Testing for Gene–Gene and Gene–Environment Interaction for both Qualitative and Quantitative Traits......Page 549
7.5.1.1 Single Quantitative Trait......Page 550
7.5.1.3 A Qualitative Trait......Page 551
7.5.2 CCA and Functional CCA......Page 552
7.5.3 Kernel CCA......Page 554
Appendix 7A: Variance of Logarithm of Odds Ratio......Page 555
Appendix 7B: Haplotype Odds-Ratio Interaction Measure......Page 557
Appendix 7C: Parameter Estimation For Multivariate Functional Regression Model......Page 558
Exercise......Page 560
Chapter 8: Machine Learning, Low-Rank Models, and Their Application to Disease Risk Prediction and Precision Medicine......Page 564
8.1.1 Two-Class Logistic Regression......Page 565
8.1.2 Multiclass Logistic Regression......Page 567
8.1.3 Parameter Estimation......Page 569
8.1.4 Test Statistics......Page 575
8.1.5.1 Model......Page 576
8.1.5.2 Proximal Method for Parameter Estimation......Page 580
8.1.6.1 Model......Page 581
8.1.6.2 Proximal Method for Parameter Estimation in Multiclass Logistic Regression......Page 583
8.2.1 Fisher’s Linear Discriminant Analysis for Two Classes......Page 585
8.2.2 Multiclass Fisher’s Linear Discriminant Analysis......Page 589
8.2.3.1 Matrix Formulation of Linear Discriminant Analysis......Page 591
8.2.3.3 Connection between LDA and CCA......Page 594
8.3 Support Vector Machine......Page 595
8.3.2.1 Separable Case......Page 596
8.3.2.2 Nonseparable Case......Page 599
8.3.2.3 The Karush–Kuhn–Tucker (KKT) Conditions......Page 601
8.3.2.4 Sequential Minimal Optimization (SMO) Algorithm......Page 603
8.3.4 Penalized SVMs......Page 608
8.4.1.1 Formulation......Page 613
8.4.1.2 Interpretation......Page 615
8.4.2.2 Sparse PCA......Page 616
8.5.1 Quadratically Regularized Canonical Correlation Analysis......Page 618
8.5.2.1 Least Square Formulation of CCA......Page 619
8.5.2.2 CCA for Multiclass Classification......Page 628
8.5.3.1 Sparse Singular Value Decomposition via Penalized Matrix Decomposition......Page 629
8.5.3.2 Sparse CCA via Direct Regularization Formulation......Page 632
8.6.1 Sufficient Dimension Reduction (SDR) and Sliced Inverse Regression (SIR)......Page 634
8.6.2.1 Coordinate Hypothesis......Page 638
8.6.2.2 Reformulation of SIR for SDR as an Optimization Problem......Page 639
8.6.2.3 Solve Sparse SDR by Alternative Direction Method of Multipliers......Page 640
8.6.2.4 Application to Real Data Examples......Page 643
Software Package......Page 644
Appendix 8A: Proximal Method for Parameter Estimation in Network-Penalized Two-Class Logistic Regression......Page 648
Appendix 8B: Equivalence of Optimal Scoring and LDA......Page 654
Appendix 8C: A Distance from a Point to the Hyperplane......Page 655
Appendix 8D: Solving a Quadratically Regularized PCA Problem......Page 657
Appendix 8E: The Eckart–Young Theorem......Page 659
Appendix 8F Poincare Separation Theorem......Page 663
Appendix 8G: Regression for CCA......Page 665
Appendix 8H: Partition of Global SDR for a Whole Genome into a Number of Small Regions......Page 667
Appendix 8I: Optimal Scoring and Alternative Directionmethods of Multipliers (ADMM) Algorithms......Page 670
Exercises......Page 674
References......Page 678
Index......Page 688
✦ Subjects
Bi
📜 SIMILAR VOLUMES
<strong>Big Data in Omics and Imaging: Association Analysis</strong>addresses the recent development of association analysis and machine learning for both population and family genomic data in sequencing era. It is unique in that it presents both hypothesis testing and a data mining approach to holi
"Big Data in Omics and Imaging: Integrated Analysis and Causal Inference addresses the recent development of integrated genomic, epigenomic and imaging data analysis and causal inference in big data era. Despite significant progress in dissecting the genetic architecture of complex diseases by genom
<span>This book provides state-of-the-art coverage of deep learning applications in image analysis. The book demonstrates various deep learning algorithms that can offer practical solutions for various image-related problems; also how these algorithms are used by scientists and scholars in industry
<span>This book provides state-of-the-art coverage of deep learning applications in image analysis. The book demonstrates various deep learning algorithms that can offer practical solutions for various image-related problems; also how these algorithms are used by scientists and scholars in industry
<p>The concept of ridges has appeared numerous times in the image processing liter ature. Sometimes the term is used in an intuitive sense. Other times a concrete definition is provided. In almost all cases the concept is used for very specific ap plications. When analyzing images or data sets, it