This book presents the novel approach of analyzing large-sized rectangular-shaped numerical data (so-called big data). The essence of this approach is to grasp the "meaning" of the data instantly, without getting into the details of individual data. Unlike conventional approaches of principal compon
Principal Component Analysis and Randomness Test for Big Data Analysis: Practical Applications of RMT-Based Technique
✍ Scribed by Mieko Tanaka-Yamawaki, Yumihiko Ikura
- Publisher
- Springer-JAFEE
- Year
- 2023
- Tongue
- English
- Leaves
- 153
- Series
- Evolutionary Economics and Social Complexity Science, 25
- Category
- Library
No coin nor oath required. For personal study only.
✦ Synopsis
This book presents the novel approach of analyzing large-sized rectangular-shaped numerical data (so-called big data). The essence of this approach is to grasp the "meaning" of the data instantly, without getting into the details of individual data. Unlike conventional approaches of principal component analysis, randomness tests, and visualization methods, the authors' approach has the benefits of universality and simplicity of data analysis, regardless of data types, structures, or specific field of science.
First, mathematical preparation is described. The RMT-PCA and the RMT-test utilize the cross-correlation matrix of time series, C = XXT, where X represents a rectangular matrix of N rows and L columns and XT represents the transverse matrix of X. Because C is symmetric, namely, C = CT, it can be converted to a diagonal matrix of eigenvalues by a similarity transformation SCS-1 = SCST using an orthogonal matrix S. When N is significantly large, the histogram of the eigenvalue distribution can be compared to the theoretical formula derived in the context of the random matrix theory (RMT, in abbreviation).
Then the RMT-PCA applied to high-frequency stock prices in Japanese and American markets is dealt with. This approach proves its effectiveness in extracting "trendy" business sectors of the financial market over the prescribed time scale. In this case, X consists of N stock- prices of length L, and the correlation matrix C is an N by N square matrix, whose element at the i-th row and j-th column is the inner product of the price time series of the length L of the i-th stock and the j-th stock of the equal length L.
Next, the RMT-test is applied to measure randomness of various random number generators, including algorithmically generated random numbers and physically generated random numbers.
The book concludes by demonstrating two applications of the RMT-test: (1) a comparison of hash functions, and (2) stock prediction by means of randomness, including a new index of off-randomness related to market decline.
✦ Table of Contents
Preface
Contents
1 Big Data Analysis with RMT
2 Formulation of RMT-PCA
2.1 From Data to Rectangular Matrix
2.2 Correlation Matrices and Their Properties
2.3 Eigenvalues of a Correlation Matrix
2.4 Eigenvalue Distribution and the RMT Formula
2.5 RMT-PCA: RMT-Oriented Principal Component Analysis
3 RMT-PCA for the Stock Markets
3.1 From Stock Prices to Log-Returns
3.2 The Methodology of the RMT-PCA
3.3 Annual Trends by Hourly Stock Price
3.4 Annual Trends of Major Sectors on NYSE
3.5 Quarterly Trends of Tokyo Market
4 The RMT-Tests
4.1 Motivation
4.2 Formulation: Basic Formulas
4.3 Qualitative Version
4.4 Quantitative Version with Moments
4.5 Highly Random Data
4.6 Less Random Data: Measuring the Randomness by λ= λ1 - λ+
4.7 Comparison to NIST
5 Applications of the RMT-Test
5.1 Hash Functions, MD-5 and SHA-1
5.2 Discovering Safe Investment Issues Based on Randomness
5.3 Randomness as a Market Indicator
6 Conclusion
A Introduction to Vector, Inner Product, Correlation Matrix
B Jacobi's Rotation Algorithm and Program for the RMT-PCA
C Program for the RMT-test
D RMT-test Applied on TOPIX Index Time Series in 2011.1–2012.5
E RMT-test Applied on TOPIXcore30 Index Time Series in 2014
Bibliography
📜 SIMILAR VOLUMES
<p><p>This book is a timely collection of chapters that present the state of the art within the analysis and application of big data. Working within the broader context of big data, this text focuses on the hot topics of social network modelling and analysis such as online dating recommendations, hi
Systematic treatment of the commonly employed crossed and nested classification models used in analysis of variance designs with a detailed and thorough discussion of certain random effects models not commonly found in texts at the introductory or intermediate level. It also includes numerical examp
Introduction One-way Classification Two-way Crossed Classification without Interaction Two-way Crossed Classification with Interaction Three-way and Higher-Order Crossed Classifications Two-way Nested Classification Three-way and Higher-Order Nested Classifications General Balanced Random Ef
<p><P>Analysis of variance (ANOVA) models have become widely used tools and play a fundamental role in much of the application of statistics today. In particular, ANOVA models involving random effects have found widespread application to experimental design in a variety of fields requiring measureme