Julia for Data Analysis

✍ Scribed by Bogumil Kaminski

Publisher: Manning
Year: 2023
Tongue: English
Leaves: 474
Edition: 1
Category: Library

No coin nor oath required. For personal study only.

✦ Synopsis

Master core data analysis skills using Julia. Interesting hands-on projects guide you through time series data, predictive models, popularity ranking, and more.

In Julia for Data Analysis you will learn how to:
• Read and write data in various formats
• Work with tabular data, including subsetting, grouping, and transforming
• Visualize your data
• Build predictive models
• Create data processing pipelines
• Create web services sharing results of data analysis
• Write readable and efficient Julia programs

Julia was designed for the unique needs of data scientists: it's expressive and easy-to-use whilst also delivering super-fast code execution. Julia for Data Analysis shows you how to take full advantage of this amazing language to read, write, transform, analyze, and visualize data—everything you need for an effective data pipeline. It’s written by Bogumil Kaminski, one of the top contributors to Julia, #1 Julia answerer on StackOverflow, and a lead developer of Julia’s core data package DataFrames.jl. Its engaging hands-on projects get you into the action quickly. Plus, you’ll even be able to turn your new Julia skills to general purpose programming!

Foreword by Viral Shah.

About the technology
Julia is a great language for data analysis. It’s easy to learn, fast, and it works well for everything from one-off calculations to full-on data processing pipelines. Whether you’re looking for a better way to crunch everyday business data or you’re just starting your data science journey, learning Julia will give you a valuable skill.

About the book
Julia for Data Analysis teaches you how to handle core data analysis tasks with the Julia programming language. You’ll start by reviewing language fundamentals as you practice techniques for data transformation, visualizations, and more. Then, you’ll master essential data analysis skills through engaging examples like examining currency exchange, interpreting time series data, and even exploring chess puzzles. Along the way, you’ll learn to easily transfer existing data pipelines to Julia.

What's inside
• Read and write data in various formats
• Work with tabular data, including subsetting, grouping, and transforming
• Create data processing pipelines
• Create web services sharing results of data analysis
• Write readable and efficient Julia programs

About the reader
For data scientists familiar with Python or R. No experience with Julia required.

About the author
Bogumil Kaminski iis one of the lead developers of DataFrames.jl—the core package for data manipulation in the Julia ecosystem. He has over 20 years of experience delivering data science projects.

✦ Table of Contents

Julia for Data Analysis
brief contents
contents
foreword
preface
acknowledgments
about this book
Who should read this book
How this book is organized: A roadmap
About the code
liveBook discussion forum
Other online resources
about the author
about the cover illustration
1 Introduction
1.1 What is Julia and why is it useful?
1.2 Key features of Julia from a data scientist’s perspective
1.2.1 Julia is fast because it is a compiled language
1.2.2 Julia provides full support for interactive workflows
1.2.3 Julia programs are highly reusable and easy to compose together
1.2.4 Julia has a built-in state-of-the-art package manager
1.2.5 It is easy to integrate existing code with Julia
1.3 Usage scenarios of tools presented in the book
1.4 Julia’s drawbacks
1.5 What data analysis skills will you learn?
1.6 How can Julia be used for data analysis?
Summary
Part 1 Essential Julia skills
2 Getting started with Julia
2.1 Representing values
2.2 Defining variables
2.3 Using the most important control-flow constructs
2.3.1 Computations depending on a Boolean condition
2.3.2 Loops
2.3.3 Compound expressions
2.3.4 A first approach to calculating the winsorized mean
2.4 Defining functions
2.4.1 Defining functions using the function keyword
2.4.2 Positional and keyword arguments of functions
2.4.3 Rules for passing arguments to functions
2.4.4 Short syntax for defining simple functions
2.4.5 Anonymous functions
2.4.6 Do blocks
2.4.7 Function-naming convention in Julia
2.4.8 A simplified definition of a function computing the winsorized mean
2.5 Understanding variable scoping rules
Summary
3 Julia’s support for scaling projects
3.1 Understanding Julia’s type system
3.1.1 A single function in Julia may have multiple methods
3.1.2 Types in Julia are arranged in a hierarchy
3.1.3 Finding all supertypes of a type
3.1.4 Finding all subtypes of a type
3.1.5 Union of types
3.1.6 Deciding what type restrictions to put in method signature
3.2 Using multiple dispatch in Julia
3.2.1 Rules for defining methods of a function
3.2.2 Method ambiguity problem
3.2.3 Improved implementation of winsorized mean
3.3 Working with packages and modules
3.3.1 What is a module in Julia?
3.3.2 How can packages be used in Julia?
3.3.3 Using StatsBase.jl to compute the winsorized mean
3.4 Using macros
Summary
4 Working with collections in Julia
4.1 Working with arrays
4.1.1 Getting the data into a matrix
4.1.2 Computing basic statistics of the data stored in a matrix
4.1.3 Indexing into arrays
4.1.4 Performance considerations of copying vs. making a view
4.1.5 Calculating correlations between variables
4.1.6 Fitting a linear regression
4.1.7 Plotting the Anscombe’s quartet data
4.2 Mapping key-value pairs with dictionaries
4.3 Structuring your data by using named tuples
4.3.1 Defining named tuples and accessing their contents
4.3.2 Analyzing Anscombe’s quartet data stored in a named tuple
4.3.3 Understanding composite types and mutability of values in Julia
Summary
5 Advanced topics on handling collections
5.1 Vectorizing your code using broadcasting
5.1.1 Understanding syntax and meaning of broadcasting in Julia
5.1.2 Expanding length-1 dimensions in broadcasting
5.1.3 Protecting collections from being broadcasted over
5.1.4 Analyzing Anscombe’s quartet data using broadcasting
5.2 Defining methods with parametric types
5.2.1 Most collection types in Julia are parametric
5.2.2 Rules for subtyping of parametric types
5.2.3 Using subtyping rules to define the covariance function
5.3 Integrating with Python
5.3.1 Preparing data for dimensionality reduction using t-SNE
5.3.2 Calling Python from Julia
5.3.3 Visualizing the results of the t-SNE algorithm
Summary
6 Working with strings
6.1 Getting and inspecting the data
6.1.1 Downloading files from the web
6.1.2 Using common techniques of string construction
6.1.3 Reading the contents of a file
6.2 Splitting strings
6.3 Using regular expressions to work with strings
6.3.1 Working with regular expressions
6.3.2 Writing a parser of a single line of movies.dat file
6.4 Extracting a subset from a string with indexing
6.4.1 UTF-8 encoding of strings in Julia
6.4.2 Character vs. byte indexing of strings
6.4.3 ASCII strings
6.4.4 The Char type
6.5 Analyzing genre frequency in movies.dat
6.5.1 Finding common movie genres
6.5.2 Understanding genre popularity evolution over the years
6.6 Introducing symbols
6.6.1 Creating symbols
6.6.2 Using symbols
6.7 Using fixed-width string types to improve performance
6.7.1 Available fixed-width strings
6.7.2 Performance of fixed-width strings
6.8 Compressing vectors of strings with PooledArrays.jl
6.8.1 Creating a file containing flower names
6.8.2 Reading in the data to a vector and compressing it
6.8.3 Understanding the internal design of PooledArray
6.9 Choosing appropriate storage for collections of strings
Summary
7 Handling time-series data and missing values
7.1 Understanding the NBP Web API
7.1.1 Getting the data via a web browser
7.1.2 Getting the data by using Julia
7.1.3 Handling cases when an NBP Web API query fails
7.2 Working with missing data in Julia
7.2.1 Definition of the missing value
7.2.2 Working with missing values
7.3 Getting time-series data from the NBP Web API
7.3.1 Working with dates
7.3.2 Fetching data from the NBP Web API for a range of dates
7.4 Analyzing data fetched from the NBP Web API
7.4.1 Computing summary statistics
7.4.2 Finding which days of the week have the most missing values
7.4.3 Plotting the PLN/USD exchange rate
Summary
Part 2 Toolbox for data analysis
8 First steps with data frames
8.1 Fetching, unpacking, and inspecting the data
8.1.1 Downloading the file from the web
8.1.2 Working with bzip2 archives
8.1.3 Inspecting the CSV file
8.2 Loading the data to a data frame
8.2.1 Reading a CSV file into a data frame
8.2.2 Inspecting the contents of a data frame
8.2.3 Saving a data frame to a CSV file
8.3 Getting a column out of a data frame
8.3.1 Understanding the data frame’s storage model
8.3.2 Treating a data frame column as a property
8.3.3 Getting a column by using data frame indexing
8.3.4 Visualizing data stored in columns of a data frame
8.4 Reading and writing data frames using different formats
8.4.1 Apache Arrow
8.4.2 SQLite
Summary
9 Getting data from a data frame
9.1 Advanced data frame indexing
9.1.1 Getting a reduced puzzles data frame
9.1.2 Overview of allowed column selectors
9.1.3 Overview of allowed row-subsetting values
9.1.4 Making views of data frame objects
9.2 Analyzing the relationship between puzzle difficulty and popularity
9.2.1 Calculating mean puzzle popularity by its rating
9.2.2 Fitting LOESS regression
Summary
10 Creating data frame objects
10.1 Reviewing the most important ways to create a data frame
10.1.1 Creating a data frame from a matrix
10.1.2 Creating a data frame from vectors
10.1.3 Creating a data frame using a Tables.jl interface
10.1.4 Plotting a correlation matrix of data stored in a data frame
10.2 Creating data frames incrementally
10.2.1 Vertically concatenating data frames
10.2.2 Appending a table to a data frame
10.2.3 Adding a new row to an existing data frame
10.2.4 Storing simulation results in a data frame
Summary
11 Converting and grouping data frames
11.1 Converting a data frame to other value types
11.1.1 Conversion to a matrix
11.1.2 Conversion to a named tuple of vectors
11.1.3 Other common conversions
11.2 Grouping data frame objects
11.2.1 Preparing the source data frame
11.2.2 Grouping a data frame
11.2.3 Getting group keys of a grouped data frame
11.2.4 Indexing a grouped data frame with a single value
11.2.5 Comparing performance of indexing methods
11.2.6 Indexing a grouped data frame with multiple values
11.2.7 Iterating a grouped data frame
Summary
12 Mutating and transforming data frames
12.1 Getting and loading the GitHub developers data set
12.1.1 Understanding graphs
12.1.2 Fetching GitHub developer data from the web
12.1.3 Implementing a function that extracts data from a ZIP file
12.1.4 Reading the GitHub developer data into a data frame
12.2 Computing additional node features
12.2.1 Creating a SimpleGraph object
12.2.2 Computing features of nodes by using the Graphs.jl package
12.2.3 Counting a node’s web and machine learning neighbors
12.3 Using the split-apply-combine approach to predict the developer’s type
12.3.1 Computing summary statistics of web and machine learning developer features
12.3.2 Visualizing the relationship between the number of web and machine learning neighbors of a node
12.3.3 Fitting a logistic regression model predicting developer type
12.4 Reviewing data frame mutation operations
12.4.1 Performing low-level API operations
12.4.2 Using the insertcols! function to mutate a data frame
Summary
13 Advanced transformations of data frames
13.1 Getting and preprocessing the police stop data set
13.1.1 Loading all required packages
13.1.2 Introducing the @chain macro
13.1.3 Getting the police stop data set
13.1.4 Comparing functions that perform operations on columns
13.1.5 Using short forms of operation specification syntax
13.2 Investigating the violation column
13.2.1 Finding the most frequent violations
13.2.2 Vectorizing functions by using the ByRow wrapper
13.2.3 Flattening data frames
13.2.4 Using convenience syntax to get the number of rows of a data frame
13.2.5 Sorting data frames
13.2.6 Using advanced functionalities of DataFramesMeta.jl
13.3 Preparing data for making predictions
13.3.1 Performing initial transformation of the data
13.3.2 Working with categorical data
13.3.3 Joining data frames
13.3.4 Reshaping data frames
13.3.5 Dropping rows of a data frame that hold missing values
13.4 Building a predictive model of arrest probability
13.4.1 Splitting the data into train and test data sets
13.4.2 Fitting a logistic regression model
13.4.3 Evaluating the quality of a model’s predictions
13.5 Reviewing functionalities provided by DataFrames.jl
Summary
14 Creating web services for sharing data analysis results
14.1 Pricing financial options by using a Monte Carlo simulation
14.1.1 Calculating the payoff of an Asian option definition
14.1.2 Computing the value of an Asian option
14.1.3 Understanding GBM
14.1.4 Using a numerical approach to computing the Asian option value
14.2 Implementing the option pricing simulator
14.2.1 Starting Julia with multiple-thread support
14.2.2 Computing the option payoff for a single sample of stock prices
14.2.3 Computing the option value
14.3 Creating a web service serving the Asian option valuation
14.3.1 A general approach to building a web service
14.3.2 Creating a web service using Genie.jl
14.3.3 Running the web service
14.4 Using the Asian option pricing web service
14.4.1 Sending a single request to the web service
14.4.2 Collecting responses to multiple requests from a web service in a data frame
14.4.3 Unnesting a column of a data frame
14.4.4 Plotting the results of Asian option pricing
Summary
appendix A First steps with Julia
A.1 Installing and setting up Julia
A.2 Getting help in and about Julia
A.3 Managing packages in Julia
A.3.1 Project environments
A.3.2 Activating project environments
A.3.3 Potential issues with installing packages
A.3.4 Managing packages
A.3.5 Setting up integration with Python
A.3.6 Setting up integration with R
A.4 Reviewing standard ways to work with Julia
A.4.1 Using a terminal
A.4.2 Using Visual Studio Code
A.4.3 Using Jupyter Notebook
A.4.4 Using Pluto notebooks
appendix B Solutions to exercises
appendix C Julia packages for data science
C.1 Plotting ecosystems in Julia
C.2 Scaling computing with Julia
C.3 Working with databases and data storage formats
C.4 Using data science methods
Summary
index
Symbols
Numerics
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
W
Z
Julia for Data Analysis - back

✦ Subjects

Data Analysis; Programming; Packages; Time Series Analysis; Julia; Type Systems

📜 SIMILAR VOLUMES

Julia for Data Analysis

📁 Julia for Data Analysis

✍ Bogumil Kaminski 📂 Library 📅 2022 🏛 Manning 🌐 English

Master core data analysis skills using Julia. Interesting hands-on projects guide you through time series data, predictive models, popularity ranking, and more. In Julia for Data Analysis you will learn how to: Read and write data in vario

Julia for Data Analysis

📁 Julia for Data Analysis

✍ Bogumil Kaminski 📂 Library 📅 2022 🏛 Manning 🌐 English

Julia for Data Analysis

📁 Julia for Data Analysis

✍ Bogumil Kaminski 📂 Library 📅 2022 🏛 Manning 🌐 English

Julia for Data Analysis Version 7

📁 Julia for Data Analysis Version 7

✍ Bogumił Kamiński 📂 Library 📅 2022 🏛 Manning Publications 🌐 English

Data Analyst: Careers in data analysis

📁 Data Analyst: Careers in data analysis

✍ Harish Gulati; Charles Joseph; Rune Rasmussen; Clare Stanier; Obi Umegbolu 📂 Library 📅 2019 🏛 BCS, The Chartered Institute for IT 🌐 English

Data is constantly increasing and data analysts are in higher demand than ever. This book is an essential guide to the role of data analyst. Aspiring data analysts will discover what data analysts do all day, what skills they will need for the role, and what regulations they will be required to adhe

Julia for Data Science

📁 Julia for Data Science

✍ Zacharias Voulgaris PhD [Zacharias Voulgaris PhD] 📂 Library 📅 2016 🏛 Technics Publications 🌐 English

Master how to use the Julia language to solve business critical data science challenges. After covering the importance of Julia to the data science community and several essential data science principles, we start with the basics including how to install Julia and its powerful libraries. Man