<p><span>Supercharge your data with the limitless potential of Neo4j 5, the premier graph database for cutting-edge machine learning</span></p><p><span>Purchase of the print or Kindle book includes a free PDF eBook</span></p><h4><span>Key Features</span></h4><ul><li><span><span>Extract meaningful in
Graph Data Science with Neo4j: Learn how to use Neo4j 5 with Graph Data Science library 2.0 and its Python driver for your project
β Scribed by Estelle Scifo
- Publisher
- Packt Publishing
- Year
- 2023
- Tongue
- English
- Leaves
- 289
- Edition
- 1
- Category
- Library
No coin nor oath required. For personal study only.
β¦ Synopsis
Supercharge your data with the limitless potential of Neo4j 5, the premier graph database for cutting-edge machine learning
Key Features:
β’ Extract meaningful information from graph data with Neo4j's latest version 5
β’ Use Graph Algorithms into a regular Machine Learning pipeline in Python
β’ Learn the core principles of the Graph Data Science Library to make predictions and create data science pipelines.
Book Description:
Neo4j, along with its Graph Data Science (GDS) library, is a complete solution to store, query, and analyze graph data. As graph databases are getting more popular among developers, data scientists are likely to face such databases in their career, making it an indispensable skill to work with graph algorithms for extracting context information and improving the overall model prediction performance.
Data scientists working with Python will be able to put their knowledge to work with this practical guide to Neo4j and the GDS library that offers step-by-step explanations of essential concepts and practical instructions for implementing data science techniques on graph data using the latest Neo4j version 5 and its associated libraries. You'll start by querying Neo4j with Cypher and learn how to characterize graph datasets. As you get the hang of running graph algorithms on graph data stored into Neo4j, you'll understand the new and advanced capabilities of the GDS library that enable you to make predictions and write data science pipelines. Using the newly released GDSL Python driver, you'll be able to integrate graph algorithms into your ML pipeline.
By the end of this book, you'll be able to take advantage of the relationships in your dataset to improve your current model and make other types of elaborate predictions.
What You Will Learn:
β’ Use the Cypher query language to query graph databases such as Neo4j
β’ Build graph datasets from your own data and public knowledge graphs
β’ Make graph-specific predictions such as link prediction
β’ Explore the latest version of Neo4j to build a graph data science pipeline
β’ Run a scikit-learn prediction algorithm with graph data
β’ Train a predictive embedding algorithm in GDS and manage the model store
Who this book is for:
If you're a data scientist or data professional with a foundation in the basics of Neo4j and are now ready to understand how to build advanced analytics solutions, you'll find this graph data science book useful. Familiarity with the major components of a data science project in Python and Neo4j is necessary to follow the concepts covered in this book.
β¦ Table of Contents
Cover
Copyright
Contributors
Table of Contents
Preface
Part 1 β Creating Graph Data in Neo4j
Chapter 1: Introducing and Installing Neo4j
Technical requirements
What is a graph database?
Databases
Graph database
Finding or creating a graph database
A note about the graph datasetβs format
Modeling your data as a graph
Neo4j in the graph databases landscape
Neo4j ecosystem
Setting up Neo4j
Downloading and starting Neo4j Desktop
Creating our first Neo4j database
Creating a database in the cloud β Neo4j Aura
Inserting data into Neo4j with Cypher, the Neo4j query language
Extracting data from Neo4j with Cypher pattern matching
Summary
Further reading
Exercises
Chapter 2: Importing Data into Neo4j to Build a Knowledge Graph
Technical requirements
Importing CSV data into Neo4j with Cypher
Discovering the Netflix dataset
Defining the graph schema
Importing data
Introducing the APOC library to deal with JSON data
Browsing the dataset
Getting to know and installing the APOC plugin
Loading data
Dealing with temporal data
Discovering the Wikidata public knowledge graph
Data format
Query language β SPARQL
Enriching our graph with Wikidata information
Loading data into Neo4j for one person
Importing data for all people
Dealing with spatial data in Neo4j
Importing data in the cloud
Summary
Further reading
Exercises
Part 2 β Exploring and Characterizing Graph Data with Neo4j
Chapter 3: Characterizing a Graph Dataset
Technical requirements
Characterizing a graph from its node and edge properties
Link direction
Link weight
Node type
Computing the graph degree distribution
Definition of a nodeβs degree
Computing the node degree with Cypher
Visualizing the degree distribution with NeoDash
Installing and using the Neo4j Python driver
Counting node labels and relationship types in Python
Building the degree distribution of a graph
Improved degree distribution
Learning about other characterizing metrics
Triangle count
Clustering coefficient
Summary
Further reading
Exercises
Chapter 4: Using Graph Algorithms to Characterize a Graph Dataset
Technical requirements
Digging into the Neo4j GDS library
GDS content
Installing the GDS library with Neo4j Desktop
GDS project workflow
Projecting a graph for use by GDS
Native projections
Cypher projections
Computing a nodeβs degree with GDS
stream mode
The YIELD keyword
write mode
mutate mode
Algorithm configuration
Other centrality metrics
Understanding a graphβs structure by looking for communities
Number of components
Modularity and the Louvain algorithm
Summary
Further reading
Chapter 5: Visualizing Graph Data
Technical requirements
The complexity of graph data visualization
Physical networks
General case
Visualizing a small graph with networkx and matplotlib
Visualizing a graph with known coordinates
Visualizing a graph with unknown coordinates
Configuring object display
Discovering the Neo4j Bloom graph application
What is Bloom?
Bloom installation
Selecting data with Neo4j Bloom
Configuring the scene in Bloom
Visualizing large graphs with Gephi
Installing Gephi and its required plugin
Using APOC Extended to synchronize Neo4j and Gephi
Configuring the view in Gephi
Summary
Further reading
Exercises
Part 3 β Making Predictions on a Graph
Chapter 6: Building a Machine Learning Model with Graph Features
Technical requirements
Introducing the GDS Python client
GDS Python principles
Input and output types
Creating a projected graph from Python
Running GDS algorithms from Python and extracting data in a dataframe
write mode
stream mode
Dropping the projected graph
Using features from graph algorithms in a scikit-learn pipeline
Machine learning tasks with graphs
Our task
Computing features
Extracting and visualizing data
Building the model
Summary
Further reading
Exercise
Chapter 7: Automatically Extracting Features with Graph Embeddings for Machine Learning
Technical requirements
Introducing graph embedding algorithms
Defining embeddings
Graph embedding classification
Using a transductive graph embedding algorithm
Understanding the Node2Vec algorithm
Using Node2Vec with GDS
Training an inductive embedding algorithm
Understanding GraphSAGE
Introducing the GDS model catalog
Training GraphSAGE with GDS
Computing new node representations
Summary
Further reading
Exercises
Chapter 8: Building a GDS Pipeline for Node Classification Model Training
Technical requirements
The GDS pipelines
What is a pipeline?
Building and training a pipeline
Creating the pipeline and choosing the features
Setting the pipeline configuration
Training the pipeline
Making predictions
Computing the confusion matrix
Using embedding features
Choosing the graph embedding algorithm to use
Training using Node2Vec
Training using GraphSAGE
Summary
Further reading
Exercise
Chapter 9: Predicting Future Edges
Technical requirements
Introducing the LP problem
LP examples
LP with the Netflix dataset
Framing an LP problem
LP features
Topological features
Features based on node properties
Building an LP pipeline with the GDS
Creating and configuring the pipeline
Pipeline training and testing
Summary
Further reading
Chapter 10: Writing Your Custom Graph Algorithms with the Pregel API in Java
Technical requirements
Introducing the Pregel API
GDSβs features
The Pregel API
Implementing the PageRank algorithm
The PageRank algorithm
Simple Python implementation
Pregel Java implementation
Implementing the tolerance-stopping criteria
Testing our code
Test for the PageRank class
Test for the PageRankTol class
Using our algorithm from Cypher
Adding annotations
Building the JAR file
Updating the Neo4j configuration
Testing our procedure
Summary
Further reading
Exercises
Index
Other Books You May Enjoy
β¦ Subjects
Machine Learning; Data Science; Python; Java; Data Visualization; Pipelines; Graph Data Model; Cypher; Neo4j; Gephi; Graph Algorithms; Feature Extraction; PageRank; Knowledge Graphs; Graph Data Science
π SIMILAR VOLUMES
Graph Data Science with Python and Neo4j is your ultimate guide to unleashing the potential of graph data science by blending Python's robust capabilities with Neo4j's innovative graph database technology. From fundamental concepts to advanced analytics and machine learning techniques, you'll learn
Practical methods for analyzing your data with graphs, revealing hidden connections and new insights. Graphs are the natural way to represent and understand connected data. This book explores the most important algorithms and techniques for graphs in data science, with concrete advice on implemen
Graph Algorithms for Data Science teaches you how to construct graphs from both structured and unstructured data. You'll learn how the flexible Cypher query language can be used to easily manipulate graph structures, and extract amazing insights. Graph Algorithms for Data Science is a hands-on guide
Connectivity is the single most pervasive characteristic of todayβs networks and systems. From protein interactions to social networks, from communication systems to power grids, and from retail experiences to supply chains, networks with even a modest degree of complexity arenβt random, which means