Hands-On Data Science with SQL Server 2017

✍ Scribed by Marek Chmel, Vladimir Muzny

Publisher: Packt Publishing
Year: 2018
Tongue: English
Leaves: 494
Category: Library

⬇ Acquire This Volume

No coin nor oath required. For personal study only.

✦ Synopsis

Find, explore, and extract big data to transform into actionable insights

Key Features

Perform end-to-end data analysis―from exploration to visualization
Real-world examples, tasks, and interview queries to be a proficient data scientist
Understand how SQL is used for big data processing using HiveQL and SparkSQL

Book Description

SQL Server is a relational database management system that enables you to cover end-to-end data science processes using various inbuilt services and features.

Hands-On Data Science with SQL Server 2017 starts with an overview of data science with SQL to understand the core tasks in data science. You will learn intermediate-to-advanced level concepts to perform analytical tasks on data using SQL Server. The book has a unique approach, covering best practices, tasks, and challenges to test your abilities at the end of each chapter. You will explore the ins and outs of performing various key tasks such as data collection, cleaning, manipulation, aggregations, and filtering techniques. As you make your way through the chapters, you will turn raw data into actionable insights by wrangling and extracting data from databases using T-SQL. You will get to grips with preparing and presenting data in a meaningful way, using Power BI to reveal hidden patterns. In the concluding chapters, you will work with SQL Server integration services to transform data into a useful format and delve into advanced examples covering machine learning concepts such as predictive analytics using real-world examples.

By the end of this book, you will be in a position to handle the growing amounts of data and perform everyday activities that a data science professional performs.

What you will learn

Understand what data science is and how SQL Server is used for big data processing
Analyze incoming data with SQL queries and visualizations
Create, train, and evaluate predictive models
Make predictions using trained models and establish regular retraining courses
Incorporate data source querying into SQL Server
Enhance built-in T-SQL capabilities using SQLCLR
Visualize data with Reporting Services, Power View, and Power BI
Transform data with R, Python, and Azure

Who this book is for

Hands-On Data Science with SQL Server 2017 is intended for data scientists, data analysts, and big data professionals who want to master their skills learning SQL and its applications. This book will be helpful even for beginners who want to build their career as data science professionals using the power of SQL Server 2017. Basic familiarity with SQL language will aid with understanding the concepts covered in this book.

Data Science Overview
SQL Server 2017 as a Data Science Platform
Data Sources for Analytics
Data Transforming and Cleaning with T-SQL
Data Exploration and Statistics with T-SQL
Custom Aggregations on SQL Server
Data Visualization
Data Transformations with Other Tools
Predictive Model Training and Evaluation
Making Predictions
Getting It All Together - A Real-World Example
Next Steps with Data Science and SQL

✦ Table of Contents

Cover
Title Page
Copyright and Credits
About Packt
Contributors
Table of Contents
Preface
Chapter 1: Data Science Overview
Introducing data science
Data science project life cycle
Business understanding
Getting data
Modelling and analysis
Deployment and visualization
Final acceptance
Data science domains
Math and statistics
Visualizing the types of data
Statistics 101
Central tendency
Skewness
Variability
Machine learning
SQL Server and machine learning
Choosing the right algorithm
Big data
SQL Server and big data
Summary
Chapter 2: SQL Server 2017 as a Data Science Platform
Technical requirements
SQL Server evolution
What's available in the pack?
History of SQL Server
SQL Server in the cloud
Azure SQL Database
Azure SQL Data Warehouse
SQL Server Services and their use with data science
SQL Server Integration Services
SQL Server Analysis Services
Tabular Mode
Multidimensional mode
PowerPivot Mode
Querying languages
Reporting Services
Development tools for Reporting Services
Power BI Report Server
Machine Learning Services
Summary
Chapter 3: Data Sources for Analytics
Technical requirements
Getting data from databases
Importing data from SQL Server
Importing data from other database systems
Importing flat files
Working with XML data
Working with JSON
Retrieve data as JSON
Processing stored JSON data
External data with PolyBase
Installing and configuring
Summary
Chapter 4: Data Transforming and Cleaning with T-SQL
Technical requirements
The need for data transformation
Database architectures for data transformations
Direct source for data analysis
Staging–target scenario
Landing–staging–target scenario
Tools eligible for data movement
Distributed queries
SQL Server Integration Services
Why should we use SSIS?
What is needed to develop an SSIS solution?
Where should SSIS be used?
Is there an alternative to SSIS?
Transforming data
Full data load
Incremental data load
The MERGE statement
CHECKSUM
Temporal tables
Denormalizing data
Relational normalization
First normal form
Second normal form
Third normal form
Need for denormalization
Ways of denormalization
Computed columns
Denormalization using joins
Using views and stored procedures
Database applications
Using views
Using stored procedures
Performance considerations
Writing correct code
Using indexes
B-tree indexes
COLUMNSTORE INDEX
Summary
Questions
Chapter 5: Data Exploration and Statistics with T-SQL
Technical requirements
T-SQL aggregate queries
Common properties of aggregate functions
Aggregate functions
COUNT, COUNT(*), and COUNT_BIG
MIN and MAX
SUM
AVG
VAR and VARP
STDEV and STDEVP
Using groups
Using the HAVING clause
Ranking, framing, and windowing
Ranking functions
ROW_NUMBER
RANK
DENSE_RANK
NTILE
Running aggregates
Using aggregate functions in running aggregates
Using aggregate functions
Using the LEAD and LAG functions
Calculating with percentiles
The PERCENT_RANK and CUME_DIST functions
The PERCENTILE_CONT and PERCENTILE_DISC functions
Summary
Questions
Chapter 6: Custom Aggregations on SQL Server
Technical requirements
Overview of SQLCLR
Use cases of using SQLCLR
How to work with SQLCLR
Instance and database configurations to use with SQLCLR
Creating CLR aggregations
Example goal and assignment
Skeleton of CLR aggregation
Implementing methods
Implementing custom serialization
Implementing the Init method
Implementing the Accumulate method
Implementing the Merge method
Implementing the terminate method
Deployment and testing
Limitations and performance considerations
Development issues and risks
Maintenance issues and risks
Performance issues and risks
Summary
Questions
Chapter 7: Data Visualization
Technical requirements
Data visualization – preparation phase
Power BI Report Server
Starting with Power BI Desktop
Defining the data source
Adding visualizations to the Report
Visual interactions
Publishing reports
SQL Server Reporting Services
Adding charts to Reports
Using SQL Server Data Tools
Summary
Chapter 8: Data Transformations with Other Tools
Technical requirements
Categorization, missing values, and normalization
Categorization
Missing values
Normalization
Z-score
Feature-scaling
Using Integration Services for data transformation
Setting up a SSIS project
Categorizing the products
Using R for data transformation
Preparing client R environment
R Syntax first steps
Working example of Z-score computed in R
Using Data Factory for data transformation
Creating Azure Data Factory
Creating simple copy data with ADF
Summary
Questions
Chapter 9: Predictive Model Training and Evaluation
Technical requirements
Preparing SQL Server
Setting up and configuring ML services
Preparing to install our own R packages
Creating data structures
The concept of machine learning in databases
Creating physical data structures
Creating common objects
Creating objects using filestreams
Creating objects using temporal tables
Deploying, training, and evaluating a predictive model
Saving our machine learning model to filestreams
Saving a machine learning model to temporal tables
Summary
Questions
Chapter 10: Making Predictions
Technical requirements
Reading models from a database
Reading the model from a common table
Reading the model from a temporal table
Submitting values to an external script
Submitting values into the external script
Deserializing a predictive model
Making the prediction
Using the PREDICT keyword
Making the predictive model self-training
Re-calculating a predictive model regularly
Re-calculating a predictive model asynchronously
Creating a message type
Creating a contract
Creating queues and services
Sending a request to train a new model
Consuming requests and sending responses
Testing the asynchronous solution
Summary
Questions
Chapter 11: Getting It All Together - A Real-World Example
Technical requirements
Assignment and preparation
SQL Server
Data description
Data exploration
Exploring data using T-SQL
Exploring data using the SSIS Data Profiling Task
Exploring the SourceData.Actions table
Exploring data using R
Data transformation
Training and using predictive models for estimations
Preparing the schema for the model
Training the model
Using the rxLinMod function and finishing the model
Using the model in predictions
Summary
Questions
Chapter 12: Next Steps with Data Science and SQL
Data science next steps
Next steps with SQL Server
Big data clusters
Machine learning
Machine learning services on Linux
Machine learning high availability
Data science in the cloud
Summary
Other Books You May Enjoy
Index

📜 SIMILAR VOLUMES

Hands-On Data Science with SQL Server 20

📁 Hands-On Data Science with SQL Server 2017

✍ Marek Chmel, Vladimir Muzny 📂 Library 📅 2018 🏛 Packt Publishing 🌐 English

<p><span>Find, explore, and extract big data to transform into actionable insights </span></p><h4><span>Key Features</span></h4><ul><li><span><span>Perform end-to-end data analysis―from exploration to visualization </span></span></li><li><span><span>Real-world examples, tasks, and interview queries

Hands-On Data Science with SQL Server 20

📁 Hands-On Data Science with SQL Server 2017

✍ Marek Chmel, Vladimir Muzny 📂 Library 📅 2018 🏛 Packt Publishing 🌐 English

Data Science with Microsoft SQL Server 2

📁 Data Science with Microsoft SQL Server 2016

✍ Buck Woody, Danielle Dean, Debraj GuhaThakurta, Gagan Bansal, Matt Conners and W 📂 Library 📅 2016 🏛 Microsoft Press 🌐 English

R is one of the most popular, powerful data analytics languages and environments in use by data scientists. Actionable business data is often stored in Relational Database Management Systems (RDBMS), and one of the most widely used RDBMS is Microsoft SQL Server. Much more than a database server, it’

Data Science with SQL Server Quick Start

📁 Data Science with SQL Server Quick Start Guide

✍ Dejan Sarka [Dejan Sarka] 📂 Library 📅 2018 🏛 Packt Publishing 🌐 English

<span><p><b>Get unique insights from your data by combining the power of SQL Server, R and Python</b></p><h4>Key Features</h4><ul><li>Use the features of SQL Server 2017 to implement the data science project life cycle </li><li>Leverage the power of R and Python to design and develop efficient data

Microsoft SQL Server 2012 with Hadoop: I

📁 Microsoft SQL Server 2012 with Hadoop: Integrate data between Apache Hadoop and SQL Server 2012 and provide business intelligence on the heterogeneous data

✍ Debarchan Sarkar 📂 Library 📅 2013 🏛 Packt Publishing 🌐 English

With the explosion of data, the open source Apache Hadoop ecosystem is gaining traction, thanks to its huge ecosystem that has arisen around the core functionalities of its distributed file system (HDFS) and Map Reduce. As of today, being able to have SQL Server talking to Hadoop has become increasi

Data mining with SQL Server 2005

📁 Data mining with SQL Server 2005

✍ Zhaohui Tang; Jamie MacLennan 📂 Library 📅 2005 🏛 Wiley 🌐 English