𝔖 Scriptorium
✦   LIBER   ✦

πŸ“

Unstructured Data Analysis: Entity Resolution and Regular Expressions in SAS

✍ Scribed by Windham, Matthew


Publisher
SAS Institute
Year
2018
Tongue
English
Leaves
166
Category
Library

⬇  Acquire This Volume

No coin nor oath required. For personal study only.

✦ Synopsis


Unstructured data is the most voluminous form of data in the world, and several elements are critical for any advanced analytics practitioner leveraging SAS software to effectively address the challenge of deriving value from that data. This book covers the five critical elements of entity extraction, unstructured data, entity resolution, entity network mapping and analysis, and entity management. By following  Read more...

✦ Table of Contents


Intro
Contents
About This Book
Software Used to Develop the Book's Content
Example Code and Data
SAS University Edition
Acknowledgments
Chapter 1: Getting Started with Regular Expressions
1.1.1 Defining Regular Expressions
1.1.2 Motivational Examples
1.1.3 RegEx Essentials
1.1.4 RegEx Test Code
1.3.1 Wildcard
1.3.2 Word
1.3.3 Non-word
1.3.4 Tab
1.3.5 Whitespace
1.3.6 Non-whitespace
1.3.7 Digit
1.3.8 Non-digit
1.3.9 Newline
1.3.10 Bell
1.3.11 Control Character
1.3.12 Octal
1.3.13 Hexadecimal
1.4.1 List
1.4.2 Not List
1.4.3 Range
1.5.1 Case Modifiers 1.5.2 Repetition Modifiers1.6.1 Ignore Case
1.6.2 Single Line
1.6.3 Multiline
1.6.4 Compile Once
1.6.5 Substitution Operator
1.7.1 Start of Line
1.7.2 End of Line
1.7.3 Word Boundary
1.7.4 Non-word Boundary
1.7.5 String Start
Chapter 2: Using Regular Expressions in SAS
2.1.1 Capture Buffer
2.2.1 PRXPARSE
2.2.2 PRXMATCH
2.2.3 PRXCHANGE
2.2.4 PRXPOSN
2.2.5 PRXPAREN
2.3.1 CALL PRXCHANGE
2.3.2 CALL PRXPOSN
2.3.3 CALL PRXSUBSTR
2.3.4 CALL PRXNEXT
2.3.5 CALL PRXDEBUG
2.3.6 CALL PRXFREE
2.4.1 Data Cleansing and Standardization
2.4.2 Information Extraction 2.4.3 Search and ReplacementChapter 3: Entity Resolution Analytics
3.3.1 Entity Extraction
3.3.2 Extract, Transform, and Load
3.3.3 Entity Resolution
3.3.4 Entity Network Mapping and Analysis
3.3.5 Entity Management
3.4.1 Establish Clear Goals
3.4.2 Verify Proper Data Inventory
3.4.3 Create SMART Objectives
Chapter 4: Entity Extraction
4.3.1 Webpage
4.3.2 File System
4.4.1 Social Security Number
4.4.2 Phone Number
4.4.3 Address
4.4.4 Website
4.4.5 Corporation Name
Chapter 5: Extract, Transform, Load
5.2.1 PROC CONTENTS
5.2.2 PROC FREQ
5.2.3 PROC MEANS 5.4.1 Hexadecimal to Decimal5.4.2 Working with Dates
5.6.1 Quantile Binning
5.6.2 Bucket Binning
Chapter 6: Entity Resolution
6.1.1 Exact Matching
6.1.2 Fuzzy Matching
6.1.3 Error Handling
6.2.1 INDEX=
6.3.1 COMPGED and COMPLEV
6.3.2 SOUNDEX
6.3.3 Putting Things Together
Chapter 7: Entity Network Mapping and Analysis
7.2.1 Shared Entity Attributes
7.2.2 Entity Interactions
7.3.1 Articulation Points and Biconnected Components
7.3.2 Minimum Spanning Trees
7.3.3 Clique Detection
7.3.4 Minimum Cut
7.3.5 Shortest Paths
Chapter 8: Entity Management Appendix A: Additional ResourcesA.2.1 Non-Printing Characters
A.2.2 Printing Characters
A.4.1 Random PII Generator
A.4.2 Output


πŸ“œ SIMILAR VOLUMES


Unstructured Data Analysis: Entity Resol
✍ Matthew Windham πŸ“‚ Library πŸ“… 2018 πŸ› SAS Institute 🌐 English

Unstructured data is the most voluminous form of data in the world, and several elements are critical for any advanced analytics practitioner leveraging SAS software to effectively address the challenge of deriving value from that data. This book covers the five critical elements of entity extractio

Modeling Data Irregularities and Structu
✍ Zhu J., Cook W.D. πŸ“‚ Library πŸ“… 2007 🌐 English

In a relatively short period of time, Data Envelopment Analysis (DEA) has grown into a powerful quantitative, analytical tool for measuring and evaluating performance. It has been successfully applied to a whole variety of problems in many different contexts worldwide. The analysis of an array of th

Modeling Data Irregularities and Structu
✍ Wade D. Cook, Joe Zhu (auth.), Joe Zhu, Wade D. Cook (eds.) πŸ“‚ Library πŸ“… 2007 πŸ› Springer US 🌐 English

<p><P>In a relatively short period of time, Data Envelopment Analysis (DEA) has grown into a powerful quantitative, analytical tool for measuring and evaluating performance. It has been successfully applied to a whole variety of problems in many different contexts worldwide. The analysis of an array

Introduction to Regular Expressions in S
✍ Matthew Windham πŸ“‚ Library πŸ“… 2014 πŸ› SAS Institute 🌐 English

Unstructured data is the most voluminous form of data in the world, and analysts rarely receive it in perfect condition for processing. In other words, you often need to clean, transform, and enhance your source data before you can use and derive value from it―especially where textual data is concer

Introduction to Regular Expressions in S
✍ Matthew Windham πŸ“‚ Library πŸ“… 2014 πŸ› SAS Institute 🌐 English

Unstructured data is the most voluminous form of data in the world, and analysts rarely receive it in perfect condition for processing. In other words, you often need to clean, transform, and enhance your source data before you can use and derive value from it―especially where textual data is concer

Introduction to Regular Expressions in S
✍ Windham K.M. πŸ“‚ Library 🌐 English

SAS Institute, 2014. β€” 120 p. β€” ISBN: 1612909043, 9781612909042<div class="bb-sep"></div>Unstructured data is the most voluminous form of data in the world, and analysts rarely receive it in perfect condition for processing. In other words, you often need to clean, transform, and enhance your source