𝔖 Scriptorium
✦   LIBER   ✦

📁

The Data Detective’s Toolkit: Cutting-Edge Techniques and SAS® Macros to Clean, Prepare, and Manage Data

✍ Scribed by Kim Chantala


Publisher
SAS Institute Inc.
Year
2020
Tongue
English
Leaves
216
Edition
1
Category
Library

⬇  Acquire This Volume

No coin nor oath required. For personal study only.

✦ Table of Contents


Contents
About This Book
What Does This Book Cover?
Is This Book for You?
What Are the Prerequisites for This Book?
What Should You Know about the Examples?
Software Used to Develop the Book's Content
Example Code and Data
Output and Graphics
We Want to Hear from You
About The Author
Acknowledgements
Chapter 1: Advantages of Using the Data Detective's Toolkit
Introduction
An Overview of the Data Detective’s Toolkit
Table 1-1: List of Macro Programs in the Data Detective’s Toolkit
%TK_codebook
%TK_inventory
%TK_xwalk
%TK_find_dups
%TK_harmony
%TK_skip_edit
%TK_max_length
Summary
Chapter 2: The Data Detective's Toolkit and SAS
Introduction
Preparing Your SAS Data Set
Types of Metadata
Using SAS to add Metadata to Your Data Set
Example 2-1: Adding Metadata to your SAS data set
Program 2-1: Program to Add Formats and Labels to a SAS Data Set
Specifications for Creating Formats and Labels
Create Formats (Step 1)
SAS Date Formats
Add Label to Data Set (Step 2)
Assign Formats (Step 3)
Variable Labels (Step 4)
Fundamental SAS Macro Concepts
What is the Macro Language?
Table 2-1: SAS Macro Date Variables
Table 2-2: SAS Macro Language Statements Used in This Book
Table 2-3: SAS Options for Macro Processing
Using the Data Detective’s Toolkit Macro Programs
Example 2-2: Fixing Truncation of Data Values When Combining Data Sets
Output 2-1: Output to Examine Variables in Web Data
Output 2-2: Output to Examine Variables in the Teleform Data Set
Output 2-3: Listing Showing Truncated Values of Variable State
Program 2-2: Using %TK_max_length to Prevent Truncating Data Values When Combining Data Sets
Macro Syntax
Required Keyword Parameters
Output 2-4: Listing Showing the Correct Values of Variable State
Inside the Toolkit: %TK_max_length Macro
Program 2-3: The %TK_max_length and %get_length Macro Program Source Code
The Output Delivery System
Table 2-4: Partial List of ODS Destinations
Summary
Chapter 3: Codebooks: A Roadmap to Your Data
Introduction
Understanding Codebooks
Codebook 3-1: Example Codebook
Using the %TK_codebook Macro
Syntax
Required Keywords
Optional Keywords
A Word of Caution When Using Excel to Create Your Codebook
Ordering Variables in Codebook
Output Data Set
Table 3-1: Variables in Output Data Set Used to Create Codebook
Example 3-1: Create a Codebook with Potential Problem Reports
Program 3-1: Create Formats for Data Set
Program 3-2: Prepare Data Set with Formats and Labels
Program 3-3: Create a Codebook with the %TK_codebook Macro
Interpreting the Codebook
Codebook 3-2: Codebook with Problems to Illustrate the Potential Problem Reports
Understanding the Potential Problem Reports
Incomplete Format Report
Report 3-1: Incomplete Format Report
Out of Range Value Report
Report 3-2: Out of Range Value Report
No Variation in Response Report
Report 3-3: No Variation in Response Report
Variables with No Assigned User Format
Report 3-4: Character Variables Not Assigned Report
Report 3-5: Numeric Variables Not Assigned A User-Defined Format Report
Undefined Variable Label Report
Report 3-6: Undefined Variable Label Report
Inside the Toolkit: %TK_codebook
Summary
Chapter 4: Customizing Codebooks
Introduction
Example 4-1: Embellishing Titles
Table 4-1: Options for Embellishing Titles and Footnotes
Program 4-1: Embellish Titles
Codebook 4-1: Codebook Created Using Options in TITLE and FOOTNOTE Statements
Example 4-2: Add a Logo to Your Codebook
Program 4-2: Add Logo to Codebook
Codebook 4-2: Codebook with Logo Inserted in Title
Example 4-3: Codebook Output Data Set and Default Design
Program 4-3: Using an Output Data Set to Create a Codebook
Table 4-2: Variables in the Output Data Set Obtained from %TK_codebook
Codebook 4-3: Codebook Created Using an Ouput Data Set from the %TK_codebook Macro
Understanding the Default Codebook Template
Program 4-4: Style template Used by %TK_codebook
Formatting Your Codebook with the Default Codebook Design
Program 4-5: SAS Statements to Create Codebook
Example 4-4: Create a Custom Design for Your Codebook
Program 4-6: Creating a Custom Codebook with an Output Data Set from %TK_codebook
Modifying the Default Codebook Template
Program 4-7: Template for New Design of Codebook
Updating the Design of Your Codebook
Program 4-8: SAS Statements for New Design of Codebook
Codebook 4-4: New Design for Codebook Created with Output Data Set from %TK_codebook.
Summary
Chapter 5: Catalog Your Data
Introduction
Using the %TK_inventory Macro
Syntax
Arguments
Output Data Set
Table 5-1: Variables in Output Data Set Used to Create Inventory of Data Sets
Example 5-1: Create an Inventory of Data Sets
Program 5-1: Create an Inventory of Data Sets
Inventory 5-1: Current Inventory of SAS Data Sets in a Folder
Inside the Toolkit: %TK_inventory
Using the %TK_xwalk Macro
Syntax
Arguments
Example 5-2: Creating a Crosswalk
Program 5-2: Create a Crosswalk for Multiple Data Sets
Crosswalk 5-2: Crosswalk Comparing Variables in Multiple Data Sets
Inside the Toolkit: %TK_xwalk
Summary
Chapter 6: Detecting and Correcting Data Errors
Introduction
Harmonizing Data Sets: Using the %TK_harmony Macro
Syntax
Required Arguments
Optional Arguments
Output Data Set
Table 6-1: Variables Included in Optional Output Data Set from %TK_Harmony
Example 6-1: Harmonizing Two Data Sets
Program 6-1: SAS Program to Find Differences in Attributes of Variables in Two Data Sets
Summary 6-1: Summary Showing Harmony of Data Sets for Study A, Collection 1 and 2
Details 6-1: Differences in Attributes of Data Sets for Study A, Collection 1 and 2
Program 6-2: SAS Program to Combine Data Sets for Study A, Collection 1 and 2
Inside the Toolkit: How %TK_harmony Works
Finding Duplicates: Using the %TK_find_dups Macro
Syntax
Required Arguments
Optional Arguments
Example 6-2: Identifying Duplicates Based on Multiple Variables
Program 6-3: Example Program to Identify Duplicates Based on a Subset of Variables
Summary 6-2: Findings from the %TK_find_dups Examination of the Data Set STUDY
Details 6-2: Details Showing Actual Values of CASEID*WAVE for Duplicate Observations
Duplicates 6-1: Duplicate Records Found in Data Set
Inside the Toolkit: How %TK_find_dups Works
Summary
Chapter 7: Inspect and Edit Flow through Skip Patterns
Introduction
Understanding Skip Patterns
Identifying Skip Patterns in a Survey
Survey 7-1a: Part 1: Demographics
Survey 7-1b: Part 2: Tobacco Use
Survey 7-1c: Part 3: Pregnancy History
Traditional Method of Auditing Skip Patterns
Program 7-1: Examine Skip Pattern
Crosstab 7-1: Visual Inspection Evaluating TOB5 Skip Pattern
Program 7-2: Edit Data Values Using Special Missing Values
Crosstab 7-2: Data after Editing to Correct Inconsistent Flow Through Skip Path for Variable TOB5
Example 7-1: Using the %TK_skip_edit Macro
Syntax
Required Arguments
Optional Arguments
Tally Results Data Set
Table 7-1: Variables in Output Data Set Created Using %LET TALLY_RESULTS = My_Results
Skip Formats
How Skip Path Logic Is Implemented by %TK_skip_edit
A Blueprint to Using %TK_skip_edit
Example 7-2: Automated Method of Checking Skip Patterns
Program 7-3: Check and Edit Data with Skip Patterns
Examining the Tally Report
Crosstab 7-3: Number of Edits for Tobacco Variables
Crosstab 7-4: Number of Edits for Pregnancy Variables
Examining the Edits Reported in the Crosstab Tables
TOB2 Results
Crosstab 7-5: TOB2 Edited with %TK_skip_edit
TOB3 Results
Crosstab 7-6: TOB3 Edited with %TK_skip_edit
TOB4 Results
Crosstab 7-7: TOB4 Edited with %TK_skip_edit
TOB5 Results
Crosstab 7-8: TOB5 Edited with %TK_skip_edit
PG1 Results
Crosstab 7-9: PG1 Edited with %TK_skip_edit
PG2 Results
Crosstab 7-10: PG2 Edited with %TK_skip_edit
PG3 Results
Crosstab 7-11: PG3 (Version 1) Edited with %TK_skip_edit
Crosstab 7-12: PG3 ((Version>1) Edited with %TK_skip_edit
PG4 Results
Crosstab 7-13: PG4 Edited with %TK_skip_edit
PG5 Results
Crosstab 7-14: PG5 Edited with %TK_skip_edit
Inside the Toolkit: How %TK_skip_edit Works
Program 7-4a: Default Values for Parameters and Status Formats
Program 7-4b: Create Macro Variables to Guide Editing Process
Program 7-4c: Initialize Variables to Capture Information to Inspect the Skip Pattern
Program 7-4d: Determine Flow through Survey from Values of Skip Variables
Program 7-4e: Determine Status of Flow through Skip Pattern for Variable Being Checked
Program 7-4f: Print Crosstab Tables Summarizing Status of Flow Through Skip Pattern
Program 7-4g: Add Summary Information to the Tally Results File
Program 7-4h: Drop Temporary Variables from Edited Data Set
Summary
Chapter 8: Create and Validate New Variables
Introduction
Coding Variables
Coding Missing Values
Using Formats to Recode Data Values
Example 8-1: Using Formats to Recode Data Values
Program 8-1: Using PROC FORMAT to Recode Variables.
Crosstab 8-1: Comparison of New and Original Values of Recoded Variables
Caution About Recoding Missing SAS Date Values
Program 8-2: Illustrating Unexpected Results Using Numeric Missing Value Codes for SAS Date Values
Listing 8-1: Numeric Missing Value Codes Interpreted as SAS Date Values
Easy Ways to Check Variable Construction
Table 8-1: SAS Procedures to Examine the Relationship Between Variables
Example 8-2: Checking Indicator Variables Created from Ordinal Variables
Program 8-3: Creating Indicator Variables for Variable HEALTH from Exact Specifications
Program 8-4: Checking How Variable HEALTH was Created
Crosstab 8-2: Indicator Variables Created from HEALTH Variable Using Exact Specifications
Program 8-5: Correctly Handle Missing Value for Variable HEALTH
Crosstab 8-3: Corrected Indicator Variables Calculated from Variable HEALTH
Example 8-3: Checking Categorical Variables Created from Continuous Variables
Program 8-6: Checking Categorical Variables Created from Continuous Variables
Crosstab 8-4: Check Variable TESTS_TAKEN = Total Number of Tests Taken (Score Present)
Means 8-1: Check Variable PASS_STRENGTH (Pass if Score 60 or Higher)
Means 8-2: Check Variable PASS_ENDURANCE (Pass if Score 60 or Higher)
Means 8-3: Check Variable PASS_CARDIOVASCULAR (Pass if Score 60 or Higher)
Crosstab 8-5: Check Variable TOTAL_PASSED for Participants Who Took All 3 Fitness Tests
Crosstab 8-6: Check HCFT_SCORE=. if 1+ Tests Failed, Has Value if a Member Passes All Three Tests
Regression 8-1: Check HCFT_SCORE = Sum of Tests if All Tests Passed
Summary
Appendix A: Your Part in the Data LIfe Cycle
Introduction
Understanding the Data Life Cycle
Figure A-1: Stages in the Data Life Cycle
Stage 1: Define Project
Stage 2: Plan Data Management
Stage 3: Acquire Data
Stage 4: Prepare Data
Stage 5: Analyze Data
Stage 6: Publish Results
Stage 7: Preserve Publication Data
Stage 8: Share Data
Stage 9: Archive Project
Summary
Appendix B: Skip Pattern Data Codebook
Introduction
SAS Program to Create Codebook
Program B-1: Create Codebook for the Skip Pattern Data Set Created in Chapter 7
Codebook B-1: Codebook for the Cleaned Skip Pattern Data Set from Chapter 7
Appendix C: Research Data Codebook
Introduction
SAS Program to Create Codebook
Program C-1: Codebook for Data Set Used in Chapter 8 Examples
Codebook C-1: Codebook for the Research Data Set Used in Chapter 8
Additional Resources
Index
A
B
C
D
E
F
G
H
I
J
K
L
M
N
O
P
Q
R
S
T
U
V
X


📜 SIMILAR VOLUMES


The Data WarehouseETL Toolkit: Practical
✍ Ralph Kimball, Joe Caserta 📂 Library 📅 2004 🏛 Wiley 🌐 English

<ul><li>Cowritten by Ralph Kimball, the world's leading data warehousing authority, whose previous books have sold more than 150,000 copies<li>Delivers real-world solutions for the most time- and labor-intensive portion of data warehousing-data staging, or the extract, transform, load (ETL) process<

The Data Warehouse ETL Toolkit : Practic
✍ Ralph Kimball 📂 Library 📅 2004 🌐 English

Cowritten by Ralph Kimball, the world's leading data warehousing authority, whose previous books have sold more than 150,000 copiesDelivers real-world solutions for the most time- and labor-intensive portion of data warehousing-data staging, or the extract, transform, load (ETL) processDelineates be

Cody’s Data Cleaning Techniques Using SA
✍ Ron Cody 📂 Library 📅 2017 🏛 SAS Institute 🌐 English

<p><b>Find errors and clean up data easily using SAS!</p></b> <br /><br /><p>Thoroughly updated, <i>Cody's Data Cleaning Techniques Using SAS, Third Edition</i>, addresses tasks that nearly every data analyst needs to do - that is, make sure that data errors are located and corrected. Written in Ron

Cody's Data Cleaning Techniques Using SA
✍ Ron Cody 📂 Library 📅 2017 🏛 SAS Institute 🌐 English

Written in Ron Cody's signature informal, tutorial style, this book develops and demonstrates data cleaning programs and macros that you can use as written or modify which will make your job of data cleaning easier, faster, and more efficient. --