Inference Control in Statistical Databases: From Theory to Practice (Lecture Notes in Computer Science, 2316)

✍ Scribed by Josep Domingo-Ferrer (editor)

Publisher: Springer
Year: 2002
Tongue: English
Leaves: 238
Edition: 2002
Category: Library

No coin nor oath required. For personal study only.

✦ Synopsis

Inference control in statistical databases, also known as statistical disclosure limitation or statistical confidentiality, is about finding tradeoffs to the tension between the increasing societal need for accurate statistical data and the legal and ethical obligation to protect privacy of individuals and enterprises which are the source of data for producing statistics. Techniques used by intruders to make inferences compromising privacy increasingly draw on data mining, record linkage, knowledge discovery, and data analysis and thus statistical inference control becomes an integral part of computer science.
This coherent state-of-the-art survey presents some of the most recent work in the field. The papers presented together with an introduction are organized in topical sections on tabular data protection, microdata protection, and software and user case studies.

✦ Table of Contents

419YYTSBM8L
00front-matter
Lecture Notes in Computer Science
Springer
Inference Control in Statistical Databases
Preface
Table of Contents
Tabular Data Protection
Microdata Protection
Software and User Case Studies
01
Advances in Inference Control in Statistical Databases: An Overview
Introduction
Tabular Data Protection
Microdata Protection
Software and User Case Studies
Related Literature and Information Sources
Acknowledgments
References
02
Cell Suppression: Experience and Theory
References
03
Bounds on Entries in 3-Dimensional Contingency Tables Subject to Given Marginal Totals
1 Introduction
2 The F-Bounds
3 The Bounding Methods of Fienberg and Buzzigoli-Giusti
3.1 The Procedure of Fienberg
3.2 The Shuttle Algorithm of Buzzigoli-Giusti
3.3 Comparative Analysis of Fienberg, Shuttle, and F-Bounding Methods
3.4 Limitations of All Three Procedures
4 The Roehrig and Chowdhury Network Models
5 Fractional Extremal Points
6 Discussion
Disclaimer
References
04
Extending Cell Suppression to Protect Tabular Data against Several Attackers
Introduction
Classical Cell Suppression Methodology
Multi-attacker Cell Suppression Methodology
Mathematical Models
An Algorithm for the Second ILP Model
An Example for the Second ILP Model
A Relaxation of the General Problem
Relation between the Three Methodologies
Conclusion
Acknowledgment
References
Appendix: Input Data Format for CSPLIB Instances
05
Network Flows Heuristics for Complementary Cell Suppression: An Empirical Evaluation and Extensions
Introduction
Formulation of CSP
Network Flows Heuristics
General Framework
First Heuristic
Second Heuristic
Computational Comparison
Extensions of Network Flows Models
Three-Dimensional Tables
Linked and Hierarchical Tables
Preliminary Computational Results
Conclusions
References
06
HiTaS: A Heuristic Approach to Cell Suppression in Hierarchical Tables
Introduction
Hierarchical Cell Suppression
Examples of Output
Remarks and Future Developments
References
Appendix A: Tables to Be Checked in Example of Section
07
Model Based Disclosure Protection
Introduction
A Unified Framework for Model Based Protection
Nonparametric Protection Methods
Semiparametric Protection Methods
Parametric Models
A Regression Model Approach for Protection of Business Microdata: Application to the CIS Survey
The Microdata
The Protection Model
Comments on the Protection Method
Concluding Remarks and Further Research
Acknowledgements
References
08
Microdata Protection through Noise Addition
Introduction
Description of Algorithms
Adding Noise
Masking by Adding Noise and Linear Transformations
Masking by Adding Noise and Nonlinear Transformation
Empirical Results
Level of Protection
Summary
References
09
Sensitive Micro Data Protection Using Latin Hypercube Sampling Technique
Introduction
LHS and the Restricted Pairing Algorithm
Iterative Refinement
Choice of Cumulative Distribution Function
Treatment of Identifiable Subpopulations
Size of the Synthetic Data Set
An Illustrative Example^2
Real Life Implementation of LHS-Based Technique
References
10
Integrating File and Record Level Disclosure Risk Assessment
Introduction
Data Intrusion Simulation (DIS)
The Special Method
DIS: The General Method
Extending DIS to the Record Level
Numerical Demonstration
The Special Uniques Method
Numerical Study
Integrating Levels of Risk Analysis
References
11
Disclosure Risk Assessment in Perturbative Microdata Protection
Introduction
Data Files
Domingo-Ferrer and Mateo-Sanz
Kim-Winkler
Methods
Rank Swapping
Additive Noise
Mixtures of Additive Noise
Re-identification
Information-Loss and Scoring Metric
Results
Domingo Data Statistics
Kim-Winkler Data Statistics
Discussion
Concluding Remarks
References
Appendix: Additive Mixture Noise Methodology
12
LHS-Based Hybrid Microdata vs Rank Swapping and Microaggregation for Numeric Microdata Protection
Introduction
Natural Masking Methods
Synthetic and Hybrid Masking Methods
LHS Synthetic Masking
Hybrid Masking Methods
Metrics for Method Comparison
Same Number of Original and Masked Records
Different Number of Original and Masked Records
Computational Results
Conclusions
References
13
Post-Masking Optimization of the Tradeoff between Information Loss and Disclosure Risk in Masked Microdata Sets
Introduction
A Score for Method Comparison
Post-Masking Optimization
The Model
A Heuristic Optimization Procedure
Computational Results
Conclusions and Extensions
References
14
The CASC Project
1 Introduction
2 CASC-Partners
3 ARGUS Software Development
3.1 Software Concepts

4 Methodology Research for Microdata
4.1 Introduction
4.2 Methodology for Business Microdata
4.3 Measurement of Risk and Information Loss
5 Methodology for Tabular Data
5.1 Introduction
5.2 Main Objectives in Tabular Data Research
6 Testing
7 Conclusion
15
Tools and Strategies to Protect Multiple Tables with the GHQUAR Cell Suppression Engine
1 Introduction
2 The GHQUAR Algorithm for Secondary Cell Suppression
2.1 Disclosure Control Aspects
2.2 Information Loss Aspects
2.3 Control Options
3 Applying GHQUAR to Overlapping Tables: The GHMITER Algorithm for Table-to-Table Protection
4 Co-ordination of Suppression Patterns: The Support Routine POOLAC
4.1 Facing the Problem of User Driven Table Production
4.2 Specialized Facilities to Support the Co-ordination of Suppression Patterns
5 Graphic User Interfaces for GHQUAR

5.2 Enhancing First User Experience: Teh Eurostat GUI
6 Final Remarks
Acknowledgements
References
Appendix
16
SDC in the 2000 U.S. Decennial Census
1 Introduction
2 Disclosure Limitation for the 1990 Census
2.1 Procedure for the 100% (Short Form) Data
2.2 Procedure for the Sample (Long Form) Data
3 Why Should the Procedures 1990 Be Changed?
3.1 Main Improvement: Targeting the Most "Risky" Records
3.2 Multiple Race Issues
3.3 American FactFinder (AFF)
4 The Procedure for the 100% Data
5 The Procedure for the Sample Data in Tabular Form
6 The Procedure for the Sample Data in Microdata Form
7 The Procedure for American FactFinder
7.1 The Query Filter
7.2 The Results Filter
8 Conclusion
References
17
Applications of Statistical Disclosure Control at Statistics Netherlands
1 Introduction

4 Working On-Site in a Secure Area within Statistics Netherlands
5 Examples of Official Statistics
6 Discussion
References
18
Empirical Evidences on Protecting Population Uniqueness at Idescat
Introduction
Approaches to the Macrodata and Microdata Statistical Control
Random Perturbations by Compensation in Macrodata on Population Census
Use of Privacy Homomorphisms for Macro/Microdata Statistical Confidentiality
Re-identification Model for the Release of Microdata (Sub)samples
Advances and Evidences in the Sure Processing of Microdata
Empirical Framework
Recoding of Variables
The Identification Level and Threshold
Empirical Evidences: The Distribution of the Information Loss
Number of Suppressions
Distribution of the Suppressions by Variables
Increase of Suppressions in Dimension and Threshold Values Changing
Disclosure Risk Decreasing
Homogeneity of the Variables Distribution
Global Assessments
References
back-matter
Author Index

📜 SIMILAR VOLUMES

Inference Control in Statistical Databas

📁 Inference Control in Statistical Databases: From Theory to Practice

✍ Josep Domingo-Ferrer (auth.), Josep Domingo-Ferrer (eds.) 📂 Library 📅 2002 🏛 Springer-Verlag Berlin Heidelberg 🌐 English

Inference control in statistical databases, also known as statistical disclosure limitation or statistical confidentiality, is about finding tradeoffs to the tension between the increasing societal need for accurate statistical data and the legal and ethical obligation to protect privacy of indiv

Inference Control in Statistical Databas

📁 Inference Control in Statistical Databases: From Theory to Practice

✍ Josep Domingo-Ferrer (auth.), Josep Domingo-Ferrer (eds.) 📂 Library 📅 2002 🏛 Springer-Verlag Berlin Heidelberg 🌐 English

SOFSEM 2024: Theory and Practice of Comp

📁 SOFSEM 2024: Theory and Practice of Computer Science (Lecture Notes in Computer Science)

✍ Henning Fernau (editor), Serge Gaspers (editor), Ralf Klasing (editor) 📂 Library 📅 2024 🏛 Springer 🌐 English

This book constitutes the proceedings of the 49th International Conference on Current Trends in Theory and Practice of Computer Science, SOFSEM 2024, held in Cochem, Germany, in February 2024. The 33 full papers presented in this book were carefully reviewed and selected from 81 submission

[Lecture Notes in Computer Science] Data

📁 [Lecture Notes in Computer Science] Database Theory — ICDT 2001 Volume 1973 || Constraint-Based Clustering in Large Databases

✍ Van den Bussche, Jan; Vianu, Victor 📂 Library 📅 2001 🏛 Springer Berlin Heidelberg

SOFSEM 2020: Theory and Practice of Comp

📁 SOFSEM 2020: Theory and Practice of Computer Science (Lecture Notes in Computer Science, 12011)

✍ Alexander Chatzigeorgiou (editor), Riccardo Dondi (editor), Herodotos Herodotou 📂 Library 📅 2020 🏛 Springer 🌐 English

This book constitutes the refereed proceedings of the 46th International Conference on Current Trends in Theory and Practice of Informatics, SOFSEM 2020, held in Limassol, Cyprus, in January 2020. The 40 full papers presented together with 17 short papers and 3 invited papers were care

Lectures on Data Security: Modern Crypto

📁 Lectures on Data Security: Modern Cryptology in Theory and Practice (Lecture Notes in Computer Science, 1561)

✍ Ivan Damgard (editor) 📂 Library 📅 1999 🏛 Springer 🌐 English

In July 1998, a summer school in cryptology and data security was organized atthecomputersciencedepartmentofAarhusUniversity, Denmark.Thistook place as a part of a series of summer schools organized by the European Edu- tional Forum, an organizationconsisting of the researchcenters TUCS (Finla