sameer fakhoury
  • Home
  • CTF Writeups
  • Course Summaries
  • Cyber Reports
  • Articles
  • Event Notes
  • About Me
Data Anonymization Summary

Data Anonymization Summary

Data anonymization - Personal Data

  1. removing private or confidential information from raw data → results cannot be associated with any individual or company
  2. Protection identity, private activities - Financial aspect
  3. your organization → collect data ( raw data ) + Anonymization policy → data Anonymization → share, store internally - share, store with third parties and publics release ( cloud storage - research - software )
  4. Personal or identifiable data → Information that can lead to the identification of an individual- group of individuals
    • Direct identifiers → surname, email, phone number, id card
    • Indirect identifiers → Date of birth, gender, zip code → uniquely identify about 80% of the US population
    • Pseudonymous or encrypted data → used to re-identify a person and thus remains personal data
  5. Personal data → rendered anonymous → no longer identifiable → is no longer considered personal data
  6. Anonymization Data → must be irreversible

General Data Protection Regulation

  1. GDPR → General Data Protection Regulation
  2. regulation in EU law → on data protection and privacy → in the European Union and European Economic Area
  3. addresses the transfer of personal data outside the EU and EEA areas.
  4. GDPR sets out seven principles for the lawful processing of personal data
    1. Lawfulness, fairness and transparency → Processing must be lawful, fair, and transparent to the data subject.
    2. Purpose limitation → process data for the legitimate purposes specified to the data subject when you collected it.
    3. Data minimization → collect and process only as much data as absolutely necessary for the purposes specified.
    4. Accuracy → keep personal data accurate and up to date.
    5. Storage limitation → You may only store personally identifying data for as long as necessary for the specified purpose.
    6. Integrity and confidentiality → Processing must ensure appropriate security, integrity, and confidentiality → encryption
    7. Accountability → The data controller is responsible for being able to demonstrate GDPR compliance with all of these principles.

Sensitive Data - Structured vs unstructured data

  1. Can cause harm to the individual - fingerprints, biometric data
  2. Sensitive business information → Poses a risk to the company in question if discovered - trade secrets, acquisition plans
  3. Structured data → Stored in a structured way - Easily searchable - Relational databases, spreadsheets, JSON, XML, CSV
  4. Unstructured data → anything else - difficult to search - Text files, reports, emails
  1. Anonymization methods → suppression masking - swapping, generalization
  2. Pseudonymous → Reversible process use: key - treated as personal data because enables re-identification
  3. Measuring anonymization and risks → K-anonymity, Differential privacy → Focus on structured data
  4. Tools for structured data → ARX, Cornell Anonymization Toolkit
  5. Tools for unstructured data → MIST: MITRE Identification Scrubber Toolkit, Natural Language processing tools OpenNLPor

©sameer fakhoury

GitHubLinkedIn