Why a Privacy Glossary Matters

The privacy industry moves fast. New legislation, emerging threats, and evolving technology create a vocabulary that can feel impenetrable if you are not paying close attention. Understanding these terms is not academic — it directly affects your ability to protect yourself, exercise your legal rights, and make informed decisions about which services deserve your trust.

This glossary covers 50 essential terms organized into five categories. Whether you are filing your first data broker opt-out request or evaluating enterprise privacy tools, this is the reference you need.

---

Data Privacy Basics

1. Personally Identifiable Information (PII)

Any data that can identify a specific individual, either on its own or when combined with other information. Direct PII includes your full name, Social Security number, driver's license number, and passport number. Indirect PII includes your ZIP code, date of birth, or workplace — individually insufficient, but combined they can uniquely identify you. The distinction matters because data brokers build profiles by aggregating indirect PII from dozens of sources until they have a complete dossier. Under CCPA, PII includes any information that "identifies, relates to, describes, is reasonably capable of being associated with, or could reasonably be linked" to you or your household.

2. Data Broker

A company that collects, aggregates, and sells personal information about individuals without a direct relationship with those individuals. Data brokers source information from public records, commercial transactions, social media, loyalty programs, and other brokers. The industry generates an estimated $250 billion annually. There are over 4,000 data brokers operating in the United States alone, and GhostMyData scans and monitors 1,500+ of them. Brokers range from people-search sites like Spokeo and BeenVerified to enterprise data aggregators like Acxiom and Oracle Data Cloud.

3. Data Breach

An incident where protected or confidential information is accessed, disclosed, or stolen by an unauthorized party. Breaches can result from hacking, insider threats, accidental exposure, or physical theft. When a breach occurs, your exposed information often ends up on dark web marketplaces within hours. In 2025, there were over 3,200 publicly reported data breaches in the United States, exposing more than 1.5 billion records. The average cost to affected individuals includes credit monitoring, identity restoration, and lost time.

4. Opt-Out

The process of requesting that a company stop collecting, sharing, or selling your personal information. Under CCPA, California residents have the right to opt out of the sale or sharing of their personal information. Most data brokers are legally required to honor opt-out requests, though the process varies widely — some require email verification, others require mailed forms or identity verification. GhostMyData automates opt-out submissions across broker sites so you do not have to navigate each one individually.

5. Right to Be Forgotten

A legal principle originating in EU law (GDPR Article 17) that gives individuals the right to request the deletion of their personal data under certain conditions. In the United States, similar rights exist under CCPA (right to delete), Virginia's VCDPA, Colorado's CPA, and a growing list of state laws. The concept is straightforward: if a company has no legitimate reason to retain your data, you can demand they erase it. The challenge is enforcement — requesting deletion from one broker does nothing about the dozens of others holding copies.

6. Digital Footprint

The trail of data you leave behind when using the internet. Your active digital footprint includes things you intentionally share — social media posts, forum comments, online reviews. Your passive footprint includes data collected without your direct input — browsing history, location data, purchase records, IP addresses. Reducing your digital footprint is one of the most effective ways to limit what data brokers can aggregate about you.

7. Data Minimization

A privacy principle stating that organizations should collect only the minimum amount of personal data necessary for a specific purpose and retain it only as long as needed. This is a core requirement under GDPR and increasingly reflected in US state privacy laws. When a fitness app asks for your home address, that is a violation of data minimization. When a flashlight app requests access to your contacts, that is a violation of data minimization.

8. Privacy Policy

A legal document that discloses how an organization collects, uses, stores, shares, and protects personal information. Despite being legally required, privacy policies are notoriously long and difficult to read — the average policy takes 18 minutes to read. Key things to look for: what data is collected, who it is shared with, how long it is retained, and how to submit deletion requests. A vague or overly broad privacy policy is often a signal that the company monetizes user data.

9. Cookie

A small text file stored on your device by a website you visit. First-party cookies remember your preferences and login state. Third-party cookies — the ones that matter for privacy — are placed by advertisers and tracking companies to follow your activity across different websites. While Chrome finally joined Safari and Firefox in restricting third-party cookies, tracking companies have shifted to fingerprinting and server-side tracking to maintain surveillance capabilities.

10. Tracking Pixel

A tiny, invisible image (typically 1x1 pixel) embedded in an email or webpage that reports back to a server when loaded. Tracking pixels reveal when you opened an email, what device you used, your IP address, and your approximate location. They are used extensively in marketing emails and on websites. Most email clients now block remote image loading by default to counter tracking pixels, but many users override this setting for convenience.

---

Security Terms

11. VPN (Virtual Private Network)

A service that encrypts your internet traffic and routes it through a server in a different location, hiding your real IP address and making your online activity harder to monitor. A VPN protects against network-level surveillance — particularly on public Wi-Fi — but does not make you anonymous. Your VPN provider can still see your traffic, and websites can identify you through cookies, fingerprinting, and account logins. A VPN is one layer of privacy, not a complete solution.

12. Encryption

The process of converting readable data (plaintext) into an unreadable format (ciphertext) using a mathematical algorithm and a key. Only someone with the correct decryption key can convert it back. End-to-end encryption means that even the service provider cannot read your messages. AES-256 is the gold standard for data at rest; TLS 1.3 protects data in transit. Encryption is foundational to every secure system on the internet, from banking to messaging.

13. Two-Factor Authentication (2FA)

A security method that requires two different forms of identification before granting access to an account — typically something you know (password) and something you have (phone, hardware key). SMS-based 2FA is better than nothing but vulnerable to SIM swapping attacks. Authenticator apps (like Google Authenticator or Authy) are more secure. Hardware keys (like YubiKey) are the strongest option.

14. Phishing

A social engineering attack where an attacker impersonates a legitimate entity to trick you into revealing sensitive information, clicking a malicious link, or downloading malware. Phishing attacks have become increasingly sophisticated — AI-generated phishing emails are now nearly indistinguishable from legitimate communications. Spear phishing targets specific individuals using personal details often sourced from data brokers. The more a criminal knows about you, the more convincing the phishing attempt.

15. Social Engineering

The art of manipulating people into divulging confidential information or performing actions that compromise security. Social engineering exploits human psychology — trust, fear, urgency, authority — rather than technical vulnerabilities. Common forms include phishing, pretexting (creating a fabricated scenario), baiting (leaving infected USB drives), and tailgating (following authorized personnel into restricted areas). Data brokers provide the raw material that makes social engineering effective.

16. Ransomware

Malicious software that encrypts a victim's files or locks them out of their system, demanding payment (usually in cryptocurrency) for the decryption key. Ransomware attacks have shifted from targeting individuals to targeting hospitals, schools, and municipal governments where the cost of downtime exceeds the ransom. Double extortion — encrypting data and threatening to publish it — is now standard practice among ransomware groups.

17. Zero-Day Vulnerability

A software vulnerability that is unknown to the vendor and has no available patch. "Zero-day" refers to the fact that the vendor has had zero days to fix it. These vulnerabilities are extremely valuable on both legitimate (bug bounty) and black markets. State actors and sophisticated criminal organizations stockpile zero-days for targeted attacks. The average zero-day remains undetected for 7 years before discovery.

18. Firewall

A network security system that monitors and controls incoming and outgoing network traffic based on predetermined security rules. Firewalls can be hardware (a physical device between your network and the internet), software (a program running on your computer), or cloud-based. Modern firewalls inspect traffic at the application layer and can identify malicious patterns beyond simple port blocking. Your home router includes a basic firewall — make sure it is enabled and its firmware is updated.

19. Malware

Any software intentionally designed to cause damage to a computer, server, or network. Malware includes viruses, worms, trojans, ransomware, spyware, adware, and rootkits. Distribution methods include phishing emails, infected websites, malicious advertisements, and compromised software updates. Modern malware is polymorphic — it changes its code to evade detection by antivirus software.

20. Keylogger

A type of surveillance software (or hardware device) that records every keystroke you make. Keyloggers capture passwords, credit card numbers, messages, and anything else you type. Software keyloggers can be installed through malware or bundled with legitimate-looking applications. Hardware keyloggers are physical devices plugged between your keyboard and computer. Mobile keyloggers monitor touchscreen input patterns.

---

Legal and Regulatory Terms

21. CCPA (California Consumer Privacy Act)

California's landmark privacy law (effective January 2020, amended by CPRA in 2023) that gives California residents the right to know what personal information businesses collect about them, the right to delete that information, the right to opt out of its sale or sharing, and the right to non-discrimination for exercising these rights. CCPA applies to for-profit businesses that meet certain revenue or data-volume thresholds. It is the most influential state privacy law in the US and the basis for data deletion requests that GhostMyData files on your behalf.

22. GDPR (General Data Protection Regulation)

The European Union's comprehensive data protection regulation (effective May 2018) that sets strict rules for how organizations collect, store, and process personal data of EU residents. GDPR established the concepts of data protection by design, the right to erasure, data portability, and mandatory breach notification. Fines can reach 4% of global annual revenue. While GDPR does not directly apply to US residents, its principles have influenced every major US state privacy law. For a detailed comparison, see our CCPA vs GDPR guide.

23. COPPA (Children's Online Privacy Protection Act)

A US federal law that imposes requirements on websites and online services directed at children under 13, or that knowingly collect personal information from children under 13. COPPA requires parental consent before collecting children's data, limits data collection to what is necessary, and mandates reasonable data security practices. The FTC enforces COPPA and has imposed significant fines on violators including TikTok ($5.7 million in 2019). For parents, understanding COPPA is essential — see our children's online privacy guide.

24. HIPAA (Health Insurance Portability and Accountability Act)

A US federal law that establishes national standards for the protection of health information. HIPAA's Privacy Rule covers "protected health information" (PHI) held by covered entities (healthcare providers, insurers, clearinghouses) and their business associates. HIPAA does not cover health data collected by fitness apps, genetic testing companies, or period-tracking apps unless they have a direct relationship with a covered entity — a gap that leaves significant amounts of health data unprotected.

25. FERPA (Family Educational Rights and Privacy Act)

A US federal law that protects the privacy of student education records. FERPA gives parents rights over their children's education records until the child turns 18 or enters post-secondary education, at which point rights transfer to the student. Schools must have written permission to release student records (with exceptions for legitimate educational interests). FERPA's protections are increasingly important as schools adopt more educational technology that collects and processes student data.

26. Right to Erasure

The legal right to request that an organization delete your personal data. Under GDPR (Article 17), this right applies when the data is no longer necessary, you withdraw consent, the data was unlawfully processed, or there is no overriding legitimate interest. Under CCPA, California residents have a similar right to delete. The practical challenge is that exercising this right against every company holding your data requires contacting each one individually — or using a service like GhostMyData to do it at scale.

27. California DELETE Act (SB 362)

California's 2023 law that creates a centralized mechanism — the DROP (Data Request Online Portal) platform — for residents to submit a single deletion request that is forwarded to all registered data brokers. The law requires data brokers to register with the state and process deletion requests received through DROP within 31 business days. The DROP platform began accepting requests in 2026. While groundbreaking, DROP only covers brokers that have registered with California — GhostMyData covers both registered and unregistered brokers.

28. FTC Act Section 5

The Federal Trade Commission Act's prohibition against "unfair or deceptive acts or practices in or affecting commerce." This is the FTC's primary tool for privacy enforcement in the absence of a comprehensive federal privacy law. The FTC has used Section 5 to pursue companies that violate their own privacy policies, fail to secure consumer data, or engage in deceptive data collection practices. Section 5 actions have resulted in billions in penalties and consent decrees requiring companies to implement comprehensive privacy programs.

29. State Privacy Laws

A patchwork of state-level data privacy legislation that has emerged in the absence of a federal privacy law. As of 2026, 20+ states have enacted comprehensive privacy laws including California (CCPA/CPRA), Virginia (VCDPA), Colorado (CPA), Connecticut (CTDPA), Utah (UCPA), Iowa, Indiana, Tennessee, Montana, Texas, Oregon, Delaware, New Hampshire, New Jersey, Maryland, Minnesota, Nebraska, and Kentucky. Each law has different thresholds, rights, and enforcement mechanisms. GhostMyData's multi-state law selection automatically cites the strongest applicable law for each data deletion request.

30. Data Protection Officer (DPO)

A role required under GDPR for organizations that process large amounts of personal data or engage in systematic monitoring. The DPO is responsible for overseeing data protection strategy, ensuring compliance with privacy regulations, conducting data protection impact assessments, and serving as the contact point for data subjects and supervisory authorities. In the US, the equivalent role is often called Chief Privacy Officer (CPO), though it is not legally mandated at the federal level.

---

Advanced Concepts

31. Shadow Profile

A hidden profile that a platform or data aggregator builds about you even if you have never created an account with that service. Facebook (Meta) is the most well-known example — it collects data about non-users through tracking pixels on third-party websites, contact uploads by existing users, and data purchases from brokers. Shadow profiles make it nearly impossible to avoid a platform's data collection simply by not using it. This is one reason why removing your data from brokers matters even if you have never used a particular service.

32. Data Aggregation

The process of collecting data from multiple sources and combining it into a comprehensive profile. A single data point — like your name — is not particularly useful. But when aggregated with your address, phone number, employment history, purchasing habits, political affiliation, and health conditions, it becomes a detailed portrait. Data brokers are professional aggregators, and the value of aggregated data is exponentially greater than the sum of its parts. This is how scammers build comprehensive profiles for targeted fraud.

33. Device Fingerprinting

A tracking technique that identifies your device based on its unique combination of hardware and software characteristics — screen resolution, installed fonts, browser plugins, time zone, language settings, graphics card, and dozens of other attributes. Unlike cookies, fingerprints cannot be easily cleared because they are based on your device's configuration rather than stored files. Canvas fingerprinting, WebGL fingerprinting, and audio fingerprinting are the most common methods.

34. Facial Recognition

Biometric technology that identifies or verifies individuals by analyzing facial features from images or video. Facial recognition systems map facial geometry (distance between eyes, jawline shape, nose width) into a mathematical template. Major concerns include mass surveillance, racial bias in accuracy rates, and the difficulty of changing your face if the data is compromised. Several US cities have banned government use of facial recognition, but private sector use remains largely unregulated.

35. Biometric Data

Unique physical or behavioral characteristics used for identification — fingerprints, facial geometry, iris patterns, voiceprints, gait analysis, and typing patterns. Biometric data is particularly sensitive because, unlike a password, it cannot be changed if compromised. Illinois' BIPA (Biometric Information Privacy Act) is the strongest US biometric privacy law, requiring informed consent before collection and allowing private lawsuits for violations. Texas and Washington have similar but weaker laws.

36. Metadata

Data about data — information that describes the characteristics of a file, communication, or transaction without revealing its content. An email's metadata includes the sender, recipient, timestamp, subject line, and IP addresses — everything except the body text. A photo's metadata (EXIF data) can include GPS coordinates, camera model, and exact timestamp. Phone call metadata reveals who you called, when, and for how long. Intelligence agencies have acknowledged that metadata alone can reveal as much as content.

37. Geofencing

A technology that creates virtual geographic boundaries using GPS, Wi-Fi, cellular data, or RFID. When a device enters or exits a geofenced area, it triggers a programmed response — typically a targeted advertisement or data collection event. Geofencing is used by retailers, political campaigns, and data brokers to track foot traffic and target advertising. Some reproductive health clinics have been geofenced to identify visitors, raising serious concerns about health data privacy.

38. Dark Web

The portion of the internet that is not indexed by standard search engines and requires specialized software (typically the Tor browser) to access. The dark web hosts legitimate privacy-focused services alongside illegal marketplaces where stolen personal data, credentials, and financial information are bought and sold. After a data breach, your information often appears on dark web markets within hours. Monitoring dark web forums is essential for detecting compromised data early.

39. OSINT (Open Source Intelligence)

The practice of collecting and analyzing publicly available information to produce actionable intelligence. OSINT sources include social media profiles, public records, news articles, domain registrations, satellite imagery, and data broker sites. While OSINT has legitimate uses in journalism, law enforcement, and security research, it is also used by stalkers, doxxers, and social engineers. The existence of data brokers dramatically lowers the skill barrier for harmful OSINT — anyone can find your personal details for a few dollars.

40. Threat Modeling

A structured approach to identifying potential security threats and vulnerabilities, assessing their likelihood and impact, and determining appropriate countermeasures. For individuals, threat modeling means asking: what data do I need to protect, who might want it, how could they get it, and what would be the consequences? A journalist protecting sources has a different threat model than a parent protecting children — and both have different threat models than a domestic violence survivor protecting their location.

---

Emerging Topics

41. AI Training Data

The datasets used to train artificial intelligence and machine learning models. AI companies scrape vast quantities of text, images, and code from the internet — including personal information, social media posts, and content published without consent. The legal and ethical questions around AI training data are among the most contentious issues in technology today. Your data broker profile can be scraped and incorporated into an AI model's training set, which means your personal information may be embedded in systems you never consented to. Read our dedicated guide on opting out of AI training data.

42. Synthetic Identity Fraud

A sophisticated form of identity theft where criminals combine real and fabricated information to create entirely new identities. A synthetic identity might use a real Social Security number (often belonging to a child, elderly person, or deceased individual) with a fake name and address. These identities are then used to open credit accounts, make purchases, and commit financial fraud. Synthetic identity fraud is the fastest-growing type of financial crime in the US, costing an estimated $20 billion annually. The more personal data available on broker sites, the easier it is for criminals to construct synthetic identities.

43. Deepfake

AI-generated synthetic media — typically video or audio — that realistically depicts a person saying or doing something they never actually said or did. Deepfakes use deep learning techniques (specifically generative adversarial networks) to swap faces, clone voices, and fabricate entire scenes. While initially associated with celebrity face-swapping, deepfakes are now used in romance scams, CEO fraud (voice cloning for wire transfer instructions), political disinformation, and non-consensual intimate imagery. Detection tools exist but are engaged in a constant arms race with generation technology.

44. Neural Data

Information generated by brain-computer interfaces, neurotechnology devices, and EEG-based wearables that measures or interprets brain activity. As consumer neurotechnology advances — from meditation headbands to brain-controlled gaming interfaces — neural data is becoming a new frontier in privacy. Colorado became the first state to add neural data to its definition of "sensitive data" under its privacy law. The concern is that neural data could reveal thoughts, emotions, cognitive states, and neurological conditions — the most intimate form of personal information imaginable.

45. Data Scraping

The automated extraction of data from websites, typically using bots or scripts that collect information at scale. Data brokers extensively use scraping to harvest personal information from social media profiles, public records databases, forum posts, and corporate directories. While scraping publicly accessible information occupies a legal gray area (the 2022 hiQ v. LinkedIn Supreme Court case left key questions unresolved), many companies prohibit scraping in their terms of service. Web scraping is a primary mechanism by which your personal information propagates across the data broker ecosystem.

46. Consent Management

The systems and processes organizations use to collect, store, and manage user consent for data collection and processing. Consent management platforms (CMPs) display the cookie banners and preference centers you encounter on websites. Under GDPR, consent must be freely given, specific, informed, and unambiguous. The reality is that most consent management implementations use dark patterns — pre-checked boxes, confusing language, and asymmetric design — that undermine genuine informed consent.

47. Algorithmic Bias

Systematic errors in AI and machine learning systems that produce unfair outcomes for certain groups. Bias can enter algorithms through biased training data, flawed model design, or biased feature selection. In the privacy context, algorithmic bias affects credit scoring, insurance pricing, hiring decisions, criminal justice risk assessment, and content moderation. When data brokers sell inaccurate or outdated information to companies making automated decisions, the resulting bias compounds discrimination.

48. Privacy by Design

A framework developed by former Ontario Privacy Commissioner Ann Cavoukian that embeds privacy protections into the design and architecture of systems from the outset, rather than bolting them on as an afterthought. The seven foundational principles include proactive prevention, privacy as the default setting, privacy embedded into design, full functionality without trade-offs, end-to-end security, visibility and transparency, and respect for user privacy. GDPR codified privacy by design as a legal requirement (Article 25). It is the gold standard for how privacy-respecting technology should be built.

49. Zero-Knowledge Proof

A cryptographic method that allows one party to prove to another that a statement is true without revealing any information beyond the truth of the statement itself. For example, you could prove you are over 21 without revealing your date of birth, or prove you have sufficient funds without revealing your balance. Zero-knowledge proofs are a foundational technology for privacy-preserving authentication, anonymous credentials, and blockchain privacy. Their practical adoption is accelerating with implementations in cryptocurrency (Zcash), identity verification, and supply chain validation.

50. Homomorphic Encryption

A form of encryption that allows computations to be performed on encrypted data without decrypting it first. The result, when decrypted, matches what would have been produced by performing the same operations on the unencrypted data. This means a cloud service could analyze your data without ever seeing it in plaintext. While still computationally expensive, homomorphic encryption is advancing rapidly and could fundamentally change how sensitive data is processed — enabling privacy-preserving cloud computing, secure multi-party computation, and confidential AI inference.

---

Automate Your Privacy with GhostMyData

Understanding these terms is the first step. Taking action is the second. GhostMyData automates the hard part — scanning data broker sites, filing opt-out requests, and monitoring for re-listings.

Scan to see which brokers have your data
Automated removal requests filed on your behalf
Continuous monitoring catches data that reappears
CCPA deletion requests sent to enterprise data brokers

Start your free privacy scan and see exactly where your data is exposed.

Frequently Asked Questions

What is the most important privacy term to understand?

PII (Personally Identifiable Information) is the foundation of all privacy concepts. Understanding what qualifies as PII — and how data brokers aggregate fragments of PII into complete profiles — is essential for making informed decisions about your privacy. Once you understand PII, every other concept in this glossary becomes more concrete.

Do I need to understand privacy laws to protect myself?

No. You do not need to be a legal expert to exercise your privacy rights. Services like GhostMyData handle the legal mechanics — citing the right law, sending properly formatted deletion requests, and following up on non-compliance. Understanding the basics (CCPA gives you the right to delete, brokers must comply) is enough.

What is the difference between a data broker and a data breach?

A data broker legally collects and sells your personal information from public records, commercial sources, and other data providers. A data breach is an unauthorized access event where protected data is stolen or exposed. Both result in your information being available to people you did not consent to share it with — but through very different mechanisms.

How does AI training data relate to data brokers?

AI companies scrape vast amounts of internet data to train their models. Data broker profiles — which contain your name, address, phone number, employment history, and more — are publicly accessible on the web and can be ingested into training datasets. Removing your information from data brokers reduces the amount of personal data available for AI training scraping.

What is the best single action I can take for my privacy?

Remove your information from data brokers. Data broker profiles are the hub that connects most privacy threats — they fuel phishing attacks, social engineering, identity theft, doxxing, stalking, and AI data scraping. Eliminating these profiles cuts off the primary source of personal data that powers these threats.