In 2023, OCR investigated a research institution that published a dataset it believed was fully de-identified — only to discover that zip codes and dates of service had been left intact. The result was a reportable breach affecting thousands of patients. The organization had attempted to use the safe harbor method HIPAA provides under the Privacy Rule but failed to strip all 18 required identifiers. This is not an uncommon mistake, and it carries real consequences.
What the Safe Harbor Method Under HIPAA Actually Requires
The HIPAA Privacy Rule at 45 CFR § 164.514(b) establishes two approved methods for de-identifying protected health information (PHI): the Expert Determination method and the Safe Harbor method. Most covered entities and business associates gravitate toward Safe Harbor because it does not require a qualified statistical expert.
Under the safe harbor method HIPAA outlines, your organization must remove 18 specific categories of identifiers from the data. You must also have no actual knowledge that the remaining information could be used — alone or in combination — to identify an individual. If either condition is unmet, the data is still considered PHI and remains subject to every Privacy Rule and Security Rule requirement.
The 18 Identifiers You Must Remove Under Safe Harbor
The list is codified at 45 CFR § 164.514(b)(2) and is exhaustive. Every identifier must be removed or generalized beyond recognition. Here is the complete list:
- Names
- Geographic data smaller than a state (street address, city, zip code, county, precinct — zip codes may be included only if the geographic unit contains more than 20,000 people)
- All dates directly related to the individual (except year) — including birth date, admission date, discharge date, date of death, and all ages over 89
- Telephone numbers
- Fax numbers
- Email addresses
- Social Security numbers
- Medical record numbers
- Health plan beneficiary numbers
- Account numbers
- Certificate or license numbers
- Vehicle identifiers and serial numbers, including license plate numbers
- Device identifiers and serial numbers
- Web URLs
- IP addresses
- Biometric identifiers (fingerprints, voiceprints)
- Full-face photographs and comparable images
- Any other unique identifying number, characteristic, or code
That final category — "any other unique identifying number" — is the one that catches organizations off guard. Internal patient identifiers, research subject codes linked back to the individual, or proprietary tracking numbers all qualify.
Where Healthcare Organizations Consistently Get Safe Harbor Wrong
In my work with covered entities and their business associates, three mistakes appear repeatedly.
Zip code truncation errors. The Safe Harbor method permits retaining the first three digits of a zip code only if the geographic unit formed by combining all zip codes with the same three initial digits contains more than 20,000 persons. The Census Bureau publishes data to verify this threshold. Organizations that skip this verification step create re-identifiable datasets — especially in rural areas where three-digit zip code prefixes represent small populations.
Date handling failures. Safe Harbor requires removal of all elements of dates (except year) directly related to an individual. That means admission month and day, birth month and day, and procedure dates must all be stripped. Simply converting dates to a different format does not satisfy the requirement.
Ignoring the "no actual knowledge" standard. Even after removing all 18 identifiers, the safe harbor method HIPAA defines is not satisfied if the entity has actual knowledge that the remaining information could identify a person. A dataset describing a rare disease in a small community, for example, might still allow re-identification even without any of the 18 identifiers present.
Safe Harbor vs. Expert Determination: Choosing the Right Path
Safe Harbor is procedural — remove the identifiers, confirm no actual knowledge, and the data is de-identified. Expert Determination under 45 CFR § 164.514(a) requires a qualified statistical or scientific expert to apply accepted methods and certify that the risk of identification is "very small." Expert Determination is more flexible but more expensive and time-consuming.
For most healthcare organizations handling routine data requests — such as sharing information with researchers or business associates for analytics — the safe harbor method HIPAA provides is the faster, more predictable option. If your data is complex, involves small populations, or includes genomic information, consult a statistician and consider Expert Determination instead.
How OCR Enforcement Treats De-Identification Failures
OCR does not maintain a separate penalty category for de-identification failures. Instead, an improperly de-identified dataset is treated as PHI. That means every downstream use or disclosure is evaluated under the Privacy Rule's standard provisions — including the minimum necessary standard, authorization requirements, and breach notification obligations under 45 CFR §§ 164.400–414.
If your organization discloses a dataset it believed was de-identified but was not, that disclosure is an impermissible use of PHI. If it affects 500 or more individuals, it triggers notification to OCR, affected individuals, and in some cases the media. Civil monetary penalties under the HITECH Act's tiered structure can reach $2,067,813 per violation category per year (as adjusted for inflation).
Build De-Identification Into Your Workforce Training Program
The Privacy Rule at 45 CFR § 164.530(b) requires covered entities to train all workforce members on policies and procedures relevant to their job functions. For any staff involved in data analytics, research support, IT, or health information management, de-identification methodology must be part of that training.
Your training should cover the 18 Safe Harbor identifiers, the zip code population threshold, date handling rules, and the actual knowledge standard. It should also clarify who in your organization has the authority to certify that a dataset meets de-identification requirements. Comprehensive HIPAA training and certification programs address these requirements in a structured, auditable format.
Documenting Your Safe Harbor Process
OCR expects documentation. Your organization should maintain written policies specifying which de-identification method you use, the steps your workforce follows, and how you verify compliance with each requirement. This documentation should be retained for six years under the Privacy Rule's record retention standard at 45 CFR § 164.530(j).
If you rely on a business associate to de-identify data on your behalf, your business associate agreement must address this function. The BAA should specify the de-identification standard to be applied and require the business associate to certify compliance.
Practical Steps to Implement the Safe Harbor Method Today
Start with a risk analysis of your current data-sharing practices. Identify every dataset your organization treats as "de-identified" and audit it against the 18-identifier checklist. Verify zip code population thresholds using current Census data. Review your Notice of Privacy Practices to confirm it accurately describes your de-identification practices.
Assign a compliance officer or privacy official to sign off on every de-identified dataset before it leaves your organization. Train your workforce — not once, but at regular intervals and whenever your de-identification procedures change. Platforms like HIPAA Certify help organizations build and document ongoing workforce HIPAA compliance programs that cover de-identification alongside other critical Privacy Rule and Security Rule requirements.
The safe harbor method HIPAA provides is straightforward on paper. In practice, it demands precision, documentation, and a workforce that understands exactly what qualifies as de-identified data — and what does not. Get it wrong, and you are not working with de-identified data. You are disclosing PHI without authorization.