Skip to content

Latest commit

 

History

History
157 lines (104 loc) · 3.76 KB

README.md

File metadata and controls

157 lines (104 loc) · 3.76 KB

Datasets

I have used curated data resources and several data generators and to obtain good enough American, Canadian, and European specific datasets.

It’s essential to support country specific localisation (l10n) as an integral part of your policies to reduce false positive and false negative. The flexibility provided by the internationalisation (i18n) to ensure that DLP policies can be adapted to various languages and regions without engineering changes.

The datasets are identified with the country ISO code except for generic english document.

Country Code
Canada CA
Federal Republic of Germany DE
French Republic FR
Hellenic Republic GR
Kingdom of Belgium BE
Kingdom of Norway NO
Kingdom of Sweden SE
Kingdom of the Netherlands NL
Portugal PT
Republic of Finland FI
Swiss Confederation CH
United States of America US

Datasets

Secrets

Items:

  • password files / shadow
  • common passwords
  • LDAP1: LDF2schema to store content & actions to perform such as a adding, modifying, removing and renaming objects (e.g., users and groups)
  • base-64 encoded files
  • ICS/SCADA3

Compliance:

  • To be defined
Finance

Items:

  • Credit card number (CCN)

Compliance:

  • PCI
Industries

Items:

Compliance:

  • To be defined
Information Technology (IT)

Items:

Compliance:

  • To be defined
International

Items:

  • Contract
  • NDA

Countries

United Nation country names

The United Nations (UN) group of experts on geographical names document the short and the formal countries names for the official national languages and the UN official languages (e.g. English, French, Spanish, Russian, Chinese, and Arabic).

ISO 3166 country codes (2013 edition)

ISO 3166 is the International Standard for country codes, codes for subdivisions and formerly used codes (codes that were once used to describe countries but are no longer in use).

The country codes can be represented either as a two-letter code (alpha-2) which is recommended as the general purpose code, a three-letter code (alpha-3) which is more closely related to the country name and a three digit numeric code (numeric -3) which can be useful if you need to avoid using Latin script.

Names and codes for subdivisions are usually taken from relevant official national information sources.

Compliance:

  • To be defined
Personal

Items:

  • PII
  • PHI

Compliance:

  • GDPR
Legal

Items:

  • Contract
  • NDA

Compliance:

  • To be defined
Personal

Items:

  • PII
  • PHI

Compliance:

  • GDPR

Footnotes

  1. Lightweight Directory Access Protocol (LDAP), OpenLDAP is an open source implementation

  2. The LDAP Data Interchange Format (LDIF) is stored as plain-text files with an LDF extension

  3. Industrial Control System (Ics) / Supervisory Control and Data Acquisition (SCADA)