index.bs

<pre class='metadata'>
Title: Target Privacy Threat Model
Shortname: privacy-threat-model
Level: none
Group: ping
Status: ED
Repository: w3cping/privacy-threat-model
URL: https://w3cping.github.io/privacy-threat-model
Editor: Jeffrey Yasskin, Google Inc. https://google.com/, jyasskin@google.com, w3cid 72192
Editor: Tom Lowenthal, Brave https://brave.com/, tom@brave.com
Abstract: A privacy threat model we should migrate the web toward.

Warning: Not Ready
Markup Shorthands: markdown on
Complain About: accidental-2119 on, missing-example-ids on
Assume Explicit For: on
</pre>
<pre class='biblio'>
{
  "PSL-PROBLEMS": {
    "authors": [
      "Ryan Sleevi"
    ],
    "href": "https://github.com/sleevi/psl-problems",
    "title": "Public Suffix List Problems"
  },
  "WHAT-DOES-PRIVATE-BROWSING-DO": {
    "authors": ["Martin Shelton"],
    "href": "https://medium.com/@mshelton/what-does-private-browsing-mode-do-adfe5a70a8b1",
    "title": "What Does Private Browsing Mode Do?"
  }
}
</pre>
<pre class="anchors">
spec: RFC6265; urlPrefix: https://tools.ietf.org/html/rfc6265#
    type: http-header
        text: HttpOnly; url: section-4.1.2.6
spec: PSL; urlPrefix: https://publicsuffix.org/
    type: dfn
        text: registrable domain; url: list/#list-format
</pre>
<pre class='link-defaults'>
</pre>
<style>
@supports(transform: rotate(45deg)) {
/* Rotate threat model table headers to fit more columns on the screen.
 * Details adapted from
 * https://wabain.github.io/2019/10/13/css-rotated-table-header.html */
  .threatmodel tr.goals th {
    /* sin(45deg) == 0.707 */
    height: calc((25em + 3em) * 0.707);
    vertical-align: bottom;
    text-align: right;
    line-height: 1;
  }
  .threatmodel tr.goals th > div {
    width: 5em;
  }
  .threatmodel tr.goals th > div > div {
    width: 25em;
    transform-origin: bottom left;
    /* Read the transform steps bottom-up. */
    transform:
      /* Move the bottom corner from the left edge of the column to the center. */
      translateX(calc(5em/2))
      /* Rotate 45 degrees clockwise. */
      rotate(45deg)
      /* Move the bottom-right corner to the left edge of the column, which is
       * the transform-origin. */
      translateX(-100%);
  }
  .threatmodel tr.goals th a[href] {
    border-bottom: 0;
  }
}

.threatmodel td {
  text-align: center;
  min-width: 2em;
}
details[open] {
  position: absolute;
  background: white;
  border: thin black solid;
  text-align: left;
}
</style>

Advisement: This document is at a very early stage. Many things in it are wrong
and/or incomplete. Please take it as a rough shape for how we might document the
target threat model, rather than as definite statements about what should be in
the target threat model.

# Introduction # {#intro}

As a **threat model**, this specification describes attacker capabilities and
attacker goals, and says, for each possible set of capabilities an attacker
might have, which goals we expect the attacker can achieve.

As a **privacy** threat model, the attacker goals compromise the privacy of
users, rather than their security.

As a **target** threat model, it describes not the current state of the Web
including all current maybe-unwise APIs, but rather an end state that we hope to
migrate to, and that new APIs should be held to. This is meant to be a
*plausible* threat model: it doesn't expect to remove any APIs or browser
behavior that is deemed essential to the viability of the Web.

Since people are likely to disagree about which APIs are essential to the Web,
when saying that an attacker can achieve their goal, this document describes how
the attacker achieves it using particular "essential" APIs, and it provides an
index of those APIs so readers can point out ones that they don't consider
essential.

# Terminology # {#terminology}

[[HTML]] defines an [=origin=] as the tuple of a scheme, hostname, and port that
provides the main security boundary on the web.

A <dfn>site</dfn> is a set of [=origins=] that are all [=same site=] with each
other. Note that there are problems ([[PSL-PROBLEMS]]) with using [=registrable
domains=] as a logical boundary.

A <dfn>party</dfn> is defined by [[tracking-dnt]] as "a natural person, a legal
entity, or a set of legal entities that share common owner(s), common
controller(s), and a group identity that is easily discoverable by a user."

The <dfn>first party</dfn> for a user action is the [=party=] that controls the
[=origin=] of the [=top-level browsing context=] under which the action
happened. Intuitively, this is the owner of the domain in the browser's URL bar.
This differs from <a
href="https://wiki.mozilla.org/Security/Anti_tracking_policy#Tracking_Definition">Mozilla's
definition</a> in that Mozilla defines other parties as first parties if the
user can easily discover which party it is and intends to interact with that
party, for example to allow sign-in widgets to be first-party.

A <dfn>third party</dfn> for a user action is any party that isn't the [=first
party=] or the user (the second party).

A <dfn>user</dfn> is a human or program that controls a user agent.

A <dfn>user ID</dfn> is a pair of a [=site=] and a (potentially-large) integer
allocated by that site that is used to identify a [=user=] on that site. A
single user will generally have many user IDs that refer to them, and a single
site may or may not know that multiple user IDs refer to the same user.

A <dfn>global identifier</dfn> is a string that identifies a particular [=user=]
independent of which site they're visiting. Users generally have relatively few
global identifiers and can usually list and recognize them. A goal of
anti-tracking policy is to prevent [=user IDs=] from becoming [=global
identifiers=].

An <dfn>attacker</dfn> is any entity trying to get information that a [=user=]
might not want them to get. Attackers are often entities that a user intends to
interact with in other ways, as both [=first party|first=] and [=third parties=], and some users may
not mind their collection of this information.

This document uses the terms <dfn>publisher</dfn> and <dfn>tracker</dfn>
colloquially to refer to particular kinds of [=sites=] and the [=parties=] that
operate them. They are not rigorously defined.

# High-level threats # {#high-level-threats}

User agents should attempt to defend their users from a variety of high-level
threats or attacker goals, described in this section. [[#goals]] then describes
the low-level steps an attacker would use to achieve these high-level goals.

[[RFC6973]] describes the following high-level privacy threats, which the TAG
has adopted into [[security-privacy-questionnaire#threats]]:

: <dfn>Surveillance</dfn>

:: Surveillance is the observation or monitoring of an individual’s
    communications or activities. See [[RFC6973#section-5.1.1]].

: <dfn noexport>Stored Data Compromise</dfn>

:: End systems that do not take adequate measures to secure stored data from
    unauthorized or inappropriate access. See [[RFC6973#section-5.1.2]].

: <dfn>Intrusion</dfn>

:: Intrusion consists of invasive acts that disturb or interrupt one’s life or
    activities. See [[RFC6973#section-5.1.3]].

: <dfn>Misattribution</dfn>

:: Misattribution occurs when data or communications related to one individual
    are attributed to another. See [[RFC6973#section-5.1.4]].

: <dfn>Correlation</dfn>

:: Correlation is the combination of various pieces of information related to an
    individual or that obtain that characteristic when combined. See
    [[RFC6973#section-5.2.1]].

: <dfn>Identification</dfn>

:: Identification is the linking of information to a particular individual to
    infer an individual’s identity or to allow the inference of an individual’s
    identity. See [[RFC6973#section-5.2.2]].

: <dfn>Secondary Use</dfn>

:: Secondary use is the use of collected information about an individual without
    the individual’s consent for a purpose different from that for which the
    information was collected. See [[RFC6973#section-5.2.3]].

: <dfn>Disclosure</dfn>

:: Disclosure is the revelation of information about an individual that affects
    the way others judge the individual. See [[RFC6973#section-5.2.4]].

: <dfn noexport>Exclusion</dfn>

:: Exclusion is the failure to allow individuals to know about the data that
    others have about them and to participate in its handling and use. See
    [[RFC6973#section-5.2.5]].

These threats combine into the particular concrete threats we want web
specifications to defend against, described in subsections here:

## Unwanted same-site recognition ## {#hl-recognition-same-site}

Contributes to [=surveillance=], [=correlation=], and [=identification=].

Users of most instantiations of the web platform expect that if they visit a
site on one day, and then visit again the next day, the site will be able to
recognize that they're the same user. This allows sites to save the user's
preferences, shopping carts, etc. The web platform offers many mechanisms that
are either intended to accomplish this recognition or that can be trivially used
for it, including [[RFC6265|cookies]], {{WindowLocalStorage/localStorage}},
{{WindowOrWorkerGlobalScope/indexedDB}}, {{CacheStorage}}, and other forms of
storage.

A privacy harm only occurs if the user wants to break the association between
two visits, but the site can still determine with high probability that the two
visits came from the same user.

A user might expect that their two visits won't be associated if they:

* Use a browser that promises to avoid such correlation.
* Use their browser's private browsing mode. ([[WHAT-DOES-PRIVATE-BROWSING-DO]])
* Use two different browser profiles between the two visits.
* Explicitly clear the site's cookies or storage.

This recognition is generally accomplished by either "supercookies" or [=browser
fingerprinting=].

Supercookies occur when a browser stores data for a site but makes that data
more difficult to clear than other cookies or storage.
[[fingerprinting-guidance#clearing-all-local-state]] discusses how
specifications can help browsers avoid this mistake.

Fingerprinting consists of using attributes of the user's browser and platform
that are consistent between the two visits and probabilistically unique to the
user.

The attributes can be exposed as information about the user's device that is
otherwise benign (as opposed to [[#hl-sensitive-information]]). For example:

* What are the user's language and time zone?
* What size is the user's window?
* What system preferences has the user set? Dark mode, serif font, etc...
* ...

See [[fingerprinting-guidance]] for how to mitigate this threat.

## Unwanted cross-site recognition ## {#hl-recognition-cross-site}

Contributes to [=surveillance=], [=correlation=], and [=identification=],
usually more significantly than [[#hl-recognition-same-site]].

This occurs if a site can determine with high probability that a visit to that
site comes from the same user as another visit to a *different* site.  This
threat is discussed in [[#model-cross-site-recognition]].

## Sensitive information disclosure ## {#hl-sensitive-information}

Contributes to [=correlation=], [=identification=], [=secondary use=], and
[=disclosure=].

Many pieces of information about a user could cause privacy harms if disclosed.
For example:

* The user's location.
* Video or audio from the user's camera or microphone.
* The content of certain files on the user's filesystem.
* Financial data.
* Contacts.
* Calendar entries.
* [Whether the user is using assistive technology.](https://w3ctag.github.io/design-principles/#do-not-expose-use-of-assistive-tech)
* ...

A particular piece of information may have different sensitivity for different
users. Language preferences, for example, might typically seem innocent, but
also can be an indicator of belonging to an ethnic minority. Precise location
information can be extremely sensitive (because it's identifying, because it
allows for in-person intrusions, because it can reveal detailed information
about a person's life) but it might also be public and not sensitive at all, or
it might be low-enough granularity that it is much less sensitive for many
users.

When considering whether a class of information is likely to be sensitive to
users, consider at least these factors:

* whether it serves as a persistent identifier (see severity in Mitigating
    browser fingerprinting);
* whether it discloses substantial (including intimate details or inferences)
    information about the user or other users;
* whether it can be revoked (as in determining whether a permission is
    necessary);
* whether it enables other threats, like intrusion.

Issue(16): This description of what makes information sensitive still needs to
be refined.

## Intrusive behavior ## {#hl-intrusion}

See [=intrusion=].

Privacy harms don't always come from a site learning things. For example it is
intrusive for a site to

* Display messages or notifications,
* Play sounds,
* Occupy the full screen,
* etc.

if the user doesn't intend for it to do so.

## Powerful capabilities ## {#hl-capabilities}

Contributes to [=misattribution=].

For example, a site that sends SMS without the user's intent could cause them to
be blamed for things they didn't intend.

# Threat Model # {#model}

For each of the <a href="#high-level-threats">high-level threats</a>, we
describe a threat model: a description of what goals attackers with various
capabilities should or should not be able to achieve. For simple threats, a
model can be expressed in prose, while complex threat models use a grid to
express the web platform's target guarantees.

<span style="color:green">✘</span> indicates that the goal should be frustrated,
while <span style="color:red">✓</span> indicates that the attacker can achieve
their goal.

Issue: Should we mark goals attackers can currently achieve, which we want to
remove, differently from goals attackers already can't achieve?

<pre class="include">
path: xsite-tracking-model.bsinc
</pre>

## Sensitive-information ## {#model-sensitive-information}

Attackers should only be able to get access to sensitive information from a
[=user agent=] if the user expresses their intent that the attacker get access
to this information at the time the attacker gets access to it. User agents vary
in how they gather this expression of intent.

That a user intends an attacker to get a piece of information at one time, for
example their location or their contact book, may be, but is not necessarily
evidence that the user intends to give out the same piece of information at
a later time. There is not consensus about how long it's reasonable to infer
continued intent, but there is consensus that intent doesn't last for years
without interaction.

Issue(19): Add more information about local attackers and confused users and how
browsers can mitigate.

This threat model defines a kind of information as <dfn>restricted sensitive
information</dfn> if the web platform currently blocks access to it by default
or if we plan to evolve the web platform to block access to it by default
because of the potential privacy harms from disclosure of that kind of
information.

Other information is described as "not [=restricted sensitive information=]"
even if some users in some situations would find it sensitive. Information in
this category may have a lower risk of privacy harm to users or may not
currently be restricted because of incompatibility with functionality of the
Web. These categories are not static and it may become feasible to block access
by default to more kinds of information as the platform develops.

There is consensus that some kinds of information are [=restricted sensitive
information=]:

* Location
* Disability status
* Microphone input
* Etc.

There is consensus that some other kinds of information are not [=restricted
sensitive information=]:

* User agent
* Language
* A user's <a descriptor for="@media" lt="prefers-reduced-motion">preference for
    less motion</a>.
* Etc.

There is not consensus about the sensitivity of all kinds of information:

* TODO: examples?

<pre class="include">
path: capabilities.bsinc
</pre>

<pre class="include">
path: goals.bsinc
</pre>

<pre class="include">
path: essential.bsinc
</pre>

# Acknowledgements # {#acknowledgements}

Safari did the first work to prove that a more privacy-preserving web was
possible, by blocking third-party cookies by default and then shipping [ITP
1.0](https://webkit.org/blog/7675/intelligent-tracking-prevention/), without
breaking the world. They eventually published their [policy for Tracking
Prevention](https://webkit.org/tracking-prevention-policy/), which heavily
influenced this document.

Mozilla wrote the first concrete [anti-tracking
policy](https://wiki.mozilla.org/Security/Anti_tracking_policy), which inspired
Safari's policy.

Michael Kleber on the Chrome team proposed a [Privacy Model for the
Web](https://github.com/michaelkleber/privacy-model), which suggests blocking
the transfer of [=user IDs=] between top-level sites and suggests a few ways
that information could flow between sites without compromising user privacy.