From b10307d2129887bfbad59bcb4f6820dbca0b243c Mon Sep 17 00:00:00 2001 From: Amy Guy Date: Wed, 12 Jul 2023 17:12:49 +0100 Subject: [PATCH] Editorial: Copy edit 2.1 Identity (#323) Fixes 302. Moves Recognition definition and subsections into its own section in the Common Concepts appendix. --- index.html | 367 ++++++++++++++++++++++++++++------------------------- 1 file changed, 191 insertions(+), 176 deletions(-) diff --git a/index.html b/index.html index 3c6c94dd..eb4439e4 100644 --- a/index.html +++ b/index.html @@ -1038,194 +1038,39 @@ A [=user agent=] should help its user present the [=identity=] they want in each [=context=] -they are in. +they are in, and should prevent or support [=recognition=] as appropriate. + A [=person=]'s identity is the set of characteristics that define them. Their identity *in a [=context=]* is the set of characteristics they -present in that context. People frequently present different identities to -different contexts, and also frequently share an identity among several -contexts. People may also wish to present an ephemeral or anonymous identity, -which is just a set of characteristics that is too small or unstable to be useful -for following them through time. - -It is important to keep in mind that a person's [=identities=] may often be distinct -from whatever legal identity or identities they may hold. - -Recognition is the act of realising that a given [=identity=] -corresponds to the same [=person=] as another [=identity=] which may have been -observed either in another [=context=] or in the same [=context=] but at a -different time. A [=person=] can be [=recognized=] whether or not their legal identity or -characteristics of their legal identity are included in the recognition. - -In order to uphold the above principle, sometimes a [=user agent=] -needs to *prevent* [=recognition=], for instance so that one [=site=] can't -learn anything about its [=user=]'s behavior on *another* site. Other times, the -[=user agent=] needs to *support* [=recognition=], -for instance to help its [=user=] *prove* to one [=site=] that they have a -particular identity on another [=site=]. Similarly, a [=user agent=] can help its -[=user=] to separate or communicate [=identity=] across *repeat* visits to the *same* -[=site=]. - -### Recognition Types {#recognition-types} - -There are several types of [=recognition=] that may take place. These rely on -different methods and present different challenges. - -Cross-context recognition is [=recognition=] between different -[=contexts=]. It contributes to [=surveillance=], [=correlation=], -and [=identification=]. - -[=Cross-context recognition=] is only [=appropriate=] when the person being [=recognized=] -can reasonably expect that recognition to happen and can control whether it does. -Note that a person can use a piece of identifying information in two different -contexts (e.g. their email or phone number) without that implying that they're -using the same identity in both contexts. Unless there's some other indication -that they intended to use a single identity, it is [=inappropriate=] to -[=recognize=] them using that information, or to seek extra identifying -information to help with cross-context recognition. - -Systems which [=recognize=] people across [=contexts=] need -to be careful not to apply the principles of one [=context=] in ways that -violate the principles around use of information acquired in a different -[=context=]. This is particularly true for [=vulnerable=] people, as -recognising them in different [=contexts=] may force traits into the open -that reveal their vulnerability. For example, if you meet your therapist at a -party, you expect them to have different discussion topics with you than they -usually would, and possibly even to pretend they don't know you. - -Cross-site recognition is when a [=site=] determines with high -probability that a visit to the site comes from the same person as another -visit to a *different* site. In the usual case that the sites are different [=contexts=], -[=cross-site recognition=] is a privacy harm in the same cases as [=cross-context recognition=]. - -Same-site recognition is when a single [=site=] discovers and uses the -fact that two or more visits probably came from the same [=person=]. - -A privacy harm occurs if a [=person=] reasonably expects that they'll be using -a different [=identity=] for different visits to a single site, but the site -[=recognizes=] them anyway. This harm can be accomplished through a -variety of means detailed in [[[#recognition-methods]]]. - -Note that these categories overlap: [=cross-site recognition=] is usually -[=cross-context recognition=] (and always [=recognizes=] across [=partitions=]); and -[=same-site recognition=] is sometimes [=cross-context recognition=] (and may or may not -involve multiple [=partitions=]). - -### User agent awareness of recognition {#user-agent-recognition} - -A partition is the [=user agent=]'s attempt to match how its user -would understand a [=context=]. [=User agents=] don't have a perfect -understanding of how their users experience the sites they visit, so they -often need to approximate the boundaries between [=contexts=] when building -[=partitions=]. In the absence of better information, a [=partition=] can be -defined as: - -* a set of [=environments=] (roughly same-site and cross-site [^iframe^]s, -workers, and top-level pages) -* whose [=environment/top-level origins=] are in the [=same site=] (but see -[[PSL-Problems]]) -* being visited within the same user agent installation (and browser profile, -container, or container tab for user agents that support those features) -* between points in time that the person or user agent clears that [=site=]'s -cookies and other storage (which is sometimes automatic at the end of each -session). - -When a [=user agent=] knows that a site includes multiple contexts, it should -adjust its [=partitions=] accordingly, for instance by partitioning identities -per subdomain or site path. User agents should work to improve their ability to -distinguish contexts within a site. - -Where possible, [=user agents=] should prevent people from being [=cross-context -recognition|recognized=] across [=partitions=] unless they intend to be -recognized. Note that: - -* Sites can do harm even if they can't be completely certain - that visits come from the same person, so [=user agents=] should also take steps - to prevent such probabilistic recognition. The [[[Privacy-Threat]]] discusses - the tradeoffs involved. ([[Privacy-Threat]]) -* If a [=user agent=] blocks [=recognition=] between different sites but doesn't - notice when a single site contains multiple [=contexts=], this can encourage - [=actors=] to bundle multiple contexts into a single site and privilege the - [=actors=] who can do so. Despite these harms, even if a [=user agent=] can't - always block [=cross-context recognition=] within a single site, it should - still prevent the [=cross-context recognition=] it can detect. By having - [=user agents=] solve the problems that technical measures can solve, we let - actors using other approaches focus on the problems they're best at. +present under particular circumstances. -If a [=user agent=] can tell that its user is using a particular identity on a -website, for example because the user used an API like -[[[credential-management-1]]] to log into the site, it should make that active -identity clear to the user. - -### Recognition Methods {#recognition-methods} - -The web platform offers many ways for a website to recognize that a [=person=] -is using the same [=identity=] over time, including [[RFC6265|cookies]], -{{WindowLocalStorage/localStorage}}, {{WindowOrWorkerGlobalScope/indexedDB}}, -{{CacheStorage}}, and other forms of storage. This allows sites to save the -[=person=]'s preferences, shopping carts, etc., and people have come to expect -this behavior in some contexts. - -People are unlikely to expect the recognition and will find it difficult to -mitigate when it is automated, which can happen in different ways: - -* through the use of cross-site cookies, -* by having someone navigate to a link that has been decorated with an identifier - ([[?Nav-Tracking]]), -* collecting the same piece of identifying information on both sites, or -* by correlating the timestamps of an event that occurs nearly-simultaneously -on both sites (this is an example of a timing -attack). - -In addition to recognition methods that can operate automatically across -contexts, recognition can also be made *persistent* such that it will -defeat potential mitigations like [=partitions=] or clearing one's cookies. -This constitutes unsanctioned tracking ([[?UNSANCTIONED-TRACKING]]) and can -take multiple forms. - -Fingerprinting consists of using attributes of the [=person=]'s -browser and platform that are consistent between two or more visits and -probably unique to the person. +People can present different identities to different contexts, and can +also share a single identity across several different contexts. -The attributes can be exposed as information about the [=person=]'s device -that on their own are benign (as opposed to [[[#hl-sensitive-information]]]). -Taken in the aggregate these attributes could uniquely identify the device, or -contribute to possible [=cross-context recognition=]. -For example: +People may wish to present an ephemeral or anonymous identity. This is +a set of characteristics that is too small or unstable to be useful +for following them through time. -* language and time zone; -* window size; -* system preferences (such as dark mode, serif font, etc.). +A person's [=identities=] may often be distinct from whatever legal identity +or identities they hold. -Preventing [=fingerprinting=] can be particularly challenging in cases that -only affect a small group of people who use the web. For example, people who -configure their systems in unique ways, such as by using a browser with a very -small number of users. -See [[fingerprinting-guidance]] for how to mitigate threats that result from -[=fingerprinting=]. +In some circumstances, the best way for a [=user agent=] to uphold this +principle is to prevent [=recognition=] (e.g. so that one [=site=] can't +learn anything about its [=user=]'s behavior on *another* site). -Supercookies occur when a user agent stores data for a site but makes that data more -difficult to clear than other cookies or storage, typically because of a bug, of features -relating to cache storage and network state (e.g. ETag, HSTS), or -because the browser restores the browser vendor's cookies when local state is cleared. -Fingerprinting -Guidance § Clearing all local state discusses how specifications can help -user agents avoid this mistake. +In other circumstances, the best way for a [=user agent=] to uphold this +principle is to *support* [=recognition=] (e.g. to help its [=user=] prove +to one [=site=] that they have a particular identity on another [=site=]). -Header enrichment happens when a network operator adds HTTP request headers -to identify their customers to sites that they visit. It is unfortunately -difficult for a [=user agent=] to mitigate against [=header enrichment=]. +Similarly, a [=user agent=] can help its [=user=] by preventing or supporting +[=recognition=] across *repeat* visits to the *same* [=site=]. -Cross-device communication is communication -between code on one device and code running on another device. For example, sounds or -light emitted from one device could be detected by a microphone or light sensor on -another device [[?SILVERPUSH]]. [=Cross-device communication=] enables cross-device -tracking, a form of [=cross-context recognition=], but it can also be used for other -inappropriate information flows. +[=User agents=] should do their best to distinguish [=contexts=] within a site +and adjust their [=partitions=] to prevent or support [=recognition=] across those intra-site [=contexts=] +according to their [=users=]' wishes. ## Data Minimization {#data-minimization} @@ -2257,6 +2102,176 @@ monitoring service integrity, load balancing, capacity planning, or billing); and, * has a contract in place with the [=actor=] which is consistent with the above limitations. +## Recognition {#recognition} + +Recognition is the act of realising that a given [=identity=] +corresponds to the same [=person=] as another [=identity=] which may have been +observed either in another [=context=], or in the same [=context=] but at a +different time. [=Recognition=] can be probabilistic, if someone realises there's +a high probability that two [=identities=] correspond to the same [=person=], +even if they aren't certain. + +A [=person=] can be [=recognized=] whether or not their legal identity or +characteristics of their legal identity are included in the recognition. + +### Recognition Types {#recognition-types} + +There are several types of [=recognition=] that may take place. + +Cross-context recognition is [=recognition=] between different +[=contexts=]. + +[=Cross-context recognition=] is only [=appropriate=] when the person being [=recognized=] +can reasonably expect recognition to happen, and can control whether it does. + +If a person uses a piece of identifying information in two different +contexts (e.g. their email or phone number), this *does not* automatically +mean that they intend to use the same identity in both contexts. It is +[=inappropriate=] to [=recognize=] them using that information, unless there's +some other indication that they intended to use a single identity. It is also +[=inappropriate=] to seek extra identifying information to help with +cross-context recognition. + +Systems which [=recognize=] people across [=contexts=] need +to be careful not to apply the principles of one [=context=] in ways that +violate the principles around use of information acquired in a different +[=context=]. This is particularly true for [=vulnerable=] people, as +recognising them in different [=contexts=] may force traits into the open +that reveal their vulnerability. For example, if you meet your therapist at a +party, you expect them to have different discussion topics with you than they +usually would, and possibly even to pretend they don't know you. + +Cross-site recognition is [=recognition=] when the identities +are observed on different [=sites=]. In the usual case that the sites are +different [=contexts=], +[=cross-site recognition=] is a privacy harm in the same cases as [=cross-context recognition=]. + +Same-site recognition is when a single [=site=] [=recognizes=] a +[=person=] across two or more visits. + +A privacy harm occurs if a [=person=] reasonably expects that they'll be using +a different [=identity=] for different visits to a single site, but the site +[=recognizes=] them anyway. This harm can be accomplished through a +variety of means detailed in [[[#recognition-methods]]]. + +Note that these categories overlap: [=cross-site recognition=] is usually +[=cross-context recognition=] (and always [=recognizes=] across [=partitions=]); and +[=same-site recognition=] is sometimes [=cross-context recognition=] (and may or may not +involve multiple [=partitions=]). + +### User agent awareness of recognition {#user-agent-recognition} + +A partition is the [=user agent=]'s attempt to match how its user +would understand a [=context=]. [=User agents=] don't have a perfect +understanding of how their users experience the sites they visit, so they +often need to approximate the boundaries between [=contexts=] when building +[=partitions=]. + +In the absence of better information, a [=partition=] can be defined as: + +* a set of [=environments=] (roughly: same-site and cross-site [^iframe^]s, +workers, and top-level pages) +* whose [=environment/top-level origins=] are in the [=same site=] (note: see +[[PSL-Problems]]) +* being visited within the same user agent installation (and browser profile, +container, or container tab for user agents that support those features) +* between points in time that the person or user agent clears that [=site=]'s +cookies and other storage (which is sometimes automatic at the end of each +session). + +It can be difficult for a [=user agent=] to detect when a single site contains +multiple [=contexts=]. When a [=user agent=] can detect this, it should adjust +its [=partitions=] accordingly, for instance by partitioning identities +per subdomain or site path. User agents should work to improve their ability to +distinguish contexts within a site. + +[=User agents=] should prevent people from being +[=recognized=] across [=partitions=] unless they intend to be +recognized. + +Note that sites can do harm even if they can't be completely certain +that visits come from the same person, so [=user agents=] should also take steps +to prevent such probabilistic recognition. The [[[Privacy-Threat]]] discusses +the tradeoffs involved ([[Privacy-Threat]]). + +If a [=user agent=] can tell that its user is using a particular identity on a +website, it should make that active identity clear to the user (e.g. if the +user logged into the site via an API like [[[credential-management-1]]]). + +### Recognition Methods {#recognition-methods} + +The web platform offers many ways for a website to recognize that a [=person=] +is using the same [=identity=] over time, including [[RFC6265|cookies]], +{{WindowLocalStorage/localStorage}}, {{WindowOrWorkerGlobalScope/indexedDB}}, +{{CacheStorage}}, and other forms of storage. + +Sometimes sites use this to save the [=person=]'s preferences or ongoing activities +(e.g. shopping carts), and people have come to expect this behavior in some +[=contexts=]. + +People are unlikely to expect [=recognition=] all the time, and can find it +difficult to prevent or mitigate, especially if it is automated. + +[=Recognition=] can be automated in different ways: + +* through the use of cross-site cookies, +* by having someone navigate to a link that has been decorated with an identifier + ([[?Nav-Tracking]]), +* collecting the same piece of identifying information on both sites, or +* by correlating the timestamps of an event that occurs nearly-simultaneously +on both sites (this is an example of a timing +attack). + +Recognition can also be made *persistent* such that it will +defeat potential mitigations like [=partitions=] or clearing one's cookies. +This is unsanctioned tracking ([[?UNSANCTIONED-TRACKING]]) and can +take multiple forms: + +* [=Fingerprinting=]. +* [=Header enrichment=]. +* [=Cross-device communication=]. + +Fingerprinting consists of using attributes of the [=person=]'s +browser and platform that are consistent between two or more visits and +probably unique to the person. + +The attributes can be exposed as information about the [=person=]'s device +that on their own are benign (as opposed to [[[#hl-sensitive-information]]]). +Taken in the aggregate these attributes could uniquely identify the device, or +contribute to possible [=cross-context recognition=]. +For example: + +* language and time zone; +* window size; +* system preferences (such as dark mode, serif font, etc.). + +Preventing [=fingerprinting=] can be particularly challenging in cases that +only affect a small group of people who use the web. For example, people who +configure their systems in unique ways, such as by using a browser with a very +small number of users. +See [[fingerprinting-guidance]] for how to mitigate threats that result from +[=fingerprinting=]. + +Supercookies occur when a user agent stores data for a site but makes that data more +difficult to clear than other cookies or storage, typically because of a bug, of features +relating to cache storage and network state (e.g. ETag, HSTS), or +because the browser restores the browser vendor's cookies when local state is cleared. +Fingerprinting +Guidance § Clearing all local state discusses how specifications can help +user agents avoid this mistake. + +Header enrichment happens when a network operator adds HTTP request headers +to identify their customers to sites that they visit. It is +difficult for a [=user agent=] to mitigate against [=header enrichment=]. + +Cross-device communication is communication +between code on one device and code running on another device. For example, sounds or +light emitted from one device could be detected by a microphone or light sensor on +another device [[?SILVERPUSH]]. [=Cross-device communication=] enables cross-device +tracking, a form of [=cross-context recognition=], and it can also be used for other +[=inappropriate=] information flows. +