Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address diversity of privacy definitions and add missing threats #41

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
117 changes: 117 additions & 0 deletions privacy_models.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# Summary

This document serves to outline the main camps of thought regarding what privacy entails, and identify privacy threats to some or all such privacy models. It can advise on what is accepted by the entire web community and W3 as a privacy threat, and what is only considered a threat to some groups.

If a privacy model does not represent an editor's ideal model, they may consider adding a new one rather than editing the existing one if the change is not additive.

This is meant to be more a general guide rather than a proposal-by-proposal definition of privacy listing.

## Privacy Models

### The No Tracking Theory of Privacy
This theory is the idea that user data should not be tracked at all, and should be made impossible to track. No user should be identifiable in anything that touches the client.

All forms of individual tracking should be prevented entirely and made impossible. It is acceptable if this comes at the cost of loss of customization or functionality.

All forms of profiling should be prevented.

**Benefits:**

100% guarantee a user cannot be tracked.

**Issues:**

Potentially not feasible to lock out every tracking method, esp tracking by the browser or OS.

Potentially not feasible to lock out every profiling method, esp ‘form profiling’.

Concerns above infeasibilities could lead to profiling monopolization by companies whose tracking methods cannot be locked out, leading to a worse privacy state and less web competition.

Breaks current technologies.

Concerns users will not have transparency/control over their data when they are tracked under the assumption it should not exist.

Leads to subscription and walled garden web solutions.

Preventing fingerprinting requires entropy management that restricts abilities of the internet.


### The Control Theory of Privacy via ID
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this model is about the user using laws and regulation to control their online presence, while the other is about using technical means. The privacy threat model has, so far, been entirely about technical means. It's worth exploring those two approaches, although I'm not sure what we can say about the legal landscape in a W3C document that needs to include countries that haven't passed any useful laws about this.

I believe there's disagreement in the "regulation will help" camp between people who think "notice and consent" is enough, vs people who think it tends to just produce cookie banners. Are you the right person to explore that, or do we need to find more contributors?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The privacy threat model has actually been different in different proposals so far. Its my hope we can update this document to reflect that reality. Its been very confusing! You'll find for example differences in what Google and Apple representatives consider tracking methodology that are most evident in a few of the floc tickets, but it is not at all clear when talking about the proposals since the word 'privacy' is used instead of a definition on what that means in each case.

I 100% would like each group to pen their own definitions and be very clear about it. I hope to address guidelines for doing this in Feathered Serpent which will demo in web adv next week, but the definitions should be written by their supporters, and the issues by those who disagree, this way every voice is heard.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different people do have different assumptions about what "privacy" means about about what "tracking" means (e.g. https://twitter.com/jyasskin/status/1387170000511799299). This comment was an attempt to refine the description of this model so it conveys its point to more people. If you're in the "notice and consent" camp, could you write that down, with a TODO or something to get a contribution from the other folks?

This theory is the idea that a user should have total control over their online data, and profiling may exist while this is the case.

A user’s profile must be linked to an id stored on device that can be passed in place of Personal Information.

Personal Information is restricted or forbidden to be passed/sold.

The id can be removed from the device by the user to thwart all existing profiles.

Every tracking method and profiling method should be auditable.

A user should have the ability to be forgotten, or a user’s ability to delete their profile.

A user should have the ability to opt (in or out whichever is deemed more appropriate) to tracking

Legal arrangements used to punish detractors

**Benefits:**

Does not break current technologies

Keeps open web, prevents walled garden dominance

Custom experiences can still exist


**Issues:**

Reliance on audits to detect cheaters may be impractical

Data Sales and Sharing may obscure what data is where

When PI serves a legal business need (your name must be recorded in an ecommerce transaction) some data cannot be deleted

When PI is associated, revoking the id may not thwart existing profiles for a cheating operator or under some implementations


## Potential* Threats

"Potential" is due to differences in privacy definitions. A certain definitions may deem something a threat to be eliminated, others a threat to be watched, others not a threat.

### Individual Identification

This refers to being able to identify a user on a user agent/browser. This is a broad category. The sub categories will be listed separately below. Some privacy models consider them threats to be prevented, others consider them acceptable but requiring safeguards and increased user control. All should be considered by any privacy model at minimum for the purposes of safeguards.

Tracking methods with no tags are implementable by all parties.

Tracking methods only implementable by user agent cooperation are marked with a (B), for browser
Tracking methods only implementable by operating system cooperation are marked with an (OS), for operating system

**Sync Tracking** (B): Any type of cross device syncing involving browser data. Examples include Chrome Sync. This allows for cross domain data to be tracked by the browser company. This also allows for user profile data and customization to be built from this data.

**OS Snooping** (B) (OS): Examples: Safari-ios or Google-Android integrations that enable the operating system to gain insight into user data in the browser when it appears as an app, or in some cases even in the classic PC browser. These browser-side integrations allow tracking prevention safeguards to be bypassed by the browser and/or OS companies. This allows a user’s web data to be tracked and users to be profiled on the web through the user agent through a back door.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would OS integration allow that a browser couldn't do unilaterally? Or is this saying that an OS might provide APIs by which a browser passes data to it, and then the OS vendor might get data it shouldn't have? The current text is pretty handwavey, and it'd be nice to make it a little more concrete.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are not wrong that a browser could do this unilaterally, but OS-browser integrations were flagged by an FTC webinar as a specifically problematic pattern. Also with OS integration greater tracking becomes possible.

Also, the OS could technically snoop on the calls by itself without browser knowledge, or create an id the browser is not aware of.

To give a more concrete example, if you have a windows 10 computer, there is web (and a great deal of other) tracking happening (by default I believe) via Microsoft on the OS layer passing it through to other partners for use with web adv and a bunch of other stuff. In fact Microsoft's privacy statement, if you can find it, openly declares this process. For Apple into Safari its a device id as well and I believe it now requires consent but still performs the user tracking through integration with the safari app. Google sign in does this on android.

Lets say for example you sign into google on your android phone but not your browser. The browser-as-an-app integration can pass your id through the web calls.


**Stack ID** (B) (OS): The operating system or any program higher on the stack assigns a device id that can be correlated to the web user, and passes it through with some or all client calls. It is the effective reverse of OS Snooping.

**Browser Login Tracking** (B): The ability for a user to log in to their browser by some means that exists across domain, enabling the browser or affiliate to individually track them. Examples include gmail/chrome login.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be about the user logging into their browser, and then the browser sending that login information to more than one domain, right? I think the Google-account/Chrome integration isn't an example of this, since it only sends the login to the one first-party site. Or, if you mean to include the browser giving the user special help to sign into a particular first party, you might remove "some means that exists across domain".

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Browser login is a spy on every web page you visit that can report back all cross domain activity to its own server. If we are talking about preventing 'passive tracking' threats, then this login can easily be used for that purpose, and whether it is or is not now, it certainly can be. Therfore we should give it the same caution as any other threat of this magnitude. The g-mail/chrome login is present on all domains (because its embedded in every page, top right corner), and can track cross domain activity, and tie it to a specific individual account. It is passive because the user is not even aware that they have logged into Google on EVERY site, they just think they logged into google for their e-mail, etc.

Also because legacy amp pages, I believe, are still on one of google's domain, there is even first party tracking in this threat. A news site on amp is clearly a third party to google, and yet it would be first party there. Many big browsers own domains that have wide reach in unusual ways. But thats mostly a side note, the real issue is the third party reach.

The browser can already do some of this, which is I believe a separate threat, but the last part, the tieing it directly to an individual account, makes it its own threat model, especially because the browser could use a third party login system. Suppose Chrome partners with Logalog (fake name I hope) and Logalog is tasked to run g-mail for some reason and then it executes this threat; chrome wouldn't necessarily be aware.

Google is the easiest example to use here because of the very obvious cross domain login, but this is a threat model for any browser.


**First and Third Party Cookie Tracking**: a cookie can be used to identify an individual web user. The cookie can be cross domain, or a synced first party domain cookie. Either performs the same function, despite the misnomer that only third party cookies can track.

**Storage Tracking**: Tracking using a storage system other than cookies to store an id to track a user,

**Browser Snooping** (B): This refers to the ability of a browser to send user data signals to a home server by its own accord. It also includes taking data from signals not meant to be related to user tracking, profiling, or data collection, and using them for that purpose as part of a microservice architecture system.For example the browser see the user on website X and phones home with this information to feed into a profiling system without a trigger by the user, or in some cases by some user trigger that may be unrelated.

**Fingerprinting**: The use of multiple data points of a device’s user agent to identify it as close to unique, theryby identifying a user. Fingerprinting can be performed by a third party, first party, or even the browser client itself which has access to the most entropy data.

### Profiling

This section details situations in which a profile of user data can be built, or problems relating to profiling.

**Form Profiling**: This refers to the practice of building a user profile based on form information filled out on a web page, often for the purpose of signing up for a product. In some cases this is legally required, as in ecommerce. The threat also refers to taking legally required information and then using it for other purposes.

**Choice Profiling**: This refers to creating a user profile based on choices a user makes such a purchases on an ecommerce platform, or shows watched on a streaming platform. This profile is used for customization options and the data can also be sold for numerous uses (product decisions, data science, advertising).

**Hidden Profiles**: Refers to a user being unable to access a web system’s profile on their user data. This could be because it is impossible or impractical. It is of particular concern because a user can be misled as to what data of theirs is being captured where this threat exists.

**Untouchable Profiles**: Refers to a user profile that a user is unable to request deletion of. Includes cases where it is impractical for a user to verify or have confidence their data has truly been deleted upon request.

**Aggregate Link Profiling**: Linking an aggregate system (like floc) to an individual user through various individual tracking, coop, or machine learning methods.