Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Address diversity of privacy definitions and add missing threats #41

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
90 changes: 90 additions & 0 deletions privacy_models.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
# Summary

This document serves to outline the main camps of thought regarding what privacy entails, and identify privacy threats to some or all such privacy models. It can advise on what is accepted by the entire web community and W3 as a privacy threat, and what is only considered a threat to some groups.

If a privacy model does not represent an editor's ideal model, they may consider adding a new one rather than editing the existing one if the change is not additive.

This is meant to be more a general guide rather than a proposal-by-proposal definition of privacy listing.

## Privacy Models

### The No Tracking Theory of Privacy
This theory is the idea that user data should not be tracked at all, and should be made impossible to track. No user should be identifiable in anything that touches the client.

All forms of individual tracking should be prevented entirely and made impossible. It is acceptable if this comes at the cost of loss of customization or functionality.

All forms of profiling should be prevented.

Benefits:
100% guarantee a user cannot be tracked.

Issues:
Potentially not feasible to lock out every tracking method, esp tracking by the browser or OS.
Potentially not feasible to lock out every profiling method, esp ‘form profiling’.
Concerns above infeasibilities could lead to profiling monopolization by companies whose tracking methods cannot be locked out, leading to a worse privacy state and less web competition.
Breaks current technologies.
Concerns users will not have transparency/control over their data when they are tracked under the assumption it should not exist.
Leads to subscription and walled garden web solutions.
Preventing fingerprinting requires entropy management that restricts abilities of the internet.
TheMaskMaker marked this conversation as resolved.
Show resolved Hide resolved


### The Control Theory of Privacy via ID
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this model is about the user using laws and regulation to control their online presence, while the other is about using technical means. The privacy threat model has, so far, been entirely about technical means. It's worth exploring those two approaches, although I'm not sure what we can say about the legal landscape in a W3C document that needs to include countries that haven't passed any useful laws about this.

I believe there's disagreement in the "regulation will help" camp between people who think "notice and consent" is enough, vs people who think it tends to just produce cookie banners. Are you the right person to explore that, or do we need to find more contributors?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The privacy threat model has actually been different in different proposals so far. Its my hope we can update this document to reflect that reality. Its been very confusing! You'll find for example differences in what Google and Apple representatives consider tracking methodology that are most evident in a few of the floc tickets, but it is not at all clear when talking about the proposals since the word 'privacy' is used instead of a definition on what that means in each case.

I 100% would like each group to pen their own definitions and be very clear about it. I hope to address guidelines for doing this in Feathered Serpent which will demo in web adv next week, but the definitions should be written by their supporters, and the issues by those who disagree, this way every voice is heard.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Different people do have different assumptions about what "privacy" means about about what "tracking" means (e.g. https://twitter.com/jyasskin/status/1387170000511799299). This comment was an attempt to refine the description of this model so it conveys its point to more people. If you're in the "notice and consent" camp, could you write that down, with a TODO or something to get a contribution from the other folks?

This theory is the idea that a user should have total control over their online data, and profiling may exist while this is the case.

A user’s profile must be linked to an id stored on device that can be passed in place of Personal Information.
Personal Information is restricted or forbidden to be passed/sold.
The id can be removed from the device by the user to thwart all existing profiles.
Every tracking method and profiling method should be auditable.
A user should have the ability to be forgotten, or a user’s ability to delete their profile.
A user should have the ability to opt (in or out whichever is deemed more appropriate) to tracking
Legal arrangements used to punish detractors

Benefits:
Does not break current technologies
Keeps open web, prevents walled garden dominance
Custom experiences can still exist

Issues:
Reliance on audits to detect cheaters may be impractical
Data Sales and Sharing may obscure what data is where
When PI serves a legal business need (your name must be recorded in an ecommerce transaction) some data cannot be deleted
When PI is associated, revoking the id may not thwart existing profiles for a cheating operator or under some implementations


## Potential* Threats

"Potential" is due to differences in privacy definitions. A certain definitions may deem something a threat to be eliminated, others a threat to be watched, others not a threat.

### Individual Identification

This refers to being able to identify a user on a user agent/browser. This is a broad category. The sub categories will be listed separately below. Some privacy models consider them threats to be prevented, others consider them acceptable but requiring safeguards and increased user control. All should be considered by any privacy model at minimum for the purposes of safeguards.

Sync Tracking (II): Any type of cross device syncing involving browser data. Examples include Chrome Sync. This allows for cross domain data to be tracked by the browser company. This also allows for user profile data and customization to be built from this data.
TheMaskMaker marked this conversation as resolved.
Show resolved Hide resolved

OS Snooping (II): Examples: Safari-ios or Google-Android integrations that enable the operating system to gain insight into user data in the browser when it appears as an app, or in some cases even in the classic PC browser. These browser-side integrations allow tracking prevention safeguards to be bypassed by the browser and/or OS companies. This allows a user’s web data to be tracked and users to be profiled on the web through the user agent through a back door.

Stack ID (II): The operating system or any program higher on the stack assigns a device id that can be correlated to the web user, and passes it through with some or all client calls.

Browser Login Tracking (II): The ability for a user to log in to their browser by some means that exists across domain, enabling the browser or affiliate to individually track them. Examples include gmail/chrome login.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 4 and the Browser Snooping threat below are a useful new category of threats to privacy, where users might worry about their browser learning too much about them. I think we should keep them separate from the other 3 items that cover the websites the user visits. Basically, group cases by the entity that's attacking the user.

I think the W3C hasn't historically worked much on the problem of a user choosing a browser whose behavior they didn't entirely want, and it's not clear to me that the W3C has any levers to use to change this sort of UA behavior, but it's worth discussing with the PING anyway.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, actually they aren't a new category, in fact current uses of these methods feed into the exact same user tracking ad technologies as the others. I'm amazed they have not been brought up before!

To exclude them or move them to a separate category would raise the question of why browser-based or browser-partnered adtech firms get special treatment, and would make proposals benefit one group over the other.

Since the same data about users is extracted for the same use cases, and also I checked this is user agent behavior so should by the charter be in scope for us, we should probably leave them in the same list. Do you disagree?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or do you mean its worth flagging that only browsers can do these things? Thats a good point! I can tag them as such! Let me know if this update is in line with what you are thinking.


First and Third Party Cookie Tracking (II): a cookie can be used to identify an individual web user. The cookie can be cross domain, or a synced first party domain cookie. Either performs the same function, despite the misnomer that only third party cookies can track.

Storage Tracking (II): Tracking using a storage system other than cookies to store an id to track a user,

Browser Snooping (II): This refers to the ability of a browser to send user data signals to a home server by its own accord. It also includes taking data from signals not meant to be related to user tracking, profiling, or data collection, and using them for that purpose as part of a microservice architecture system.

Fingerprinting (II): The use of multiple data points of a device’s user agent to identify it as close to unique, theryby identifying a user. Fingerprinting can be performed by a third party, first party, or even the browser client itself which has access to the most entropy data.

### Profiling

This section details situations in which a profile of user data can be built, or problems relating to profiling.

Form Profiling: This refers to the practice of building a user profile based on form information filled out on a web page, often for the purpose of signing up for a product. In some cases this is legally required, as in ecommerce. The threat also refers to taking legally required information and then using it for other purposes.

Choice Profiling: This refers to creating a user profile based on choices a user makes such a purchases on an ecommerce platform, or shows watched on a streaming platform. This profile is used for customization options and the data can also be sold for numerous uses (product decisions, data science, advertising).

Hidden Profiles: Refers to a user being unable to access a web system’s profile on their user data. This could be because it is impossible or impractical. It is of particular concern because a user can be misled as to what data of theirs is being captured where this threat exists.

Untouchable Profiles: Refers to a user profile that a user is unable to request deletion of. Includes cases where it is impractical for a user to verify or have confidence their data has truly been deleted upon request.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two cases belong in a different category from the first two: the first two talk about the way a profile is built, while these two talk about how little control a user has over the profile.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thats a good point, but its not always true that these methods are not tracking methods. Some profiles MUST be collected, i.e. in ecommerce. A user is consenting to buy something, and technically for their information to be collected. They may not want the latter but cannot separate them. Hiding the profile tricks the user into thinking they have not been tracked, and this behavior allows the profile to be collected. This is just as unfair to the user. My intention here was to prevent these threats from being forgotten.

We could put them in a separate list, but I think this method is important and I would prefer it here. What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of subsections under "Profiling". When I say they're a different category, I don't mean to make claims about whether or not they're tracking (and, in fact it'd be good to avoid the term "tracking" entirely, or define it).


Aggregate Link Profiling: Linking an aggregate system (like floc) to an individual user through various individual tracking, coop, or machine learning methods.