Open Bug 572650 (http-fingerprint) Opened 15 years ago Updated 3 months ago

[meta] Reduce the amount of data and entropy sent out in HTTP requests

Categories

(Core :: Networking: HTTP, enhancement, P5)

enhancement

Tracking

()

People

(Reporter: hsivonen, Unassigned)

References

(Depends on 4 open bugs, Blocks 1 open bug, )

Details

(Keywords: meta, perf, privacy, Whiteboard: [fingerprinting][necko-would-take][fp-triaged])

Attachments

(2 files)

Steps to reproduce:
 1) Navigate to http://www.delorie.com:81/some/url.txt

Actual results:
Firefox exposes a lot of configuration-dependent data in every HTTP request.

Expected results:
Expected Firefox to send less data and, in particular, less data that exposes configuration entropy. Exposing less data and less entropy is good for three reasons:
 1) Exposing less entropy helps combat browser fingerprinting. Making the browser instance less fingerprintable is good for user privacy. See https://panopticlick.eff.org/
 2) Simply sending fewer bytes allows longer request URLs without having to split an HTTP GET request into multiple packets. Fewer packets (ideally just one) is good for performance.
 3) Making branded Firefox builds, unbranded builds, nightlies and even non-Firefox Gecko-based browsers indistinguishable makes them all less fingerprintable and reduces the possibilities for sites to sniff for Firefox release builds and break other Gecko-based browsers, including nightlies.

(This will be a tracking bug.)
Alias: http-fingerprint
Depends on: 572652
Keywords: perf
Depends on: 572656
Depends on: 572659
Depends on: 572661
Depends on: 572665
Depends on: 572667
Depends on: 572668
As long as we are unable to have a per-site directed spoofing mechanism, and have that built in by default, I think this MUST be a WONTFIX.
I agree with 1,2 and 3.

This MUST be FIXED (soon .. they're already here ..)!
I must ask:  

How many Web sites will be broken if the UA string is pruned?  

How many users of Gecko-based browsers will migrate like lemmings to other browsers when the owners of the broken Web sites declare that they will not fix their Web sites for a problem created by Mozilla?
I can't see many sites breaking from the current proposed changes. The majority of sites just detects for Mozilla or Firefox and in some very rare cases, Gecko.
Depends on: 566434
For reference, here's the output for all browsers right now.
I see it should be en-US anyways.
Firefox is the only one using lower-case.
If we really want to tackle UA fingerprinting we're going to need to ban or ask to correct 3rd party junk added to the UA. (see bug 566434 for asking) AMO is going to start banning it for addons hosted there but it will still be a problem. In terms of most the fingerprintable things (after the :visited fix), UA litter ranks up around the top of the list. It's usually really specific and there are unfortunately plenty of things that muck with it these days. Things sent by Firefox from a completely clean install aren't nearly as important to the general population, and this is aside from the potential breakage that can be caused by these junky addons' UA changes.
(In reply to comment #9)
> In terms of most the fingerprintable things (after the :visited fix),
> UA litter ranks up around the top of the list.

From all I saw, the plugins listing ranges a few dimensions, not just a small factor, higher than anything else, including the relative commonality that UA strings represent.
Yes, with people with too many plugins installed that'll be a big part of their fingerprinting problem, though not HTTP request specific as with this particular bug. Again, this isn't something that has to do with a vanilla Firefox install. (plugins fingerprinting is bug 566423)

Also, we can't forget that a 100% unfingerprintable browser is a windmill quest. The goal is just to reduce things majorly to make fingerprinting no longer as lucrative and not worth it.

A higher level fingerprinting meta bug depending on this and others might be warranted if someone wants to organize the other known issues, by the way.
(In reply to comment #10)
> From all I saw, the plugins listing ranges a few dimensions, not just a small
> factor, higher than anything else, including the relative commonality that UA
> strings represent.

It's quite possible that we are getting towards a world where a sufficient Web experience require only one plug-in (Flash Player) and even that one can be put behind a click-to-play trigger. (iOS is already selling pretty well with no plug-ins.)

Therefore, it's not futile to work on non-plug-in fingerprintability issues. Also, entropy in the HTTP requests enables fully server-side fingerprinting. Fingerprinting methods that require sending code the the client side expose evidence of the fingerprinting activity meaning fingerprinters can be called on their activity.
(In reply to comment #12)
> It's quite possible that we are getting towards a world where a sufficient Web
> experience require only one plug-in (Flash Player)

"Getting towards"? I've had only the Flash plugin installed and no other plugins for years. The problem isn't the need for more than one plugin, it's having more than one plugin installed at all and plenty of people install way too much, frequently unknowingly.

It would be nice if all plugins were just click to play with the option to allow auto play only for some. If what's exposed through navigator.plugins was covered in this then that would pretty much fix most of the problem there.

> it's not futile to work on non-plug-in fingerprintability issues.

I agree, but the bigger issues here will be not plugin but extension related.

> Also, entropy in the HTTP requests enables fully server-side fingerprinting.

Good point.
(In reply to comment #13)
> I've had only the Flash plugin installed and no other plugins for years.

Then you are probably a singularity. Most people have a list of at least 4-5 plugins installed - but then, let's leave the discussion of details around plugins stuff to the other bug that is about those.
I just wanted to mention that navigator-plugins hands out a lot more fingerprinting information than any of our normal UA strings.
Besides Flash, common plug-ins include Java and Adobe Acrobat.  I also have Real Player, which is not unusual.  

The real source of UA litter comes from Gecko-specific add-ons.  If an add-on extends the UA string, it should also provide an option not to do so.  Mnenhy (at least older versions) used a preference variable to suppress the adding of its name and version to the UA string.  

Limiting the UA string to "Firefox" (per the Description and comment #4) would require non-Firefox browsers (e.g., SeaMonkey) to have a license to use that name even though the product is not Firefox.  Before that change is made, the Mozilla organization (foundation and corporation) should officially agree to such usage.
SeaMonkey should not claim to be Firefox. They're just going to have to live with an extra bit of fingerprintablity there.

By the way, AMO is going to ban all non-critical UA modifications by all addons they host. This won't get those that aren't on AMO, but it's a start.

Oh and yes, I'm aware most people have more plugins than I do. Most people are on Windows which has all sorts of crap by default. The annoying thing is that *not* having a bunch of common junk plugins makes me highly fingerprintable. ;)
Maybe Gecko shouldn't provide the ability to append junk to the UA string in the first place?
(In reply to comment #17)
> Maybe Gecko shouldn't provide the ability to append junk to the UA string in
> the first place?

Yes, that's been suggested, at least for the HTTP request UA. IE9 will do this:
http://blogs.msdn.com/b/ie/archive/2010/03/23/introducing-ie9-s-user-agent-string.aspx

I don't know of a bug for that yet. It's a good idea that we should consider.

To summarize work here thus far, as the bugs are getting messy:
FIXED: bug 572668 & bug 572656
likely to be FIXED: bug 572652
hopefully to be FIXED: bug 566434
still up for debate: bug 572665 & bug 572661
probable WONTFIX: bug 572659
almost certain WONTFIX: bug 572667

Some of the discussions are trailing off in those bugs. Who owns things and can make an official WONTFIX decision in those that need it?
Depends on: 581008
Thanks for the list, that's a useful summary.

(In reply to comment #16)
> SeaMonkey should not claim to be Firefox. They're just going to have to live
> with an extra bit of fingerprintablity there.

Unfortunately, many SeaMonkey users have to spoof their UA string to get certain sites working which are doing poor browser sniffing. Following bug 581008 (not on your list yet), removal of the general.useragent.extra.* will prevent such necessary spoofing. I agree with comment #1 that such mechanisms MUST NOT be removed until a way is found to prevent loss of functionality.
IMHO, any larger default UA change can only be reasonably done when we have a mechanism for site-specific spoofing (like e.g. Opera has) with a notification to the user, at least via page info, possibly more. As long as we don't have such a mechanism, any default UA change will break large sites on the web without us being able to even temporarily mitigate that.
While I can live with some changes (but not with others) proposed in these bugs, KaiRo exactly hits the nail on its head - browser sniffing *will* be broken, and users *will* blame it on the updated browser and not on the site
if it doesn't work. That's the one thing more than certain to happen.

As a reference, SeaMonkey bug 387416 suggests such a spoofer which other browsers already have for considerable time (including Konqueror, in case you need a site on Linux which insists on using Safari). Driving these efforts forward before implementing more of these changes indeed should be the more prudent way to proceed first rather than modifying ancient string definitions.
Depends on: 527886
(In reply to comment #21)
> While I can live with some changes (but not with others) proposed in these
> bugs, KaiRo exactly hits the nail on its head - browser sniffing *will* be
> broken, and users *will* blame it on the updated browser and not on the site
> if it doesn't work. That's the one thing more than certain to happen.

IE is changing their UA drastically, which will have the same effect for them. That's why this is such a unique opportunity; there isn't competitive disadvantage if both browsers change UA at approximately the same time, and it will incent sites with broken sniffing to fix themselves. 

Having simpler, shorter UA strings will also reduce the chance of future attempts at sniffing being broken.
(In reply to comment #22)
> IE is changing their UA drastically, which will have the same effect for them.

And I bet they will introduce a compatibility shim like Opera and Konqueror have had for a significant time and as I have suggested in bug 387416. I don't think any major browser can do a major UA string change without such a mechanism.
Mark, how do you support your claim that MSIE's string changes "drastically"? Here the list of changes from the MSDN link in comment #18:

> IE9 will no longer send additions to the UA string made by other software
> installed on the machine such as .NET and many others.
> [...]
> There are four changes to IE8’s UA string that site developers need to be aware of:
> 
>  1. Application version is incremented from ‘Mozilla/4.0’ to ‘Mozilla/5.0’ to match other browsers [...]
>  2. Version token is incremented from ‘MSIE 8.0’ to ‘MSIE 9.0’.
>  3. Trident token is incremented from ‘Trident/4.0’ to ‘Trident/5.0’. 
> [...]
> Similar to IE8, IE9’s Compatibility View will map to IE7 Standards Mode,

These changes are much less substantial than what's proposed for the Gecko-2.0 user-agent string, mainly a change in numbers and omission of the additional tokens also proposed in bug 581008. Scripts should still work with an MSIE9 UA.
(In reply to comment #24)
> These changes are much less substantial than what's proposed for the Gecko-2.0
> user-agent string, mainly a change in numbers and omission of the additional
> tokens also proposed in bug 581008. Scripts should still work with an MSIE9 UA.

The variability of Firefox's UA string could be reduced without breaking existing sniffers:
 * Freezing the Gecko date to today's date wouldn't break sniffers any more than building Gecko today would.
 * Omitting the Gecko patch level wouldn't break sniffers any more than the first release from a given relbranch breaks sniffers.
 * Making the UA string of nightlies and un-branded relbranch builds say "Firefox" wouldn't break sniffers any more than releases builds do.
(In reply to comment #25)
>  * Freezing the Gecko date to today's date wouldn't break sniffers any more
> than building Gecko today would.

Either remove it or keep it. I don't think freezing to an artificial date is a good idea.
This discussion is in bug 572661 - both removing it and especially moving the Gecko/rv: specifier in its place are alterations which *would* break sites.
(In reply to comment #24)
> Mark, how do you support your claim that MSIE's string changes "drastically"?

I don't think that anything I suggested in bug 577994 is more drastic than what MS has done.
/sigh/ yes, it is. Removing or swapping n tokens in the UA string will break m sites expecting that token to appear in a specific context, and that site will either not behave correctly or refuse to talk to you at all. The user says
"damn new browser version, let's try something else", it's as simple as that.
As also pointed out, MSIE has a compatibility UA setting since 7.0 at least, modifications from 8.0 to 9.0 affect numbers, not tokens or their location.

This is going in cycles, arguments are not getting any more or less valid by constantly repeating them.
Introduce a compat mode where Firefox sends the UA in old format and send the new format by default? (bit like wt IE did for IE8, except that affects rendering and this affects javascript)
Depends on: 581783
(In reply to comment #30)
> Introduce a compat mode where Firefox sends the UA in old format and send the
> new format by default? (bit like wt IE did for IE8, except that affects
> rendering and this affects javascript)

The problem is that the most interesting changes like sending Gecko/$version instead of rv:$version and Gecko/$builddate likely affect numerous sniffers including whole libraries used by numerous sites, so this isn't a matter of just whitelisting a few domains to get the old format.

Many less controversial changes like those already made won't need a compat mode anyway.
(In reply to comment #31)
> The problem is that the most interesting changes like sending Gecko/$version
> instead of rv:$version and Gecko/$builddate likely affect numerous sniffers
> including whole libraries used by numerous sites, so this isn't a matter of
> just whitelisting a few domains to get the old format.

A good, maintained-with-crowdsourcing and updated-from-server (similar to safebrowsing) list for targeted spoofing could very well work, esp. if there is an add-on (or even builtin functionality) that allows interested users to easily take part in that crowdsourcing effort. I've been proposing that for a long time, currently it's blocked by core code not allowing a per-site override of navigator.userAgent, but it's possible and it can help making the (UA and sniffing) world better - it just needs people to actually work on it instead of arguing about how we are stuck in UA-getting-worse world.
Here is a suggestion that may please everyone.

What if private browsing were the special case? Maybe send a very, very minimal UA string when in private browsing mode and a normal one otherwise? This should keep the people who think their privacy is at stake happy and let everyone else go on without breaking websites.

So how about making something like that an option when using private browsing?
(In reply to comment #33)
> What if private browsing were the special case? Maybe send a very, very minimal
> UA string when in private browsing mode and a normal one otherwise? This should
> keep the people who think their privacy is at stake happy and let everyone else
> go on without breaking websites.

There may be a few cases where this would make sense, but in general we don't want to break sites in private browsing, so compatibility concerns do apply. Also note that the desire to get rid of cruft like the gecko build date has almost nothing to do with privacy.
(In reply to comment #34)
> There may be a few cases where this would make sense, but in general we don't
> want to break sites in private browsing, so compatibility concerns do apply.
> Also note that the desire to get rid of cruft like the gecko build date has
> almost nothing to do with privacy.

People using private browsing are the ones most likely to be the ones that are concerned about themselves being tracked, does it not therefore make sense to minimalize the impact of such changes to those who want them?

I am aware that the removal of build dates and so on is also about reducing the information sent and so on, but those things are useful in a lot of cases and removing them is likely to cause more breakage.

Either way, I think some decisions need to be made here so everyone knows in which direction this is going. The sooner it is decided the more time it will give for sites to update their code to the upcoming changes - adding this at the last minute will cause breakages.
(In reply to comment #35)
> People using private browsing are the ones most likely to be the ones that are
> concerned about themselves being tracked, does it not therefore make sense to
> minimalize the impact of such changes to those who want them?

Private browsing does not prevent tracking from websites in any way.
We also need private browsing to actually work. Nothing is won if those who want to can't use private browsing because it breaks their favorite sites.
Right. And also, I don't think it's positive for private browsing if sites can detect that you use it by seeing you having a different UA string. ;-)
(In reply to comment #31)
> The problem is that the most interesting changes like sending Gecko/$version
> instead of rv:$version and Gecko/$builddate likely affect numerous sniffers
> including whole libraries used by numerous sites, so this isn't a matter of
> just whitelisting a few domains to get the old format.

The solution to that (as I argued earlier in the newsgroup) is making breaking changes in Gecko 2 betas and having the MoCo evangelism team evangelize the heck out of the world.  Tell everyone it's going to break, start breaking it, and tell them again it's been broken and they have to fix their sites/libraries or they break their site in Firefox 4.

Nothing's going to work unless changes get heavily evangelized with lead time, and if that doesn't happen, then all we've done is wasted a bunch of time and electrons with no benefits; users still lose, the web still loses.
sounds like we need a drumbeat effort.
If we went through with this, it would really be a great opportunity to try to standardize the user-agent and recommend a standard way to sniff it that shouldn't break in the future.
Yeah, and if all the UA changes were in FF4, that would be smarter, instead of across multiple releases.  If such efforts would be worthwhile, then maybe it might make sense to setup a bugzilla just for website evangelism/UA sniffing bugs if only to keep the efforts organized and/or separate from real bugs.  It would work by allowing users to report broken websites, but it would publish all reports to this other bugzilla.. that way we all benefit.
Depends on: 414057
Depends on: 582421
I think we're going to want to update this page now:
https://developer.mozilla.org/En/Browser_Detection_and_Cross_Browser_Support
(In reply to comment #43)
> I think we're going to want to update this page now:
> https://developer.mozilla.org/En/Browser_Detection_and_Cross_Browser_Support

I believe there's an existing bug for that somewhere; I've had some discussions in the past with MoCo people about how terribly outdated it is.
Wouldn't this be a terrific opportunity to standardize the UA strings. One common and useful way, with no redundant information. 
With every big browser rapidly developing new versions lately and cooperation between all browsers very big, this could be a joined change. That way no browser will be left for another when a site doesn't work because of bad UA detection.
Depends on: 586165
Depends on: 584683
Depends on: 591537
Depends on: 591573
I second jerone's comment; we should take this opportunity to propose something that could be adopted by all current and future UA's. With the push for standards adoption lately, this could be a "now or never" moment.
Depends on: 630357
WebKit took on a similar effort - https://bugs.webkit.org/show_bug.cgi?id=54556
Firefox 4.0b12:

HTTP_ACCEPT Headers:
	
bits of identifying information: 8.65

one in x browsers have this value: 400.77

value: text/html, */* ISO-8859-1,utf-8;q=0.7,*;q=0.7 gzip, deflate en-us,en;q=0.5

--------------------

Chrome 9.0:

HTTP_ACCEPT Headers:
	
bits of identifying information: 4.05

one in x browsers have this value: 16.56

value: text/html, */* ISO-8859-1,utf-8;q=0.7,*;q=0.3 gzip,deflate,sdch en-US,en;q=0.8

----

Can someone explain in laymen terms why FF 4.0 needs (or has) a different value for HTTP_ACCEPT Headers ?

Thanks.
Firefox 4 doesn't send the same accept headers as Firefox 3.6. See bug 576033, for instance. And since Firefox 4 isn't released yet, it's less widespread.
Just thought to cc. regarding Bug 566423 - Consider standardizing/normalizing navigator.plugins (browser fingerprinting) 

The paper said:

"The obvious solution to this problem would be to make the version numbers
less precise. Why report Java 1.6.0 17 rather than just Java 1.6, or DivX Web
Player 1.4.0.233 rather than just DivX Web Player 1.4?"

Comment by Steffen Wilberg 2010-08-04 09:02:05 PDT 

"The plugins check will be incorporated into the Addons manager, see
https://wiki.mozilla.org/Firefox/Projects/Extension_Manager_Redesign/design#Plugin_Updates.
Once that happens, we don't need need to expose full plugin version information
to web content just because of the plugins check."

Hopefully this gets pushed into Firefox 4.0
(In reply to comment #50)
> Hopefully this gets pushed into Firefox 4.0

I think you mean 5.0. Firefox 4.0 is essentially done, save for any major bugs they find during the RC cycle.
(In reply to comment #51)
> (In reply to comment #50)
> > Hopefully this gets pushed into Firefox 4.0
> 
> I think you mean 5.0. Firefox 4.0 is essentially done, save for any major bugs
> they find during the RC cycle.

Update on:  Bug 566423 - Consider standardizing/normalizing
navigator.plugins (browser fingerprinting) 

now depends upon

Bug 613305 - Integrate Plugin Check with installed plugins in the add-ons manager 

and 

Bug 602795 - Add-ons manager should find and install updates for plugins 

Then we won't need to we don't need need to expose full plugin version information to web content just because of the plugins check.

But from what I understand, Firefox 5.0 is also done :(
blah! I need to proof read before submitting:

*Then we won't need to expose full plugin version information to web content just because of the plugins check.*
Depends on: 643352
Depends on: 669814
Depends on: 697383
It appears that bug #718066 -- in its use of UUID -- conflicts with the philosophy of this bug.
Making explicit that both can be implemented independently. Sending a UUID just to mozilla can be done while reducing general entropy (now, sending such UUID to any website would be really worrying). Philisophically, you're completely right, and I think there are good reasons listed in bug 718066, but if such UUIDs should be added is to be discussed there.
Depends on: 728582
Depends on: 728585
Depends on: 728831
Depends on: 728888
Depends on: 728894
Depends on: 728952
Depends on: 588909
Depends on: 729089
Depends on: 736373
Depends on: 757726
Depends on: 765048
Depends on: 588913
Depends on: 793978
Depends on: 799899
With Firefox 17 now released and bug 588909 out there, all of the main goals in the realm of the UA string and HTTP request have been more or less achieved. There's still some stuff to clean up like the "rv:" token, and probably always will be some lingering minor annoyances like the "Mozilla/5.0" nonsense (which is at least partially sensical for Mozilla), but things are much better than they used to be.

Here's an updated list of the different browsers' HTTP requests to compare against attachment 458163 [details] from a couple years ago. This is all five major browsers' latest stable versions to the best that it can be done on Windows 7. Latest Safari no longer supports Windows so I've tested with the last Windows version, which is one major update back. I did not test Internet Explorer 10, which is just available as a preview on Windows 7 at the moment.
Attachment #458163 - Attachment description: data from current versions of 5 main browsers (Firefox, Safari, Chrome, Opera, IE) → data from 2010 versions of 5 main browsers (Firefox, Safari, Chrome, Opera, IE)
Here's the list of changes over the past couple years. Changes listed with a "*" bring all 5 browsers into the same de facto standard. All vendors have made some improvements and there is a lot more cross-browser consistency now.

Changes done by Firefox, Safari, Chrome, & Opera:
* "U;" dropped
* locale dropped from UA

Changes done by Firefox, Safari, & Chrome:
* redundant "Windows;" dropped

Changes done by Firefox & Internet Explorer:
* "Accept-Language" dialect code capitalized

Changes done by Firefox & Opera:
"Accept-Charset" line dropped (Chrome is the only one who still uses it)

Changes done by just Firefox:
minor versioning dropped (e.g. x.y instead of x.y.z)
* build date dropped
* "Keep-Alive" line dropped (everyone just lists "Connection: keep-alive", though Opera and IE capitalize it differently)

Changes done by Safari, Chrome, & Internet Explorer:
the "Accept" line has been simplified; Safari and Chrome now match Firefox exactly and IE is now much shorter (Opera is the only one who still lists image types here)

Changes done by just Opera:
"Accept-Encoding" line dropped the "x-gzip, identity, *;q=0" part; everyone now lists "gzip, deflate" with the exception of Chrome which also has "sdch" (and no spaces after the commas)
Using the panopticlick.eff.org test with the current 17.0 release build, I still see all my plugins and fonts listed, which seems to be the most unique data.

Isn't that a good place to start?
plugins is addressed in bug 757726
(In reply to bugzilla from comment #60)
> Using the panopticlick.eff.org test with the current 17.0 release build, I
> still see all my plugins and fonts listed

Fonts are the fault of Flash and Java. To help with that, enable click-to-play for all plugins. (set plugins.click_to_play to true in about:config) At this point you should also just disable the Java plugin entirely if you have it installed, as nobody should be using it on any websites anymore and it's more trouble than it's worth.

This meta-bug specifically covers HTTP requests, including the UA string. With respect to fingerprinting, this is for entropy for server-side fingerprinting, which is a more important issue than client-side fingerprinting as server-side fingerprinting can be done more easily and without being detectable. Plugins are the primary source of client-side entropy and work on that is in other bugs (though there's only so much that can be done for that).
To clarify, though, most of the different changes here have reduced the size and complexity of the header and increased cross-browser consistency. The changes that reduced the header entropy were the banning of addons mucking with the UA and the dropping of minor versioning (including the build date) and locale in the UA.
Depends on: 817450
Could perhaps a Terse mode be added?
Currently I have set: general.useragent.override = Firefox/17.0
With Chrome I did the same with the command line --user-agent="Chrome/23.0"
With Opera this is a lot easier as it has actual settings you can reach.
IE9 is a real pain as registry editing is needed and it still leaves Trident/5.0

My goal is to have the smallest possible user agent string but still allow important web browser stats to be collected.

Currently my browsers pass the following user agent string to web servers:
Firefox/17.0
Opera/9.28
Chrome/23.0
Internet Explorer/9.0 ( ; ; ; Trident/5.0)

IE being the bad boy here, but otherwise this is the minimal amount that stats sites need to track if someone is using Firefox 17.0 or 17.1 or 18.0 etc.
All HTML5 gen browsers have made the underlying OS and even the html rendering engine info redundant.

Chrome is by far the easiest to change when a new browser release/version is installed.
Ideally this would be automatic and I could just toggle a Terse(Privacy)/Normal/Verbose(Debug) option.

Normal should be as it is today. But if someone chooses "Terse" (or High privacy elsewhere in settings) then the minimalistic example I showed should be used.
namely: Client Name/version.revision
Verbose is intended for debugging, either by webauthors or browser devs (in house automated build testing etc.)

If the browser(s) has a incognito mode then the user agent string should not be sent at all.
(In reply to Roger Hågensen from comment #64)
I think you may be misunderstanding the goal here. You or any small group of people (aka not everyone) changing your UA to something minimalistic does not reduce your ability to be tracked; it increases it significantly. You'll be the only ones a site sees with the reduced UA making you perfectly trackable. If you want to reduce the ability to be tracked via the UA, you need to make it the same as the largest group of people possible. Your best bet is to use the UA for latest stable Firefox on Windows 7 as it will have the largest single population. Spoofing to latest stable IE on Windows 7 might sound like a good idea, but the rest of the HTTP header won't be spoofed to look like IE so it'll actually increase your fingerprintability as well.

We've already gotten rid of the vast majority of the problems in the UA, so it's pretty much as good as it can get on this front.

With respect to an incognito mode (called private browsing in Firefox), no, you don't want to hide the UA when it's on. Not only would it break things but it would essentially be announcing to the site that you're in private browsing mode.
(In reply to Dave Garrett from comment #65)
> (In reply to Roger Hågensen from comment #64)
> I think you may be misunderstanding the goal here. You or any small group of
> people (aka not everyone) changing your UA to something minimalistic does
> not reduce your ability to be tracked; it increases it significantly. You'll
> be the only ones a site sees with the reduced UA making you perfectly trackable.

Actually, you seem to not understand what I wrote. The tracking was in context of no user agent in the http header at all. (aka, a "" blank browser).


> If you want to reduce the ability to be tracked via the UA, you
> need to make it the same as the largest group of people possible. Your best
> bet is to use the UA for latest stable Firefox on Windows 7 as it will have
> the largest single population. Spoofing to latest stable IE on Windows 7
> might sound like a good idea, but the rest of the HTTP header won't be
> spoofed to look like IE so it'll actually increase your fingerprintability
> as well.

Again you are misunderstanding (did you read any of it at all?)
Something like an option to show only: Firefox/17.0
Would be the minimal possible will still making it possible to track browser trends. (for all those stats trackers out there)

Another benefit I did not state, but which was stated by others previously in this thread is that it reduces the footprint (not fingerprint) of the user agent in the http header.
Even if the string is shorter now it's still longer than some of the cookies the browser sends in the http header.

> We've already gotten rid of the vast majority of the problems in the UA, so
> it's pretty much as good as it can get on this front.

Are you sure? You can get more optimized than just showing Firefox/17.0

> With respect to an incognito mode (called private browsing in Firefox), no,
> you don't want to hide the UA when it's on. Not only would it break things
> but it would essentially be announcing to the site that you're in private
> browsing mode.

So now you are deciding how I should browse the net?

I don't understand your negatory stance on this.
A simple dropdown button with 4 choices:
[User Agent ID]
None
Terse
Full
Debug

And with Full being the default.
The dropdown button could easily be put in the new "Tracking" GUI group in the Privacy tab in the options window.
Alternatively tuck it away under the advanced tab somewhere.
Or even a "hidden" option in about:config with values of 0,1,2,3 with default value being 2.

Try and it see how users adopt it (and what they choose), if it bombs then remove the option (if it is added for Firefox 18 that is) in Firefox 19 for example.

I've only run into a single site reacting to Firefox/17.0
Yahoo mail seems to choke and tells me (ironically) to get the latest version of firefox or latest of other browsers heh...

This just shows that Yahoo is not following the HTML(5)/CSS/JS "standard" of testing for features instead.

Allowing users to change the verbosity of the user agent id, or even turn it off fully will encourage web developers to develop for features instead.

>With respect to an incognito mode (called private browsing in Firefox), no, you >don't want to hide the UA when it's on. Not only would it break things but it >would essentially be announcing to the site that you're in private browsing mode.
Just as using a proxy does the same, proxies do have the ability to modify or strip the user agent from the http header. There are plugins that do the same. Or even Antivirus/Firewalls that do this.

And again, the incognito was just a sideremark. I simply want the UA string as short as possible. The name of the browser plus the version: Firefox/x.x
This is the shortest it can be while still adhering to the RFC standard on the user agent string for HTTP headers.

Compared with the current string, the terse one saves 60 bytes.
I don't know how many times I loads pages/sites and how many requests my browser does on average per day. But considering the string is sent each time you load a page, and for each css and js and image on that page. Those 60 bytes will actually stack up.

As a web developer who run a site I'd rather have at least the "option" of reducing my virtual carbon footprint as much as possible.

And as other have pointed out in this thread, a http request to a server will more often (than currently) squeeze into a single TCP packet rather than two. (cookies is still the beast here, but those are the website's fault and not the browser at least)

****************
GET / HTTP/1.1
Host: www.example.com
User-Agent: Firefox/17.0
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: en-US,en;q=0.5
Accept-Encoding: gzip, deflate
Connection: keep-alive
Referer: http://www.example.com/
************************

Isn't that nice and pretty? The 60 bytes saved could allow many cookies to sneak under the single TCP packet limit. Any requests that are "just a few bytes over" would go from 2 packets to 1 packet per request, that's a saving of up to 50% (dependent on http header compression obviously)

The user agent has only statistical interest to me.
If you ignore for the case of the argument that UA strings can be spoofed,
then the only important part of the UA is the brand and version.

The brand so browser market can be (gu)estimated, the version so that version migration van be seen. (like with Chrome where a version seems to live just for 3 months until it dies out, Chrome users are thus rarely more than 3 months out of date on latest version)

The rest like OS and CPU is actually misleading as a virtual machine may be running Firefox in a Linux VM on a Windows 7 machine.
The site/stats tracker then thinks the OS of the user is Linux (while it's actually Windows) and unless the browser ads (VM Windows 7) or similar then that stat is worthless.

What is not worthless is the browser brand and version, after all that is actually what is being used to surf the site.

Ignoring spoofed UA's obviously, but spoofing is done for particular reasons (i.e. compatibility with anal choices *cough* yahoo mail UA string dependency *cough*). No browsers actually ship with a spoofed UA. (that would hand "market" to the competition after all)

The fact that "Mozilla/5.0" is still in the UA and that "(compatible;)" is also still used is just wastefull.

Brand and version for terse option, and for full is what it is today. and Debug with extra info (like extensions, 3rd party UA appended stuff etc. All that old mess).
In the future maybe the default could be set to Terse instead of full.

My suggested idea could resolve this "bug", give more choice to users (and devs) and add more tools for devs (Debug UA), help reduce virtual footprint (not to be confused with browser fingerprinting)

The UA is per the RFC only ever intended for statistics or to handle agents known to have serious standards issues/bugs (on a HTTP level rather than HTML level).

But this is turning into a ranty essay so I'd better stop typing.
(In reply to Roger Hågensen from comment #66)
Wow; wall of text. I'm just not going to attempt to reply in fully detail. None of that belongs in this meta-bug anyway.

Your confusion as to why I assumed that because you wanted the "smallest possible user agent string but still allow important web browser stats to be collected" meant you wanted to drop the ability to track the rest of it (namely the OS/platform) is because that's generally what these bugs here focused on. If you're going off on a tangent focusing just on raw header size, not entropy or bad sniffing surface, that's not what the vast majority of this has been about.

The simple truth is that bug 588909 proved that even minor housekeeping changes to the UA can "Break The Web" because so many sites use old bits of code that are never updated. Those 60 bytes are miles away from being worth that mess on a grand scale. Adding any option to make it slightly easier to do what you're already doing would be wasted effort and there's really no point.

If what you want is smaller headers, what you want is header compression in SPDY or better yet HTTP 2.0. My hope is that maybe the whole UA string could be axed for HTTP 2.0 and have separate field for each bit of information, though who knows if that will even be tackled (though someone filed a bug to that affect somewhere).

Please, this is not a productive discussion and it doesn't really need to be here. Your suggestion belongs in its own bug anyway.
(In reply to Dave Garrett from comment #67)

Sorry about that WOT.

> Please, this is not a productive discussion and it doesn't really need to be
> here. Your suggestion belongs in its own bug anyway.

Oh? What about point 2 in the original post then? As far as I can see it is relevant. The last size reduction of the user agent shaved off what, 5-10 bytes?

Also, if I file a new report somebody will flag it as duplicate of this one due to point 1, 2 and 3 in the original post/report here right?

So I'll modify and simplify my suggestion to this then:
Add general.useragent.stringmode to about:config
supported values are 0,1,2
0=no user agent string
1=Default (how it currently is)
2=Terse (like Firefox/17.0 and nothing else than that)

My suggestion also fits with point 1 and 3 now in the original post as any OS fingerprinting/attack will be more difficult (only browser version fingerprinting). Maybe later a mode 3 can be added that is only the browser brand

This will allow advanced users like me and the original poster to choose themselves. (Ideally Mozilla should copy what Opera does, allowing per site user agent tweaks etc.)

No UI changes needed, just a few lines of code in the appropriate place. (where version is fetched for the UA string before sending for example)

Give me a good reason why the suggestion/idea in this post is bad (the time/lines of code needed to implement this does not qualify as a reason against the idea obviously, if that was the case then Firefox would not exist at all.)

Please note that this post's suggestion retains the wishes of the original poster while also retaining your wishes to not mess with the normal/default, it is not fully optional, and no UI changes needed.
(In reply to Roger Hågensen from comment #68)

> it is not fully optional, and no UI changes needed.

it is fully optional, and no UI changes needed.
I think you've misunderstood what fingerprinting is. Fingerprinting is enabled when the user's information is more unique. If a terse version was behind a pref, so few people would use it, it could be used to uniquely identify u, called finger printing. 
This bug is about reducing entropy (variation) to reduce finger printing, so all changes should reduce variations for all users, not create new ones. 
Your short version.is a good idea,  can't be used globally because of the vaSt amount of sites that will fail to work (statistics were collected by the mobile team before, and it was not pretty).
(In reply to Roger Hågensen from comment #68)
> (In reply to Dave Garrett from comment #67)
 
> > Please, this is not a productive discussion and it doesn't really need to be
> > here. Your suggestion belongs in its own bug anyway.
> 
> Oh? What about point 2 in the original post then? As far as I can see it is
> relevant. The last size reduction of the user agent shaved off what, 5-10
> bytes?

This was reported in June 2010. Meanwhile we have seen here it's very hard to do without breaking the web. Size reducing would be a very seasonable side effect but hardly worth the trouble.

> Also, if I file a new report somebody will flag it as duplicate of this one
> due to point 1, 2 and 3 in the original post/report here right?

No, this is a meta bug to track all of it depending bugs. So your bug can't be a dup of this but might have a dependency.

> So I'll modify and simplify my suggestion to this then:
> Add general.useragent.stringmode to about:config
> supported values are 0,1,2
> 0=no user agent string
> 1=Default (how it currently is)
> 2=Terse (like Firefox/17.0 and nothing else than that)

...

> Give me a good reason why the suggestion/idea in this post is bad

File a new bug, make it concise, read
https://bugzilla.mozilla.org/page.cgi?id=etiquette.html
and everything is fine!
(In reply to Roger Hågensen from comment #68)
> My suggestion also fits with point 1 and 3 now in the original post as any
> OS fingerprinting/attack will be more difficult (only browser version
> fingerprinting).

Please re-read comment 65. As I already tried to explain, that's not how things work. Removing the OS/platform just for you or a small group of people drastically increases fingerprinting, not reduces.

This entire project's discussion has been hindered on more than one occasion because some people simply don't understand what entropy is. Wikipedia isn't really concise on this topic, so I'll try to explain it quickly here. Entropy is disorder. It's a property of thermodynamics and applied to computer science entropy is disorder in data, or more aptly described as uniqueness here. The more entropy/disorder/uniqueness in the data in the HTTP header, the more fingerprintable it is because it is more possible to look at the header and see that there are fewer people with that header, and thus more certainty that when you see it again it's the same person or small group of people.

Without trying to explain everything, the bottom line is that the only way to reduce entropy is to do it for everyone at the same time. It cannot be an option, because using that option itself introduces entropy. The most useful way to put it is that the only way to hide here is in a crowd.

This topic sounds esoteric but it is quantifiable math that pops up in computer science from time to time. Another area where entropy is a big deal is encryption and also passwords.

This is a meta bug; frankly, no discussion belongs here, but this is often a lost cause. If you're going to continue to suggest things on this topic, of any kind, do so in a new bug. We don't need to keep emailing over 100 people with this discussion.
(In reply to henry.fai.hang.chan from comment #70)

I was going to respond to your patronizing reply, but as soon as I saw you use "u" instead of "you" I decided that it's not even worth the effort beyond this sentence!
(In reply to Dave Garrett from comment #72)
> (In reply to Roger Hågensen from comment #68)
> > My suggestion also fits with point 1 and 3 now in the original post as any
> > OS fingerprinting/attack will be more difficult (only browser version
> > fingerprinting).
> 
...
> This entire project's discussion has been hindered on more than one occasion
> because some people simply don't understand what entropy is.

Please stop being patronizing, I know damn well what entropy, I've even written random number a generator, which involved gathering entropy from the underlying OS.

I did not go by the bug "title" (as those can be misleading), I went by the first post by Henri Sivonen, and what I suggested in comment 68 is compatible with his wishes (I hope, only he can judge if that is a acceptable solution for him or not).

But if the title is so important I'll post another bug, with different title but very similar to Mr Sivonen's points. I can't believe how messy people have made such a simple thing.


(In reply to j.j. (inactive in 2012) from comment #71)
> > Also, if I file a new report somebody will flag it as duplicate of this one
> > due to point 1, 2 and 3 in the original post/report here right?
> 
> No, this is a meta bug to track all of it depending bugs. So your bug can't
> be a dup of this but might have a dependency.

Only Mr. Garret has mentioned that this is a "meta bug", I see now there is a tiny "meta" in the keywords list. The primary post need to be updated to reflect this. If "meta" bugs are so special then then should be much more recognizable and easily distinguishable from a "regular" bug.

When Mr Garret used the word "meta" I thought he meant entropy "sources" in a odd way, it's not until your comment that I realized that "meta" referred to this bug instead.
 
> > So I'll modify and simplify my suggestion to this then:
> > Add general.useragent.stringmode to about:config
> > supported values are 0,1,2
> > 0=no user agent string
> > 1=Default (how it currently is)
> > 2=Terse (like Firefox/17.0 and nothing else than that)
> 
> ...
> 
> > Give me a good reason why the suggestion/idea in this post is bad
> 
> File a new bug, make it concise, read
> https://bugzilla.mozilla.org/page.cgi?id=etiquette.html
> and everything is fine!

Although I'd rather have had a link to an explanation of what a meta bug is and how to handle them on bugzilla rather than a etiquette link (if I've broken any of those rules, please point that out instead).

I can clearly not be the first nor last to read the subject/title, then read the actual post and see the 3 points, comment about it and then get smacked in the face with a "BRRR! Your wrong buzzer!) just because it's a meta bug (something I still need to look up).
If one can not make suggestions or comment/argument about meta bugs then shouldn't this bug be locked instead?

I'm not trying to tell the mods here how to run things, but maybe do some alteration to how meta bugs are handled, as the thread for this bug is now becoming unnecessarily long. This should have been "stopped" after the first dozen comments.

PS! Thanks to J.J. for treating me with at least some courtesy.
(In reply to Roger Hågensen from comment #74)
This is not a forum; this is a bug tracker. There is no way to edit comment 0 or any other comment, nor is there a way to lock bugs from new commenters (though I would like that ability). If you continue to rant in this bug, however, you could be banned. Please drop the reactionary language and stop posting large rants to this bug which are emailed to over 100 people each time.

You got me to sit down and take the time to explain things to you in comment 72 because of the portion of your comment which I quoted there. I try to be nice and explain things to avoid long tangents like this cluttering up bugs. It doesn't seem to work as often as it should, apparently. (I swear it does work sometimes)

You have been told multiple times by two different people that this does not belong here. Nobody blames you for not learning the logistics of meta bugs right away, but you are expected to move to a new bug when told to do so. Stop posting in this bug.
I must apologize I am using mobile where typing u is much easier than you with my fat fingers. In all due respect Firefox /17.0 is a good idea, but for reasons above, please move to a new bug or post in the newsgroup where discussion belongs.
Re comment #68 and other comments suggesting "No UA string":

This is contrary to RFC 1945 (Section 10.15).  While the RFC does not mandate a UA string, it does indicate why it is desired.  

When I find that my Web site has been accessed by a UA with a blank UA string, I consider it abusive and potentially malicious.  I send a message to the owner of the source's IP address requesting action against the perpetrator.
Mr. Chan, please accept my apologies as well, I came across harsher than intended with my knee-jerk reaction to "SMS speak", this was not intended and was without any form of malice.

Note: I un-subscribed from this bug a few days ago, it is just by chance I came across this again. And please also note that it is highly unlikely I'll read this bug (and it's comments) any time in the near future (as I am not subscribed to this bug).

(In reply to David E. Ross from comment #77)
> When I find that my Web site has been accessed by a UA with a blank UA
> string, I consider it abusive and potentially malicious.  I send a message
> to the owner of the source's IP address requesting action against the
> perpetrator.

I do not take kindly to threats Mr. Ross!
Whiteboard: [fingerprinting]
If reducing the UserAgent to "Firefox 23" would break many old websites, why not change the UserAgent to

Mozilla/5.0 (Windows NT 6.1; rv:23.0) Gecko/20100101 Firefox/23.0

irrespective of operating system.
> Mozilla/5.0 (Windows NT 6.1; rv:23.0) Gecko/20100101 Firefox/23.0
> irrespective of operating system.

Would you keep "NT 6.1" in all eternity then?
To avoid some pain, I for one would drop any new/upcoming OS from the UA string and just keeping existing ones.
Software download sites often sniff for the appropriate OS and version. They may serve an old version or for the wrong OS if the OS field remains static or goes away with newer OSs.
http://www.mozilla.org/ being an example of such a site. ;-)
(In reply to rsx11m from comment #82)
> http://www.mozilla.org/ being an example of such a site. ;-)

Yes, there are a number of software-download sites that need the OS in the UA string so that they can select the software version that works with the user's OS.  One of them is the Java site.  I know that Java is often derided for having vulnerabilities, but it is required for some applications.
The OP wrote about https://panopticlick.eff.org/, the website

https://ip-check.info/

with a description of it at https://ip-check.info/description.php is similar and highlights some interesting ways the browser can be tracked.
Depends on: 1054739
Those who wish to reduce fingerprinting might consider installing the Secret Agent extension from <https://www.dephormation.org.uk/index.php?page=81>.  Instead of eliminating clues to your specific browser configuration, Secret Agent keeps changing the clues.  This even confuses some geolocation capabilities.  

However, I have found that there are Web sites that get so confused by Secret Agent that they cannot be visited.  One reason I have for submitting this comment is to point out that some fingerprinting might be necessary for smooth Web surfing and that there might be a limit on implementing this RFE, beyond which users could be harmed.  (Yes, this is an RFE and not a Normal bug because there is no detectable software error addressed here.)
Severity: normal → enhancement
This meta bug is basically done and we might even consider closing it. The plugins issue is still unresolved, but the HTTP request issues are all done save for a few UA nitpicks that we're probably not going to worry about for a while.
Depends on: 1090433
Depends on: 583181
Whiteboard: [fingerprinting] → [fingerprinting][necko-would-take]
Browser Plugin Details and System Fonts are still identifiable.

Firefox 50 Nightly plans to add an option to block enumeration of plugins. https://www.bestvpn.com/firefox-beef-privacy-tor-settings/

Fonts should be blocked as well.

Things like specific operating system details in the User Agent could be replaced with one particular OS details for desktop, and another for tablet (e.g. everyone is on 64-bit Windows 10, or Android).

Similarly, screen size could be either for desktop or tablet, instead of exact dimensions (e.g. everyone has the same size screen), etc.

That these things are still outstanding in this trivial to solve bug is ridiculous.

Thanks.
Depends on: 1313580
Might be related to this Tor patch: https://trac.torproject.org/projects/tor/ticket/5282
Priority: -- → P3
I'm going to propose we close this meta bug in favor of Bug 1329996.  We may want to open a separate bug about the randomized HTTP Pipeline.
See Also: → 1366202
(In reply to Tom Ritter [:tjr] from comment #90)
> I'm going to propose we close this meta bug in favor of Bug 1329996.  We may
> want to open a separate bug about the randomized HTTP Pipeline.

I filed bug 1366202.

But not sure if we can close this bug since there are still several open bugs/issues.
Bug 566434, 583181, 728952, 1090433
Bulk change to priority: https://bugzilla.mozilla.org/show_bug.cgi?id=1399258
Priority: P3 → P5
Depends on: 648186
Please, reopen bug #736373.
Whiteboard: [fingerprinting][necko-would-take] → [fingerprinting][necko-would-take][fp-triaged]
Summary: Reduce the amount of data and entropy sent out in HTTP requests → [meta] Reduce the amount of data and entropy sent out in HTTP requests
Depends on: 1556223
Severity: normal → S3
Depends on: 1861847
Depends on: 1873273
Depends on: 1609304
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: