Last Comment Bug 572656 - Remove the UI language from the UA string and navigator.appVersion
: Remove the UI language from the UA string and navigator.appVersion
Status: RESOLVED FIXED
[parity-IE]
: dev-doc-complete
Product: Core
Classification: Components
Component: Networking: HTTP (show other bugs)
: Trunk
: All All
: -- normal (vote)
: mozilla2.0b2
Assigned To: Dão Gottwald [:dao]
:
Mentors:
http://www.delorie.com:81/some/url.txt
Depends on: 602291
Blocks: 515171 http-fingerprint 55366 426517 586165
  Show dependency treegraph
 
Reported: 2010-06-17 04:18 PDT by Henri Sivonen (:hsivonen)
Modified: 2010-11-07 05:04 PST (History)
23 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
patch (4.06 KB, patch)
2010-07-18 23:48 PDT, Dão Gottwald [:dao]
bzbarsky: review+
Details | Diff | Review

Description Henri Sivonen (:hsivonen) 2010-06-17 04:18:50 PDT
(Note: This bug is less general than bug 55366, which is why I'm filing a new bug fully knowing about the existence of bug 55366.)

Steps to reproduce:
 1) Navigate to http://www.delorie.com:81/some/url.txt

Actual results:
The User-Agent header includes the language code for the active UI localization, which means configuration-based entropy is exposed and
can by used for fingerprinting. See https://panopticlick.eff.org/ The language code also bloats each HTTP request, typically by 7 bytes.

Furthermore, some language codes match regular expressions that sites use to match something else. See bug 515171 and bug 426517.

Expected results:
Expected the UI language not to be included in the UA string.

Additional information:
Internet Explorer (English IE8 and IE9 Platform Preview and Swedish IE8 tested) does not include the UI language of the browser in the UA string.
Comment 1 Ben Bucksch (:BenB) 2010-06-17 04:21:41 PDT
UA string is very much part of bug 55366.
Comment 2 Axel Hecht [:Pike] 2010-06-17 04:31:14 PDT
Any idea how to figure out if this breaks the web?
Comment 3 Henri Sivonen (:hsivonen) 2010-06-17 05:39:04 PDT
(In reply to comment #2)
> Any idea how to figure out if this breaks the web?

By doing it and seeing what happens. I don't know of another way. :-/
Comment 4 Ben Bucksch (:BenB) 2010-06-17 06:07:48 PDT
Note that I suggested in bug 55366 to replace the UI language in the UA string with the content language pref. That would not break anything, and it fact let it work better.
Comment 5 Henri Sivonen (:hsivonen) 2010-06-17 23:35:38 PDT
(In reply to comment #4)
> Note that I suggested in bug 55366 to replace the UI language in the UA string
> with the content language pref. That would not break anything, and it fact let
> it work better.

If the content language pref stays, would Firefox really need to expose it in more places in the HTTP request than IE? That is, if Accept-Language stays in some form, why wouldn't sites use Accept-Language (since it works for IE *and* Firefox) and grep the UA string instead?

Even though putting the same string in two places doesn't expose more entropy than putting it in one place,t putting a language code in the UA string would continue to make the UA string prone to problems like bug 515171 and bug 426517.
Comment 6 Axel Hecht [:Pike] 2010-06-18 01:38:29 PDT
PS: I think most sites use navigator.language of DOM0 fame, https://developer.mozilla.org/en/Navigator.language, for language selection. Which is the same UI pref we send with the UA string right now.
Comment 7 Ben Bucksch (:BenB) 2010-06-18 01:41:38 PDT
What do you mean with "if"? The content language pref and Accept-Language: header are not going to go away.

> why wouldn't sites use Accept-Language (since it works for IE *and*
> Firefox) and grep the UA string instead?

Why wouldn't they use standards? Because they are stupid! Or because they were written before Accept-Language was implemented.

Or because the site localizes in JS and the UA string language value is automatically reflected into the JavaScript |navigator.language| property *, and the HTTP headers are not accessible from JS (to my knowledge or the web page author's).

* That's also, BTW, why this is a clear up of bug 55366.

I have no data on how many sites use this. If anybody does, please post it.
Comment 8 Henri Sivonen (:hsivonen) 2010-06-18 02:09:19 PDT
(In reply to comment #7)
> What do you mean with "if"? The content language pref and Accept-Language:
> header are not going to go away.

That depends on how Mozilla ends up balancing privacy vs. browser-driven site localization.

> Or because the site localizes in JS and the UA string language value is
> automatically reflected into the JavaScript |navigator.language| property *,
> and the HTTP headers are not accessible from JS (to my knowledge or the web
> page author's).

In that case, navigator.language should reflect the content language pref.

Bug 515171 and bug 426517 are still valid reasons to zap any language from the UA string instead of putting the content language pref there.
Comment 9 Ben Bucksch (:BenB) 2010-06-18 02:24:30 PDT
Bug 515171 comment 7 is just an Apache bug, a plain bug, an overzealous security script tripping over "rm" anywhere. That could have happened with any value, not just language.

The purpose here cannot be to remove all automatic site localization. That is a highly useful feature, and as I said is not going to go away.

What's wrong is to leak the browser UI language or to base the site localization on the browser UI language instead of the explicit content language pref, which is made for exactly this purpose.
Comment 10 Axel Hecht [:Pike] 2010-06-18 02:35:39 PDT
accept-lang is a list of languages, not one. Yes, it'd be great to have that exposed to js as such, but it's not as easy to reflect that into a single locale.

Also, there are other use-cases in the newsgroup which explicitly want the locale of the browser, not the user's preferred language list.

On a second thought, it might be a nice thing to expose an API for bcp47, where the site passes in the list of locales it can offer and gets back the preferred one. That'd be both a nice API, and would expose the least personal data. (Not that you couldn't massively test that API to reverse engineer standard settings.)

Completely different bug, though.
Comment 11 Ben Bucksch (:BenB) 2010-06-18 02:51:22 PDT
> accept-lang is a list of languages, not one.
> it's not as easy to reflect that into a single locale.

It is easy: accept-lang has a preference order. You just pick the first.

The first accept-lang, on a default Firefox install, is the same as UI language, at least the locale builds I know. So, it would continue to work as now.
Comment 12 Ben Bucksch (:BenB) 2010-06-18 02:53:08 PDT
> it might be a nice thing to expose an API for bcp47, where
> the site passes in the list of locales it can offer and gets back the preferred
> one. That'd be both a nice API

Fully agreed. Want to file a bug on it or shall I?

> and would expose the least personal data.

Yes, that's great. Unfortunately, if we want to support Accept-Language header (in HTTP spec), we already expose the data.
Comment 13 Axel Hecht [:Pike] 2010-06-18 03:05:54 PDT
(In reply to comment #11)
> The first accept-lang, on a default Firefox install, is the same as UI
> language, at least the locale builds I know. So, it would continue to work as
> now.

Not true, http://mxr.mozilla.org/l10n-mozilla1.9.2/search?string=intl.accept_languages&find=global/intl.properties shows a series of counter examples.

(In reply to comment #12)
> Fully agreed. Want to file a bug on it or shall I?

Go ahead, mind making it depend on bug 525494?
Comment 14 Henri Sivonen (:hsivonen) 2010-06-18 04:18:07 PDT
(In reply to comment #13)
> Not true,
> http://mxr.mozilla.org/l10n-mozilla1.9.2/search?string=intl.accept_languages&find=global/intl.properties
> shows a series of counter examples.

Should I be noticing counter-examples where the first Accept-Language tag isn't en-US and the language part of the locale name doesn't match the language part of the first language tag?
Comment 15 Axel Hecht [:Pike] 2010-06-18 04:23:34 PDT
I have no idea what en-US has to do with it. Any locale where the first code is not the locale code would change. Whether that's good or bad is a completely different question, though.
Comment 16 Robert Kaiser (not working on stability any more) 2010-06-18 05:02:59 PDT
(In reply to comment #6)
> PS: I think most sites use navigator.language of DOM0 fame,
> https://developer.mozilla.org/en/Navigator.language, for language selection.
> Which is the same UI pref we send with the UA string right now.

Which they shouldn't. They really should use Accept-Language, the UI language should have no influence on websites - but that's all theory.

IMHO, we can't really change anything in the UA unless we support targeted spoofing.
Comment 17 Ben Bucksch (:BenB) 2010-06-18 07:05:37 PDT
> Whether that's good or bad is a completely different question

Yes, some of the locales above look buggy to me. It makes very little sense to say "en-gb, en, si-lk".
Anyways, it's true for all the important locales (en-US, en-GB, de-DE, fr-FR, pt-PT, es-*, ...).

I didn't mean to say that absolutely nothing will change. I meant to say that it will not break too much. And where it does change, often it will change to the better. Esp. we that'd honor the user pref. Currently, the Content Language preference does not work on any site that looks at UA string or JS navigator.language. Arguably, that's a bug.
Comment 18 Dão Gottwald [:dao] 2010-07-18 23:48:20 PDT
Created attachment 458264 [details] [diff] [review]
patch

This removes the UI language from the UA string and from navigator.appVersion. Both is consistent with IE. It leaves the UI language for navigator.language, which should eventually be replaced by the top accept-lang.
Comment 19 Ben Bucksch (:BenB) 2010-07-19 01:03:48 PDT
Comment on attachment 458264 [details] [diff] [review]
patch

This is violating the Mozilla User Agent spec.
<http://www-archive.mozilla.org/build/user-agent-strings.html>
<https://developer.mozilla.org/en/User_Agent_Strings_Reference>
I don't know how severe that is and how many sites it breaks, though.

Also, the navigator.language JS property is kept. What would it return now? If it still returns the UI language, nothing is won.

I think bug 55366 comment 67 would be a much better approach.

r-
Comment 20 Dão Gottwald [:dao] 2010-07-19 01:38:58 PDT
(In reply to comment #19)
> Comment on attachment 458264 [details] [diff] [review]
> patch
> 
> This is violating the Mozilla User Agent spec.
> <http://www-archive.mozilla.org/build/user-agent-strings.html>
> <https://developer.mozilla.org/en/User_Agent_Strings_Reference>
> I don't know how severe that is and how many sites it breaks, though.

We "violated" it for the security flag already. It's mostly describes the status quo though rather than being prescriptive. It can change.

> Also, the navigator.language JS property is kept. What would it return now? If
> it still returns the UI language, nothing is won.

See comment 18.
Comment 21 Ben Bucksch (:BenB) 2010-07-19 01:42:51 PDT
> We "violated" it for the security flag already. It's mostly describes the
> status quo though rather than being prescriptive. It can change.

It's a spec. It is used by websites worldwide. We can change the spec, if we understand the concrete implications (or have a pressing need, e.g. security bug). I don't see evidence of that.
Comment 22 Dão Gottwald [:dao] 2010-07-19 01:49:04 PDT
We have changed it already for this release cycle. Sites relying on the exact token count to parse the string are already broken (bug 579161). Sites relying on the language token to actually determine the user's language need to use Accept-Language instead, just like for IE.
Comment 23 Ben Bucksch (:BenB) 2010-07-19 01:55:38 PDT
> bug 579161

Typo?

> Both is consistent with IE.

Please note that "like IE" is no argument, because the UA string of Moz and MSIE was never even similar.

I understand that sites *should* not use this, and there's no good reason to use this, and in fact I'd be happy to have it gone. I just think that before changing APIs or protocols affecting other parties, you better have a concrete idea what you break and how it breaks instead of just making the change blindly.
Comment 24 Dão Gottwald [:dao] 2010-07-19 01:57:08 PDT
(In reply to comment #23)
> > bug 579161
> 
> Typo?

no
Comment 25 Ben Bucksch (:BenB) 2010-07-19 02:20:30 PDT
So, you're quoting a bug where a UA string change *did* cause totally unexpected side effects, at a well-known site, and use that as argument to change the UA string further, without any investigation of effect.
Well, whatever. Not my decision, I leave it up to others to accept/deny.
As I said, if we *can* remove this feature from UA string, I'd be happy.
Comment 26 Dão Gottwald [:dao] 2010-07-19 02:25:36 PDT
(In reply to comment #25)
> So, you're quoting a bug where a UA string change *did* cause totally
> unexpected side effects, at a well-known site, and use that as argument to
> change the UA string further

Yes, as the side effects would be similar, i.e. UA string parsers which the language token removal would break are likely already broken due to the security token removal.
Comment 27 Boris Zbarsky [:bz] 2010-07-19 10:57:52 PDT
Comment on attachment 458264 [details] [diff] [review]
patch

This looks fine, but please file a followup for navigator.language and a followup to update the UA string spec to match reality?
Comment 28 Dão Gottwald [:dao] 2010-07-19 13:46:06 PDT
http://hg.mozilla.org/mozilla-central/rev/7a07bba40e14

I updated <https://developer.mozilla.org/en/User_Agent_Strings_Reference> with the same remark that was there for the security token and added notes to <https://developer.mozilla.org/en/Firefox_4_for_developers>. Tentatively marking dev-doc-complete...
Comment 29 Dão Gottwald [:dao] 2010-07-19 14:01:16 PDT
filed bug 580032
Comment 30 Dão Gottwald [:dao] 2010-07-20 00:06:19 PDT
(In reply to comment #22)
> Sites relying on the exact
> token count to parse the string are already broken (bug 579161).

Also note that we insert the WOW64 token occasionally, so we already demanded some flexibility from parsers.
Comment 31 Fred Wenzel [:wenzel] 2010-07-28 00:31:47 PDT
FWIW, this also successfully broke our own beta feedback app: bug 582351.
Comment 32 skierpage 2010-07-29 12:15:26 PDT
(In reply to comment #28)
> I updated <https://developer.mozilla.org/en/User_Agent_Strings_Reference> with
> the same remark that was there for the security token and added notes to
> <https://developer.mozilla.org/en/Firefox_4_for_developers>.
I expanded the latter.  But there are lots of other devmo references that need fixing such as <https://developer.mozilla.org/en/DOM/window.navigator.language> ("this property also shows up as part of the navigator.userAgent string") and  https://developer.mozilla.org/en/Web_Localizability/Creating_localizable_web_applications etc.  I dunno if it's better to remove them altogether or put a similar "removed in geckoRelease 2.0" red box around each statement.
Comment 33 Wolfgang Rosenauer [:wolfiR] 2010-09-02 01:38:50 PDT
What is the purpose of the pref general.useragent.locale nowadays?
Comment 34 Dão Gottwald [:dao] 2010-09-02 03:26:15 PDT
(In reply to comment #33)
> What is the purpose of the pref general.useragent.locale nowadays?

It's used for navigator.language.
Comment 35 Ben Bucksch (:BenB) 2010-09-02 03:36:37 PDT
Not much longer: bug 55366.

I think the chrome still switches the actual UI language based on it (yes, the name now makes no sense anymore).
Comment 36 Robert Kaiser (not working on stability any more) 2010-09-02 05:50:14 PDT
(In reply to comment #35)
> I think the chrome still switches the actual UI language based on it (yes, the
> name now makes no sense anymore).

Yes, that's its primary purpose now, and it probably will need to stay there for that reason.

Note You need to log in before you can comment on or make changes to this bug.