Closed Bug 572656 Opened 14 years ago Closed 14 years ago

Remove the UI language from the UA string and navigator.appVersion

Categories

(Core :: Networking: HTTP, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla2.0b2

People

(Reporter: hsivonen, Assigned: dao)

References

(Blocks 1 open bug, )

Details

(Keywords: dev-doc-complete, Whiteboard: [parity-IE])

Attachments

(1 file)

(Note: This bug is less general than bug 55366, which is why I'm filing a new bug fully knowing about the existence of bug 55366.)

Steps to reproduce:
 1) Navigate to http://www.delorie.com:81/some/url.txt

Actual results:
The User-Agent header includes the language code for the active UI localization, which means configuration-based entropy is exposed and
can by used for fingerprinting. See https://panopticlick.eff.org/ The language code also bloats each HTTP request, typically by 7 bytes.

Furthermore, some language codes match regular expressions that sites use to match something else. See bug 515171 and bug 426517.

Expected results:
Expected the UI language not to be included in the UA string.

Additional information:
Internet Explorer (English IE8 and IE9 Platform Preview and Swedish IE8 tested) does not include the UI language of the browser in the UA string.
UA string is very much part of bug 55366.
Any idea how to figure out if this breaks the web?
(In reply to comment #2)
> Any idea how to figure out if this breaks the web?

By doing it and seeing what happens. I don't know of another way. :-/
Note that I suggested in bug 55366 to replace the UI language in the UA string with the content language pref. That would not break anything, and it fact let it work better.
(In reply to comment #4)
> Note that I suggested in bug 55366 to replace the UI language in the UA string
> with the content language pref. That would not break anything, and it fact let
> it work better.

If the content language pref stays, would Firefox really need to expose it in more places in the HTTP request than IE? That is, if Accept-Language stays in some form, why wouldn't sites use Accept-Language (since it works for IE *and* Firefox) and grep the UA string instead?

Even though putting the same string in two places doesn't expose more entropy than putting it in one place,t putting a language code in the UA string would continue to make the UA string prone to problems like bug 515171 and bug 426517.
PS: I think most sites use navigator.language of DOM0 fame, https://developer.mozilla.org/en/Navigator.language, for language selection. Which is the same UI pref we send with the UA string right now.
What do you mean with "if"? The content language pref and Accept-Language: header are not going to go away.

> why wouldn't sites use Accept-Language (since it works for IE *and*
> Firefox) and grep the UA string instead?

Why wouldn't they use standards? Because they are stupid! Or because they were written before Accept-Language was implemented.

Or because the site localizes in JS and the UA string language value is automatically reflected into the JavaScript |navigator.language| property *, and the HTTP headers are not accessible from JS (to my knowledge or the web page author's).

* That's also, BTW, why this is a clear up of bug 55366.

I have no data on how many sites use this. If anybody does, please post it.
(In reply to comment #7)
> What do you mean with "if"? The content language pref and Accept-Language:
> header are not going to go away.

That depends on how Mozilla ends up balancing privacy vs. browser-driven site localization.

> Or because the site localizes in JS and the UA string language value is
> automatically reflected into the JavaScript |navigator.language| property *,
> and the HTTP headers are not accessible from JS (to my knowledge or the web
> page author's).

In that case, navigator.language should reflect the content language pref.

Bug 515171 and bug 426517 are still valid reasons to zap any language from the UA string instead of putting the content language pref there.
Bug 515171 comment 7 is just an Apache bug, a plain bug, an overzealous security script tripping over "rm" anywhere. That could have happened with any value, not just language.

The purpose here cannot be to remove all automatic site localization. That is a highly useful feature, and as I said is not going to go away.

What's wrong is to leak the browser UI language or to base the site localization on the browser UI language instead of the explicit content language pref, which is made for exactly this purpose.
accept-lang is a list of languages, not one. Yes, it'd be great to have that exposed to js as such, but it's not as easy to reflect that into a single locale.

Also, there are other use-cases in the newsgroup which explicitly want the locale of the browser, not the user's preferred language list.

On a second thought, it might be a nice thing to expose an API for bcp47, where the site passes in the list of locales it can offer and gets back the preferred one. That'd be both a nice API, and would expose the least personal data. (Not that you couldn't massively test that API to reverse engineer standard settings.)

Completely different bug, though.
> accept-lang is a list of languages, not one.
> it's not as easy to reflect that into a single locale.

It is easy: accept-lang has a preference order. You just pick the first.

The first accept-lang, on a default Firefox install, is the same as UI language, at least the locale builds I know. So, it would continue to work as now.
> it might be a nice thing to expose an API for bcp47, where
> the site passes in the list of locales it can offer and gets back the preferred
> one. That'd be both a nice API

Fully agreed. Want to file a bug on it or shall I?

> and would expose the least personal data.

Yes, that's great. Unfortunately, if we want to support Accept-Language header (in HTTP spec), we already expose the data.
(In reply to comment #11)
> The first accept-lang, on a default Firefox install, is the same as UI
> language, at least the locale builds I know. So, it would continue to work as
> now.

Not true, http://mxr.mozilla.org/l10n-mozilla1.9.2/search?string=intl.accept_languages&find=global/intl.properties shows a series of counter examples.

(In reply to comment #12)
> Fully agreed. Want to file a bug on it or shall I?

Go ahead, mind making it depend on bug 525494?
(In reply to comment #13)
> Not true,
> http://mxr.mozilla.org/l10n-mozilla1.9.2/search?string=intl.accept_languages&find=global/intl.properties
> shows a series of counter examples.

Should I be noticing counter-examples where the first Accept-Language tag isn't en-US and the language part of the locale name doesn't match the language part of the first language tag?
I have no idea what en-US has to do with it. Any locale where the first code is not the locale code would change. Whether that's good or bad is a completely different question, though.
(In reply to comment #6)
> PS: I think most sites use navigator.language of DOM0 fame,
> https://developer.mozilla.org/en/Navigator.language, for language selection.
> Which is the same UI pref we send with the UA string right now.

Which they shouldn't. They really should use Accept-Language, the UI language should have no influence on websites - but that's all theory.

IMHO, we can't really change anything in the UA unless we support targeted spoofing.
> Whether that's good or bad is a completely different question

Yes, some of the locales above look buggy to me. It makes very little sense to say "en-gb, en, si-lk".
Anyways, it's true for all the important locales (en-US, en-GB, de-DE, fr-FR, pt-PT, es-*, ...).

I didn't mean to say that absolutely nothing will change. I meant to say that it will not break too much. And where it does change, often it will change to the better. Esp. we that'd honor the user pref. Currently, the Content Language preference does not work on any site that looks at UA string or JS navigator.language. Arguably, that's a bug.
Whiteboard: [parity-IE]
Attached patch patchSplinter Review
This removes the UI language from the UA string and from navigator.appVersion. Both is consistent with IE. It leaves the UI language for navigator.language, which should eventually be replaced by the top accept-lang.
Attachment #458264 - Flags: review?(bzbarsky)
Comment on attachment 458264 [details] [diff] [review]
patch

This is violating the Mozilla User Agent spec.
<http://www-archive.mozilla.org/build/user-agent-strings.html>
<https://developer.mozilla.org/en/User_Agent_Strings_Reference>
I don't know how severe that is and how many sites it breaks, though.

Also, the navigator.language JS property is kept. What would it return now? If it still returns the UI language, nothing is won.

I think bug 55366 comment 67 would be a much better approach.

r-
(In reply to comment #19)
> Comment on attachment 458264 [details] [diff] [review]
> patch
> 
> This is violating the Mozilla User Agent spec.
> <http://www-archive.mozilla.org/build/user-agent-strings.html>
> <https://developer.mozilla.org/en/User_Agent_Strings_Reference>
> I don't know how severe that is and how many sites it breaks, though.

We "violated" it for the security flag already. It's mostly describes the status quo though rather than being prescriptive. It can change.

> Also, the navigator.language JS property is kept. What would it return now? If
> it still returns the UI language, nothing is won.

See comment 18.
> We "violated" it for the security flag already. It's mostly describes the
> status quo though rather than being prescriptive. It can change.

It's a spec. It is used by websites worldwide. We can change the spec, if we understand the concrete implications (or have a pressing need, e.g. security bug). I don't see evidence of that.
We have changed it already for this release cycle. Sites relying on the exact token count to parse the string are already broken (bug 579161). Sites relying on the language token to actually determine the user's language need to use Accept-Language instead, just like for IE.
> bug 579161

Typo?

> Both is consistent with IE.

Please note that "like IE" is no argument, because the UA string of Moz and MSIE was never even similar.

I understand that sites *should* not use this, and there's no good reason to use this, and in fact I'd be happy to have it gone. I just think that before changing APIs or protocols affecting other parties, you better have a concrete idea what you break and how it breaks instead of just making the change blindly.
(In reply to comment #23)
> > bug 579161
> 
> Typo?

no
So, you're quoting a bug where a UA string change *did* cause totally unexpected side effects, at a well-known site, and use that as argument to change the UA string further, without any investigation of effect.
Well, whatever. Not my decision, I leave it up to others to accept/deny.
As I said, if we *can* remove this feature from UA string, I'd be happy.
(In reply to comment #25)
> So, you're quoting a bug where a UA string change *did* cause totally
> unexpected side effects, at a well-known site, and use that as argument to
> change the UA string further

Yes, as the side effects would be similar, i.e. UA string parsers which the language token removal would break are likely already broken due to the security token removal.
Comment on attachment 458264 [details] [diff] [review]
patch

This looks fine, but please file a followup for navigator.language and a followup to update the UA string spec to match reality?
Attachment #458264 - Flags: review?(bzbarsky) → review+
http://hg.mozilla.org/mozilla-central/rev/7a07bba40e14

I updated <https://developer.mozilla.org/en/User_Agent_Strings_Reference> with the same remark that was there for the security token and added notes to <https://developer.mozilla.org/en/Firefox_4_for_developers>. Tentatively marking dev-doc-complete...
Assignee: nobody → dao
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Summary: Remove the UI language from the UA string → Remove the UI language from the UA string and navigator.appVersion
Target Milestone: --- → mozilla2.0b2
filed bug 580032
(In reply to comment #22)
> Sites relying on the exact
> token count to parse the string are already broken (bug 579161).

Also note that we insert the WOW64 token occasionally, so we already demanded some flexibility from parsers.
FWIW, this also successfully broke our own beta feedback app: bug 582351.
(In reply to comment #28)
> I updated <https://developer.mozilla.org/en/User_Agent_Strings_Reference> with
> the same remark that was there for the security token and added notes to
> <https://developer.mozilla.org/en/Firefox_4_for_developers>.
I expanded the latter.  But there are lots of other devmo references that need fixing such as <https://developer.mozilla.org/en/DOM/window.navigator.language> ("this property also shows up as part of the navigator.userAgent string") and  https://developer.mozilla.org/en/Web_Localizability/Creating_localizable_web_applications etc.  I dunno if it's better to remove them altogether or put a similar "removed in geckoRelease 2.0" red box around each statement.
Blocks: 586165
What is the purpose of the pref general.useragent.locale nowadays?
(In reply to comment #33)
> What is the purpose of the pref general.useragent.locale nowadays?

It's used for navigator.language.
Not much longer: bug 55366.

I think the chrome still switches the actual UI language based on it (yes, the name now makes no sense anymore).
(In reply to comment #35)
> I think the chrome still switches the actual UI language based on it (yes, the
> name now makes no sense anymore).

Yes, that's its primary purpose now, and it probably will need to stay there for that reason.
Depends on: 602291
You need to log in before you can comment on or make changes to this bug.