Closed Bug 1415146 Opened 7 years ago Closed 5 years ago

navigator.language is empty string "" if the user cleared the preferred languages list

Categories

(Core :: DOM: Core & HTML, defect, P3)

58 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla77
Tracking Status
firefox77 --- fixed

People

(Reporter: dennis.lissov, Assigned: mkaply)

References

Details

Attachments

(1 file)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:58.0) Gecko/20100101 Firefox/58.0 Build ID: 20171106100122 Steps to reproduce: Open "preferred languages" in settings and delete all language lines, then open a console on any site and check the values of `navigator.language` and `navigator.languages` Actual results: > navigator.language "" > navigator.languages Array [ ] Expected results: According to the W3C HTML5 spec and to the the WHATWG HTML living standard, navigator.language: "Must return a valid BCP 47 language tag representing either a plausible language or the user's preferred language." navigator.languages: "Must return a read only array of valid BCP 47 language tags representing either *one or more* plausible languages, or the user’s preferred languages, ordered by preference with the most preferred language first." neither of these properties can be empty. https://www.w3.org/TR/2017/PR-html52-20171102/webappapis.html#language-preferences https://html.spec.whatwg.org/multipage/system-state.html#language-preferences
Component: Untriaged → DOM
Product: Firefox → Core
Priority: -- → P3
Component: DOM → DOM: Core & HTML

Can confirm that this problem still exists in FF 68.0.2 (Linux).
Also, I have found that occasionally it does make some websites break silently (i.e. not only confused about languages, but not working at all without giving an indication of what is wrong). Current example: https://maps.luftdaten.info (should show a map, with the languages list empty it remains stuck on "loading data...". The admin of that website is informed of the problem so it might get fixed on their side at some point).

Component: DOM: Core & HTML → Internationalization

Pushing it back to DOM because I think it's a DOM spec conform issue.

The code is in https://searchfox.org/mozilla-central/source/dom/base/Navigator.cpp#309 and it just doesn't handle the case where the platform has no value.

Component: Internationalization → DOM: Core & HTML

Why are we using accept languages at all? Per the documentation, this should be the user agent language, not the accept language:

The NavigatorLanguage.language read-only property returns a string representing the preferred language of the user, usually the language of the browser UI.

Then we would never be in a situation where we didn't know the language.

We have a significant number of cases where the user wants their UI to be in a different locale than their web content.

This bug breaks qwant.com when navigator.language is the empty string "" and navigator.userLanguage is undefined. The qwant.com home page is blank in Firefox if you remove all the languages from the "Choose your preferred language for displaying pages" UI. Chrome's settings UI does not allow the user to remove all the languages, so navigator.language is always at least "en".

I'm surprised that Firefox's navigator.language doesn't include the browser build's language and/or the OS language, or at least as a fallback when the user has emptied their list of preferred languages.

Status: UNCONFIRMED → NEW
Ever confirmed: true
Summary: navigator.language can be empty if the user cleared the preferred languages list → navigator.language is empty string "" if the user cleared the preferred languages list

I'm wondering if we should simply not allow the list to be cleared in preferences (don't allow deletion of the last item).

The code specifically references RFC 2616, Section 15.1.4 "Privacy Issues Connected to Accept Headers" as to why you shouldn't send the user's language:

https://tools.ietf.org/html/rfc2616#section-15.1.4

Obviously this wouldn't help folks that are already in this situation.

Assignee: nobody → mozilla
Status: NEW → ASSIGNED

Arthur, can you offer an opinion on this patch please?

Flags: needinfo?(arthur)

(In reply to Mike Kaply [:mkaply] from comment #6)

I'm wondering if we should simply not allow the list to be cleared in preferences (don't allow deletion of the last item).

It's (probably?) fine if people want to clear Accept-Languages, and I'm not sure how you'll enforce that a pref can't be an empty string given the various ways they can be set. But in that case (privacy conscious user?) we should default to en-US as the biggest set for those people to try to hide in, not their UI language which could be quite a small set.

(In reply to Daniel Veditz [:dveditz] from comment #9)

It's (probably?) fine if people want to clear Accept-Languages, and I'm not sure how you'll enforce that a pref can't be an empty string given the various ways they can be set. But in that case (privacy conscious user?) we should default to en-US as the biggest set for those people to try to hide in, not their UI language which could be quite a small set.

While we can't prevent a user from clearing their intl.accept_languages pref in about:config (and thus Gecko needs to handle that case), Firefox's language settings UI (search for "Webpage language settings" in about:preferences) allows the user to remove all languages from the UI list. Perhaps a follow-up frontend bug could change the language settings UI to allows require at least one language entry. (Chrome's language settings UI prevents this.)

On Wed, Feb 19, 2020 at 5:53 PM Mike Kaply <mkaply@mozilla.com> wrote:

On Tue, Feb 18, 2020 at 4:28 PM Tom Ritter <tom@mozilla.com> wrote:

Using GetWebExposedLocales is good because it respects Tor Browser's
preference for spoofing locale.

The question seems to be "should we leak the user's ui locale or just
hardcode 'en'?"  We sometimes leak the UI locale in some
odd/inconsistent ways presently, but I don't think that's an argument
to leak it by default like this.

Those two statements seem to contradict each other. If GetWebExposedLocales is good, why can't we then use it and just accept the value it gives (which is en-US when nothing is specified).

Sorry you're right - GetWebExposedLocales handles the Tor/Resist Fingerprinting case but it doesn't address the fact that we're now leaking the user's UI locale in a new way.

If the only purpose is to not break websites by having some value
there, I think it would be better to hardcode 'es' - and if we get
compatibility issues, investigate and see if it's fixed by using the
UI locale.  (Why 'es'? Because if a site actually does have
compatibility problems, we're going to see them more obviously than if
we hardcoded 'en'.  The users who were answering <nothing> are no more
fingerprintinable than they were before.)

If we do that, we'll just get websites in the wrong language for users that remove all the languages. They'll have to explicitly add a language for things to work...

I'm sorry, I had thought the intention here was to avoid a javascript error where a website expected a value to be returned and none was. Hence my suggestion to a) hardcode a value and b) choose a non-english value to show us (mostly-english-localized-developers) if a website was actually making a localization choice based on that arbitrary value we chose to return.

It seems like you're saying that the situation is actually that a user who has mucked with their language preferences is now getting incorrectly localized content from the website. And the way to fix that is to ignore that the user mucked with their language preferences and go get a locale from the UI. This seems against the spirit of what the user intended. (Assuming of course, that we think the mucking that was done was intentional....)

Flags: needinfo?(arthur)

It seems like you're saying that the situation is actually that a user who has mucked with their language preferences is now getting incorrectly localized content from the website. And the way to fix that is to ignore that the user mucked with their language preferences and go get a locale from the UI. This seems against the spirit of what the user intended. (Assuming of course, that we think the mucking that was done was intentional....)

It's worse than that. There are websites that don't work at all in this case (qwant.com is blank). Websites assume navigator.language will have a value because per the spec, it's supposed to always have a value

(In reply to Mike Kaply [:mkaply] from comment #12)

It seems like you're saying that the situation is actually that a user who has mucked with their language preferences is now getting incorrectly localized content from the website. And the way to fix that is to ignore that the user mucked with their language preferences and go get a locale from the UI. This seems against the spirit of what the user intended. (Assuming of course, that we think the mucking that was done was intentional....)

It's worse than that. There are websites that don't work at all in this case (qwant.com is blank). Websites assume navigator.language will have a value because per the spec, it's supposed to always have a value

Actually that example is my initial understanding: There's a javascript error because it expects a value to be there that isn't. I reproduced it by clearing my language list, seeing the error in the web console, then fixed it by adding back Spanish. Despite being present in the website's list of languages (visible in the Js var constants.AvailableLanguages) the website is not localized in any way I can see - it all looks English (not Spanish) to me.

So while other examples may actually localize based on the reported value, it looks like we could hardcode a value (like 'es') and fix some websites. And that choice (a single hardcoded value instead of the UI locale) would be consistent with the assumption that in clearing the language list the user wanted to make themselves less unique.

That assumption may not be correct of course. And it may be the case that other high profile websites do localize based on this value, and that the best-guessed way to fix them is to use UI locale. (Which may not actually be the correct choice from the user's perspective, but it's certainly a better guess than hardcoding.)

would be consistent with the assumption that in clearing the language list the user wanted to make themselves less unique.

I would not assume that. Vast majority of cases where I've encountered that outcome are caused by people making an unconscious mistake and then being confused by the result (blank website? I didn't do anything to cause it!).

Maybe we could return und language in such a case? It's a BCP47 compatible Unicode Language Identifier for an unknown locale.

(In reply to Zibi Braniecki [:zbraniecki][:gandalf] from comment #14)

Maybe we could return und language in such a case? It's a BCP47 compatible Unicode Language Identifier for an unknown locale.

While technically correct given an empty language pref or UI settings list, returning und might confuse poorly-coded websites or produce less useful (to the user) content than returning Firefox's UI locale or hardcoded en.

For users clearing their language list in an attempt to make themselves less unique, returning und would make them more unique than returning something like en.

(In reply to Chris Peterson [:cpeterson] from comment #15)

For users clearing their language list in an attempt to make themselves less unique, returning und would make them more unique than returning something like en.

While yes, hardcoding 'en' could enable them to go into the anonymity set of 'en' and it would benefit them; returning 'und' is no worse than what we currently do which is return nothing.

While technically correct given an empty language pref or UI settings list, returning und might confuse poorly-coded websites or produce less useful (to the user) content than returning Firefox's UI locale or hardcoded en.

I don't understand that claim. Why is "en" more useful than "und"? If the website uses accepted languages header it does some negotiation and it'll negotiate down to something. There's no difference between und and any other locale that the website doesn't have data for.

For users clearing their language list in an attempt to make themselves less unique, returning und would make them more unique than returning something like en.

That's a bit more complicated. en would make them less unique in English speaking World. In countries where people don't use English, en and und are equally unlikely except that en falsely claims that the user wants en while they don't.

In general, I'm not sure why having accepted langs match Firefox UI locale is considered worsening the situation.
As far as I understand the flow is:

  1. User installs Firefox in locale X
  2. Their accept langs matches X
  3. They remove (accidentally) their accept langs
  4. Why not show X still?

I need a recommendation here. It feel like we're just going around in circles at this point.

Another broken website: Firefox Monitor users' dashboard pages show a cryptic error message if the user has no accept languages.

https://monitor.firefox.com/user/dashboard

https://github.com/mozilla/blurts-server/issues/1580

Another site broken when navigator.language is the empty string: clicking Lando's "Preview Landing" button doesn't do anything.

Here is one of errors logged to the web console:

RangeError: invalid language tag: "" main.min.js:18:1484
    CanonicalizeLocaleList self-hosted:5417
    InitializeDateTimeFormat self-hosted:5858
    toLocaleString self-hosted:756
    formatTime https://lando.services.mozilla.com/static/build/main.min.js?42f5d3c0:18
    each https://lando.services.mozilla.com/static/build/vendor.min.js?c9f5aeec:2
    each https://lando.services.mozilla.com/static/build/vendor.min.js?c9f5aeec:2
    formatTime https://lando.services.mozilla.com/static/build/main.min.js?42f5d3c0:18
    timeline https://lando.services.mozilla.com/static/build/main.min.js?42f5d3c0:18
    each https://lando.services.mozilla.com/static/build/vendor.min.js?c9f5aeec:2
    each https://lando.services.mozilla.com/static/build/vendor.min.js?c9f5aeec:2
    timeline https://lando.services.mozilla.com/static/build/main.min.js?42f5d3c0:18
    <anonymous> https://lando.services.mozilla.com/static/build/main.min.js?42f5d3c0:18
    j https://lando.services.mozilla.com/static/build/vendor.min.js?c9f5aeec:2
    k https://lando.services.mozilla.com/static/build/vendor.min.js?c9f5aeec:2
    (Async: setTimeout handler)
    g https://lando.services.mozilla.com/static/build/vendor.min.js?c9f5aeec:2
    i https://lando.services.mozilla.com/static/build/vendor.min.js?c9f5aeec:2
    fireWith https://lando.services.mozilla.com/static/build/vendor.min.js?c9f5aeec:2
    fire https://lando.services.mozilla.com/static/build/vendor.min.js?c9f5aeec:2
    i https://lando.services.mozilla.com/static/build/vendor.min.js?c9f5aeec:2
    fireWith https://lando.services.mozilla.com/static/build/vendor.min.js?c9f5aeec:2
    ready https://lando.services.mozilla.com/static/build/vendor.min.js?c9f5aeec:2
    S https://lando.services.mozilla.com/static/build/vendor.min.js?c9f5aeec:3
    (Async: EventListener.handleEvent)
    <anonymous> https://lando.services.mozilla.com/static/build/vendor.min.js?c9f5aeec:3
    <anonymous> https://lando.services.mozilla.com/static/build/vendor.min.js?c9f5aeec:2
    <anonymous> https://lando.services.mozilla.com/static/build/vendor.min.js?c9f5aeec:2
See Also: → 1427604

So looking at https://tools.ietf.org/html/rfc7231#section-9.7 has me wondering, are we sending an empty accept language header or no accept language header?

Would these sites work if there was no accept language at all?

So looking at https://tools.ietf.org/html/rfc7231#section-9.7 has me wondering, are we sending an empty accept language header or no accept language header?

Would these sites work if there was no accept language at all?

We are sending no accept language header at all. That's the core problem. These sites expect there to be an accept language header, regardless of the value. I'm going to do some more testing to see if sending an empty language accept header works.

(Independent of the navigator.language problem. I think there are two issues here, navigator.language and accept header)

Attachment #9120173 - Attachment description: Bug 1415146 - Fallback to UI locale for navigator.language(s) if there are no accept languages. r?zbraniecki → Bug 1415146 - Use getExposedWebLocales for navigator.language(s) if there are no accept languages. r?zbraniecki

Hopefully someone can review this and we can just call this done.

We're using the API that was designed for this - getWebExposedLocales.

I've added in more detail about what we're doing in the comments.

We should either take this fix or close as WONTFIX.

And ignore my accept header stuff. I was just confused.

I like this patch from the intl use perspective. But I don't think we should fiddle with that API without having someone related to privacy approve it as well.

(In reply to Zibi Braniecki [:zbraniecki][:gandalf] from comment #24)

I like this patch from the intl use perspective. But I don't think we should fiddle with that API without having someone related to privacy approve it as well.

Technically I'm not a peer of anything; but I think this is okay.

Pushed by mozilla@kaply.com: https://hg.mozilla.org/integration/autoland/rev/7b46a8bfc6af Use getExposedWebLocales for navigator.language(s) if there are no accept languages. r=smaug

So oddly I can't recreate this failure. I'm wondering if it's only happening on macOS 10.14.

Either way, I'm going to update the test to explicitly query getWebExposedLocales and that should fix.

Flags: needinfo?(mozilla)
Pushed by mozilla@kaply.com: https://hg.mozilla.org/integration/autoland/rev/c7c5e3f98126 Use getExposedWebLocales for navigator.language(s) if there are no accept languages. r=smaug
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla77
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: