Last Comment Bug 55366 - Don't reveal UI language to site/page -- Change navigator.language to use Accept-Language instead of the UI language
: Don't reveal UI language to site/page -- Change navigator.language to use Acc...
Status: RESOLVED FIXED
[defective-privacy]
: dev-doc-complete, privacy
Product: Core
Classification: Components
Component: DOM (show other bugs)
: Trunk
: All All
: P4 normal with 6 votes (vote)
: mozilla5
Assigned To: Ben Bucksch (:BenB)
:
Mentors:
http://gemal.dk/browserspy/language.html
: 464414 525637 580032 (view as bug list)
Depends on: 572656 post2.0
Blocks: 418485 646428
  Show dependency treegraph
 
Reported: 2000-10-05 12:17 PDT by Ben Bucksch (:BenB)
Modified: 2012-03-02 15:52 PST (History)
60 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
-
-


Attachments
Proposed Fix for UA-string. Also contains Win fix for bug 57555. (4.30 KB, patch)
2001-07-27 19:36 PDT, Ben Bucksch (:BenB)
no flags Details | Diff | Review
Patch 3: Change navigator.language to use Accept-Language (1.73 KB, patch)
2010-07-25 15:00 PDT, Ben Bucksch (:BenB)
bzbarsky: review-
Details | Diff | Review
Patch 4: Change navigator.language to use Accept-Language (2.32 KB, patch)
2010-07-31 05:16 PDT, Ben Bucksch (:BenB)
no flags Details | Diff | Review
Patch 5: Change navigator.language to use Accept-Language (3.81 KB, patch)
2010-08-02 16:24 PDT, Ben Bucksch (:BenB)
bzbarsky: review-
Details | Diff | Review
Patch 6: Change navigator.language to use Accept-Language, using tokerizer for uppercasing (4.22 KB, patch)
2010-08-03 19:52 PDT, Ben Bucksch (:BenB)
bzbarsky: review+
Details | Diff | Review
Patch 7: Change navigator.language to use Accept-Language (4.22 KB, patch)
2010-08-03 21:03 PDT, Ben Bucksch (:BenB)
ben.bucksch: review+
jst: superreview+
benjamin: approval2.0-
Details | Diff | Review

Description Ben Bucksch (:BenB) 2000-10-05 12:17:47 PDT
Reproduce:
1. Visit <http://gemal.dk/browserspy/language.html>.

Actual result:
Your UI language and locale (e.g. en-US) is displayed.

Expected result:
Neither UI nor OS language or locale are revealed to page or site.

Additional Comments:
Compare HTTP 1.1. spec:
<quote src="http://www.ietf.org/rfc/rfc2616.txt">
15.1.4 Privacy Issues Connected to Accept Headers

   Accept request-headers can reveal information about the user to all
   servers which are accessed. The Accept-Language header in particular
   can reveal information the user would consider to be of a private
   nature, because the understanding of particular languages is often
   strongly correlated to the membership of a particular ethnic group.
   User agents which offer the option to configure the contents of an
   Accept-Language header to be sent in every request are strongly
   encouraged to let the configuration process include a message which
   makes the user aware of the loss of privacy involved.

   An approach that limits the loss of privacy would be for a user agent
   to omit the sending of Accept-Language headers by default, and to ask
   the user whether or not to start sending Accept-Language headers to a
   server if it detects, by looking for any Vary response-header fields
   generated by the server, that such sending could improve the quality
   of service.

   Elaborate user-customized accept header fields sent in every request,
   in particular if these include quality values, can be used by servers
   as relatively reliable and long-lived user identifiers. Such user
   identifiers would allow content providers to do click-trail tracking,
   and would allow collaborating content providers to match cross-server
   click-trails or form submissions of individual users. Note that for
   many users not behind a proxy, the network address of the host
   running the user agent will also serve as a long-lived user
   identifier. In environments where proxies are used to enhance
   privacy, user agents ought to be conservative in offering accept
   header configuration options to end users. As an extreme privacy
   measure, proxies could filter the accept headers in relayed requests.
   General purpose user agents which provide a high degree of header
   configurability SHOULD warn users about the loss of privacy which can
   be involved.
</quote>

While we don't send this info as HTTP header, we offer it as JS property. See
the source of the testcase mentioned above for details.
Comment 1 David Krause 2000-10-09 01:39:35 PDT
Isn't the language info also in the useragent string?  According to the browser
sniffer at http://www.ufaq.org/ it is.

User Agent: Mozilla/5.0 (X11; U; Linux 2.2.16-3 i686; en-US; m18) Gecko/20001006
Application Name: Netscape
Application Version: 5.0 (X11; en-US)

While this could be considered a privacy thing, it could be really neat if sites
tailored what language their content was according to your locale and language.
Comment 2 David Krause 2000-10-09 01:42:30 PDT
Hmm, on section thought maybe this isn't so good.  I recommend disabling it or
at least pref-disabling it.
Comment 3 Ben Bucksch (:BenB) 2000-10-09 04:53:34 PDT
David, you are right and everything is as you suggested it. See the pref UI pane
under Navigator - it configures the HTTP header to send.
This bug is about the JS property which is
- much less useful (do you want to send all available language versions and then
select onteh client side?)
- not opt-in
- corrently not (independantly) changeable by the user, but identical to the UI
language
.
Comment 4 timeless 2000-10-10 07:13:41 PDT
The useragent should not include this information. But the point of accept is 
to say that the user WANTS the content in that language.  Some RFCs need 
comments like ~this is stupid and counterproductive~.

Verah I think you work on that privacy document, add a paragraph w/ link to 
that rfc:
According to <a>RFC</a> we do hereby warn you that asking for content in a 
language you prefer might divulge information about you (including <span 
style="-moz-type-timeless:shocking">the language you prefer to read</span>, 
which may imply <span style="-moz-type-timeless:shocking">your 
ethnicity</span>).

I presume mozilla does send prefered language headers, if not we need a bug for 
that (mozilla1.0)
Comment 5 Ben Bucksch (:BenB) 2000-10-10 08:34:15 PDT
timeless, I don't understand your last comment. Also, it has nothing to do with
privacy *links* (maybe the *document*). Removing dependancy.

Please note the difference between the HTTP header "Accept-Language" and the JS
property. The implementation of the former is OK in Mozilla (I think). This bug
is about the latter.

David Krause, thanks for noting the UA string (anyhow, I missed your comment).
Will investigate.
Comment 6 Mitchell Stoltz (not reading bugmail) 2000-10-18 12:13:10 PDT
Future.
Comment 7 Gervase Markham [:gerv] 2000-10-25 04:27:15 PDT
As there's nothing a user can do about this JS privacy leak, is it worth 
relnoting?

The blurb:
It seems unclear to me whether this bug requires either of a "developer" or 
"user" release note for Netscape 6 RTM. If anyone feels it does, can they please 
draft one and then nominate with the relnote-user or relnote-devel strings in 
the Status Whiteboard.

Thanks :-)

Gerv
Comment 8 Ben Bucksch (:BenB) 2000-10-25 04:33:13 PDT
> As there's nothing a user can do about this JS privacy leak, is it worth
> relnoting?

Sure, at least he has to know. He can do something: Use another UA or use
english chrome.
Comment 9 Mitchell Stoltz (not reading bugmail) 2000-10-25 14:56:15 PDT
Hasn't Netscape always revealed the UI language somehow? If this behavior is
present in 4.x, then I don't see why we need a relnote. 
Comment 10 Ben Bucksch (:BenB) 2000-10-25 15:47:46 PDT
> Hasn't Netscape always revealed the UI language somehow?

Yes, I think so. I am not sure we need a relnote.
Comment 11 Ben Bucksch (:BenB) 2000-10-26 23:43:21 PDT
We *already* have something better than a relnote: Tasks|Privacy|Understanding
<chrome://communicator/locale/wallet/privacy.html>. removing relnoteRTM based on
that.
Comment 12 John Unruh 2000-11-14 10:19:05 PST
Mass changing QA to ckritzer.
Comment 13 danielmc 2001-05-22 06:49:04 PDT
We are past the UI freeze for Commmercial Beta. We need a UI freeze to ba able 
to ship Localised products simultaneously with the US. Would it be possible to 
check any changes to chrome://communicator/locale/wallet/privacy.html after we 
have branched in the commercial tree on 6/29?
Comment 14 danielmc 2001-05-22 06:57:54 PDT
By the way here is a handy JS for revealing UA info...

<PRE>
<SCRIPT>
with (document) {
    writeln("navigator.userAgent is ", navigator.userAgent);
    writeln("navigator.appCodeName is ", navigator.appCodeName);
    writeln("navigator.appVersion is ", navigator.appVersion);
    writeln("navigator.appName is ", navigator.appName);
}
</SCRIPT>
</PRE>
Comment 15 Mitchell Stoltz (not reading bugmail) 2001-07-26 16:36:30 PDT
Target is now 0.9.5, Priority P1.
Comment 16 Ben Bucksch (:BenB) 2001-07-26 17:32:21 PDT
Removing this info from the UA string is trivial. I did this for Beonex
Communicator, I can attach a patch.
Comment 17 Mitchell Stoltz (not reading bugmail) 2001-07-27 10:35:30 PDT
Ben,
   By all means, attach your patch. I don't think we will make it the default,
but it would be nice to have as part of "high-privacy mode," which is something
I'm working on.
Comment 18 Ben Bucksch (:BenB) 2001-07-27 19:23:25 PDT
Mitch, why not make it the default? The HTTP spec explicitly recommends against
doing what we do atm. Sites won't break either, if we just always send "en-US".
(But possibly with an Accept-Language header, which is customized by the user.)
Comment 19 Ben Bucksch (:BenB) 2001-07-27 19:36:28 PDT
Created attachment 43864 [details] [diff] [review]
Proposed Fix for UA-string. Also contains Win fix for bug 57555.
Comment 20 Ben Bucksch (:BenB) 2001-07-27 20:03:31 PDT
> (But possibly with an Accept-Language header, which is customized by the user.)

s/with/, we send/

Comment 21 Mitchell Stoltz (not reading bugmail) 2001-07-30 13:39:49 PDT
It's not my decision to make. I will ask around here and find out if changing
the default UA is OK. There may be some resistance to changing it at all,
especially if we've always provided the language information in the UA.
Comment 22 timeless 2001-07-30 17:32:59 PDT
this was discussed in a newsgruop that i read recently (well the discussion was 
1-4 years old but...) cc.
Comment 23 Jeremy M. Dolan 2001-07-30 20:44:58 PDT
Ben, does your patch remove the region/language from navigator.appVersion as well?
Comment 24 Ben Bucksch (:BenB) 2001-07-30 23:59:35 PDT
Yes, it seems so. The test page now shows "en" (the hardcoded dummy value)
instead of "en-US". Looks like the Javascript function pulls its value out of
the UA-string, which is nice. So, looks like I fixed this bug. I'll install a
German langpack when I get a chance to be sure.

mstoltz, would you mind, if I took the bug? Who has to be asked about checking
this in, apart from dbaron? (There's no pref to turn this on or off, since I see
no value in the current behaviour - see comment in patch and my earlier comments
here for reasons.)

dbaron, what do you think?
Comment 25 Ben Bucksch (:BenB) 2001-07-31 00:18:53 PDT
Posted proposal to .netlib: <news://news.mozilla.org/3B665A87.8050807@beonex.com>.
Comment 26 Katsuhiko Momoi 2001-07-31 01:09:11 PDT
> It's not my decision to make. I will ask around here and find 
> out if changing the default UA is OK. There may be some 
> resistance to changing it at all, especially if we've always
> provided the language information in the UA.

Netscape commercial builds cannot remove the lang info from
UA string. They serve as important tracking tools. 

If Mozilla wants to make that an option, that is fine 
but that option should not be the default. 
If there is a proposed UI for it, commercial builds might
consider removing the UI.
Comment 27 Ben Bucksch (:BenB) 2001-07-31 01:30:42 PDT
> If Mozilla wants to make that an option, that is fine 
> but that option should not be the default.

Why? If it's a pref, Netscape can alter it trivially.
Comment 28 Katsuhiko Momoi 2001-07-31 01:44:56 PDT
>> If Mozilla wants to make that an option, that is fine 
>> but that option should not be the default.

> Why? If it's a pref, Netscape can alter it trivially.

That is true. As long as it is Netscape's default not 
to turn off lang info, that would be fine.

By the way I would like to raise this issue about
the privacy clause in HHTP 1.1. It is wrong-headed to
single out lang info as the only thing compromising
security. What about the fact that you're using Mozilla,
Gecko, or Netscape, Win NT5, etc. ? Why, someone could
descriminate against Netscape users or IE users or
whatever. That is also a privacy issue if lang info
is a privacy issue. The fact that somethng is mentioned
in an RFC document does not mean we need to be implementing
everything that is in it. 
In these day and age, the fact that someone might be using
en-US build means virtually nothing other than the fact that
someone maight be able to read English. We have users 
all over the world using an en-US build.
The fact that someone is using using an ja-JP build does not
mean that that person is Japanese. It simple means that someone
possibly reads Japanese but may be a Canadian, etc. 
The whole argument about the lang info being a compromising
factor is moot in my opinion. Whoever wrote the HTTP 1.1 section
onlang info and security should examine issues more broadly
and fairly.

Our proud Mozilla localizers around the world would probably 
like to see their L10 work reflected accurately in the 
UA string. 

Comment 29 Ben Bucksch (:BenB) 2001-07-31 02:02:22 PDT
> wrong to single out lang info as the only thing compromising security.

Right. There are other bugs about other issues, e.g. bug 57555.

> en-US build means virtually nothing

Right, but if I speak Hebrew, it does mean something.

You can argue about the severity of this bug. But I do think that it should be
fixed. I care less about the default in Mozilla, but I would prefer that Mozilla
followed the advise of the spec.

I will attach a new patch which makes it dependant on a pref
(browser.reveal-ui-lang or similar) when I have time.
Comment 30 timeless 2001-07-31 09:38:56 PDT
ben: you're german no? I know a bunch of people who contribute to mozilla.org 
who can read hebrew and I wonder if _any_ of them have this concern (I know 
it's really odd ..)
Comment 31 tao 2001-07-31 11:32:52 PDT
Folks:

Some websites sniff U-A string to redirect users to appropriate pages for 
downloading localized version of their software/patch. When locale info does not
present, "en-US" are often used as the default.

Following spec is a good thing when it does not break existing websites. I agree 
that making it a preference and default to 'on' seems to be a good compromise.
Comment 32 Jeremy M. Dolan 2001-07-31 12:57:07 PDT
> Some websites sniff U-A string to redirect users to appropriate pages for 
> downloading localized version of their software/patch.

IE 5.0, 5.01, 5.5, and presumably earlier versions don't put language in UA.
Opera doesn't put language in UA. Konqueror doesn't put language in UA. This is
a wholly unappropriate and nonportable place for that information. That's the
whole purpose of the Accept-Language header, to specify what language you want
information in.

If you want to ignore the RFC and default Accept-Language to on, that's fine
enough by me (and is a seperate bug anyway). But there's no purpose to reveal
the *UI* language, if not for privacy, for correctness. Accept-Language is the
language(s) the user wants to receive information in.
Comment 33 tao 2001-07-31 13:11:48 PDT
I don't think I've ever said that it is proper to use U-A string for content
negotiation; as you pointed out, accept-lang in the HTTP header serves such 
purpose. All I said is there are indeed websites misuse the U-A string...

Glad hear standard advocate, though :-) 
Comment 34 Katsuhiko Momoi 2001-07-31 14:02:28 PDT
> But there's no purpose to reveal the *UI* language, if not 
> for privacy, for correctness. Accept-Language is the
> language(s) the user wants to receive information in.

I think you should read the definition of User-agent:

"14.43 User-Agent

   The User-Agent request-header field contains information about the
   user agent originating the request. This is for statistical purposes,
   the tracing of protocol violations, and automated recognition of user
   agents for the sake of tailoring responses to avoid particular user
   agent limitations. User agents SHOULD include this field with
   requests. The field can contain multiple product tokens (section 3.8)
   and comments identifying the agent and any subproducts which form a
   significant part of the user agent. By convention, the product tokens
   are listed in order of their significance for identifying the
   application.

       User-Agent     = "User-Agent" ":" 1*( product | comment )

   Example:

       User-Agent: CERN-LineMode/2.15 libwww/2"

Further Product tokens are deifned as:

" Product tokens are used to allow communicating applications to
   identify themselves by software name and version. Most fields using
   product tokens also allow sub-products which form a significant part
   of the application to be listed, separated by white space. By
   convention, the products are listed in order of their significance
   for identifying the application. .... etc."

UI language differs if localization files are different. 
It is clearly a significant part of the application. And though
this is not common, localization itself might reveal a bug that 
was not caught before the product was shipped. This latter type
of case does actually. For tracking purposes, it is in my opinion
siginificant info. 

I don't believe that we should use considerations raised
for Accept-Language for user-agent issues. I just want to point
out that there are arguments for both sides of this issue and also
that the HTTP 1.1 says nothing about not revealing the UI language in
the User-agent header. If MS or Opera wants not to include that
info, that is fine but let that not bind what we should do here.
Comment 35 Jeremy M. Dolan 2001-07-31 14:27:51 PDT
Keep in mind, that where the RFC says "automated recognition of user agents for
the sake of tailoring responses", I think this would more refer to protocol
tailoring, not content. For example, Apache's default config contains some magic
to disable Keep-Alive for some broken versions of IE that claim to support it.

HTML includes its own means of "avoid[ing] particular user agent limitations",
such as CSS, and other ways of making content still accessible to older browsers.

The only browser revealing language in U-A I know of is Netscape 4.*, which,
last I heard, had 8% market share. Any page basing content off this field isn't
a whole hell of a lot effective right now. If future versions of Mozilla and
Netscape 6 remove it from U-A, more web designers won't be tricked into
mistaking U-A for A-L (see also: Microsoft J++).

OK, I'll shutup now, sorry for all the spam, folks.
Comment 36 Daniel Veditz [:dveditz] 2001-07-31 15:43:24 PDT
The user agent language is used for distribution tracking, right? Personally if
I were creating a language pack I'd be gratified to have a clue how far it had
spread, especially for a minority/endangered language (Navaho? Hawaiian?
Gaelic?). If I were using such a language I'd probably want to signify my
presence (ethnic pride).

This kind of thing should obviously be a pref. (and now we can commence arguing
over the default setting in Mozilla.)  If the UA were easier to change (i.e. via
the pref UI rather than hacking prefs.js) this wouldn't be so much of an issue.

As the person mostly responsible for foisting "navigator.language" on people in
4.x I think we could safely nuke it. Eh, I guess we should return a string so we
don't break pages accidentally on an undefined property, maybe "" or "unknown".
Comment 37 tao 2001-07-31 15:53:42 PDT
Hi, Kat:

We should probably inform webmasters of whatever change we make in the final so
they can adapt accordingly.
Comment 38 Mitchell Stoltz (not reading bugmail) 2001-07-31 17:06:05 PDT
I agree that we should have a pref. I don't know what the default setting should
be, but I would lean towards leaving the language in there by default. We should
have a pref checkbox for "paranoid mode" that will turn off the language part of
the UA as well as other small privacy violations which are the norm.
Comment 39 Jacek Piskozub 2001-07-31 17:10:22 PDT
I believe that as mst potential Mozilla users does not live in zones of war or
ethnic cleansing, the default setting should be the present behavior.
Comment 40 Ben Bucksch (:BenB) 2001-07-31 20:48:32 PDT
> as mst potential Mozilla users does not live in zones of war or ethnic
> cleansing, the default setting should be the present behavior.

The problem is that users might not know that we spread this info. And it's not
worth a UI pref IMO.

Default in Mozilla: So far, I count 1+4 (non-Netscape+Netscape) votes for on,
4+0 for off/dummy.
Comment 41 Daniel Veditz [:dveditz] 2001-07-31 22:39:29 PDT
Any votes gathered here are going to be meaningless because mostly only people
who agree will find this bug. People who are happy with the way things are have
no clue others want to change things--though I will grant that most people
probably don't care one way or another.

The navigator.language issue should be dealt with separately. IMHO axe that,
leave the UA the way it is, and make it a hell of a lot easier for people to
spoof the UA (e.g. pref UI with radio-buttons for common options and then a text
box for custom text). Paranoid nuts who are worried about language in the UA
also don't like giving out OS info and nearly everything else in the UA -- we
shouldn't address UA privacy issues item by item.

So is this bug about navigator.language, or is it about the UA? If the latter it
should be invalid, in favor of some UA uber-bug
Comment 42 Daniel Glazman (:glazou) 2001-08-01 00:34:01 PDT
Just to show that I am the kind of person who cares about languages : I use a
browser with an english UI. I configured it so it accepts the following
languages in this order : French, English, Swedish, Spanish, Yiddish, Xhosa.
And the person who Cc:ed me (timeless ?)on this bug also knows that privacy is
on top of list of my concerns.

I think this bug is a total non issue and a waste of time and neurons. We have
language strings crossing the web in all directions since 1996 and nobody never
ever complained about it.

My 0.02? only...
Comment 43 Ben Bucksch (:BenB) 2001-08-01 01:21:14 PDT
> People who are happy with the way things are have no clue others want to
> change things

Right.

OK, you talked me into making to default on for Mozilla.

> The navigator.language issue should be dealt with separately. IMHO axe that,

I care more about the UA-string, since we are really spreading this all over the
world and playing it in many server-logs. Implementation is coupled.

> and make it a hell of a lot easier for people to spoof the UA (e.g. pref UI
> with radio-buttons for common options and then a text box for custom text)

There is a bug about it, but it has its own problems, like (by the user)
unintended side-effects. I am not a fan of having UI for setting the UA-string
to arbitary values.
> So is this bug about navigator.language, or is it about the UA? If the latter
> it should be invalid, in favor of some UA uber-bug

It's about both ("don't reveal" includes all ways). But it is INVALID in no
case, because it's a legal request. It is also disabled in Beonex Communicator
(by default), so I would appreciate, if I wouldn't have to carry around source
patches.

If I attach a patch to make it depending on a pref, default on, everybody is
happy, no?
Comment 44 Jeremy M. Dolan 2001-08-01 02:19:58 PDT
I was originally arguing to remove the language from U-A, but enough NSCPies
want to leave it for various reasons, so I say just leave it. I didn't notice
4.7 had been doing it all this time, so it's nothing urgent. If it becomes an
issue, we'll address it post 1.0. But for god sakes, don't make it a pref. And
certainly no UI. 

If anything, a generic U-A pref (also with no UI, or a plain textbox under
Debug... none of this multi pulldown nonsense) could be used to remove it, or, I
have a bug open on disabling U-A altogether. But seperate prefs to tweak each
part of the U-A string would be nuts.
Comment 45 Daniel Veditz [:dveditz] 2001-08-01 09:59:35 PDT
Ben, implementation could be trivially uncoupled if we wanted to deal with
navigator.language separately.

>> is it about the UA? If the latter it should be invalid, in favor
>> of some UA uber-bug
>
> It's about both ("don't reveal" includes all ways). But it is INVALID in no
> case, because it's a legal request.
It's a valid concern, but wrong to consider in isolation from the other
user-agent privacy concerns. Do you really want a bunch of "Hide UA Language",
"Hide UA platform", "Hide UA OS", "Hide UA OS version", etc. prefs? The UI would
be ugly which means they'd be hidden prefs, and that means most people who might
benefit would have no clue they were there.
Comment 46 Mike Shaver (:shaver -- probably not reading bugmail closely) 2001-08-01 10:38:58 PDT
If IE 5.0 and 5.5 don't send the language in the UA string, how many sites can
we really be breaking?  I think the compatibility stance so far, in DOM and
other key areas, has been that if IE5 and NS4 do different things, we should ape
IE5 because it has so many more users.

(I'm sure that localization folks would love to know how many people are using
their work, and I have no problem with that desire, but I don't want to turn the
UA string into about:credits.)
The argument that we've been doing this since 1996 doesn't sway me: browsers had
privacy-hostile cookie and image handling for years to, but I'm pretty sure
nobody's going to stand up and say that fixing it doesn't matter.

I'd support pulling UI language out of the UA because
 - people should be using Accept-Language to tailor content

 - I can't believe that we're functionally breaking that many sites if IE
doesn't do this, and the browser-number-tracking stuff really doesn't sway me,
because, again, IE is the majority of the browser population, and you can't do
this kind of tracking on it

 - there's too much crap in the UA anyway
Comment 47 tao 2001-08-01 11:21:34 PDT
>If IE 5.0 and 5.5 don't send the language in the UA string, how many sites can
>we really be breaking?  

Some websites have logic like this:

  if (Netscape) {
    // assuming locale info presents
    do some locale specific things...
  }
  else if (MSIE) {
    do nothing...
  } else {
    do nothing..
  }

The problem is more like Netscape used to include locale info in the U-A and
some websites use it to do something for international users. Unless 
webmasters are advised of any upcoming change, their websites are doomed to
break. I won't be surprised to see they start advising people that they websites
work better with other browsers.


Comment 48 Mitchell Stoltz (not reading bugmail) 2001-08-01 14:33:33 PDT
I agree with dveditz. People who care whether or not their UI language is
revealed in the UA string probably also don't want their browser and OS versions
revealed. We already have a (hidden) pref for overriding the UA string; why
don't we just encourage people to use that?
Comment 49 Jacek Piskozub 2001-08-01 14:45:46 PDT
Mitchell: One of the reasons is bug 83376. It seems Sun Java uses the UA to
check if the browser is Netscape/Mozilla. It is either free choice of UA or
working Java :-(
Comment 50 timeless 2001-08-01 15:11:27 PDT
please IGNORE the sun jvm problem, that's a bug which someone working on oji or 
the sun jre will fix, it shouldn't ask us about our spoofable useragent.  All 
things considered a simple pref could probably be exposed:

[x] Include system and locale information in useragent. checked by default.

unchecking it would strip out all information except very basic stuff.
Comment 51 Jacek Piskozub 2001-08-01 15:30:05 PDT
Timeless: I'm actually for ignoring the Java bug. The thing that bothers me is
the reply from Sun edburns@acm.org posted on 7/17 as a comment to bug 83376:

> Java Plug-in depends on user-agent string for version information, no fix
> will be made.
>
> zhengyu.gu@sun.com

I believe this needs applyng some pressure on Sun.
Comment 52 Tim Powell 2001-09-07 15:17:53 PDT
Although IE5,IE5.5 do not include the language in the user agent string, they do
expose it in JavaScript through navigator.userLanguage and
navigator.systemLanguage. I believe navigator.language should continue to report
useful and accurate information in Mozilla. Perhaps the language returned can be
talored to the accept language stuff under edit->prefs->navigator->language.
Perhaps report the prefered language. It seems that this is the only way the DOM
can configure to the language, which could be useful if not serving special
pages for each language.

I don't see any reason to remove this from UA by default, especially since we
allow changing the UA completely. I've always thought that this was one of the
nicer features of N4.
Comment 53 bobj 2001-09-07 17:15:13 PDT
> Perhaps the language returned can be
> talored to the accept language stuff under edit->prefs->navigator->language.
> Perhaps report the prefered language.
These are different.  The string in the u-a indicates the browser localization
(i.e. the browser UI and some default settings).  The pref indicates the
user's preferred language(s) for the content.  E.g., a user may run a Japanese
browser, but prefers content in Arabic. Since the browser is enabled for many
more languages that it is currently localized, this is not unusual.
Comment 54 Jaime Rodriguez, Jr. 2001-09-08 09:00:36 PDT
Removing ME ---> barrowma (acting browser PM)
Comment 55 Mitchell Stoltz (not reading bugmail) 2001-10-04 15:12:38 PDT
time marches on...retargeting to 0.9.6
Comment 56 Mitchell Stoltz (not reading bugmail) 2001-10-31 15:07:42 PST
Moving to Moz1.0 as part of "paranoid mode" feature set
Comment 57 Asa Dotzler [:asa] 2001-12-03 11:16:36 PST
Bugs targeted at mozilla1.0 without the mozilla1.0 keyword moved to mozilla1.0.1 
(you can query for this string to delete spam or retrieve the list of bugs I've 
moved)
Comment 58 Mitchell Stoltz (not reading bugmail) 2002-08-02 18:19:01 PDT
Futuring.
Comment 59 Tony Mechelynck [:tonymec] 2008-03-30 11:26:50 PDT
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9pre) Gecko/2008033001 SeaMonkey/2.0a1pre

Workaround: Set the pref general.useragent.locale (in about:config) to the empty string (or even to any language you want to spoof as using). If set to the empty string, the semicolon and one surrounding space are removed too.

According to http://kb.mozillazine.org/General.useragent.locale , that pref was created on 2000-02-07.
Comment 60 Ben Bucksch (:BenB) 2008-03-30 18:07:13 PDT
Per HTTP spec, "message which makes the user aware of the loss of privacy involved", i.e. not only user-configurable, but even an alert.
(This applies all the more if this is set by default based on UI or OS language, like Firefox does.)
Comment 61 Daniel Veditz [:dveditz] 2008-11-11 23:33:45 PST
*** Bug 464414 has been marked as a duplicate of this bug. ***
Comment 62 XtC4UaLL [:xtc4uall] 2009-10-31 07:55:12 PDT
*** Bug 525637 has been marked as a duplicate of this bug. ***
Comment 63 messi 2009-10-31 08:23:16 PDT
While I believe that my request (bug 525637) is a superset of this very old bug and not a duplicate I'd like to reactivate the topic here. Though it seems Mozilla is avoiding the topic.

Also, setting the locale to blank is not a workaround because it will make you stand out even more. Please remove "has workaround" keyword.
Comment 64 Matthias Versen [:Matti] 2010-01-30 07:50:00 PST
*** Bug 543202 has been marked as a duplicate of this bug. ***
Comment 65 Henri Sivonen (:hsivonen) 2010-04-15 07:15:44 PDT
Bug 543202 is also a superset and not an exact duplicate. For example, the Gecko build date is absolutely useless for any non-b.m.o sniffing but it makes UA strings have more fingerprintable parts.
Comment 66 Henri Sivonen (:hsivonen) 2010-05-20 01:51:40 PDT
Note that CSS now exposes the directionality of the UI language. The HTML5 parser (via <isindex> prompt) exposes the UI language but not necessarily which regional variant when the string does not happen to vary by region.
Comment 67 Ben Bucksch (:BenB) 2010-05-20 03:02:08 PDT
Both should instead use the *content* language (Prefs|Content|Language|Choose... , which is by default Firefox install the same as the UI language). The content language is fine to expose to the website, that's what it's for, and it's changeable independently. It's also more correct to use that, because the site should adjust based on that, and it would have wrong effects to have an English-UI Firefox nightly set to Arabic report left-to-right and "en" in some places and Arabic in Accept-Language.
Comment 68 Gervase Markham [:gerv] 2010-05-24 02:24:18 PDT
Are there _any_ known cases of a website using Accept-Language to infer someone's ethnicity and taking action against them, either electronic or physical?

The only way to avoid that possibility would be to remove all language-identifying features from what the browser sends - Accept, JS, everything. However, these features are used today to provide serious and measurable benefits to web users, 99.999% of whom don't care if a site knows what languages they speak.

Gerv
Comment 69 Daniel Glazman (:glazou) 2010-05-24 02:58:44 PDT
(In reply to comment #68)

> The only way to avoid that possibility would be to remove all
> language-identifying features from what the browser sends - Accept, JS,
> everything. However, these features are used today to provide serious and
> measurable benefits to web users, 99.999% of whom don't care if a site knows
> what languages they speak.

I don't understand some of the comments above Gerv's... Ok, the spec says something, but pragmatism sometimes help if the spec is counter-productive

There are many web sites out there that are tailored to serve me better, and that includes serve me in my own language if and when they support it. This is a hugely positive factor for all users around the world. Geolocating the IP address is not enough and many web sites have elaborated detection based on all what's available today, including user-agent string. Change that and you'll break their behaviour, and I won't call that "for the greatest benefit of all".

I am myself strongly in favor of a wontfix for this bug unless the solution implemented is a "never reveal my language" pref in an advanced preference panel with default "on". And I'm not sure it's worth the bloat, honestly.
Comment 70 Ben Bucksch (:BenB) 2010-05-24 04:57:00 PDT
Gerv, I am not opposed to Accept-Language at all, which is user-configurable and defaulted to the UI language. I think I said as much in my last message.

I am opposed to the browser sending the *UI* language, where it's different from Accept-Language. Wherever we send the language, locale or country to the site, e.g. UserAgent string, it should be user-configurable value from Prefs|Content|Language|Choose... (which is already used for Accept-Language), not the UI language value.

There is no loss for the user here. On the contrary, if anything it's going to work better.
Comment 71 Ben Bucksch (:BenB) 2010-05-24 05:00:09 PDT
Please note that the summary of this bug specifically says "UI language", i.e. browser locale/ package, not "content language" = Accept-Language = "Prefs|Content|Language|Choose...".
Comment 72 Henri Sivonen (:hsivonen) 2010-05-25 05:50:05 PDT
FWIW, I'm not really concerned about using language per se for nefarious purposes. I'm more concerned about UI language being yet another piece of configuration entropy that can be used for fingerprinting. See https://panopticlick.eff.org/
Comment 73 Ben Bucksch (:BenB) 2010-05-25 06:44:08 PDT
I'll update patch here as part of bug 57555, once I get to it.
Comment 74 Gervase Markham [:gerv] 2010-05-26 02:46:04 PDT
Ben: so you are not arguing for this switch from a privacy point of view, but a functionality one? If so, that does make sense to me.

Gerv
Comment 75 Ben Bucksch (:BenB) 2010-05-26 03:07:46 PDT
I argue from both perspectives. If the user is in control, there is no privacy issue, or at least no critical one. Also, reducing the number of permutations is good for functionality (consistent) and privacy (fingerprinting), in this case.
Comment 76 Johnny Stenback (:jst, jst@mozilla.com) 2010-05-26 15:44:34 PDT
Not holding the 1.9.3 release for this bug.
Comment 77 gionnico 2010-06-24 18:34:52 PDT
I don't understand why should all locales but english one have something like

it-it
it;q=0.8
en-us;q=0.5
en;q=0.3

If I download italian, why is english there? And why two strings with different priorities (it-it and it)?

This, by the way gives more information for fingerprinting than a plain "it" (this is in general more common than the double it-it) set as default for every italian build: official or unofficial and for every platform.
Comment 78 Ben Bucksch (:BenB) 2010-06-24 18:38:31 PDT
gionnico, you're int he wrong bug. You talk about Accept-Language header, which is not subject of this bug.
Comment 79 Ben Bucksch (:BenB) 2010-07-25 15:00:03 PDT
Created attachment 460173 [details] [diff] [review]
Patch 3: Change navigator.language to use Accept-Language

After bug 572656 has fixed the UA string by removing the language part, this also fixes navigator.language. The property is retained (JS has no access to the Accept-Language header, to my knowledge), and retains the formal format, but uses the value from Accept-Language (which the user can freely configure in the pref window) instead of the UI language.

Asking biesi to review.
Comment 81 Ben Bucksch (:BenB) 2010-07-25 15:15:57 PDT
Comment on attachment 460173 [details] [diff] [review]
Patch 3: Change navigator.language to use Accept-Language

Actually, bz reviewed bug 572656
Comment 82 Axel Hecht [:Pike] 2010-07-28 06:43:27 PDT
Comment on attachment 460173 [details] [diff] [review]
Patch 3: Change navigator.language to use Accept-Language

Would this break on 

en-gb;q=0.8, en;q=0.7

Didn't dig into whitespace handling.

Not that I'm in sync with the rationale of these bugs, for the record.
Comment 83 Boris Zbarsky [:bz] 2010-07-28 10:20:04 PDT
Comment on attachment 460173 [details] [diff] [review]
Patch 3: Change navigator.language to use Accept-Language

Yeah, Axel's right.  This needs to handle q values.  And things like spaces around the ',' chars, etc.  Probably best to just use nsCharSeparatedTokenizer and then deal with the q value thing.
Comment 84 Ben Bucksch (:BenB) 2010-07-28 16:31:37 PDT
> This needs to handle q values.

You mean I need to look for ";" in addition to ","? Yes, sure, sorry for the oversight. Will attach new patch.

(I do not care to support *manually* hacked prefs that have a lesser preferred language as the first entry, if that's what you meant.)

> nsCharSeparatedTokenizer

Will take a look, how big the code will be with that and with FindInReadable().
Comment 85 Ben Bucksch (:BenB) 2010-07-31 05:11:33 PDT
Actually, the pref "intl.accept_languages" does not contains ;q= . The UI doesn't write q= in there, and if a user does, it's ignored and not sent in HTTP header, but the HTTP re-calculates the q=.

Also, while nsCharSeparatedTokenizer is useful here, it seems to skip the first token. Either I don't know how to use its API, or the implementation is broken.
Comment 86 Ben Bucksch (:BenB) 2010-07-31 05:13:39 PDT
> nsCharSeparatedTokenizer ... seems to skip the first token

Nevermind, I was stupid.
Comment 87 Ben Bucksch (:BenB) 2010-07-31 05:16:55 PDT
Created attachment 461774 [details] [diff] [review]
Patch 4: Change navigator.language to use Accept-Language
Comment 88 Reşat SABIQ (Reshat) 2010-07-31 16:53:01 PDT
I feel kinda bad for apparently slowing the approval of the patch down a bit, but I'd like to draw attention to a few things:
1. 3-char lang codes (need to be handled).
Based on url mentioned in another bug, there are 2 existing examples of them: http://mxr.mozilla.org/l10n-mozilla1.9.2/search?string=intl.accept_languages&find=global/intl.properties
(The same could be asked for 3-char country codes, but there are no examples of those, and they might not be probable (haven't looked into it).)
2. Default for messy accept-lang pref: should this be en-US rather than en for consistency w/ default en-US locale? I tend to think "yes".
3. Update urls: %LOCALE% in chrome prefs is resolved by navigator.language as of now, AFAIK. I'm 99.9% sure that the current UI language shouldn't be changed by an update based on preferred content language. If that's the case, then there needs to be an additional patch (in this bug or another), that accounts for necessary additional logic that is now required to substitute %LOCALE% in chrome app.update.url pref w/ UI lang, rather than navigator.language value.
(Plus, I'm not 100% sure that an update in language B for an installation whose UI is in language A works smoothly every time: i'm not saying it doesn't either, but it would probably be something worth verifying if the updates were going to be based on the current preferred language.)
4. empty pref special case (messed up, or user wishes to not specify accept-language in HTTP): 
4.1. should this result in navigator.language being empty, for consistency with HTTP Accept-Language? I tend to think "yes".
4.2. i'm also throwing this note in based on Java StringTokenizer's throwing an exception if nextElement() is called w/o hasMoreTokens() having returned true prior to that. Quick look at the source appears to suggest that nsCharSeparatedTokenizer works differently and (at least currently) returns an empty string instead. Forgive me for not having time to verify that, but i hope those who are already building FX 4 can take a quick look to make sure current empty pref handling wouldn't crash the app, for instance.

Accounting for all of this should easy, except maybe item 3., which necessitates a patch for another class.

P.S. For the record, I can't wait for this fix.
Comment 89 Reşat SABIQ (Reshat) 2010-07-31 17:20:38 PDT
Just noticed this as well:
5. Currently (FX <=3.6.x, country code in navigator.language is upper-case, whereas accept-language pref's country code is lower-case. 
Should navigator.language value be:
5.a. backwards compatible
(this doesn't seem feasible, because some locales have country code in accept-lang, and no country code in navigator.language; either that's something sites will need to adjust to, or there'd need to be a map of first-accept-lang=ui-lang pairs based on FX 3.6.x used for 100% backwards-compatibility)
or
5.b. upper-case
or
5.c. lower-case

IMHO, ideally 5.a., but 5.b. would be easier to implement and more consistent, at the expense of additional country code for some locales and/or case change for some locales (sorry, i haven't analyzed other locales for letter-case). If sites can adjust to UA format change for ALL locales, they can also adjust to navigator.language values changing slightly for SOME locales (but it might  warrant a list of before-after values for affected locales).
Comment 90 Reşat SABIQ (Reshat) 2010-07-31 17:45:17 PDT
(In reply to comment #89)
> 5.b. upper-case
> or
> 5.c. lower-case
Clarification, I meant:
5.b. using upper-case country code
or
5.c. using lower-case country code
Comment 91 [not reading bugmail] 2010-07-31 17:56:11 PDT
That can be addressed in another bug if it needs to be changed.
Comment 92 Reşat SABIQ (Reshat) 2010-07-31 18:03:51 PDT
(In reply to comment #88)
> 3. Update urls: %LOCALE% in chrome prefs is resolved by navigator.language as
> of now, AFAIK. I'm 99.9% sure that the current UI language shouldn't be changed
> by an update based on preferred content language. If that's the case, then
> there needs to be an additional patch (in this bug or another), that accounts
> for necessary additional logic that is now required to substitute %LOCALE% in
> chrome app.update.url pref w/ UI lang, rather than navigator.language value.

Please ignore item 3.: a couple of my memory wires got crossed, and in fact no changes need to be made to app.update.url handling, because %LOCALE% there appears to be replaced based on the locale in update.locale file in app directory, and NOT based on navigator.language value. Unlike elsewhere, I acknowledge having wastefully posted 10 lines in c88 for item 3, and 6 lines here to clear that up.

Also, to clarify, an obvious example for item 5 that i had in mind but didn't mention is en-US vs. en-us.
Comment 93 Ben Bucksch (:BenB) 2010-07-31 22:36:59 PDT
Thanks for the comments, Reshat.
1. I was assuming only 2-char ISO lang codes were valid, but the HTTP spec explicitly gives examples with other codes, so I'll be more lax. I'll still check that the user didn't use the wrong Windows syntax of en_GB instead of the correct Internet syntax en-GB.

3. If we broke %LOCALE% in the updater, that'd be really bad. I'll double-check that this is not the case, as well as whether there are other stupid uses of navigator.language.

5. Yes, I was already thinking of casing, thanks for pointing out that the current navigator.language is en-US. I'll maybe just fix the casing, but only in cases of a 5-letter code (lowercase 2-letter, dash, uppercase 2-letter).
Whether this is important also depends on other browsers. If they use different casing, likely sites will be tolerant (using .toLowerCase()).

Specs:
<http://asg.web.cmu.edu/rfc/rfc2616.html#sec-14.4>
<https://developer.mozilla.org/en/Navigator.language>
Comment 94 Ben Bucksch (:BenB) 2010-07-31 22:37:54 PDT
4. The fallback could be either "en" or "". I chose the former, but it's trivial to do the latter. Up to reviewer. *Informed* opinions welcome.
Comment 95 Ben Bucksch (:BenB) 2010-07-31 22:53:33 PDT
Responding to self, I think it would indeed be better to use "" as fallback, esp. in light of "i-cherokee" as first accept-lang, and of sites in other countries which want to use the local language as fallback.

When using "" as fallback, we leave the fallback to the site. When using "en", the site cannot differentiate whether it's a fallback or the user really meant English. So, I'll use "" as fallback.
Comment 96 Axel Hecht [:Pike] 2010-08-01 06:41:30 PDT
There's a good chance that we'll have script tags in language codes at some point, and maybe even x- stuff.
Comment 97 Reşat SABIQ (Reshat) 2010-08-01 11:27:02 PDT
(In reply to comment #88)
> 2. Default for messy accept-lang pref: should this be en-US rather than en for
> consistency w/ default en-US locale? I tend to think "yes".
...
> 4. empty pref special case (messed up, or user wishes to not specify
> accept-language in HTTP): 
> 4.1. should this result in navigator.language being empty, for consistency with
> HTTP Accept-Language? I tend to think "yes".

IMHO, "" navigator.language for "" accept-language is good.
With regards to "" or afore-mentioned "en-US" navigator.language for messy/unrecognized accept-language, i'm not so sure, because accept-language header is not "" or "en-US" in this case as of now. I tend to think navigator.language should at least match the language in accept-language (allowing the (remote) possibility of not having a country code as it is the case in 3.6.x). So for "messed-up" intl.accept_languages, i'd either:
2.1. provide "messed-UP" as navigator.language, and keep HTTP accept-language as is (i.e., "messed-up")
or
2.2. provide "" for both navigator.language and HTTP accept-language
2.3. provide "" for both navigator.language, and log a bug to do the same for HTTP accept-language

Taking into account Axel's comment as well, 2.1 might be the best way to go in the context of this bug. If 2.2, or 2.3 were found desirable, IMHO we need a separate bug for that (especially, because exceptional manually set values are the subject of discussion here).
Comment 98 Ben Bucksch (:BenB) 2010-08-02 16:24:40 PDT
Created attachment 462244 [details] [diff] [review]
Patch 5: Change navigator.language to use Accept-Language

- removed the check, so now also allowing i-cherokee
- return "", if the accept-lang pref is empty (or otherwise invalid)
- replace _ with - (only first one for now)
- return uppercase for en-US
Comment 99 Ben Bucksch (:BenB) 2010-08-02 16:31:31 PDT
Searching for "navigator.language" in source returns (only):
./extensions/reporter/resources/content/reporter/reportWizard.js:
  const gParamLanguage = window.navigator.language;
./testing/extensions/community/chrome/content/litmus.js:
  this.locale = navigator.language;
./testing/sisyphus/tests/mozilla.org/download-page/userhook.js:
  data['005 navigator.language']      = navigator.language;
I should fix these, too, but I will probably use bug 580032 as tracker for that.
Comment 100 Boris Zbarsky [:bz] 2010-08-02 22:07:50 PDT
Hmm.  Why do we want to do the bits about converting '_' to '-' (how would the '_' get there?) and uppercasing stuff?
Comment 101 Reşat SABIQ (Reshat) 2010-08-02 23:33:13 PDT
IMHO, these points are worth another look:
4.    IMHO, no assert is needed for empty accept-language, since RFC says:
"If no Accept-Language header is present in the request, the server
    SHOULD assume that all languages are equally acceptable."
Only affects debug builds, but still...
5. country code upper-casing:
The following cases should be handled as well:
abc-XY
abc-XY-dialect
https://wiki.mozilla.org/L10n:Locale_Codes
https://wiki.mozilla.org/L10n:Teams
One way of pseudocoding would be: if first '-' (at index 2 or 3), if any, is followed by no more than 2 alpha chars in a row, uppercase those 2 chars.
That said, the whole item goes away if we are going to make country code in navigator.language lower-case, though that wouldn't be backwards-compatible.
Comment 102 Axel Hecht [:Pike] 2010-08-02 23:41:23 PDT
Please, don't use our wiki as a standard. The real document is http://tools.ietf.org/html/bcp47, which refers to http://tools.ietf.org/html/rfc4647 for the matching stuff.

Here, http://tools.ietf.org/html/bcp47#section-2.1.1 rules:

 At all times, language tags and their subtags, including private use
   and extensions, are to be treated as case insensitive: there exist
   conventions for the capitalization of some of the subtags, but these
   MUST NOT be taken to carry meaning.
Comment 103 Reşat SABIQ (Reshat) 2010-08-02 23:50:41 PDT
Thanks, Axel. Well, i assumed we wanted consistency. If we are fine w/ en-US on one hand, and ast-es on the other, that's a different matter of course. If that is considered and decided to be OK, then that's the way it's gonna be. Still worth drawing attention to it, IMHO.
Comment 104 Ben Bucksch (:BenB) 2010-08-03 03:31:49 PDT
> Hmm.  Why do we want to do the bits about converting '_' to '-'
> (how would the '_' get there?) and uppercasing stuff?

1. "-" vs. "_". The RFCs and ISO say that the separator is "-", e.g. "en-US". However, POSIX uses LANG="de_DE" in env vars, and Windows also uses the "en_US" notation. The latter is therefore common, and I've seen many people use "en_US" although the spec/protocol explicitly said "en-US", so it's a common error. How would it get there? By somebody editing the pref manually in <about:config>. The normal pref dialog does not allow to specify arbitrary tags. To avoid confusion and problems on the site's end, I want to prevent this, even if it's unlikely. I tried to make sure it's not a perf problem (just 2 int comparisons) nor a lot of code (just 2 lines).

2. uppercase: our pref contains e.g. "de-de,en-us,en", i.e. lower case. However, our locale codes are "de-DE", i.e. country part in upper case. BCP47 (mentioned by Axel above) also uses this notation in the examples. It's the convention. We used to return that as well, e.g. "en-US". If a site doesn't use navigator.language.toLowerCase(), the comparison will not fail. If our own code (comment 99) is any indication, this error is common, so if we don't adjust the casing to the convention, we may break stuff.

Now, my parsing here is primitive. It works fine for the 2-letter-dash-2-letter codes that we use for locales, and IIRC that's all that the pref dialog allows currently, so it should work be sufficient currently. Only "failure" is indeed "ast-es" and similar codes with 3 letters as first part. BCP47 also gives examples: zh-Hant, de-Latn-DE, de-DE-x-goethe. For these, my code would return zh-HANT, de-LATN-DE, de-DE-X-GOETHE, which is not the convetional casing. If you think I should improve this, I would use the nsCharSeparator here as well, and then uppercase every 2-letter part, apart from the first part.
Comment 105 Boris Zbarsky [:bz] 2010-08-03 09:14:44 PDT
I guess my question is how far we're willing to go in terms of trying to canonicalize random input.  I think the only two sources of this pref are:

1)  What our prefs dialog generates.
2)  What our localizers set up as the default for their locale.
3)  about:config.

I claim we don't care about #3, can impose any reasonable rules we want on #2, and fully control #1.  So if it simplifies our code, we should just assume whatever we want and impose corresponding rules on #2.  Axel, thoughts?
Comment 106 Axel Hecht [:Pike] 2010-08-03 09:47:50 PDT
rightly so
Comment 107 Boris Zbarsky [:bz] 2010-08-03 11:15:17 PDT
s/two sources/sources/, clearly.  ;)
Comment 108 Reşat SABIQ (Reshat) 2010-08-03 18:50:27 PDT
Concise:
IMHO, Ben's suggestion of uppercasing the first 2-letter part, after the first part, if any, might be ideal (though a bit harder to implement), but restricting such uppercasing to just the assumption that this 2-letter part is the second subtag (following 2- or 3-char first subtag), as i understand Boris and Axel appear to be inclined to do, will probably be sufficient for a long time (and might save x ms on each access?). If these are the 2 operational choices to proceed with, then choosing between them is almost a coin-toss situation IMHO.

Verbose additional info:
One could also say that we have 1 model, which is the intl.accept_languages chrome pref, whose value is presented by several views (in MVC pattern lingo).

Paraphrasing Boris, the input can come from:
i. pre-shipped intl.accept_languages pref values based on intl.properties (1) and 2))
ii. random about:config entries by the user (3))

FYI, if i enter "messed1-up,messed2-up" as random input, both prefs dialog and about:config reflect the same value (prefs dialog just shows them as codes in [] without displaying a recognized lang name in front of each value, although some pre-shipped values also don't have recognized lang names).
Comment 109 Ben Bucksch (:BenB) 2010-08-03 19:52:14 PDT
Created attachment 462638 [details] [diff] [review]
Patch 6: Change navigator.language to use Accept-Language, using tokerizer for uppercasing

Here's another patch with better uppercasing code. It uses the tokenizer for this as well, and uppercases all 2-letter parts, apart from the first one.
Comment 110 Ben Bucksch (:BenB) 2010-08-03 19:57:18 PDT
> I claim we don't care about [<about:config>], can impose any reasonable rules
> we want on [our prefs UI], and fully control [the defaults of the
> localized builds].  So if it simplifies our code, we should just assume
> whatever we want and impose corresponding rules on [the prefs UI].

OK, great, works for me.

Only catch is: the intl.accept_languages pref may have existing values. So, even if we change the prefs UI, the old values are still there. And unfortunately, they are in "de-de" notation. So, unless you want to migrate, we have to work with that.

I attached another patch with better lang part parsing. Make your pick. I'd take patch 6.
Comment 111 Boris Zbarsky [:bz] 2010-08-03 20:49:30 PDT
Comment on attachment 462638 [details] [diff] [review]
Patch 6: Change navigator.language to use Accept-Language, using tokerizer for uppercasing

I still think the uppercasing is silly, but r=me if you s/PRBool/bool/ for that thing you assign bools into.
Comment 112 Ben Bucksch (:BenB) 2010-08-03 20:56:27 PDT
Comment on attachment 462638 [details] [diff] [review]
Patch 6: Change navigator.language to use Accept-Language, using tokerizer for uppercasing

what happens with sr?
Comment 113 Ben Bucksch (:BenB) 2010-08-03 21:03:01 PDT
Created attachment 462663 [details] [diff] [review]
Patch 7: Change navigator.language to use Accept-Language

Playing Boules with bool and PRBool.
Comment 114 Boris Zbarsky [:bz] 2010-08-03 21:51:22 PDT
Comment on attachment 462663 [details] [diff] [review]
Patch 7: Change navigator.language to use Accept-Language

Let's have jst do that.
Comment 115 Johnny Stenback (:jst, jst@mozilla.com) 2010-08-05 15:19:37 PDT
Comment on attachment 462663 [details] [diff] [review]
Patch 7: Change navigator.language to use Accept-Language

+    while (localeTokenizer.hasMoreTokens())
+    {
+      const nsSubstring &code = localeTokenizer.nextToken();
+      if (code.Length() == 2 && !first)
+      {
+        nsAutoString upper(code);
+        ::ToUpperCase(upper);
+        aLanguage.Replace(pos, code.Length(), upper);
+      }
+      pos += code.Length() + 1; // 1 is the separator
+      if (first)
+        first = false;

Might as well loose the if check there, the result will be the same w/o it and with less branching and less code.

sr=jst
Comment 116 Robert Kaiser (not working on stability any more) 2010-09-02 05:52:31 PDT
That bug basically means decreased usability for people on our website, as we can't offer the correct locale download matching the browser version they are using right now any more. Thanks for breaking us.
Comment 117 Ben Bucksch (:BenB) 2010-09-02 05:57:41 PDT
KaiRo, wrong. Just use Accept-Language. That's the standard and what you should have used anyway.
Comment 118 Robert Kaiser (not working on stability any more) 2010-09-02 06:03:14 PDT
(In reply to comment #117)
> KaiRo, wrong. Just use Accept-Language. That's the standard and what you should
> have used anyway.

Completely wrong. I don't give a damn about the preferred language of *web sites* for that user (and that's what Accept-Language is), I only care about the *UI language* he is using. If I can't match that, I probably should even think about trying about giving the user any specific preferred download but send him through a hurdle run of clicks to select it himself. Very user friendly, but thank you for giving me no other choice.
Comment 119 Daniel Glazman (:glazou) 2010-09-02 06:28:08 PDT
(In reply to comment #118)

> Completely wrong. I don't give a damn about the preferred language of *web
> sites* for that user (and that's what Accept-Language is), I only care about
> the *UI language* he is using. If I can't match that, I probably should even
> think about trying about giving the user any specific preferred download but
> send him through a hurdle run of clicks to select it himself. Very user
> friendly, but thank you for giving me no other choice.

Guys, you are _both_ right. Ben want to follow an IETF recommendation on
a privacy issue and KaiRo wants to match the UI language because that's the
only way he can serve correctly someone like me, ie someone browsing the web
in french when it's available but using only en-US software.

That said, KaiRo, I think the vast majority of internet users use a browser
UI locale matching the accept-language's topmost language, and only a small
minority of geeks have a different configuration.

Let me ask a naive question here: if I download a given localized version of
Firefox, is the Accept-Language set by default to match that language? If yes,
then at least KaiRo can rely on that for, again, the vast majority of users.
The minority of übergeeks will be annoyed a bit but hey we're always annoyed
by everything aren't we?

FWIW, I still think the war-on-privacy-issues goes too far here. Anyway...
Comment 120 Dão Gottwald [:dao] 2010-09-02 06:35:03 PDT
(In reply to comment #119)
> Let me ask a naive question here: if I download a given localized version of
> Firefox, is the Accept-Language set by default to match that language?

Yes.

FWIW, this isn't just fixing a privacy issue. It's also about making navigator.language more useful when it's used on the client side very much like Accept-Language would be used on the server side.
Comment 121 Ben Bucksch (:BenB) 2010-09-02 06:36:01 PDT
> if I download a given localized version of
> Firefox, is the Accept-Language set by default to match that language?

For the big locales, yes. For some small locales, there may be differences.

Yes, the download button detecting system and language is only for a good default, to make it easier for the majority of users.

I think a link/page "Other languages and systems", like Firefox has, should solve this for the most part.
Comment 122 Robert Kaiser (not working on stability any more) 2010-09-02 06:40:36 PDT
(In reply to comment #119)
> Let me ask a naive question here: if I download a given localized version of
> Firefox, is the Accept-Language set by default to match that language?

Fully depends on the localizers, but usually, the UA language is *among* the value in Accept-Language, even if some carefully chosen variant of it might be the primary language listed there. Also, the user might change the Accept-Language at will, making it way more fingerprintable than the UI locale, and e.g. possible making "ger-saxon" their primary Accept-Language header when using a German (de) build, i.e. having an Accept-Language header including all of ger-saxon, de-DE, de, and possibly even en-US and/or en.

> The minority of übergeeks will be annoyed a bit but hey we're always annoyed
> by everything aren't we?

At least by all the paranoia going on with some things like the so-called "privacy" or "fingerprinting" threats introduced by UA strings.

And I'm very sensible about privacy matters usually, but there are clearly things where we are overdoing it while at the same time not working on stuff that ought to be much higher priority but may have higher impact overall, like the exposure of the plugin or installed font lists to the web, which are way more fingerprintable than ridiculous things like the UI languages or using a nightly.
Comment 123 Axel Hecht [:Pike] 2010-09-02 06:42:14 PDT
In most cases, the first accept locale differs from the chosen locale. Maybe just in that it's de-de instead of de, or upper vs lower case things that folks reading locale-code specs should already deal with, but it's not exactly the same thing.

http://mxr.mozilla.org/l10n-mozilla1.9.2/search?string=intl.accept_languages&find=global/intl.properties for data.
Comment 124 Robert Kaiser (not working on stability any more) 2010-09-02 06:42:51 PDT
In any case, as I can't be bothered to write a heuristic parser for that new stupid property, I'll just offer en-US builds when that locale doesn't give an exact match with one of the locales we can offer, everyone else needs to use the the "other platforms and languages link". Who needs usability anyhow.
Comment 125 Ben Bucksch (:BenB) 2010-09-02 07:11:33 PDT
> usually, the UA language is *among* the value in Accept-Language

So, it's solvable. You can detect which UI locale the build is, and offer the right download, in most cases (98%? of users) at least. The others can fall back to "other languages".
Comment 126 Robert Kaiser (not working on stability any more) 2010-09-02 07:20:28 PDT
(In reply to comment #125)
> > usually, the UA language is *among* the value in Accept-Language
> 
> So, it's solvable. You can detect which UI locale the build is, and offer the
> right download, in most cases (98%? of users) at least.

Only if I build a parser for the Accept-Languages list, which is quite some work for a simple download box...
Comment 127 Ben Bucksch (:BenB) 2010-09-02 07:31:12 PDT
fairly simple:
var localeMapping = {
  "de-de" : "de",
  "fo-ba" : "ba",
  ...
}
var useLocale = "unknown";
for each (var entry in acceptLang.split(",")) { // separate langs
  // strip q and spaces
  let lang = entry.replace(/;.*/, "").replace(" ", "").toLowerCase();
  if (localeMapping[lang]) {
    useLocale = localeMapping[lang];
    break;
  }
}
if (useLocale == "unknown") {
  showOtherLangsInBiggerFont(); // or directly on page
  useLocale = "en-US";
}
var downloadURL = mirror + "seamonkey-" + currentVersion + "-" + platformSpec + "-" + useLocale + platformExtension;

That's 12 lines of JS code, plus the mapping (which is fairly static). It won't be much more in PHP or whatever you use on the website, if you want to do it there instead.
Comment 128 Pascal Chevrel:pascalc 2010-09-02 07:47:36 PDT
kairo, for PHP you can use my locale detection class:
http://granary.stage.mozilla.com/libs/l10n-demos/localeDetectionDemo.php
Comment 129 Axel Hecht [:Pike] 2010-09-02 08:37:39 PDT
FTR, both ignore script tags, which we'll apparently not get for 4.0, but that are totally fine to use (not that we have UI for those).
Comment 130 dwitte@gmail.com 2010-09-03 13:27:18 PDT
Should we get a followup bug for removing the 'general.useragent.locale' pref from all.js and nuking the corresponding code/API in nsHttpHandler?
Comment 131 Axel Hecht [:Pike] 2010-09-03 14:44:38 PDT
We don't have to have some pref to select the chrome locale, and I don't see a good argument for dropping this one.
Comment 132 dwitte@gmail.com 2010-09-03 14:59:54 PDT
Sorry, I don't follow -- you're saying it's unnecessary but you want to keep it? What useful information does it provide?
Comment 133 Dão Gottwald [:dao] 2010-09-04 01:11:17 PDT
(In reply to comment #130)
> Should we get a followup bug for [...] nuking the corresponding code/API in nsHttpHandler?
Comment 134 Reşat SABIQ (Reshat) 2010-09-04 12:32:20 PDT
(In reply to comment #116)
> That bug basically means decreased usability for people on our website, as we
> can't offer the correct locale download matching the browser version they are
> using right now any more. 

It would also be more convenient if people had their name, and their preferred language written on their foreheads. Yet, somehow, i don't think i would want to be one of those people, and i think the majority of people wouldn't be either. It's not all about convenience and usability. There are trade-offs, and cost-benefit analyses involved.

(In reply to comment #130)
> Should we get a followup bug for removing the 'general.useragent.locale' pref
> from all.js and nuking the corresponding code/API in nsHttpHandler?

I think the pref might be used by some add-ons that provide UI for switching between more than 2 langpacks. Not sure if this is still workable in FF 4... That said, http should no longer have anything to do w/ this pref. If it stays, it should only be for manual or addon-based UI locale manipulation.
Comment 135 Daniel Cater 2010-09-05 09:03:43 PDT
I don't understand comment 131 or comment 133.
Comment 136 Axel Hecht [:Pike] 2010-09-05 09:06:09 PDT
(In reply to comment #131)
> We don't have to have some pref to select the chrome locale, and I don't see a
> good argument for dropping this one.

Can't type.

We do have to have some pref to select the chrome locale, and I don't see a good argument for dropping g.u.locale.
Comment 137 dwitte@gmail.com 2010-09-05 13:01:09 PDT
It's now misnamed, that's all. We can leave the name as-is but we should probably remove the API for reading it on nsHttpHandler, because that doesn't belong there anymore.
Comment 138 Benjamin Smedberg [:bsmedberg] 2010-10-29 14:06:42 PDT
Please wait until after we branch.
Comment 139 :Ehsan Akhgari (busy, don't ask for review please) 2011-03-28 14:03:24 PDT
I do not feel comfortable taking this on cedar.  Please land this on mozilla-central when it's ready
Comment 140 Dão Gottwald [:dao] 2011-03-30 04:37:30 PDT
http://hg.mozilla.org/mozilla-central/rev/ead683169ef2
Comment 141 Dão Gottwald [:dao] 2011-03-30 04:38:44 PDT
*** Bug 580032 has been marked as a duplicate of this bug. ***
Comment 142 Ben Bucksch (:BenB) 2011-03-30 05:06:47 PDT
Thanks, Dao, for commiting!
Comment 143 Ben Bucksch (:BenB) 2011-03-30 05:26:58 PDT
(hihi. Almost 10 years after my first patch here. Do I get a prize? :) )
Comment 144 shawn.sumin 2011-03-30 10:05:14 PDT
@Ben: You get a Cookie (Brand name: Privacy Cookies). Congratz.
Comment 145 Ben Bucksch (:BenB) 2011-03-30 10:19:10 PDT
nomnomnom
Comment 146 Jorge Villalobos [:jorgev] 2011-04-20 18:11:15 PDT
sheppy, I think this might be important for add-on developers and worthwhile documenting. Thanks!
Comment 147 Eric Shepherd [:sheppy] 2011-04-21 08:44:30 PDT
Documentation updated:

https://developer.mozilla.org/en/DOM/window.navigator.language

Also mentioned on Firefox 5 for developers.
Comment 148 Asa Dotzler [:asa] 2011-05-11 11:50:29 PDT
not interested for 5.
Comment 149 Eric Shepherd [:sheppy] 2011-05-11 11:51:46 PDT
I thought this landed already on Aurora. Did it not?
Comment 150 Benjamin Smedberg [:bsmedberg] 2011-05-11 11:57:37 PDT
Looks like it did, but release drivers still have no reason to track it in particular at this point.

Note You need to log in before you can comment on or make changes to this bug.