Closed
Bug 1054739
Opened 10 years ago
Closed 9 years ago
Reduce HTTP Accept-Language Entropy
Categories
(Core :: Networking: HTTP, defect)
Core
Networking: HTTP
Tracking
()
RESOLVED
FIXED
mozilla36
People
(Reporter: karl156, Assigned: awake)
References
(Blocks 2 open bugs)
Details
(Keywords: dev-doc-complete, privacy, site-compat, Whiteboard: [fingerprinting])
Attachments
(1 file, 2 obsolete files)
4.09 KB,
patch
|
mcmanus
:
review+
|
Details | Diff | Splinter Review |
When switching language via UI the Accept-Language Header is altered. For example the default German Accept-Language String is this: "Accept-Language: de,en-US;q=0.7,en;q=0.3" When playing with it in the UI, even after resetting, I get this: "Accept-Language: de,en-us;q=0.7,en;q=0.3" Note the switch from uppercase to lowercase country-code. To reset this I had to reset "intl.accept_languages" in about:config This unnecessarily increases fingerprintability. Steps to reproduce: 1. Note original Accept-Language (uppercase country-code) 2. Go to Preferences - Languages 3. Hit "up" and "down" 4. See altered Accept-Language (lowercase country-code)
Updated•10 years ago
|
Version: 31 Branch → Trunk
Assignee | ||
Comment 1•10 years ago
|
||
I'd like to give this bug a shot. Could somebody assign it to me?
Updated•10 years ago
|
Assignee: nobody → bugzillafx
Assignee | ||
Comment 2•10 years ago
|
||
The Accept-Language header is created from the intl.accept_languages string in about:config. As such, the capitalization of that string is reflected in the request header. Changing the preferred languages in the UI always sets intl.accept_languages to lowercase. Also, different localizations define intl.accept_languages differently: some have uppercase regions, while others do not. See https://mxr.mozilla.org/l10n-central/search?string=intl.accept_languages&find=intl.properties. A quick check of Internet Explorer and Chrome show that they use uppercase regions. Possible solutions: 1. Make intl.accept_languages consistent among all locales. Modify the behavior of setting the preferred languages in the UI. 2. Change the header to be consistent prior to making the request, so that the case does not matter. The first solution looks to be the better, but longer, approach. Does anyone have any thoughts?
Assignee | ||
Comment 3•10 years ago
|
||
This is a preliminary patch that fixes the problem of inconsistent capitalization of the Accept-Language header for locales with uppercase regions. Locales with a default lowercase language code were not affected since preference changes through the UI automatically sets the new accepted languages to lowercase. Their default intl.accept_languages string will need to be modified to match the new capitalization. That would mean patching roughly half or more of the 104 locales in l10n-central.
Comment 4•10 years ago
|
||
I think normalizing the string automatically in the netwerk code actually setting the HTTP header would be largely preferable.
Assignee | ||
Comment 5•9 years ago
|
||
This patch normalizes the Accept-Language header by making the language code lowercase and the region code uppercase (everything after the hyphen).
Attachment #8526803 -
Attachment is obsolete: true
Attachment #8527305 -
Flags: review?(dao)
I think the patch works slightly different to "navigator.language" and "navigator.languages": intl.accept_languages;DE-DE, EN-US, EN navigator.language - DE-DE navigator.languages - DE-DE,EN-US,EN intl.accept_languages;de-de, en-us, en navigator.language - de-DE navigator.languages - de-DE,en-US,en These tests are somewhat artificial, but as you can see the language token remains untouched while the county token is converted to uppercase. This is not a suggestion to change the patch. I just wanted to bring it to your attention.
Updated•9 years ago
|
Attachment #8527305 -
Flags: review?(dao) → review?(mcmanus)
Assignee | ||
Comment 7•9 years ago
|
||
Thanks for bringing that up. The navigator.languages code makes uppercase subtags after the first that are two letters long. All other tags are left unmodified. With the latest patch, the formatting of the language tags for special cases is different in navigator.languages and in the Accept-Language request header, as pointed out. If the preferred languages were set from the UI, the result should be the same. (The list of available languages for selection in the UI is from https://mxr.mozilla.org/mozilla-central/source/intl/locale/language.properties and all the subtags there are two letters.) https://tools.ietf.org/html/rfc5646 describes conventions for formatting the language tags: > o [ISO639-1] recommends that language codes be written in lowercase > ('mn' Mongolian). > > o [ISO15924] recommends that script codes use lowercase with the > initial letter capitalized ('Cyrl' Cyrillic). > > o [ISO3166-1] recommends that country codes be capitalized ('MN' > Mongolia). > > An implementation can reproduce this format without accessing the > registry as follows. All subtags, including extension and private > use subtags, use lowercase letters with two exceptions: two-letter > and four-letter subtags that neither appear at the start of the tag > nor occur after singletons. Such two-letter subtags are all > uppercase (as in the tags "en-CA-x-ca" or "sgn-BE-FR") and four- > letter subtags are titlecase (as in the tag "az-Latn-x-latn"). Perhaps this formatting convention can be used in navigator.languages and the Accept-Language request header, or when setting the intl.accept_languages string. Either one of these would be an improvement.
Comment 8•9 years ago
|
||
Comment on attachment 8527305 [details] [diff] [review] Normalize language codes in the netwerk code Review of attachment 8527305 [details] [diff] [review]: ----------------------------------------------------------------- This changes the content of an accept header - so its really gerv's module that needs to ok it. I'm fine with the c++ changes in it, thanks for letting me see it. ::: netwerk/protocol/http/nsHttpHandler.cpp @@ +1527,5 @@ > + { > + if (*code == '-') { > + is_region = true; > + } else { > + if (is_region) brace the if/else bits please
Attachment #8527305 -
Flags: review?(mcmanus) → review?(gerv)
Comment 9•9 years ago
|
||
Comment on attachment 8527305 [details] [diff] [review] Normalize language codes in the netwerk code Review of attachment 8527305 [details] [diff] [review]: ----------------------------------------------------------------- This Wikipedia article: http://www.wikiwand.com/en/IETF_language_tag explains all the different parts of the language code which may be present, and the appropriate case for each. If we have to cope with arbitrary language codes, then this code will need enhancing somewhat. It doesn't deal with e.g. "nan-Hant-TW" correctly. If we only have to cope with codes which are defined locales, that is a smaller set. But AIUI, people can enter their own codes in the UI? Gerv
Attachment #8527305 -
Flags: review?(gerv) → review-
Comment 10•9 years ago
|
||
(In reply to Gervase Markham [:gerv] from comment #9) > If we only have to cope with codes which are defined locales, that is a > smaller set. But AIUI, people can enter their own codes in the UI? No, they can selected languages from a list we provide.
Comment 11•9 years ago
|
||
Comment on attachment 8527305 [details] [diff] [review] Normalize language codes in the netwerk code Review of attachment 8527305 [details] [diff] [review]: ----------------------------------------------------------------- OK. If we are sure all the possible inputs to this function are of the simple form that it copes with, then it's OK with me to canonicalise Accept-Language to xx-YY. Gerv
Attachment #8527305 -
Flags: review- → review+
Assignee | ||
Comment 12•9 years ago
|
||
Added braces. Renamed function from NormalizeLanguageCode to NormalizeLanguageTag.
Attachment #8527305 -
Attachment is obsolete: true
Attachment #8528297 -
Flags: review?(gerv)
Comment 13•9 years ago
|
||
Comment on attachment 8528297 [details] [diff] [review] Normalize language tags when setting the Accept-Language header. v2 Review of attachment 8528297 [details] [diff] [review]: ----------------------------------------------------------------- I'm not the right person to review this from a code perspective. It's fine otherwise. Gerv
Attachment #8528297 -
Flags: review?(gerv)
Updated•9 years ago
|
Attachment #8528297 -
Flags: review?(mcmanus)
Comment 14•9 years ago
|
||
Comment on attachment 8528297 [details] [diff] [review] Normalize language tags when setting the Accept-Language header. v2 Review of attachment 8528297 [details] [diff] [review]: ----------------------------------------------------------------- lgtm
Attachment #8528297 -
Flags: review?(mcmanus) → review+
Comment 16•9 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/a426024d5d15
Status: UNCONFIRMED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla36
Comment 17•9 years ago
|
||
This is an Internationalization change that I and others should have been made aware of and consulted on. Here is a list of sample language tags in their canonical form to be used for testing: https://github.com/GPHemsley/BCP47/blob/master/test_tags.txt
Blocks: bcp47
Comment 18•9 years ago
|
||
Can we back this out? The code logic violates the standard, and doesn't match our code behavior, notably, if a sr would set their default to sr-Latn, sr, en for example, the upper-casing of the thing after the - doesn't work.
Comment 19•9 years ago
|
||
we should reuse http://ecma-international.org/ecma-402/1.0/#sec-6.2.3
Comment 20•9 years ago
|
||
dao - based on 17-19 do you think this should come out of 36 and 37?
Flags: needinfo?(dao)
Comment 21•9 years ago
|
||
My understanding is that comment 18 is referring to hypothetical cases that were already discussed earlier in this bug. It's probably a good idea to take care of them to be future-proof, but unless some locales are affected by this today (e.g. can sr-Latn be selected from the UI?) I don't see the immediate need to back this out.
Flags: needinfo?(dao)
Assignee | ||
Comment 22•9 years ago
|
||
I've created a patch to address the canonicalization of the language tag in the HTTP request header. Should it be posted here or in a separate bug?
Comment 23•9 years ago
|
||
(In reply to awake [:awake] from comment #22) > I've created a patch to address the canonicalization of the language tag in > the HTTP request header. Should it be posted here or in a separate bug? New bug, please. Thanks!
Assignee | ||
Comment 24•9 years ago
|
||
The new bug is at https://bugzilla.mozilla.org/show_bug.cgi?id=1108183.
Comment 25•9 years ago
|
||
(In reply to Dão Gottwald [:dao] from comment #21) > My understanding is that comment 18 is referring to hypothetical cases that > were already discussed earlier in this bug. It's probably a good idea to > take care of them to be future-proof, but unless some locales are affected > by this today (e.g. can sr-Latn be selected from the UI?) I don't see the > immediate need to back this out. Locales set their default Accept-Language header explicitly, outside of the UI. In addition, anyone can set it via about:config, which is also outside of the UI. This is Bad Code™. We already have enough of this faulty algorithm in our codebase. New code should not be duplicating it, especially when there is already a standardized algorithm available to do these sorts of things. This should be backed out so it doesn't accidentally get left in the codebase forever.
Comment 26•9 years ago
|
||
(In reply to Gordon P. Hemsley [:GPHemsley] from comment #25) > (In reply to Dão Gottwald [:dao] from comment #21) > > My understanding is that comment 18 is referring to hypothetical cases that > > were already discussed earlier in this bug. It's probably a good idea to > > take care of them to be future-proof, but unless some locales are affected > > by this today (e.g. can sr-Latn be selected from the UI?) I don't see the > > immediate need to back this out. > > Locales set their default Accept-Language header explicitly, outside of the > UI. Sure, what I said and the question I asked don't contradict this. > In addition, anyone can set it via about:config, which is also outside > of the UI. Things can break when messing with about:config. about:config isn't for end users, so no reason to set off the alarm. > This should be backed out so it doesn't accidentally get left in the > codebase forever. We file bugs on things so we don't forget about them.
Comment 27•9 years ago
|
||
Here's a way to 100% reliably break standards without interacting with prefs or even UI: Start Firefox OS 2.2, Simulator will work, and select either version of Serbian. The first value in accept-languages will either be sr-Latn or sr-Cyrl, which should be normalized exactly like this.
Comment 28•9 years ago
|
||
(In reply to Dão Gottwald [:dao] from comment #26) > (In reply to Gordon P. Hemsley [:GPHemsley] from comment #25) > > (In reply to Dão Gottwald [:dao] from comment #21) > > > My understanding is that comment 18 is referring to hypothetical cases that > > > were already discussed earlier in this bug. It's probably a good idea to > > > take care of them to be future-proof, but unless some locales are affected > > > by this today (e.g. can sr-Latn be selected from the UI?) I don't see the > > > immediate need to back this out. > > > > Locales set their default Accept-Language header explicitly, outside of the > > UI. > > Sure, what I said and the question I asked don't contradict this. My point is that it's not hypothetical. Locales can already do this now, and I expect some already have. And when they do, it shows up in the UI. > > This should be backed out so it doesn't accidentally get left in the > > codebase forever. > > We file bugs on things so we don't forget about them. Right, because it's not like there are any bugs on file from 14 years ago. This is not an urgent issue. There is no reason to commit broken code to fix it.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 29•9 years ago
|
||
I agree that if this code is known-broken, it should be backed out. The aim of reducing entropy in Accept-Language is not worth us sending badly-case-folded language tags. I rescind my approval of this patch. Gerv
Comment 30•9 years ago
|
||
To my knowledge, this still has not been backed out.
Flags: needinfo?(philringnalda)
Flags: needinfo?(dao)
Comment 31•9 years ago
|
||
Your knowledge of that appears to me to be correct, though it seems a lot easier to just look at https://hg.mozilla.org/integration/fx-team/log/default/netwerk/protocol/http/nsHttpHandler.cpp to see whether or not there's a push which backed it out that wasn't mentioned in the bug, rather than asking me.
Flags: needinfo?(philringnalda)
Comment 32•9 years ago
|
||
(In reply to Phil Ringnalda (:philor) from comment #31) > Your knowledge of that appears to me to be correct, though it seems a lot > easier to just look at > https://hg.mozilla.org/integration/fx-team/log/default/netwerk/protocol/http/ > nsHttpHandler.cpp to see whether or not there's a push which backed it out > that wasn't mentioned in the bug, rather than asking me. The needinfo wasn't about whether it was backed out; it was more a request to back it out, since you pushed it to m-c.
Comment 33•9 years ago
|
||
Oh, you *did* need info that I have! "Would it be appropriate for either the volunteer sheriff who happened to be the one to merge properly reviewed code which was not failing any tests to mozilla-central or for one of the paid sheriffs to get in the middle of a situation where a patch which wanders across the weird ownership boundary between necko and content-headers has one owner deferring his backout decision to someone else, and the other owner saying it should be backed out, but not actually doing it?" "No, that's not the sort of backouts that sheriffs do. We back out patches which fail tests. Adjudicating the boundary between ownerships is not at all our department." Either mcmanus or dao can decide that it should be backed out, and either one is perfectly capable of doing the backout themselves. Or gerv can decide that it's entirely his decision and back it out himself, or if he feels he needs help backing it out, he can ask either whoever has the nick ending in |sheriffduty or any developer for assistance.
Comment 34•9 years ago
|
||
Given the amount of time that has passed, it would be better if the patch author (Dao) backed it out as they would know best how to fix up any conflicts. However, if that is not done, I will back it out. Gerv
Comment 35•9 years ago
|
||
I'm not the patch author. A patch is waiting in bug 1108183, pending a response from Pike who raised his concern here but is now unresponsive there; apparently the concern isn't so critical after all.
Flags: needinfo?(dao)
Comment 36•9 years ago
|
||
And I'm resolving this bug again to reflect its actual state (landed, not backed out). Please stop messing with the bug status as a means to force your way.
Status: REOPENED → RESOLVED
Closed: 9 years ago → 9 years ago
Resolution: --- → FIXED
Updated•9 years ago
|
Keywords: dev-doc-needed,
site-compat
Comment 37•9 years ago
|
||
Posted the site compatibility doc for the record: https://www.fxsitecompat.com/en-US/docs/2015/language-names-in-accept-language-header-are-now-canonicalized/
Keywords: dev-doc-needed → dev-doc-complete
You need to log in
before you can comment on or make changes to this bug.
Description
•