Closed
Bug 1108183
Opened 10 years ago
Closed 9 years ago
Regularize case of Accept-Language HTTP header
Categories
(Core :: Networking: HTTP, defect)
Core
Networking: HTTP
Tracking
()
RESOLVED
FIXED
mozilla39
Tracking | Status | |
---|---|---|
firefox39 | --- | fixed |
People
(Reporter: awake, Assigned: awake)
References
Details
Attachments
(1 file)
4.17 KB,
patch
|
gerv
:
review+
|
Details | Diff | Splinter Review |
Subtags of language tags in the Accept-Language HTTP header request do not match the case of subtags defined in the Language Subtag Registry (https://www.iana.org/assignments/language-subtag-registry/language-subtag-registry). The mismatch occurs when the user sets a language tag that is not of the form xx-yy in the 'intl.accept_languages' about:config option, for example, sr-Latn. To reproduce: 1. In about:config, modify intl.accept_languages to sr-Latn. 2. Check the Accept-Language request header when accessing any site. See original discussion at https://bugzilla.mozilla.org/show_bug.cgi?id=1054739.
Assignee | ||
Comment 1•10 years ago
|
||
Comment 2•10 years ago
|
||
Can we make sure that it matches https://mxr.mozilla.org/mozilla-central/source/js/src/builtin/Intl.js#302 (or reuse it if possible?)
Assignee | ||
Comment 3•10 years ago
|
||
I don't know how or if it's possible to reuse the js function here. The C++ function only regularizes letter cases and does not substitute deprecated subtags or sort extension tags, unlike the js function. The letter cases should be identical. The substitutions or reordering of subtags when canonicalizing language tags are only required where the user has set their own language tags in the about:config option. In these cases, it should be safe to assume that the provided input is valid. Thus, that additional functionality should not be necessary here.
Comment 4•10 years ago
|
||
Comment on attachment 8532763 [details] [diff] [review] Regularize case of subtags Review of attachment 8532763 [details] [diff] [review]: ----------------------------------------------------------------- I'm going to delegate this review
Attachment #8532763 -
Flags: review?(mcmanus) → review?(gandalf)
Updated•10 years ago
|
Comment 5•10 years ago
|
||
Comment on attachment 8532763 [details] [diff] [review] Regularize case of subtags This code doesn't deal well with ext-lang subtabs from http://tools.ietf.org/html/bcp47#section-2.1. That it doesn't uppercase 3-digit region codes obviously doesn't matter much ;-). I haven't found anything utterly constructive in ICU, fwiw, it would be nice if we could pick up code from there to help the general case, and just do something "pretty good" as fallback as long as we don't have ICU across all platforms.
Comment 6•10 years ago
|
||
As I mentioned in bug 1054739 comment 17, a smattering of language tags in their canonical form are available for testing here: https://github.com/GPHemsley/BCP47/blob/master/test_tags.txt
Comment 7•10 years ago
|
||
(In reply to Axel Hecht [:Pike] from comment #5) > Comment on attachment 8532763 [details] [diff] [review] > Regularize case of subtags > > This code doesn't deal well with ext-lang subtabs from > http://tools.ietf.org/html/bcp47#section-2.1. > > That it doesn't uppercase 3-digit region codes obviously doesn't matter much > ;-). > > I haven't found anything utterly constructive in ICU, fwiw, it would be nice > if we could pick up code from there to help the general case, and just do > something "pretty good" as fallback as long as we don't have ICU across all > platforms. When ICU is available, I think uloc_canonicalize() could provide what we want.
Comment 8•10 years ago
|
||
(In reply to Jonathan Kew (:jfkthame) from comment #7) > > When ICU is available, I think uloc_canonicalize() could provide what we > want. Might return '_' seperated instead of '-', and it might have @something stuff. IIRC, the result matches the Unicode locale identifiers, which is a superset of BCP 47 (more data, and _ instead of - is OK)
Comment 9•10 years ago
|
||
(In reply to Axel Hecht [:Pike] from comment #8) > (In reply to Jonathan Kew (:jfkthame) from comment #7) > > > > When ICU is available, I think uloc_canonicalize() could provide what we > > want. > > Might return '_' seperated instead of '-', and it might have @something > stuff. OK, so we can use uloc_getBaseName (to drop any @keyword stuff). And then do s/_/-/ on the result if necessary.
Comment 10•10 years ago
|
||
This sounds slightly overengineered; what's the point if we need a fallback for when ICU isn't available anyway?
Assignee | ||
Comment 11•10 years ago
|
||
(In reply to Axel Hecht [:Pike] from comment #5) > Comment on attachment 8532763 [details] [diff] [review] > Regularize case of subtags > > This code doesn't deal well with ext-lang subtabs from > http://tools.ietf.org/html/bcp47#section-2.1. > > That it doesn't uppercase 3-digit region codes obviously doesn't matter much > ;-). > > I haven't found anything utterly constructive in ICU, fwiw, it would be nice > if we could pick up code from there to help the general case, and just do > something "pretty good" as fallback as long as we don't have ICU across all > platforms. Forgive me as I my understanding of localization is not great. I've been learning more about it since working on this issue. Could you elaborate on which ext-lang subtags are not dealt with correctly? The algorithm used is from http://tools.ietf.org/html/bcp47#section-2.1.1. > An implementation can reproduce this format without accessing the > registry as follows. All subtags, including extension and private > use subtags, use lowercase letters with two exceptions: two-letter > and four-letter subtags that neither appear at the start of the tag > nor occur after singletons. Such two-letter subtags are all > uppercase (as in the tags "en-CA-x-ca" or "sgn-BE-FR") and four- > letter subtags are titlecase (as in the tag "az-Latn-x-latn").
Updated•10 years ago
|
Flags: needinfo?(l10n)
Comment 12•10 years ago
|
||
You can create codes like zh-nan-Hant, but I think I misread the code there. One thing that rereading seems to not work is that x-I-can-do-what-I-want is ok, privateuse and grandfathered productions from the grammar in http://tools.ietf.org/html/bcp47#section-2.1. And AFAICT, the want and what should stay lowercase. Upper-case I would get lowercased, too. Should we add most of the example codes in http://tools.ietf.org/html/bcp47#appendix-A to the test?
Flags: needinfo?(l10n)
Comment 13•10 years ago
|
||
(In reply to Axel Hecht [:Pike] from comment #12) > One thing that rereading seems to not work is that x-I-can-do-what-I-want is > ok, privateuse and grandfathered productions from the grammar in > http://tools.ietf.org/html/bcp47#section-2.1. And AFAICT, the want and what > should stay lowercase. Upper-case I would get lowercased, too. What of this stuff is really relevant here? We're not dealing with arbitrary input and thus don't need to support the whole grammar. We only need to care about language tags that locales might want to set by default and that users can select from the UI.
Updated•10 years ago
|
Flags: needinfo?(l10n)
Comment 14•10 years ago
|
||
Comment on attachment 8532763 [details] [diff] [review] Regularize case of subtags Review of attachment 8532763 [details] [diff] [review]: ----------------------------------------------------------------- removing review request until we get a consensus on where we're going with this.
Attachment #8532763 -
Flags: review?(gandalf)
Comment 15•9 years ago
|
||
Comment on attachment 8532763 [details] [diff] [review] Regularize case of subtags I'm interpreting the silence as "don't care"; gandalf, can you please review this?
Attachment #8532763 -
Flags: review?(gandalf)
Comment 16•9 years ago
|
||
(In reply to Dão Gottwald [:dao] from comment #13) > (In reply to Axel Hecht [:Pike] from comment #12) > > One thing that rereading seems to not work is that x-I-can-do-what-I-want is > > ok, privateuse and grandfathered productions from the grammar in > > http://tools.ietf.org/html/bcp47#section-2.1. And AFAICT, the want and what > > should stay lowercase. Upper-case I would get lowercased, too. > > What of this stuff is really relevant here? We're not dealing with arbitrary > input and thus don't need to support the whole grammar. We only need to care > about language tags that locales might want to set by default and that users > can select from the UI. That's practically wrong. Our metrics data shows that people put random stuff in those prefs. Note, bullying people around isn't going to get you speedy responses, I just didn't feel like going into another round of you just ignoring the real world out there on the web.
Flags: needinfo?(l10n)
Comment 17•9 years ago
|
||
People messing with hidden prefs (which we explicitly warn about) isn't exactly "the real world out there on the web". So the question is how far we want to go with supporting such edge cases. Would the code become much more complicated? (I really have no idea.) Can users of other browsers put random stuff in the Accept-Language header?
Comment 18•9 years ago
|
||
Looks like we're not getting anywhere here. I would suggest that we take the proposed patch, assuming it's correct for what it intends to do, and file the case of users putting random stuff in the hidden pref as a new bug, knowing that it might end up as WONTFIX depending on how much we care about that case.
Comment 19•9 years ago
|
||
Comment on attachment 8532763 [details] [diff] [review] Regularize case of subtags Review of attachment 8532763 [details] [diff] [review]: ----------------------------------------------------------------- I feel really bad about this bug. I struggle to understand anymore if bug 1054739 + the patch that I'm looking at, together with the benefit of bug 1054739 are enough to out-weight the regressions that bug 1054739 introduced. On top of that, I'm worried about subtle ICU vs. non-ICU differences that will bite us back. I've tried to review this patch multiple times, but I start to feel that while I can technically review the code, the review here is more of a sr= nature and I'm not the right person to make the call. Should I still review it?
Comment 20•9 years ago
|
||
(In reply to Zibi Braniecki [:gandalf] from comment #19) > Comment on attachment 8532763 [details] [diff] [review] > Regularize case of subtags > > Review of attachment 8532763 [details] [diff] [review]: > ----------------------------------------------------------------- > > I've tried to review this patch multiple times, but I start to feel that > while I can technically review the code, the review here is more of a sr= > nature and I'm not the right person to make the call. > > Should I still review it? FWIW, I think this code seems reasonable. Regularizing the case of Accept-Language is a Good Thing™, IMO.
Comment 21•9 years ago
|
||
(In reply to Axel Hecht [:Pike] from comment #12) > One thing that rereading seems to not work is that x-I-can-do-what-I-want is > ok, privateuse and grandfathered productions from the grammar in > http://tools.ietf.org/html/bcp47#section-2.1. And AFAICT, the want and what > should stay lowercase. Upper-case I would get lowercased, too. Oh, to follow up on this point. I think I agree with Axel: Rather than doing nothing, the seenSingleton code should be lowercasing all the way. And grandfathered tags should mostly be OK, I think. Most come after singletons, and the rest follow capitalization rules for their length. (Which is not to say that they shouldn't be hard-coded instead; the BCP implies that they should be.) It might also be worth noting that there has been discussion recently about deprecating all remaining grandfathered tags in favor of standard forms.
Comment 22•9 years ago
|
||
Comment on attachment 8532763 [details] [diff] [review] Regularize case of subtags back to necko peer/owner; gandalf doesn't feel comfortable reviewing this
Attachment #8532763 -
Flags: review?(gandalf) → review?(mcmanus)
Comment 23•9 years ago
|
||
As this is more of a content issue, I'm going to ask gerv to find the right person to ok it.
Comment 24•9 years ago
|
||
Comment on attachment 8532763 [details] [diff] [review] Regularize case of subtags Review of attachment 8532763 [details] [diff] [review]: ----------------------------------------------------------------- normally I don't adjudicate the contents of non-protocol elements..
Attachment #8532763 -
Flags: review?(mcmanus) → review?(gerv)
Updated•9 years ago
|
Attachment #8532763 -
Flags: review?(gerv) → review+
Comment 25•9 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=efc206efbee4
Keywords: checkin-needed
Comment 26•9 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/a653443ce794
Keywords: checkin-needed
Comment 27•9 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/a653443ce794
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
status-firefox39:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla39
You need to log in
before you can comment on or make changes to this bug.
Description
•