Closed
Bug 54093
Opened 25 years ago
Closed 25 years ago
Language Preference limited to 5 characters
Categories
(Core :: Internationalization, defect, P3)
Tracking
()
VERIFIED
FIXED
Future
People
(Reporter: bobj, Assigned: shanjian)
Details
(Keywords: intl)
Attachments
(2 files)
In Edit|Preferences...Languages, if I hit the Add button and try to type
a custom language ID, such as "foo-bar", the input field will not let me
type more than 5 characters (e.g., "foo-b").
In 4.x, there is no limit.
![]() |
||
Comment 2•25 years ago
|
||
Low priority. Mark it as future. Reassign to shanjian
Assignee: ftang → shanjian
Target Milestone: --- → Future
![]() |
Assignee | |
Comment 3•25 years ago
|
||
Fix is very simple and low risk. Just let me know when should I check in
the fix. (Probably this should be done to the trunk).
Index: pref-languages-add.xul
===================================================================
RCS file:
/cvsroot/mozilla/xpfe/components/prefwindow/resources/content/pref-languages-add
.xul,v
retrieving revision 1.7
diff -c -r1.7 pref-languages-add.xul
*** pref-languages-add.xul 2000/07/29 01:17:58 1.7
--- pref-languages-add.xul 2000/10/06 21:42:44
***************
*** 52,58 ****
<box autostretch="never">
<text class="label" value="&languages.customize.others.label;"
for="languages.other"/>
! <textfield id="languages.other" size="7" maxlength="5"/>
<text class="label" value="&languages.customize.others.examples;"
for="languages.other"/>
</box>
--- 52,58 ----
<box autostretch="never">
<text class="label" value="&languages.customize.others.label;"
for="languages.other"/>
! <textfield id="languages.other" size="12" maxlength="16"/>
<text class="label" value="&languages.customize.others.examples;"
for="languages.other"/>
</box>
Status: NEW → ASSIGNED
![]() |
||
Comment 4•25 years ago
|
||
Shanjian,
Does your fix accommodate "q" values to be inserted
manually? It should be possible to accept "Q" values
such as follows manually:
zh;q=0.85
for each manual entry.
![]() |
||
Comment 5•25 years ago
|
||
It was possible to input "Q" value under 4.x this
way and we sohuld definitely allow this flexibility.
![]() |
||
Comment 6•25 years ago
|
||
CC'ed adrian who is working on "Q" value generation for
the entries.
![]() |
||
Comment 7•25 years ago
|
||
I think this is a "it's not a bug, it's a feature" type report. In other words,
Navigator's "other language" box does absolutely NO error checking and,
depending on the platform, will let you put ANYTHING in there, including
punctuation, kanji, etc., and send this to the HTTP server, regardless of
whether or not it's correct or not.
I don't think the Netscape Navigator/Communicator documentation mentions
anywhere that you can enter a "q" value in that box: this is a clever kludge
that someone used knowing that it doesn't perform proper error checking and
knowing how HTTP works. Note that the user can also enter an illegal q value
such as a number greater than one or a float with more than 4 significant
digits. I mean, if we allow the user to set HTTP header Q values through
preferences, we should probably allow the user to tweak other parts of the
protocol, such as wrapping of HTTP headers, extra headers, etc. Which may be
useful for a very small segment of the developer population. But then again, if
they really need to tweak the HTTP request, they now have the option to modify
the source directly. :)
If Accept-Language is modified so that it attaches Q values automatically like
in bug 58034, I don't see why we need to expose this low level protocol
functionality to the end-user if they can get the same desired effect by using
the arrow-buttons in the Language Preference dialog.
As Mozilla really should do error checking on this field, I'm attaching a patch
that will make sure the language conforms to RFC 2616, HTTP 1.1
<URL:http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.10>, with the
following "real-world" restrictions.
1. The Other Language field will be limited to 5 or 6 characters, as 99% of the
entries that will go in will be in the form "aa-BB".
2. The Other Language field will allow a maximum of 19 characters, which will
allow for a language tag like "x-mylangok-noextras".
3. Only a maximum of two dash/hyphens will be allowed, even though RFC 1766 sets
no limit.
The input field checker will allow the following combinations:
ja-JP-kansai
x-klingon-tng
i-ianalang
en
en-JP
In other words, the labels must be alphabetical, and unless the prefix is "x-"
or "i-", the first two tags must be exactly two letters long.
![]() |
||
Comment 8•25 years ago
|
||
![]() |
||
Comment 9•25 years ago
|
||
Adding blizzard so he can comment on the patch
![]() |
||
Comment 10•25 years ago
|
||
It looks reasonable to me, except for all the whitespace changes. :)
![]() |
||
Comment 11•25 years ago
|
||
Regarding the patch, you don't need to update/modify files under "mozilla/l10n";
they are not part of the build.
![]() |
||
Comment 12•25 years ago
|
||
I can understand the desire not to allow Q value setting
if they can be set algorithmically. So,
I will cocede this
point -- personally I would have liked to be able to
set values myself.
I have some additional questions/comments on
adrian's comments:
havill@redhat.com said:
> 1. The Other Language field will be limited to 5 or
> 6 characters, as 99% of the entries that will go in will
> be in the form "aa-BB"
Can you clarify what this means? I thought up to 8 characters
are allowed for either the primary or sub-tag. Is this what
you're referring to?
>In other words, the labels must be alphabetical,
> and unless the prefix is "x-" or "i-", the first
> two tags must be exactly two letters long.
"Exactly 2 letters long" part is too limiting.
ISO-639-2 allows 3-letter code as well. Since
ISO-639-1 is not likely to be sufficient, we should
allow for the 3-letter primary lang code.
Note also that a standard track revision of RFC 1766 is
likely to be completed soon and that revision also
allows ISO-639-2 three-letter code in the primary tag.
http://www.ietf.org/internet-drafts/draft-alvestrand-lang-tag-v2-05.txt
>3. Only a maximum of two dash/hyphens will be allowed,
> even though RFC 1766 sets no limit.
This is too limiting and arbitrary. We should anticipate
at least 3 to 4 hyphens. For example, in the above
revision document, the author a subtag like the following:
Region identification, such as sgn-US-MA (Martha's Vineyard Sign
Language, which is found in the state of Massachusetts, US)
This is just a sub tag and so the total hyphen will surely
exceed 3 hyphens.
We should review the revision document at least before
deciding on the details of what we should be allowing
as input. The 2-letter lang code limitation for primary
tag and hyphen limitation for the entire string need
to be reconsidered.
![]() |
||
Comment 13•25 years ago
|
||
Katsuhiko Momoi wrote:
> Can you clarify what ["field will be limited to 5 or 6
> characters"] means? I thought up to 8 characters
> are allowed for either the primary or sub-tag. Is this what
> you're referring to?
The EBNF definition at
<URL:http://andrew2.andrew.cmu.edu/rfc/rfc1766.html#sec-2.> does indeed imply
that parsers should read up to 8 characters but then in the text explanation it
says what each of these values may be and says that: "Other values cannot be
assigned except by updating this standard."
> "Exactly 2 letters long" part is too limiting.
> ISO-639-2 allows 3-letter code as well.
ISO-639-2 may allow it, but RFC 2616 (HTTP 1.1 std) which goes by RFC 1766 (and
specifically summarizes it mentioning TWO letter language and country codes)
does not.
> Since ISO-639-1 is not likely to be sufficient, we should
> allow for the 3-letter primary lang code.
If you did, not only would this go against HTTP/1.1, it would break most current
HTTP servers (they would not understand "jpn" to be a synonym for "ja").
> Note also that a standard track revision of RFC 1766 is
> likely to be completed soon and that revision also
> allows ISO-639-2 three-letter code in the primary tag.
> <URL:http://www.ietf.org/internet-drafts/draft-alvestrand-lang-tag-v2-05.txt>
Is listed as a "Best Current Practice" and not standards track, even though it
claims it will obsolete 1766. I do see the future need for three letter lang
codes, but I worry that people who really have a need to do this need to be
careful and understand how "current practice" HTTP servers work so they don't
enter "eng" for English and wonder why the server won't give them the "en"
English document.
> [Only allowing up to two dash/hyphens] is too limiting and arbitrary.
> We should anticipate at least 3 to 4 hyphens. For example, in the above
> revision document, the author a subtag like the following:
> Region identification, such as sgn-US-MA (Martha's Vineyard Sign
> Language, which is found in the state of Massachusetts, US)
> This is just a sub tag and so the total hyphen will surely
> exceed 3 hyphens.
In the real world, not just hypotetical? Can you think of a real example, no
matter how rare the language is, where more than two would be needed? (a
sub-dialect of Martha's Vinevard Sign Language {most definitely a MPEG or MPG
server resource}), especially since the third tag is free form and can be as
specific as possible. (e.g. en-US-texas, ja-JP-kyoto, i-klingon-tos {turns out
klingon is not "x-", as my example above mentions, but registered with IANA})
But then again, I remember someone once said that 640K was all the memory a
computer would ever need, so the restriction perhaps should go, especially since
it it's current form that permits only the primary tag to be "x-" or two
letters, the above example would have to be entered as "x-sgn-US-MA", which is
indeed 3 dash/hyphens. And since this field is probably currently for academics
and others doing "rare" language work, we probably shouldn't crimp their style.
===== PATCH FIX =====
Simply change the integer literal constant in line 265 to the max dashes you
want to allow, and add 9 to maxlength (9 == "-mysubtag") in line 18. Or to be
unlimited, comment out line 265 and remove the maxlength attribute from line 18.
Also, I do notice that IANA has registered things
<URL:http://www.egt.ie/standards/iso639/iana-lang-assignments.html> like:
zh-yue (Cantonese)
zh-min (Min, Fuzhou, Hokkien, Amoy, Taiwanese)
zh-guoyu (Mandarin)
which are very real and popular languages and is a better solution than the
current method IMO (zh-TW, zh-CN, zh-HK, etc.)
so the two letter restriction for the first subtag, because I thought that no
real-world languages were registered with IANA without the "i-" prefix, was
incorrect.
change the following patched code in
xpfe/components/prefwindow/resources/content/pref-languages.js:
+ /* the first subtag can be either a 2 letter ISO 3166 country code,
+ a ISO 3166 user assined code (AA, QM-QZ, XA-XZ and ZZ),
+ or an IANA registered tag from 3 to 8 characters. I don't
+ think their are any IANA registered 3 to 8 letter extensions,
+ so if someone wants a custom variation, they'll have to use
+ the second subtag or use the "x" primary tag as we'll only
+ allow 2 letters here is a ISO 639 language code is the primary
+ tag.
+ */
+ if (tags.length > 1) {
+ if (tags[1].length != 2) return false;
+ if (!isAlpha(tags[1])) return false;
+ checkedTags++;
+ }
to
+ /* the first subtag can be either a 2 letter ISO 3166 country code,
+ a ISO 3166 user assigned code (AA, QM-QZ, XA-XZ and ZZ),
+ or an IANA registered tag from 3 to 8 characters.
+ */
+ if (tags.length > 1) {
+ if (tags[1].length < 2) return false;
+ if (!isAlpha(tags[1])) return false;
+ checkedTags++;
+ }
![]() |
||
Comment 14•25 years ago
|
||
Adrian, thank you first of all for additional
clarification and discussion. I think you're
coming close to what I would like to see now.
I would like to make additional comments, however,
on several points you raised.
First of all, the need to update RFC 1766 has been
around for a while and this is a known fact. The most
pressing need is for languages which do not have
2-letter representation since ISO639-1 has been closed
for further updates.
People who will attempt to enter 3-letter codes are
advanced users who need that form of representation.
There is really is no need to replace the current 2-letter
code with a corresponding 3-letter one. I think the practice
will settle on using the 3-letetr variety only when the
2-letter one is not available for that language.
Ruling out the 3-letter code for fear that some davanced
users may abuse it not a good reason for not accommodating
Mozilla users who will have this need. Mozilla should be
friendly to international users needs.
>Is listed as a "Best Current Practice" and not standards
> track, even though it claims it will obsolete 1766.
The author's intent is clear. There has been sufficient
discussion for the need to allow the 3-letter lang code
and the same author who had written 1766 has undertaken to
update it because 1766 says it must be updated to allow
other lang code representation. That is part of the
intent of the update to RFC 1766 which will obsolete it.
Further, HTTP 1.1 does not explicitly rule out 3-letetr code.
All that it says is that "if there is a 2-letter
code in the primary tag", then it must be from ISO-639. It does
not say "if the code is from ISO639", then it must be 2-letter code.
This latter part is left to RFC 1766. Now that it is certain that
RFC 1766 will be obsoleted by the proposal of the
original author of RFC 1766, we should anticipate it
and allow for the 3-letter code. Remember, only advanced users
will be using this manual fill-in feature. A large majority
of users will be content with what is in the list -- which if it
uses 3-letter code will be restricted to only those which
will fill a void not covered by the 2-letter codes.
At least, my current plan to update the built-in
Accept Language list will not use 3-letter codes unless
a 2-letter code cannot be found for that language. I think
we can control its use this way.
> codes, but I worry that people who really have a need
> to do this need to be careful and understand how
> "current practice" HTTP servers work so they don't
> enter "eng" for English and wonder why the server
> won't give them the "en" English document.
I understand your worry but I think the best use will settle on
using 3-letter code only if there is no 2-letter equivalent.
I think that is also what Mr. Alvestrand should include in this
revision. I'm willing to write to him, Martin Durst, M. Everson
and others to revise the wording on the revision of the new
RFC to recommend 3-letter only when 2-letter variety is not
available.
My sense is that if a server does not understand a 3-letter
code, it should ignore it. If someone implemented a
parsing code which only accepts 2-letter code, then that code
is not very good considering that RFC 1766 lists 1*8ALPHA
as part of the official syntax no matter what any additional
comments say.
> In the real world, not just hypotetical? Can you think
> of a real example, no matter how rare the language is,
> where more than two would be needed?
As a linguist, I can tell you that it is very easy to come up
with more than 2 hyphens. For exmple, I can think of a need
to cite the following for Middle Japanese Kansai dialect
document.
ja-middle-jpn-kansai-jp
Note that whitespace is not allowed in the lang string
and so any time you have words like East Hebrides,
Old Japanese, we need to hyphenate it. It is also easy to
find examples in Amerindian languages which must use hyphens
more than twice. For example,
In the Na-Dene family, you find a language like:
Tanana-Upper-Kuskokwim
I could then have something like
Tanana-Upper-Kuskok-CA-US
omitting "wim" from Kuskokwim due to 8-letter limitation.
My recommendation: Don't constrain the hypneation narrowly.
There are too many real lanaguges whose names need hyphens
in more than 2 places in the subtag.
![]() |
||
Comment 15•25 years ago
|
||
![]() |
Assignee | |
Comment 16•25 years ago
|
||
I am ok with the latest patch. I would suggest to enlarge the text field size because we allow
more text in it, but that's no big deal.
adrian, do you have checkin privilege? If so, I can reassign the bug to you. Otherwise let me know
and I will take care of it.
![]() |
||
Comment 17•25 years ago
|
||
adrian, shanjian, I now assume that other than the
limitation not to allow "Q" values, there are no
other limitations and that the HTTP 1.1/RFC1766 (& revision)
syntax is accommodated. If so, we should go ahead and
check this in.
![]() |
Assignee | |
Comment 18•25 years ago
|
||
pref-languages.js crashes on my computer. I spent almost a whole day tracing this, and I
just found out that it has nothing to do with changes made in this bug. But we have to
hold on the fix until the original problem got fixed.
![]() |
||
Comment 19•25 years ago
|
||
Shanjian Li wrote:
> pref-languages.js crashes on my computer.
Can you be more specific (like a bug report)? Perhaps others can help. This
patch was working on the last nitely build when it was submitted.
![]() |
||
Comment 20•25 years ago
|
||
Changed QA contact to ylong@netscape.com.
Keywords: intl
QA Contact: teruko → ylong
![]() |
Assignee | |
Comment 21•25 years ago
|
||
fix checked in.
Status: ASSIGNED → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
![]() |
||
Comment 22•25 years ago
|
||
Nutz. I noticed one minor typo (not in the code, in a comment) in the checked in
source at
<URL:http://lxr.mozilla.org/seamonkey/source/xpfe/components/prefwindow/resource
s/content/pref-languages.js#348>
it says "ISO 639 country code"... when it should be "ISO 639 language code".
Same goes for line 349 below it (actually, it doesn't check if the country OR
language code exists)
Can the person who checked it in change this one word so code readers don't get
confused?
You need to log in
before you can comment on or make changes to this bug.
Description
•