Closed Bug 54093 Opened 25 years ago Closed 25 years ago

Language Preference limited to 5 characters

Categories

(Core :: Internationalization, defect, P3)

All
Other
defect

Tracking

()

VERIFIED FIXED
Future

People

(Reporter: bobj, Assigned: shanjian)

Details

(Keywords: intl)

Attachments

(2 files)

In Edit|Preferences...Languages, if I hit the Add button and try to type a custom language ID, such as "foo-bar", the input field will not let me type more than 5 characters (e.g., "foo-b"). In 4.x, there is no limit.
Reassign to ftang, cc to tao.
Assignee: nhotta → ftang
Low priority. Mark it as future. Reassign to shanjian
Assignee: ftang → shanjian
Target Milestone: --- → Future
Fix is very simple and low risk. Just let me know when should I check in the fix. (Probably this should be done to the trunk). Index: pref-languages-add.xul =================================================================== RCS file: /cvsroot/mozilla/xpfe/components/prefwindow/resources/content/pref-languages-add .xul,v retrieving revision 1.7 diff -c -r1.7 pref-languages-add.xul *** pref-languages-add.xul 2000/07/29 01:17:58 1.7 --- pref-languages-add.xul 2000/10/06 21:42:44 *************** *** 52,58 **** <box autostretch="never"> <text class="label" value="&languages.customize.others.label;" for="languages.other"/> ! <textfield id="languages.other" size="7" maxlength="5"/> <text class="label" value="&languages.customize.others.examples;" for="languages.other"/> </box> --- 52,58 ---- <box autostretch="never"> <text class="label" value="&languages.customize.others.label;" for="languages.other"/> ! <textfield id="languages.other" size="12" maxlength="16"/> <text class="label" value="&languages.customize.others.examples;" for="languages.other"/> </box>
Status: NEW → ASSIGNED
Shanjian, Does your fix accommodate "q" values to be inserted manually? It should be possible to accept "Q" values such as follows manually: zh;q=0.85 for each manual entry.
It was possible to input "Q" value under 4.x this way and we sohuld definitely allow this flexibility.
CC'ed adrian who is working on "Q" value generation for the entries.
I think this is a "it's not a bug, it's a feature" type report. In other words, Navigator's "other language" box does absolutely NO error checking and, depending on the platform, will let you put ANYTHING in there, including punctuation, kanji, etc., and send this to the HTTP server, regardless of whether or not it's correct or not. I don't think the Netscape Navigator/Communicator documentation mentions anywhere that you can enter a "q" value in that box: this is a clever kludge that someone used knowing that it doesn't perform proper error checking and knowing how HTTP works. Note that the user can also enter an illegal q value such as a number greater than one or a float with more than 4 significant digits. I mean, if we allow the user to set HTTP header Q values through preferences, we should probably allow the user to tweak other parts of the protocol, such as wrapping of HTTP headers, extra headers, etc. Which may be useful for a very small segment of the developer population. But then again, if they really need to tweak the HTTP request, they now have the option to modify the source directly. :) If Accept-Language is modified so that it attaches Q values automatically like in bug 58034, I don't see why we need to expose this low level protocol functionality to the end-user if they can get the same desired effect by using the arrow-buttons in the Language Preference dialog. As Mozilla really should do error checking on this field, I'm attaching a patch that will make sure the language conforms to RFC 2616, HTTP 1.1 <URL:http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.10>, with the following "real-world" restrictions. 1. The Other Language field will be limited to 5 or 6 characters, as 99% of the entries that will go in will be in the form "aa-BB". 2. The Other Language field will allow a maximum of 19 characters, which will allow for a language tag like "x-mylangok-noextras". 3. Only a maximum of two dash/hyphens will be allowed, even though RFC 1766 sets no limit. The input field checker will allow the following combinations: ja-JP-kansai x-klingon-tng i-ianalang en en-JP In other words, the labels must be alphabetical, and unless the prefix is "x-" or "i-", the first two tags must be exactly two letters long.
Adding blizzard so he can comment on the patch
It looks reasonable to me, except for all the whitespace changes. :)
Regarding the patch, you don't need to update/modify files under "mozilla/l10n"; they are not part of the build.
I can understand the desire not to allow Q value setting if they can be set algorithmically. So, I will cocede this point -- personally I would have liked to be able to set values myself. I have some additional questions/comments on adrian's comments: havill@redhat.com said: > 1. The Other Language field will be limited to 5 or > 6 characters, as 99% of the entries that will go in will > be in the form "aa-BB" Can you clarify what this means? I thought up to 8 characters are allowed for either the primary or sub-tag. Is this what you're referring to? >In other words, the labels must be alphabetical, > and unless the prefix is "x-" or "i-", the first > two tags must be exactly two letters long. "Exactly 2 letters long" part is too limiting. ISO-639-2 allows 3-letter code as well. Since ISO-639-1 is not likely to be sufficient, we should allow for the 3-letter primary lang code. Note also that a standard track revision of RFC 1766 is likely to be completed soon and that revision also allows ISO-639-2 three-letter code in the primary tag. http://www.ietf.org/internet-drafts/draft-alvestrand-lang-tag-v2-05.txt >3. Only a maximum of two dash/hyphens will be allowed, > even though RFC 1766 sets no limit. This is too limiting and arbitrary. We should anticipate at least 3 to 4 hyphens. For example, in the above revision document, the author a subtag like the following: Region identification, such as sgn-US-MA (Martha's Vineyard Sign Language, which is found in the state of Massachusetts, US) This is just a sub tag and so the total hyphen will surely exceed 3 hyphens. We should review the revision document at least before deciding on the details of what we should be allowing as input. The 2-letter lang code limitation for primary tag and hyphen limitation for the entire string need to be reconsidered.
Katsuhiko Momoi wrote: > Can you clarify what ["field will be limited to 5 or 6 > characters"] means? I thought up to 8 characters > are allowed for either the primary or sub-tag. Is this what > you're referring to? The EBNF definition at <URL:http://andrew2.andrew.cmu.edu/rfc/rfc1766.html#sec-2.> does indeed imply that parsers should read up to 8 characters but then in the text explanation it says what each of these values may be and says that: "Other values cannot be assigned except by updating this standard." > "Exactly 2 letters long" part is too limiting. > ISO-639-2 allows 3-letter code as well. ISO-639-2 may allow it, but RFC 2616 (HTTP 1.1 std) which goes by RFC 1766 (and specifically summarizes it mentioning TWO letter language and country codes) does not. > Since ISO-639-1 is not likely to be sufficient, we should > allow for the 3-letter primary lang code. If you did, not only would this go against HTTP/1.1, it would break most current HTTP servers (they would not understand "jpn" to be a synonym for "ja"). > Note also that a standard track revision of RFC 1766 is > likely to be completed soon and that revision also > allows ISO-639-2 three-letter code in the primary tag. > <URL:http://www.ietf.org/internet-drafts/draft-alvestrand-lang-tag-v2-05.txt> Is listed as a "Best Current Practice" and not standards track, even though it claims it will obsolete 1766. I do see the future need for three letter lang codes, but I worry that people who really have a need to do this need to be careful and understand how "current practice" HTTP servers work so they don't enter "eng" for English and wonder why the server won't give them the "en" English document. > [Only allowing up to two dash/hyphens] is too limiting and arbitrary. > We should anticipate at least 3 to 4 hyphens. For example, in the above > revision document, the author a subtag like the following: > Region identification, such as sgn-US-MA (Martha's Vineyard Sign > Language, which is found in the state of Massachusetts, US) > This is just a sub tag and so the total hyphen will surely > exceed 3 hyphens. In the real world, not just hypotetical? Can you think of a real example, no matter how rare the language is, where more than two would be needed? (a sub-dialect of Martha's Vinevard Sign Language {most definitely a MPEG or MPG server resource}), especially since the third tag is free form and can be as specific as possible. (e.g. en-US-texas, ja-JP-kyoto, i-klingon-tos {turns out klingon is not "x-", as my example above mentions, but registered with IANA}) But then again, I remember someone once said that 640K was all the memory a computer would ever need, so the restriction perhaps should go, especially since it it's current form that permits only the primary tag to be "x-" or two letters, the above example would have to be entered as "x-sgn-US-MA", which is indeed 3 dash/hyphens. And since this field is probably currently for academics and others doing "rare" language work, we probably shouldn't crimp their style. ===== PATCH FIX ===== Simply change the integer literal constant in line 265 to the max dashes you want to allow, and add 9 to maxlength (9 == "-mysubtag") in line 18. Or to be unlimited, comment out line 265 and remove the maxlength attribute from line 18. Also, I do notice that IANA has registered things <URL:http://www.egt.ie/standards/iso639/iana-lang-assignments.html> like: zh-yue (Cantonese) zh-min (Min, Fuzhou, Hokkien, Amoy, Taiwanese) zh-guoyu (Mandarin) which are very real and popular languages and is a better solution than the current method IMO (zh-TW, zh-CN, zh-HK, etc.) so the two letter restriction for the first subtag, because I thought that no real-world languages were registered with IANA without the "i-" prefix, was incorrect. change the following patched code in xpfe/components/prefwindow/resources/content/pref-languages.js: + /* the first subtag can be either a 2 letter ISO 3166 country code, + a ISO 3166 user assined code (AA, QM-QZ, XA-XZ and ZZ), + or an IANA registered tag from 3 to 8 characters. I don't + think their are any IANA registered 3 to 8 letter extensions, + so if someone wants a custom variation, they'll have to use + the second subtag or use the "x" primary tag as we'll only + allow 2 letters here is a ISO 639 language code is the primary + tag. + */ + if (tags.length > 1) { + if (tags[1].length != 2) return false; + if (!isAlpha(tags[1])) return false; + checkedTags++; + } to + /* the first subtag can be either a 2 letter ISO 3166 country code, + a ISO 3166 user assigned code (AA, QM-QZ, XA-XZ and ZZ), + or an IANA registered tag from 3 to 8 characters. + */ + if (tags.length > 1) { + if (tags[1].length < 2) return false; + if (!isAlpha(tags[1])) return false; + checkedTags++; + }
Adrian, thank you first of all for additional clarification and discussion. I think you're coming close to what I would like to see now. I would like to make additional comments, however, on several points you raised. First of all, the need to update RFC 1766 has been around for a while and this is a known fact. The most pressing need is for languages which do not have 2-letter representation since ISO639-1 has been closed for further updates. People who will attempt to enter 3-letter codes are advanced users who need that form of representation. There is really is no need to replace the current 2-letter code with a corresponding 3-letter one. I think the practice will settle on using the 3-letetr variety only when the 2-letter one is not available for that language. Ruling out the 3-letter code for fear that some davanced users may abuse it not a good reason for not accommodating Mozilla users who will have this need. Mozilla should be friendly to international users needs. >Is listed as a "Best Current Practice" and not standards > track, even though it claims it will obsolete 1766. The author's intent is clear. There has been sufficient discussion for the need to allow the 3-letter lang code and the same author who had written 1766 has undertaken to update it because 1766 says it must be updated to allow other lang code representation. That is part of the intent of the update to RFC 1766 which will obsolete it. Further, HTTP 1.1 does not explicitly rule out 3-letetr code. All that it says is that "if there is a 2-letter code in the primary tag", then it must be from ISO-639. It does not say "if the code is from ISO639", then it must be 2-letter code. This latter part is left to RFC 1766. Now that it is certain that RFC 1766 will be obsoleted by the proposal of the original author of RFC 1766, we should anticipate it and allow for the 3-letter code. Remember, only advanced users will be using this manual fill-in feature. A large majority of users will be content with what is in the list -- which if it uses 3-letter code will be restricted to only those which will fill a void not covered by the 2-letter codes. At least, my current plan to update the built-in Accept Language list will not use 3-letter codes unless a 2-letter code cannot be found for that language. I think we can control its use this way. > codes, but I worry that people who really have a need > to do this need to be careful and understand how > "current practice" HTTP servers work so they don't > enter "eng" for English and wonder why the server > won't give them the "en" English document. I understand your worry but I think the best use will settle on using 3-letter code only if there is no 2-letter equivalent. I think that is also what Mr. Alvestrand should include in this revision. I'm willing to write to him, Martin Durst, M. Everson and others to revise the wording on the revision of the new RFC to recommend 3-letter only when 2-letter variety is not available. My sense is that if a server does not understand a 3-letter code, it should ignore it. If someone implemented a parsing code which only accepts 2-letter code, then that code is not very good considering that RFC 1766 lists 1*8ALPHA as part of the official syntax no matter what any additional comments say. > In the real world, not just hypotetical? Can you think > of a real example, no matter how rare the language is, > where more than two would be needed? As a linguist, I can tell you that it is very easy to come up with more than 2 hyphens. For exmple, I can think of a need to cite the following for Middle Japanese Kansai dialect document. ja-middle-jpn-kansai-jp Note that whitespace is not allowed in the lang string and so any time you have words like East Hebrides, Old Japanese, we need to hyphenate it. It is also easy to find examples in Amerindian languages which must use hyphens more than twice. For example, In the Na-Dene family, you find a language like: Tanana-Upper-Kuskokwim I could then have something like Tanana-Upper-Kuskok-CA-US omitting "wim" from Kuskokwim due to 8-letter limitation. My recommendation: Don't constrain the hypneation narrowly. There are too many real lanaguges whose names need hyphens in more than 2 places in the subtag.
I am ok with the latest patch. I would suggest to enlarge the text field size because we allow more text in it, but that's no big deal. adrian, do you have checkin privilege? If so, I can reassign the bug to you. Otherwise let me know and I will take care of it.
adrian, shanjian, I now assume that other than the limitation not to allow "Q" values, there are no other limitations and that the HTTP 1.1/RFC1766 (& revision) syntax is accommodated. If so, we should go ahead and check this in.
pref-languages.js crashes on my computer. I spent almost a whole day tracing this, and I just found out that it has nothing to do with changes made in this bug. But we have to hold on the fix until the original problem got fixed.
Shanjian Li wrote: > pref-languages.js crashes on my computer. Can you be more specific (like a bug report)? Perhaps others can help. This patch was working on the last nitely build when it was submitted.
Changed QA contact to ylong@netscape.com.
Keywords: intl
QA Contact: teruko → ylong
fix checked in.
Status: ASSIGNED → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
Nutz. I noticed one minor typo (not in the code, in a comment) in the checked in source at <URL:http://lxr.mozilla.org/seamonkey/source/xpfe/components/prefwindow/resource s/content/pref-languages.js#348> it says "ISO 639 country code"... when it should be "ISO 639 language code". Same goes for line 349 below it (actually, it doesn't check if the country OR language code exists) Can the person who checked it in change this one word so code readers don't get confused?
Not reproducible in 12-29 Mtrunk build.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: