Closed Bug 55300 Opened 24 years ago Closed 21 years ago

UTF-XXX support issues.

Categories

(Core :: Internationalization, enhancement, P3)

enhancement

Tracking

()

RESOLVED FIXED
mozilla1.7beta

People

(Reporter: keka, Assigned: smontagu)

References

Details

(Keywords: intl)

Attachments

(1 file, 2 obsolete files)

From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; WinNT4.0; en-US; m18) Gecko/20000929 Netscape6/6.0b3 BuildID: 2000092908 Reproducible: Always Steps to Reproduce: Actual Results: Expected Results: UTF-7 should not be presented as a option for web pages; no web page should ever be encoded in UTF-7. However, UTF-16 (and byte order variants) should be presented as explicit options, since that is a viable and recommended encoding(s) for web pages, esp. for XML/XHTML. "UTF-7 (deprecated)" could be kept for e-mail and """news""". It should explicitly say "deprecated".
*** Bug 55301 has been marked as a duplicate of this bug. ***
Reassign to ftang, cc to cata.
Assignee: nhotta → ftang
UTF-7 is needed for viewing UTF-7 RFC itself UTF-16 is not needed explicitly because the BOM in the beginning will tell us it is UTF-16.
Status: UNCONFIRMED → RESOLVED
Closed: 24 years ago
Resolution: --- → WONTFIX
Verified as Wonfix.
Status: RESOLVED → VERIFIED
"Viewing the UTF-7 RFC itself" hardly counts as an argument. UTF-7 (which is a misnomer, it's not a UTF, it's a TES, Transfer Encoding Syntax) is and remain outdated, and was ever intended only for e-mail, never for general use. Like Quoted-Printable (another TES), UTF-7 should be confined to e-mail. Further, there is no complete requirement to use a "BOM" with UTF-16, even though the XML specification says so. It may or may not be present with plain text, and even the XML specification allows for "UTF-16" initial detection when there is no "BOM". Finally, UTF-32 should also be supported: UTF-32BE, UTF-32, and UTF-32LE, now that UTF-32 will be a standard annex to the Unicode standard.
Status: VERIFIED → UNCONFIRMED
Resolution: WONTFIX → ---
Can we get a status update on this one? (not that I like to push ;-)
setting bug status to New.
Status: UNCONFIRMED → NEW
Ever confirmed: true
What is "Wrong" to let user view UTF-7 pages ? We do support "UTF-16" , 'UTF-16BE", "UTF-16LE" ,"UTF-32" , "UTF-32BE" and "UTF-32LE" . What make you said we do not support them now ?
Status: NEW → ASSIGNED
Summary: character encoding support → UTF-XXX supprot issues.
a. Not all "charsets" are created equal. Netscape recognises that already by just providing support for a subset of the IANA registered "charsets". b. Some IETF registered "charsets" are what UTR 17 now calls CESes, character encoding schemes. c. However, some IETF registered "charsets" are not CESes. Since IEFT decided to have only a few, not to be extended, "content- transfer-encoding"s (7bit, 8bit, Quoted-Printable, and Base64), all other encodings that were really content-transfer-encodings of text, had to be registed as "charsets" rather than extending the set of "content-transfer-encodings". One of these, specifically aimed ad 7-bit-only channel e-mails was UTF-7. UTF-7 is misnamed, since it does not play in the same ball- park as the other UTFs. UTF-7 is, just like Latin-1+Quoted- Printable, a transfer encoding. UTR 17 call them TESes, Transfer Encoding Syntaxes. For formal reasons most other character encodings where a TES have been applied are NOT supported by the browser, but only for e-mail. d. The rôle of UTF-7 has been outdated by the increasing support for 8bit content-transfer for e-mail, or even the use of Base64 or QP together with the UTF-8 CES. e. UTF-7 is not a part of Unicode 3.0; it has been withdrawn as an (TES) encoding that conforms to Unicode. It was never specified by ISO. UTF-7 should olny be interpreted for incoming e-mail, never for outgoing e-mail, definitely never for web pages, wether external or being edited. f. Unicode consortium recommends to NOT create any more (e-mail) data in UTF-7. It NEVER was intended for any other kind of data, like web pages or plain text. g. The character encoding menues in Netscape 6 lists UTF-8 (good), and UTF-7 (bad, remove). But they does not list UTF-16(BE/LE), nor UTF-32(BE/LE). So one cannot set the browser to use, e.g., UTF-16 for a given page, nor can one save an edited page (via Netscape's editor) in, e.g., UTF-16BE. h. The encoding called HZ is similarly a TES for 7-bit only channels, and should not be let out of the SMTP cage (for which they were designed). HTTP is always an 8-bit channel. i. Side remark: IE has "autodetect" for UTF-7, and it often *wrongly* concludes that a page is in UTF-7 whenever texts like U+nnnn (where nnnn are hexadecimal digits). j. If you wish to support more Unicode encodings (CES level) SCSU and UTF-EBCDIC are more worthy candidates than the TES UTF-7. (IE 5.x under Windows 2000 supports a number of EBCDIC encodings...) UTF-7 is best left forgotten. k. XML "requires" a BOM for UTF-16 (clause 4.3.3), but then in annex F gives examples of UTF-16 XML without BOM. Plain text in UTF-16 does not need a BOM, of course. l. Note that the UTC recently changed requirements on UTF-8 so that "illegal" UTF-8 sequences not only must not be emitted, but are not to be accepted either. For XML, in addition, the "irregular" UTF-8 sequences are not to be accepted (they never were allowed anywhere, specifically not by XML).
Summary: UTF-XXX supprot issues. → UTF-XXX support issues.
mark as future
Target Milestone: --- → Future
*** Bug 155184 has been marked as a duplicate of this bug. ***
In particular I was not able to view a UTF-16LE plain text document without a BOM without first viewing a plain text document with a valid UTF-16LE BOM.
The list of bug opened related to UTF-16 shows there's quite a few special situation where it would be really convenient to have the option to force recognition of UTF-16XX. I'd wish that this would be taken into consideration, even if the option is initially hidden so that you can only make it appear with the customize option of the view/character coding menu. This is something that was suggested in bug 42893 comment #26 (saving composer pages in UTF-16), so this solution could cover both problems.
Taking myself. http://www.topjobs.ie is an example of a site in UTF-16LE without BOM.
Assignee: ftang → smontagu
Status: ASSIGNED → NEW
I agree to comment #13. It should be possible to force UTF-16/32(LE|BE).
Keywords: intl
Making dependent on bug 42893, since Jungshik's patch there includes browser as well as composer.
Depends on: 42893
Attached patch patch (obsolete) — Splinter Review
This is an update to my patch attached to bug 42893. I'm posting it here instead of bug 42893 because I haven't yet turned on UTF-16/32 for composer in this patch but does everything (except for blocking UTF-7) mentioned here. Enabling UTF-16/32 in 'SaveAsCharset' needs some more work and I'm gonna do it in bug 42893. In the mean time, I think it's better to make it possible to choose 'UTF-16/32' for web pages. Neil, this patch works perfectly well for Mozilla, but for some reason I couldn't make 'Unicode' menu show up in Firebird (View | Character (En)coding | More). Can you take a look what I'm missing?
I'm now reversing the relationship between this bug and bug 42893.
Blocks: 42893
No longer depends on: 42893
(In reply to comment #17) > Neil, this patch works perfectly well for Mozilla, but for some reason > I couldn't make 'Unicode' menu show up in Firebird (View | Character > (En)coding | More). Can you take a look what I'm missing? > http://lxr.mozilla.org/mozilla/source/browser/base/content/browser-menubar.inc
At the end of attachment 140846 [details] [diff] [review] is a patch against browser/base/content/browser-menubar.inc Even with that, it didn't work. I'll try again. BTW, I guess I should not add 'Unicode' menu to mailview and mailedit because only UTF-8 makes sense there. Other UTF's will be listed for mailview, nonetheless.
Oops, sorry, I can't have woken up when I wrote that...
I haven't yet figured out why it doesn't work in firebird. Anyway, I'll try to get this in before 1.7beta.
Target Milestone: Future → mozilla1.7beta
Attached patch updated patch (obsolete) — Splinter Review
attacahment 140846 worked after I clobbered and rebuilt firefox. Anyway, I got rid of some 'pollution' (from other patches). In addition, I made mailedit and mailview window NOT have 'Unicode' in View | Character Coding | More. Because UTF-8 is added to the 'static' list, not having 'Unicode' in mailedit and mailview shouldn't matter. The only problem here is that there's no way to force UTF-7 in mailview window. However, it should matter very little in practice because no one in his sane mind would send out emails in UTF-7 in 2004. If it's really necessary, I can deal with it in another bug (with more fine-grained distinctions between various character encodings. Currently, we have 'notForBrowser' and 'notForOutGoing'. I can implement 'notForMailView' and 'notForMailEdit' as originally planned back in 1999?.) UTF-16/32* are hidden by default for mailedit but can be exposed by 'customizing' the list (by a die-hard user). This problem can also be handled by adding more categories as mentioned above.
Attachment #140846 - Attachment is obsolete: true
Attachment #142678 - Attachment is obsolete: true
Comment on attachment 142680 [details] [diff] [review] update (with a patch from another bug removed) asking for r/sr.
Attachment #142680 - Flags: superreview?(blizzard)
Attachment #142680 - Flags: review?(neil.parkwaycc.co.uk)
Attachment #142680 - Flags: superreview?(blizzard) → superreview+
Comment on attachment 142680 [details] [diff] [review] update (with a patch from another bug removed) >+utf-8.LangGroup = x-unicode > utf-16be.LangGroup = x-unicode > utf-16le.LangGroup = x-unicode > utf-32be.LangGroup = x-unicode > utf-32le.LangGroup = x-unicode > utf-7.LangGroup = x-unicode >-utf-8.LangGroup = x-unicode I must admit I don't see the point of this :-) > <!ENTITY charsetMenuMore2.accesskey "E"> > <!ENTITY charsetMenuMore3.label "East Asian"> > <!ENTITY charsetMenuMore3.accesskey "A"> > <!ENTITY charsetMenuMore4.label "SE &amp; SW Asian"> > <!ENTITY charsetMenuMore4.accesskey "S"> > <!ENTITY charsetMenuMore5.label "Middle Eastern"> > <!ENTITY charsetMenuMore5.accesskey "m"> >+<!ENTITY charsetMenuUnicode.label "Unicode"> >+<!ENTITY charsetMenuUnicode.accesskey "u"> Should me an uppercase U, to match the U of Unicode... feel free to fix the "m" too... also, speaking of accesskeys, you might want to fix a missing accesskey in charsetOverlay.xul; these three lines should be identical, but the middle one is missing its accesskey: /xpfe/global/resources/content/charsetOverlay.xul, line 38 -- <menu label="&charsetMenuMore.label;" accesskey="&charsetMenuMore.accesskey;" datasources="rdf:charset-menu" ref="NC:BrowserMoreCharsetMenuRoot"> /xpfe/global/resources/content/charsetOverlay.xul, line 137 -- <menu label="&charsetMenuMore.label;" datasources="rdf:charset-menu" ref="NC:BrowserMoreCharsetMenuRoot"> /xpfe/global/resources/content/charsetOverlay.xul, line 253 -- <menu label="&charsetMenuMore.label;" accesskey="&charsetMenuMore.accesskey;" datasources="rdf:charset-menu" ref="NC:BrowserMoreCharsetMenuRoot"> Don't forget to get ff moa.
Attachment #142680 - Flags: review?(neil.parkwaycc.co.uk) → review+
a=pch
thanks all. patch checked in.
Status: NEW → RESOLVED
Closed: 24 years ago21 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: