<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Comment 3

•

25 years ago

UTF-7 is needed for viewing UTF-7 RFC itself UTF-16 is not needed explicitly because the BOM in the beginning will tell us it is UTF-16.

Status: UNCONFIRMED → RESOLVED

Closed: 25 years ago

Resolution: --- → WONTFIX

Teruko Kobayashi

Comment 4

•

25 years ago

Verified as Wonfix.

Status: RESOLVED → VERIFIED

keka

Reporter

Comment 5

•

25 years ago

"Viewing the UTF-7 RFC itself" hardly counts as an argument. UTF-7 (which is a misnomer, it's not a UTF, it's a TES, Transfer Encoding Syntax) is and remain outdated, and was ever intended only for e-mail, never for general use. Like Quoted-Printable (another TES), UTF-7 should be confined to e-mail. Further, there is no complete requirement to use a "BOM" with UTF-16, even though the XML specification says so. It may or may not be present with plain text, and even the XML specification allows for "UTF-16" initial detection when there is no "BOM". Finally, UTF-32 should also be supported: UTF-32BE, UTF-32, and UTF-32LE, now that UTF-32 will be a standard annex to the Unicode standard.

Status: VERIFIED → UNCONFIRMED

Resolution: WONTFIX → ---

Fabian Guisset

Comment 6

•

25 years ago

Can we get a status update on this one? (not that I like to push ;-)

Asa Dotzler [:asa]

Comment 7

•

25 years ago

setting bug status to New.

Status: UNCONFIRMED → NEW

Ever confirmed: true

Comment 8

•

25 years ago

What is "Wrong" to let user view UTF-7 pages ? We do support "UTF-16" , 'UTF-16BE", "UTF-16LE" ,"UTF-32" , "UTF-32BE" and "UTF-32LE" . What make you said we do not support them now ?

Updated

•

25 years ago

Status: NEW → ASSIGNED

Summary: character encoding support → UTF-XXX supprot issues.

keka

Reporter

Comment 9

•

25 years ago

a. Not all "charsets" are created equal. Netscape recognises that already by just providing support for a subset of the IANA registered "charsets". b. Some IETF registered "charsets" are what UTR 17 now calls CESes, character encoding schemes. c. However, some IETF registered "charsets" are not CESes. Since IEFT decided to have only a few, not to be extended, "content- transfer-encoding"s (7bit, 8bit, Quoted-Printable, and Base64), all other encodings that were really content-transfer-encodings of text, had to be registed as "charsets" rather than extending the set of "content-transfer-encodings". One of these, specifically aimed ad 7-bit-only channel e-mails was UTF-7. UTF-7 is misnamed, since it does not play in the same ball- park as the other UTFs. UTF-7 is, just like Latin-1+Quoted- Printable, a transfer encoding. UTR 17 call them TESes, Transfer Encoding Syntaxes. For formal reasons most other character encodings where a TES have been applied are NOT supported by the browser, but only for e-mail. d. The rôle of UTF-7 has been outdated by the increasing support for 8bit content-transfer for e-mail, or even the use of Base64 or QP together with the UTF-8 CES. e. UTF-7 is not a part of Unicode 3.0; it has been withdrawn as an (TES) encoding that conforms to Unicode. It was never specified by ISO. UTF-7 should olny be interpreted for incoming e-mail, never for outgoing e-mail, definitely never for web pages, wether external or being edited. f. Unicode consortium recommends to NOT create any more (e-mail) data in UTF-7. It NEVER was intended for any other kind of data, like web pages or plain text. g. The character encoding menues in Netscape 6 lists UTF-8 (good), and UTF-7 (bad, remove). But they does not list UTF-16(BE/LE), nor UTF-32(BE/LE). So one cannot set the browser to use, e.g., UTF-16 for a given page, nor can one save an edited page (via Netscape's editor) in, e.g., UTF-16BE. h. The encoding called HZ is similarly a TES for 7-bit only channels, and should not be let out of the SMTP cage (for which they were designed). HTTP is always an 8-bit channel. i. Side remark: IE has "autodetect" for UTF-7, and it often *wrongly* concludes that a page is in UTF-7 whenever texts like U+nnnn (where nnnn are hexadecimal digits). j. If you wish to support more Unicode encodings (CES level) SCSU and UTF-EBCDIC are more worthy candidates than the TES UTF-7. (IE 5.x under Windows 2000 supports a number of EBCDIC encodings...) UTF-7 is best left forgotten. k. XML "requires" a BOM for UTF-16 (clause 4.3.3), but then in annex F gives examples of UTF-16 XML without BOM. Plain text in UTF-16 does not need a BOM, of course. l. Note that the UTC recently changed requirements on UTF-8 so that "illegal" UTF-8 sequences not only must not be emitted, but are not to be accepted either. For XML, in addition, the "irregular" UTF-8 sequences are not to be accepted (they never were allowed anywhere, specifically not by XML).

Summary: UTF-XXX supprot issues. → UTF-XXX support issues.

Comment 10

•

25 years ago

mark as future

Target Milestone: --- → Future

Comment 11

•

23 years ago

*** Bug 155184 has been marked as a duplicate of this bug. ***

Comment 12

•

23 years ago

In particular I was not able to view a UTF-16LE plain text document without a BOM without first viewing a plain text document with a valid UTF-16LE BOM.

Jean-Marc Desperrier

Comment 13

•

23 years ago

The list of bug opened related to UTF-16 shows there's quite a few special situation where it would be really convenient to have the option to force recognition of UTF-16XX. I'd wish that this would be taken into consideration, even if the option is initially hidden so that you can only make it appear with the customize option of the view/character coding menu. This is something that was suggested in bug 42893 comment #26 (saving composer pages in UTF-16), so this solution could cover both problems.

Simon Montagu :smontagu

Assignee

Comment 14

•

22 years ago

Taking myself. http://www.topjobs.ie is an example of a site in UTF-16LE without BOM.

Assignee: ftang → smontagu

Status: ASSIGNED → NEW

Comment 15

•

22 years ago

I agree to comment #13. It should be possible to force UTF-16/32(LE|BE).

Keywords: intl

Simon Montagu :smontagu

Assignee

Comment 16

•

22 years ago

Making dependent on bug 42893, since Jungshik's patch there includes browser as well as composer.

Depends on: 42893

Comment 17

•

22 years ago

Attached patch patch (obsolete) — Details — Splinter Review

This is an update to my patch attached to bug 42893. I'm posting it here instead of bug 42893 because I haven't yet turned on UTF-16/32 for composer in this patch but does everything (except for blocking UTF-7) mentioned here. Enabling UTF-16/32 in 'SaveAsCharset' needs some more work and I'm gonna do it in bug 42893. In the mean time, I think it's better to make it possible to choose 'UTF-16/32' for web pages. Neil, this patch works perfectly well for Mozilla, but for some reason I couldn't make 'Unicode' menu show up in Firebird (View | Character (En)coding | More). Can you take a look what I'm missing?

Comment 18

•

22 years ago

I'm now reversing the relationship between this bug and bug 42893.

Blocks: 42893

No longer depends on: 42893

Comment 19

•

22 years ago

(In reply to comment #17) > Neil, this patch works perfectly well for Mozilla, but for some reason > I couldn't make 'Unicode' menu show up in Firebird (View | Character > (En)coding | More). Can you take a look what I'm missing? > http://lxr.mozilla.org/mozilla/source/browser/base/content/browser-menubar.inc

Comment 20

•

22 years ago

At the end of attachment 140846 [details] [diff] [review] is a patch against browser/base/content/browser-menubar.inc Even with that, it didn't work. I'll try again. BTW, I guess I should not add 'Unicode' menu to mailview and mailedit because only UTF-8 makes sense there. Other UTF's will be listed for mailview, nonetheless.

Comment 21

•

22 years ago

Oops, sorry, I can't have woken up when I wrote that...

Comment 22

•

22 years ago

I haven't yet figured out why it doesn't work in firebird. Anyway, I'll try to get this in before 1.7beta.

Target Milestone: Future → mozilla1.7beta

Comment 23

•

22 years ago

Attached patch updated patch (obsolete) — Details — Splinter Review

attacahment 140846 worked after I clobbered and rebuilt firefox. Anyway, I got rid of some 'pollution' (from other patches). In addition, I made mailedit and mailview window NOT have 'Unicode' in View | Character Coding | More. Because UTF-8 is added to the 'static' list, not having 'Unicode' in mailedit and mailview shouldn't matter. The only problem here is that there's no way to force UTF-7 in mailview window. However, it should matter very little in practice because no one in his sane mind would send out emails in UTF-7 in 2004. If it's really necessary, I can deal with it in another bug (with more fine-grained distinctions between various character encodings. Currently, we have 'notForBrowser' and 'notForOutGoing'. I can implement 'notForMailView' and 'notForMailEdit' as originally planned back in 1999?.) UTF-16/32* are hidden by default for mailedit but can be exposed by 'customizing' the list (by a die-hard user). This problem can also be handled by adding more categories as mentioned above.

Updated

•

22 years ago

Attachment #140846 - Attachment is obsolete: true

Comment 24

•

22 years ago

Attached patch update (with a patch from another bug removed) — Details — Splinter Review

Attachment #142678 - Attachment is obsolete: true

Christopher Blizzard (:blizzard)

Comment 25

•

22 years ago

Comment on attachment 142680 [details] [diff] [review] update (with a patch from another bug removed) asking for r/sr.

Attachment #142680 - Flags: superreview?(blizzard)

Attachment #142680 - Flags: review?(neil.parkwaycc.co.uk)

Updated

•

22 years ago

Attachment #142680 - Flags: superreview?(blizzard) → superreview+

Comment 26

•

22 years ago

Comment on attachment 142680 [details] [diff] [review] update (with a patch from another bug removed) >+utf-8.LangGroup = x-unicode > utf-16be.LangGroup = x-unicode > utf-16le.LangGroup = x-unicode > utf-32be.LangGroup = x-unicode > utf-32le.LangGroup = x-unicode > utf-7.LangGroup = x-unicode >-utf-8.LangGroup = x-unicode I must admit I don't see the point of this :-) > <!ENTITY charsetMenuMore2.accesskey "E"> > <!ENTITY charsetMenuMore3.label "East Asian"> > <!ENTITY charsetMenuMore3.accesskey "A"> > <!ENTITY charsetMenuMore4.label "SE & SW Asian"> > <!ENTITY charsetMenuMore4.accesskey "S"> > <!ENTITY charsetMenuMore5.label "Middle Eastern"> > <!ENTITY charsetMenuMore5.accesskey "m"> >+<!ENTITY charsetMenuUnicode.label "Unicode"> >+<!ENTITY charsetMenuUnicode.accesskey "u"> Should me an uppercase U, to match the U of Unicode... feel free to fix the "m" too... also, speaking of accesskeys, you might want to fix a missing accesskey in charsetOverlay.xul; these three lines should be identical, but the middle one is missing its accesskey: /xpfe/global/resources/content/charsetOverlay.xul, line 38 -- <menu label="&charsetMenuMore.label;" accesskey="&charsetMenuMore.accesskey;" datasources="rdf:charset-menu" ref="NC:BrowserMoreCharsetMenuRoot"> /xpfe/global/resources/content/charsetOverlay.xul, line 137 -- <menu label="&charsetMenuMore.label;" datasources="rdf:charset-menu" ref="NC:BrowserMoreCharsetMenuRoot"> /xpfe/global/resources/content/charsetOverlay.xul, line 253 -- <menu label="&charsetMenuMore.label;" accesskey="&charsetMenuMore.accesskey;" datasources="rdf:charset-menu" ref="NC:BrowserMoreCharsetMenuRoot"> Don't forget to get ff moa.

Attachment #142680 - Flags: review?(neil.parkwaycc.co.uk) → review+

Pierre Chanial

Comment 27

•

22 years ago

a=pch