Closed Bug 55300 Opened 24 years ago Closed 20 years ago

UTF-XXX support issues.


(Core :: Internationalization, enhancement, P3)






(Reporter: keka, Assigned: smontagu)



(Keywords: intl)


(1 file, 2 obsolete files)

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; WinNT4.0; en-US; m18) Gecko/20000929
BuildID:    2000092908


Reproducible: Always
Steps to Reproduce:

Actual Results:  							

Expected Results:  							

UTF-7 should not be presented as a option for web pages; no web page should ever
be encoded in UTF-7.  However, UTF-16 (and byte order variants) should be
presented as explicit options, since that is a viable and recommended
encoding(s) for web pages, esp. for XML/XHTML.

"UTF-7 (deprecated)" could be kept for e-mail and """news""".  It should
explicitly say "deprecated".
*** Bug 55301 has been marked as a duplicate of this bug. ***
Reassign to ftang, cc to cata.
Assignee: nhotta → ftang
UTF-7 is needed for viewing UTF-7 RFC itself
UTF-16 is not needed explicitly because the BOM in the beginning will tell us it
is UTF-16.
Closed: 24 years ago
Resolution: --- → WONTFIX
Verified as Wonfix.
"Viewing the UTF-7 RFC itself" hardly counts as an argument.
UTF-7 (which is a misnomer, it's not a UTF, it's a TES, Transfer Encoding 
Syntax) is and remain outdated, and was ever intended only for e-mail,
never for general use.  Like Quoted-Printable (another TES), UTF-7 should be 
confined to e-mail.

Further, there is no complete requirement to use a "BOM" with UTF-16,
even though the XML specification says so.  It may or may not be present
with plain text, and even the XML specification allows for "UTF-16" initial
detection when there is no "BOM".

Finally, UTF-32 should also be supported: UTF-32BE, UTF-32, and UTF-32LE,
now that UTF-32 will be a standard annex to the Unicode standard.
Resolution: WONTFIX → ---
Can we get a status update on this one? (not that I like to push ;-)
setting bug status to New.  
Ever confirmed: true
What is "Wrong" to let user view UTF-7 pages ?
We do support "UTF-16" , 'UTF-16BE", "UTF-16LE" ,"UTF-32" , "UTF-32BE" and
"UTF-32LE" . What make you said we do not support them now ?
Summary: character encoding support → UTF-XXX supprot issues.
a. Not all "charsets" are created equal.  Netscape recognises
   that already by just providing support for a subset of the
   IANA registered "charsets".

b. Some IETF registered "charsets" are what UTR 17 now calls
   CESes, character encoding schemes.

c. However, some IETF registered "charsets" are not CESes.  Since
   IEFT decided to have only a few, not to be extended, "content-
   transfer-encoding"s (7bit, 8bit, Quoted-Printable, and Base64),
   all other encodings that were really content-transfer-encodings
   of text, had to be registed as "charsets" rather than extending
   the set of "content-transfer-encodings".  One of these,
   specifically aimed ad 7-bit-only channel e-mails was UTF-7.
   UTF-7 is misnamed, since it does not play in the same ball-
   park as the other UTFs.  UTF-7 is, just like Latin-1+Quoted-
   Printable, a transfer encoding.  UTR 17 call them TESes,
   Transfer Encoding Syntaxes.  For formal reasons most other
   character encodings where a TES have been applied are NOT
   supported by the browser, but only for e-mail.

d. The rôle of UTF-7 has been outdated by the increasing
   support for 8bit content-transfer for e-mail, or even
   the use of Base64 or QP together with the UTF-8 CES.

e. UTF-7 is not a part of Unicode 3.0; it has been withdrawn
   as an (TES) encoding that conforms to Unicode.  It was
   never specified by ISO.  UTF-7 should olny be interpreted
   for incoming e-mail, never for outgoing e-mail, definitely
   never for web pages, wether external or being edited.

f. Unicode consortium recommends to NOT create any more
   (e-mail) data in UTF-7.  It NEVER was intended for
   any other kind of data, like web pages or plain text.

g. The character encoding menues in Netscape 6 lists
   UTF-8 (good), and UTF-7 (bad, remove).  But they does not
   list UTF-16(BE/LE), nor UTF-32(BE/LE).  So one cannot
   set the browser to use, e.g., UTF-16 for a given page,
   nor can one save an edited page (via Netscape's editor)
   in, e.g., UTF-16BE.

h. The encoding called HZ is similarly a TES for 7-bit only
   channels, and should not be let out of the SMTP cage
   (for which they were designed). HTTP is always an 8-bit

i. Side remark: IE has "autodetect" for UTF-7, and it often
   *wrongly* concludes that a page is in UTF-7 whenever texts
   like U+nnnn (where nnnn are hexadecimal digits).

j. If you wish to support more Unicode encodings (CES level)
   SCSU and UTF-EBCDIC are more worthy candidates than the
   TES UTF-7.  (IE 5.x under Windows 2000 supports a number
   of EBCDIC encodings...)  UTF-7 is best left forgotten.

k. XML "requires" a BOM for UTF-16 (clause 4.3.3), but then
   in annex F gives examples of UTF-16 XML without BOM.  Plain
   text in UTF-16 does not need a BOM, of course.

l. Note that the UTC recently changed requirements on UTF-8
   so that "illegal" UTF-8 sequences not only must not be
   emitted, but are not to be accepted either.  For XML, in
   addition, the "irregular" UTF-8 sequences are not to be
   accepted (they never were allowed anywhere, specifically
   not by XML).

Summary: UTF-XXX supprot issues. → UTF-XXX support issues.
mark as future
Target Milestone: --- → Future
*** Bug 155184 has been marked as a duplicate of this bug. ***
In particular I was not able to view a UTF-16LE plain text document without a
BOM without first viewing a plain text document with a valid UTF-16LE BOM.
The list of bug opened related to UTF-16 shows there's quite a few special
situation where it would be really convenient to have the option to force
recognition of UTF-16XX.

I'd wish that this would be taken into consideration, even if the option is
initially hidden so that you can only make it appear with the customize option
of the view/character coding menu.

This is something that was suggested in bug 42893 comment #26 (saving composer
pages in UTF-16), so this solution could cover both problems.
Taking myself. is an example of a site in UTF-16LE without
Assignee: ftang → smontagu
I agree to comment #13. It should be possible to force UTF-16/32(LE|BE). 
Keywords: intl
Making dependent on bug 42893, since Jungshik's patch there includes browser as
well as composer.
Depends on: 42893
Attached patch patch (obsolete) — Splinter Review
This is an update to my patch attached to bug 42893. I'm posting it here
instead of bug 42893 because I haven't yet turned on UTF-16/32 for composer in
this patch but does everything (except for blocking UTF-7) mentioned here.
Enabling UTF-16/32 in 'SaveAsCharset' needs some more work and I'm gonna do it
in bug 42893. In the mean time, I think it's better to make it possible to
choose 'UTF-16/32' for web pages. 

Neil, this patch works perfectly well for Mozilla, but for some reason I
couldn't make 'Unicode' menu show up in Firebird (View | Character (En)coding |
More). Can you take a look what I'm missing?
I'm now reversing the relationship between this bug and bug 42893.
Blocks: 42893
No longer depends on: 42893
(In reply to comment #17)
> Neil, this patch works perfectly well for Mozilla, but for some reason
> I couldn't make 'Unicode' menu show up in Firebird (View | Character
> (En)coding | More). Can you take a look what I'm missing? 
At the end of attachment 140846 [details] [diff] [review] is a patch against
Even with that, it didn't work. I'll try again.

BTW, I guess I should not add 'Unicode' menu to mailview and mailedit because
only UTF-8 makes sense there. Other UTF's  will be listed for mailview,
Oops, sorry, I can't have woken up when I wrote that...
I haven't yet figured out why it doesn't work in firebird. Anyway, I'll try to
get this in before 1.7beta. 
Target Milestone: Future → mozilla1.7beta
Attached patch updated patch (obsolete) — Splinter Review
attacahment 140846 worked after I clobbered and rebuilt firefox. Anyway, I got
rid of some 'pollution' (from other patches). 

In addition, I made mailedit and mailview window NOT have 'Unicode' in View |
Character Coding | More. Because UTF-8 is added to the 'static' list, not
having 'Unicode' in mailedit and mailview shouldn't matter. The only problem
here is that there's no way to force UTF-7 in mailview window. However, it
should matter very little in practice because no one in his sane mind would
send out emails in UTF-7 in 2004. If it's really necessary, I can deal with it
in another bug (with more fine-grained distinctions between various character
encodings. Currently, we have 'notForBrowser' and 'notForOutGoing'. I can
implement 'notForMailView' and 'notForMailEdit' as originally planned back in

UTF-16/32* are hidden by default for mailedit but can be exposed by
'customizing' the list (by a die-hard user). This problem can also be handled
by adding more categories as mentioned above.
Attachment #140846 - Attachment is obsolete: true
Attachment #142678 - Attachment is obsolete: true
Comment on attachment 142680 [details] [diff] [review]
update (with a patch from another bug removed)

asking for r/sr.
Attachment #142680 - Flags: superreview?(blizzard)
Attachment #142680 - Flags: review?(
Attachment #142680 - Flags: superreview?(blizzard) → superreview+
Comment on attachment 142680 [details] [diff] [review]
update (with a patch from another bug removed)

>+utf-8.LangGroup                    = x-unicode
> utf-16be.LangGroup                 = x-unicode
> utf-16le.LangGroup                 = x-unicode
> utf-32be.LangGroup                 = x-unicode
> utf-32le.LangGroup                 = x-unicode
> utf-7.LangGroup                    = x-unicode
>-utf-8.LangGroup                    = x-unicode
I must admit I don't see the point of this :-)

> <!ENTITY charsetMenuMore2.accesskey       "E">
> <!ENTITY charsetMenuMore3.label       "East Asian">
> <!ENTITY charsetMenuMore3.accesskey       "A">
> <!ENTITY charsetMenuMore4.label       "SE &amp; SW Asian">
> <!ENTITY charsetMenuMore4.accesskey       "S">
> <!ENTITY charsetMenuMore5.label       "Middle Eastern">
> <!ENTITY charsetMenuMore5.accesskey       "m">
>+<!ENTITY charsetMenuUnicode.label       "Unicode">
>+<!ENTITY charsetMenuUnicode.accesskey       "u">
Should me an uppercase U, to match the U of Unicode... feel free to fix the "m"
too... also, speaking of accesskeys, you might want to fix a missing accesskey
in charsetOverlay.xul; these three lines should be identical, but the middle
one is missing its accesskey:

/xpfe/global/resources/content/charsetOverlay.xul, line 38 -- <menu
label="&charsetMenuMore.label;" accesskey="&charsetMenuMore.accesskey;"
datasources="rdf:charset-menu" ref="NC:BrowserMoreCharsetMenuRoot">
/xpfe/global/resources/content/charsetOverlay.xul, line 137 -- <menu
label="&charsetMenuMore.label;" datasources="rdf:charset-menu"
/xpfe/global/resources/content/charsetOverlay.xul, line 253 -- <menu
label="&charsetMenuMore.label;" accesskey="&charsetMenuMore.accesskey;"
datasources="rdf:charset-menu" ref="NC:BrowserMoreCharsetMenuRoot">

Don't forget to get ff moa.
Attachment #142680 - Flags: review?( → review+
thanks all. patch checked in.
Closed: 24 years ago20 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.