Closed
Bug 55300
Opened 24 years ago
Closed 21 years ago
UTF-XXX support issues.
Categories
(Core :: Internationalization, enhancement, P3)
Core
Internationalization
Tracking
()
RESOLVED
FIXED
mozilla1.7beta
People
(Reporter: keka, Assigned: smontagu)
References
Details
(Keywords: intl)
Attachments
(1 file, 2 obsolete files)
32.53 KB,
patch
|
neil
:
review+
blizzard
:
superreview+
|
Details | Diff | Splinter Review |
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; WinNT4.0; en-US; m18) Gecko/20000929
Netscape6/6.0b3
BuildID: 2000092908
Reproducible: Always
Steps to Reproduce:
Actual Results:
Expected Results:
UTF-7 should not be presented as a option for web pages; no web page should ever
be encoded in UTF-7. However, UTF-16 (and byte order variants) should be
presented as explicit options, since that is a viable and recommended
encoding(s) for web pages, esp. for XML/XHTML.
"UTF-7 (deprecated)" could be kept for e-mail and """news""". It should
explicitly say "deprecated".
Comment 3•24 years ago
|
||
UTF-7 is needed for viewing UTF-7 RFC itself
UTF-16 is not needed explicitly because the BOM in the beginning will tell us it
is UTF-16.
Status: UNCONFIRMED → RESOLVED
Closed: 24 years ago
Resolution: --- → WONTFIX
"Viewing the UTF-7 RFC itself" hardly counts as an argument.
UTF-7 (which is a misnomer, it's not a UTF, it's a TES, Transfer Encoding
Syntax) is and remain outdated, and was ever intended only for e-mail,
never for general use. Like Quoted-Printable (another TES), UTF-7 should be
confined to e-mail.
Further, there is no complete requirement to use a "BOM" with UTF-16,
even though the XML specification says so. It may or may not be present
with plain text, and even the XML specification allows for "UTF-16" initial
detection when there is no "BOM".
Finally, UTF-32 should also be supported: UTF-32BE, UTF-32, and UTF-32LE,
now that UTF-32 will be a standard annex to the Unicode standard.
Status: VERIFIED → UNCONFIRMED
Resolution: WONTFIX → ---
Comment 6•24 years ago
|
||
Can we get a status update on this one? (not that I like to push ;-)
Comment 8•24 years ago
|
||
What is "Wrong" to let user view UTF-7 pages ?
We do support "UTF-16" , 'UTF-16BE", "UTF-16LE" ,"UTF-32" , "UTF-32BE" and
"UTF-32LE" . What make you said we do not support them now ?
Updated•24 years ago
|
Status: NEW → ASSIGNED
Summary: character encoding support → UTF-XXX supprot issues.
a. Not all "charsets" are created equal. Netscape recognises
that already by just providing support for a subset of the
IANA registered "charsets".
b. Some IETF registered "charsets" are what UTR 17 now calls
CESes, character encoding schemes.
c. However, some IETF registered "charsets" are not CESes. Since
IEFT decided to have only a few, not to be extended, "content-
transfer-encoding"s (7bit, 8bit, Quoted-Printable, and Base64),
all other encodings that were really content-transfer-encodings
of text, had to be registed as "charsets" rather than extending
the set of "content-transfer-encodings". One of these,
specifically aimed ad 7-bit-only channel e-mails was UTF-7.
UTF-7 is misnamed, since it does not play in the same ball-
park as the other UTFs. UTF-7 is, just like Latin-1+Quoted-
Printable, a transfer encoding. UTR 17 call them TESes,
Transfer Encoding Syntaxes. For formal reasons most other
character encodings where a TES have been applied are NOT
supported by the browser, but only for e-mail.
d. The rôle of UTF-7 has been outdated by the increasing
support for 8bit content-transfer for e-mail, or even
the use of Base64 or QP together with the UTF-8 CES.
e. UTF-7 is not a part of Unicode 3.0; it has been withdrawn
as an (TES) encoding that conforms to Unicode. It was
never specified by ISO. UTF-7 should olny be interpreted
for incoming e-mail, never for outgoing e-mail, definitely
never for web pages, wether external or being edited.
f. Unicode consortium recommends to NOT create any more
(e-mail) data in UTF-7. It NEVER was intended for
any other kind of data, like web pages or plain text.
g. The character encoding menues in Netscape 6 lists
UTF-8 (good), and UTF-7 (bad, remove). But they does not
list UTF-16(BE/LE), nor UTF-32(BE/LE). So one cannot
set the browser to use, e.g., UTF-16 for a given page,
nor can one save an edited page (via Netscape's editor)
in, e.g., UTF-16BE.
h. The encoding called HZ is similarly a TES for 7-bit only
channels, and should not be let out of the SMTP cage
(for which they were designed). HTTP is always an 8-bit
channel.
i. Side remark: IE has "autodetect" for UTF-7, and it often
*wrongly* concludes that a page is in UTF-7 whenever texts
like U+nnnn (where nnnn are hexadecimal digits).
j. If you wish to support more Unicode encodings (CES level)
SCSU and UTF-EBCDIC are more worthy candidates than the
TES UTF-7. (IE 5.x under Windows 2000 supports a number
of EBCDIC encodings...) UTF-7 is best left forgotten.
k. XML "requires" a BOM for UTF-16 (clause 4.3.3), but then
in annex F gives examples of UTF-16 XML without BOM. Plain
text in UTF-16 does not need a BOM, of course.
l. Note that the UTC recently changed requirements on UTF-8
so that "illegal" UTF-8 sequences not only must not be
emitted, but are not to be accepted either. For XML, in
addition, the "irregular" UTF-8 sequences are not to be
accepted (they never were allowed anywhere, specifically
not by XML).
Summary: UTF-XXX supprot issues. → UTF-XXX support issues.
Comment 11•22 years ago
|
||
*** Bug 155184 has been marked as a duplicate of this bug. ***
Comment 12•22 years ago
|
||
In particular I was not able to view a UTF-16LE plain text document without a
BOM without first viewing a plain text document with a valid UTF-16LE BOM.
Comment 13•22 years ago
|
||
The list of bug opened related to UTF-16 shows there's quite a few special
situation where it would be really convenient to have the option to force
recognition of UTF-16XX.
I'd wish that this would be taken into consideration, even if the option is
initially hidden so that you can only make it appear with the customize option
of the view/character coding menu.
This is something that was suggested in bug 42893 comment #26 (saving composer
pages in UTF-16), so this solution could cover both problems.
Assignee | ||
Comment 14•21 years ago
|
||
Taking myself. http://www.topjobs.ie is an example of a site in UTF-16LE without
BOM.
Assignee: ftang → smontagu
Status: ASSIGNED → NEW
Comment 15•21 years ago
|
||
I agree to comment #13. It should be possible to force UTF-16/32(LE|BE).
Keywords: intl
Assignee | ||
Comment 16•21 years ago
|
||
Making dependent on bug 42893, since Jungshik's patch there includes browser as
well as composer.
Depends on: 42893
Comment 17•21 years ago
|
||
This is an update to my patch attached to bug 42893. I'm posting it here
instead of bug 42893 because I haven't yet turned on UTF-16/32 for composer in
this patch but does everything (except for blocking UTF-7) mentioned here.
Enabling UTF-16/32 in 'SaveAsCharset' needs some more work and I'm gonna do it
in bug 42893. In the mean time, I think it's better to make it possible to
choose 'UTF-16/32' for web pages.
Neil, this patch works perfectly well for Mozilla, but for some reason I
couldn't make 'Unicode' menu show up in Firebird (View | Character (En)coding |
More). Can you take a look what I'm missing?
Comment 18•21 years ago
|
||
I'm now reversing the relationship between this bug and bug 42893.
Comment 19•21 years ago
|
||
(In reply to comment #17)
> Neil, this patch works perfectly well for Mozilla, but for some reason
> I couldn't make 'Unicode' menu show up in Firebird (View | Character
> (En)coding | More). Can you take a look what I'm missing?
>
http://lxr.mozilla.org/mozilla/source/browser/base/content/browser-menubar.inc
Comment 20•21 years ago
|
||
At the end of attachment 140846 [details] [diff] [review] is a patch against
browser/base/content/browser-menubar.inc
Even with that, it didn't work. I'll try again.
BTW, I guess I should not add 'Unicode' menu to mailview and mailedit because
only UTF-8 makes sense there. Other UTF's will be listed for mailview,
nonetheless.
Comment 21•21 years ago
|
||
Oops, sorry, I can't have woken up when I wrote that...
Comment 22•21 years ago
|
||
I haven't yet figured out why it doesn't work in firebird. Anyway, I'll try to
get this in before 1.7beta.
Target Milestone: Future → mozilla1.7beta
Comment 23•21 years ago
|
||
attacahment 140846 worked after I clobbered and rebuilt firefox. Anyway, I got
rid of some 'pollution' (from other patches).
In addition, I made mailedit and mailview window NOT have 'Unicode' in View |
Character Coding | More. Because UTF-8 is added to the 'static' list, not
having 'Unicode' in mailedit and mailview shouldn't matter. The only problem
here is that there's no way to force UTF-7 in mailview window. However, it
should matter very little in practice because no one in his sane mind would
send out emails in UTF-7 in 2004. If it's really necessary, I can deal with it
in another bug (with more fine-grained distinctions between various character
encodings. Currently, we have 'notForBrowser' and 'notForOutGoing'. I can
implement 'notForMailView' and 'notForMailEdit' as originally planned back in
1999?.)
UTF-16/32* are hidden by default for mailedit but can be exposed by
'customizing' the list (by a die-hard user). This problem can also be handled
by adding more categories as mentioned above.
Updated•21 years ago
|
Attachment #140846 -
Attachment is obsolete: true
Comment 24•21 years ago
|
||
Attachment #142678 -
Attachment is obsolete: true
Comment 25•21 years ago
|
||
Comment on attachment 142680 [details] [diff] [review]
update (with a patch from another bug removed)
asking for r/sr.
Attachment #142680 -
Flags: superreview?(blizzard)
Attachment #142680 -
Flags: review?(neil.parkwaycc.co.uk)
Updated•21 years ago
|
Attachment #142680 -
Flags: superreview?(blizzard) → superreview+
Comment 26•21 years ago
|
||
Comment on attachment 142680 [details] [diff] [review]
update (with a patch from another bug removed)
>+utf-8.LangGroup = x-unicode
> utf-16be.LangGroup = x-unicode
> utf-16le.LangGroup = x-unicode
> utf-32be.LangGroup = x-unicode
> utf-32le.LangGroup = x-unicode
> utf-7.LangGroup = x-unicode
>-utf-8.LangGroup = x-unicode
I must admit I don't see the point of this :-)
> <!ENTITY charsetMenuMore2.accesskey "E">
> <!ENTITY charsetMenuMore3.label "East Asian">
> <!ENTITY charsetMenuMore3.accesskey "A">
> <!ENTITY charsetMenuMore4.label "SE & SW Asian">
> <!ENTITY charsetMenuMore4.accesskey "S">
> <!ENTITY charsetMenuMore5.label "Middle Eastern">
> <!ENTITY charsetMenuMore5.accesskey "m">
>+<!ENTITY charsetMenuUnicode.label "Unicode">
>+<!ENTITY charsetMenuUnicode.accesskey "u">
Should me an uppercase U, to match the U of Unicode... feel free to fix the "m"
too... also, speaking of accesskeys, you might want to fix a missing accesskey
in charsetOverlay.xul; these three lines should be identical, but the middle
one is missing its accesskey:
/xpfe/global/resources/content/charsetOverlay.xul, line 38 -- <menu
label="&charsetMenuMore.label;" accesskey="&charsetMenuMore.accesskey;"
datasources="rdf:charset-menu" ref="NC:BrowserMoreCharsetMenuRoot">
/xpfe/global/resources/content/charsetOverlay.xul, line 137 -- <menu
label="&charsetMenuMore.label;" datasources="rdf:charset-menu"
ref="NC:BrowserMoreCharsetMenuRoot">
/xpfe/global/resources/content/charsetOverlay.xul, line 253 -- <menu
label="&charsetMenuMore.label;" accesskey="&charsetMenuMore.accesskey;"
datasources="rdf:charset-menu" ref="NC:BrowserMoreCharsetMenuRoot">
Don't forget to get ff moa.
Attachment #142680 -
Flags: review?(neil.parkwaycc.co.uk) → review+
Comment 27•21 years ago
|
||
a=pch
Comment 28•21 years ago
|
||
thanks all. patch checked in.
Status: NEW → RESOLVED
Closed: 24 years ago → 21 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•