Closed
Bug 97054
Opened 23 years ago
Closed 23 years ago
UTF-16 character coding support
Categories
(Core :: Internationalization, defect, P4)
Core
Internationalization
Tracking
()
VERIFIED
WORKSFORME
mozilla0.9.9
People
(Reporter: ilya.konstantinov+future, Assigned: shanjian)
Details
(Keywords: intl, meta)
Attachments
(8 files)
As many applications today can use UTF-16 for editing text files, I think we
should add support for this character set (both in LE and BE forms).
Also, it's neccesary to autodetect UCS-2 encoded files, which can be derived
from the initial FF-FE (for LE order) or FE-FF (for BE order). Apparently,
that's what Internet Explorer does (keep in mind it's not possible to get the
<META> tags which specify the encoding *inside* the HTML, and not many authors
would know how to change the HTTP server's headers).
Updated•23 years ago
|
Comment 2•23 years ago
|
||
should this be assigned to Shanjian ?
Comment 3•23 years ago
|
||
Sorry Brian. I would like to assign this to ftang since it is a
new feature and we need to put this on our development schedule.
== assigning to ftang and changing to All platform/All OS.
Assignee: bstell → ftang
OS: Linux → All
Hardware: PC → All
Comment 4•23 years ago
|
||
mark it as m0.9.7
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla0.9.7
Comment 6•23 years ago
|
||
give this bug to shanjian to drive the feature. I belive this is a meta bug, we
need to identify other real bug to solve it to support UTF-16. shanjian- mid
priority project.
Assignee: ftang → shanjian
Status: ASSIGNED → NEW
Keywords: meta
Summary: UCS-2 character coding support → UTF-16 character coding support
Comment 7•23 years ago
|
||
I think bug 42893 has some implication for this bug. (hmm,
bugzilla may need 'related-to' relation in addition to 'blocks' and
'depends on' relations).
Last night, I stumbled upon a UTF-16LE encoded web page
(it was Hanja - Chinese character- dictionary in Korea).
They wrote that their pages are in Unicode and I assumed
that they're in UTF-8, but Mozilla can't render it while
MS IE can. It was not until I saved the source html file and examined
it that I realized that it's in UTF-16LE with BOM. I was about
to write to the webmaster of the site that using UTF-16 is a
violation of HTML and (s)he has to convert pages to UTF-8. Before
actually writing that,
I thought just in case I might as well check
the standard and it turned out that UTF-16 is a valid MIME charset
for html. That's how I found this bug along with bug 42893.
As jbetak wrote in his comment to bug 42893 and I've just confirmed
myself, it's trivial to add UTF-16LE and UTF-16BE to view|character encoding
menu and to make them work with actual UTF-16LE/UTF-16BE encoded
web pages(no change in actual code but just a few changes in *properties files)
because the necessary infrastructures are already in place with
the possible exception of automatic detection of endianness.
Would it be a bad idea to turn on UTF-16LE/UTF-16BE *now*
(because it comes almost free and some web sites are actually
encoded in UTF-16) and
to work on the automatic detection of endianness (with BOM)
and perhaps various transformation formats of Unicode (so that
Mozilla can have Unicode(Auto) for automatic detection
of UTF-8, UTF-16LE, UTF-16BE, UTF-32LE, UTF-32BE) *later*?
Assignee | ||
Comment 8•23 years ago
|
||
Jungshik Shin,
to support UTF16 without support surrogate should be a easy thing to do. Can you
post a website which is using UTF16 encoding?
Status: NEW → ASSIGNED
Comment 9•23 years ago
|
||
Yes, it's trivial to enable UTF-16 without surrogate support (I've
turned it on in my build).
As for web pages using UTF-16LE I mentioend, the site seems to be down
at the moment. UTF-16LE part (Chinese Character Dictionary) is
somewhere under http://ikc.korea.ac.kr/~cnsc, but I can't give you
the exact URL because the site is down.
If you just wanna test UTF-16 support on a simple page, I put up
two test pages in UTF-16LE and UTF-16BE at
http://jshin.net/moztest/css2.utf16le.html and
http://jshin.net/moztest/css2.utf16be.html
Comment 10•23 years ago
|
||
The web pages I found in UTF-16LE are at
http://ikc.korea.ac.kr/~cnsc/hidb/intro.htm
In the top frame, you may select the middle menu
(which is radical+stroke count index). Then,
in the middle frame, you'll find a list of
radicals. Click on any of radicals and you'll
get the page encoded in UTF-16LE in the bottom
frame. If you click on any of Chinese characters
in the bottom frame, the right frame will show
you some information about the character (Korean
pronunciation, Unicode/ISO 10646 code point,
KS X, GB, CNS, JIS, VN code points, etc)
There doesn't seem to be much need for using any
form of Unicode because they're using 96x96(??)
GIF images to represent Chinese characters
other than radicals. Well, there must be some radicals
not representable in legacy encodings.
Comment 11•23 years ago
|
||
I was just about to enter this bug for Mozilla 0.9.5. This is very important
that UTF-16 is supported as UTF-16 is apart of the XML 1.0 specification and
must be supported. Also, Unicode must be support as apart of HTML 4.0.1
specification, but there is no mention of the actual encoding scheme that needs
to be supported or the default encoding scheme; at least I couldn't find it...
I tried Netscape 6.1, and it exhibits the same behavior under Windows, but
somehow works under Mac OS.
This issue should be a show stopper for version 1.0. 2-byte content can get
very large under UTF-8, so UTF-16 is desperately needed...
Comment 12•23 years ago
|
||
Please move this as BLOCKER.
I would like to run tests for UTF-16. I am stopped from using these tests.
I found a bug under Netscape 6.1 on the Macintosh, where somehow the UTF-16 is
turned on in the build. I tried UTF-16 for XML, HTML, CSS-2, and text files.
It all seems to work well. However, it seems that CSS files in UTF-16 are
ignored, but UTF-8 ones work. I cannot test this CSS "feature" in Mozilla
because UTF-16 is not available.
Assignee | ||
Comment 13•23 years ago
|
||
move it to 0.9.8, but I will try to resolve it in 0.9.7.
Target Milestone: mozilla0.9.7 → mozilla0.9.8
Comment 14•23 years ago
|
||
Thanks. Just not that all testing for me is BLOCKED. :'-(
I cannot proceed. I have pending bugs that I cannot verify.
Assignee | ||
Comment 15•23 years ago
|
||
Jungshik, Joaquin,
I need to have a test case to go on. As far as I can recall, html parser does
detect BOM. Even though UTF-16LE, UTF-16BE is not available in charset menu, we
do support them. The reason not putting them in charset menu is that we don't
need/want user's intervention here. Since BOM is mandatory for UTF-16XX, we can
always identify them when they are. Using
"http://ikc.korea.ac.kr/~cnsc/hidb/under.htm" as an example, UTF-16LE is
identified and correctly marked in charset menu.
Joaquin seems suggesting that css in UTF-16XX encoding is ignored. Is UTF-16XX
allowed to encoding CSS files? If so, could you attach your test case here? We
probably does not detect UTF-16XX in some of the parser or parser path. If that
is the case, I will fix it.
Could you also please provide UTF-16 testcases for XML, please?
Comment 17•23 years ago
|
||
Comment 18•23 years ago
|
||
This requires the needed CSS file. It can be opened in Word2k, Notepad under
WinNT/2k, or other program that supports Unicode UTF16.
Comment 19•23 years ago
|
||
Other XML and HTML might depend on this. It can be opened in Word2k, Notepad
under WinNT/2k, or other program that supports Unicode UTF16.
Comment 20•23 years ago
|
||
This can test wether CSS works smoothly with UTF-16 document
Comment 21•23 years ago
|
||
Some test cases depend on this document.
Comment 22•23 years ago
|
||
Comment 23•23 years ago
|
||
Comment 24•23 years ago
|
||
Assignee | ||
Updated•23 years ago
|
Target Milestone: mozilla0.9.8 → mozilla0.9.9
Assignee | ||
Comment 25•23 years ago
|
||
I tried all the 6 testcases with recent trunk build on windows, and everything
works perfect. So could someone tell me what is the problem?
Reporter | ||
Comment 26•23 years ago
|
||
Works for me on Linux too (Mozilla 0.9.8).
Assignee | ||
Comment 27•23 years ago
|
||
Resolve it as WFM.
Reopen the bug if somebody still experience the problem.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•