I seperate the default encoding to UTF-8 from bug 4431.
changing severity to enhancement, assigning to danm
Peter, this is not an enhancement, check this reference.
We have as part of M4 a goal of pseudo localizing XUL into Japanese, and this is
a blocking issue. Please, modify the priority and set the TFV M4
The fix should be in M4 and severity is major.
Okay. reassigning to hyatt for m4
I have no idea what this means. Isn't nsString unicode? What is the problem?
nsString is UTF-16 Unicode which is a 16-bit value for all characters.
But the XML is likely to be be UTF-8.
If an XML file does not contain:
then by default the encoding is either UTF-16 or UTF-8. For most interesting
cases, each UTF-16 character is a fixed 2-byte quantity, but these can be
affected by endian-ness. So you need to check if the file starts with a
Byte Order Mark (BOM). It will either be FEFF or FFFE, depending on the
If there's a BOM, the data is UTF-16. Otherwise it is UTF-8 which is a
byte-stream and unaffected by endian-ness. ASCII is a proper subset of
UTF-8 and each character is represented as 1 byte. Other character sets
are encoded as multiple bytes per character (e.g., accented characters).
We plan to create all of the 5.0 XUL files in UTF-8, so we need to convert
from UTF-8 to UTF-16.
For M4 we need UTF-8 default supported. Later we will need UTF-16 and the
supported. But that is another bug: 4431
Is this only XUL's problem? Are XML and HTML supporting UTF-8?
I'm trying to figure out if this problem is with the XML parser or if it's with
the XUL content sink (or both).
I think its the XML parser. It needs to parse:
as spec'd in http://www.w3.org/TR/1998/REC-xml-19980210#charencoding.
(As it did in pre-5.0 browsers, HTML needs to support UTF-8 if specified,
but HTML defaults to ISO-8859-1 not UTF-8. Our implementation is more
complicated because we take into account user settings too.)
Another thing that affects both XML and HTML is the HTTP Content-Type, e.g.,
Content-Type: text/xml; charset=UTF-8
Content-Type: text/html; charset=UTF-8
This needs to be parsed by netlib for both HTML and XML. ftang is working
w/gagan and rickg on this.
For M4, we need the default XML behavior. For M4, we don't need the above HTTP
header and the XML <?xml encoding ...> parsing.
Be advised that we are supposed to switch to XPAT by M4. One thing to try is
recompile the client with xpat turned on.
Add Nisheeth to CC list for comments.
As an additional fact, UTF-8 HTML display is working on the
current (4/1 build) as long as the Meta-Equiv Content-Type
header icnludes "charset = utf-8".
Hyatt, I hope you don't mind I reassign this to myself . I have fix that in
nsParser (approved by rickg) to use UTF-8 as default charset for RDF, XML, or
XUL. Check in as mozilla/htmlparser/src/nsParser.cpp 3.81
I verify this w/ my psueod l10n file. The button show up correctly. Howerver,
the menu still display garbage, but this is a seperate issue. Let's put the menu
display problem into a seperate bug.
I re-open the bug because there is something wrong with the newer builds. Here
are the results with UFT8 JA pseudo navigator.xul on two OS machines:
JA- NT : menu, buttons, status bar display fine.
US- NT with J fonts: buttons, status bar display fine but not menu (it shows
????). The result is similar to MAC.
Build 04-08-10 04-09-12 and 04-09-16
JA- NT : Only display frame window, no menu, buttons, status bar.
US- NT with J fonts: Only display frame window, no menu, buttons, status bar.
Moving to M5, we need to investigate more and it shouldn't be a show stopper
ToNewCString in nsExpatTokenizer::ConsumeToken cause the damange of the data.It damange the Unicode data which already get conveterted (by assuming
Nisheeth, please fixed it ASAP, None of our XUL/XML/RDF work without this.This is a blocker for L10N and pseudo L10N
This is also blocking viewing message headers (2671).
This has been reopened because of the switch to expat.
Clearing resolution FIXED.
Accepting bug. Setting component to XML...
The fix is checked in. Expat now accepts unicode buffers.
*** Bug 4431 has been marked as a duplicate of this bug. ***
*** Bug 5262 has been marked as a duplicate of this bug. ***
Verified on Japanese NT4.0.