Default encoding for XUL/XML/RDF should use UTF-8

VERIFIED FIXED in M5

Status

()

Core
XML
P1
critical
VERIFIED FIXED
19 years ago
19 years ago

People

(Reporter: rchen, Assigned: Nisheeth Ranjan)

Tracking

Trunk
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

19 years ago
I seperate the default encoding to UTF-8 from bug 4431.

Updated

19 years ago
Assignee: trudelle → danm
Severity: normal → enhancement

Comment 1

19 years ago
changing severity to enhancement, assigning to danm

Comment 2

19 years ago
Peter, this is not an enhancement, check this reference.
http://www.w3.org/TR/1998/REC-xml-19980210#charencoding

We have as part of M4 a goal of pseudo localizing XUL into Japanese, and this is
a blocking issue. Please, modify the priority and set the TFV M4

Updated

19 years ago
QA Contact: 4015 → 4338
(Reporter)

Comment 3

19 years ago
Peter,

The fix should be in M4 and severity is major.

Thank you.

Updated

19 years ago
Assignee: danm → hyatt
Severity: enhancement → major
Target Milestone: M4

Comment 4

19 years ago
Okay. reassigning to hyatt for m4

Comment 5

19 years ago
I have no idea what this means.  Isn't nsString unicode?  What is the problem?

Comment 6

19 years ago
nsString is UTF-16 Unicode which is a 16-bit value for all characters.
But the XML is likely to be be UTF-8.

If an XML file does not contain:
   <?xml encoding='IANA-charset-name'?>
then by default the encoding is either UTF-16 or UTF-8.  For most interesting
cases, each UTF-16 character is a fixed 2-byte quantity, but these can be
affected by endian-ness.  So you need to check if the file starts with a
Byte Order Mark (BOM).  It will either be FEFF or FFFE, depending on the
endian-ness.

If there's a BOM, the data is UTF-16.  Otherwise it is UTF-8 which is a
byte-stream and unaffected by endian-ness.  ASCII is a proper subset of
UTF-8 and each character is represented as 1 byte.  Other character sets
are encoded as multiple bytes per character (e.g., accented characters).

We plan to create all of the 5.0 XUL files in UTF-8, so we need to convert
from UTF-8 to UTF-16.

For M4 we need UTF-8 default supported.  Later we will need UTF-16 and the
  <?xml encoding='IANA-charset-name'?>
supported.  But that is another bug: 4431

Comment 7

19 years ago
Is this only XUL's problem?  Are XML and HTML supporting UTF-8?

Comment 8

19 years ago
I'm trying to figure out if this problem is with the XML parser or if it's with
the XUL content sink (or both).

Comment 9

19 years ago
I think its the XML parser.  It needs to parse:
   <?xml encoding='IANA-charset-name'?>
as spec'd in http://www.w3.org/TR/1998/REC-xml-19980210#charencoding.

(As it did in pre-5.0 browsers, HTML needs to support UTF-8 if specified,
but HTML defaults to ISO-8859-1 not UTF-8.  Our implementation is more
complicated because we take into account user settings too.)

Another thing that affects both XML and HTML is the HTTP Content-Type, e.g.,
     Content-Type: text/xml; charset=UTF-8
or
     Content-Type: text/html; charset=UTF-8

This needs to be parsed by netlib for both HTML and XML.  ftang is working
w/gagan and rickg on this.

For M4, we need the default XML behavior.  For M4, we don't need the above HTTP
header and the XML <?xml encoding ...> parsing.

Comment 10

19 years ago
Be advised that we are supposed to switch to XPAT by M4. One thing to try is
recompile the client with  xpat turned on.

Add Nisheeth to CC list for comments.

Comment 11

19 years ago
Be advised that we are supposed to switch to XPAT by M4. One thing to try is
recompile the client with  xpat turned on.

Add Nisheeth to CC list for comments.

Comment 12

19 years ago
As an additional fact, UTF-8 HTML display is working on the
current (4/1 build) as long as the Meta-Equiv Content-Type
header icnludes "charset = utf-8".

Updated

19 years ago
Assignee: hyatt → ftang

Comment 13

19 years ago
Hyatt, I hope you don't mind I reassign this to myself . I have fix that in
nsParser (approved by rickg) to use UTF-8 as default charset for RDF, XML, or
XUL. Check in as mozilla/htmlparser/src/nsParser.cpp 3.81

I verify this w/ my psueod l10n file. The button show up correctly. Howerver,
the menu still display garbage, but this is a seperate issue. Let's put the menu
display problem into a seperate bug.

Updated

19 years ago
Status: NEW → RESOLVED
Last Resolved: 19 years ago
Resolution: --- → FIXED
(Reporter)

Updated

19 years ago
Status: RESOLVED → REOPENED
(Reporter)

Comment 14

19 years ago
I re-open the bug because there is something wrong with the newer builds. Here
are the results with UFT8 JA pseudo navigator.xul on two OS machines:

Build 04-07-11
  JA- NT :  menu, buttons, status bar display fine.
  US- NT with J fonts: buttons, status bar display fine but not menu (it shows
????). The result is similar to MAC.

  Build 04-08-10 04-09-12 and 04-09-16
  JA- NT :  Only display frame window, no menu, buttons, status bar.
  US- NT with J fonts: Only display frame window, no menu, buttons, status bar.

Updated

19 years ago
Target Milestone: M4 → M5

Comment 15

19 years ago
Moving to M5, we need to investigate more and it shouldn't be a show stopper

Updated

19 years ago
Assignee: ftang → nisheeth
Status: ASSIGNED → NEW

Comment 16

19 years ago
ToNewCString in nsExpatTokenizer::ConsumeToken cause the damange of the data.It damange the Unicode data which already get conveterted (by assuming
UTF-8)

Nisheeth, please fixed it ASAP, None of our XUL/XML/RDF work without this.This is a blocker for L10N and pseudo L10N

Updated

19 years ago
QA Contact: 4338 → 4475

Comment 17

19 years ago
This is also blocking viewing message headers (2671).

Updated

19 years ago
Resolution: FIXED → ---

Comment 18

19 years ago
This has been reopened because of the switch to expat.
Clearing resolution FIXED.
(Assignee)

Updated

19 years ago
Status: NEW → ASSIGNED
Component: XUL → XML
(Assignee)

Comment 19

19 years ago
Accepting bug.  Setting component to XML...
(Assignee)

Updated

19 years ago
Status: ASSIGNED → RESOLVED
Last Resolved: 19 years ago19 years ago
Resolution: --- → FIXED
(Assignee)

Comment 20

19 years ago
The fix is checked in.  Expat now accepts unicode buffers.
(Assignee)

Comment 21

19 years ago
*** Bug 4431 has been marked as a duplicate of this bug. ***
(Assignee)

Comment 22

19 years ago
*** Bug 5262 has been marked as a duplicate of this bug. ***
(Reporter)

Updated

19 years ago
Status: RESOLVED → VERIFIED
(Reporter)

Comment 23

19 years ago
Verified on Japanese NT4.0.
You need to log in before you can comment on or make changes to this bug.