Last Comment Bug 4431 - XML must handle <?xml encoding=...?>
: XML must handle <?xml encoding=...?>
Product: Core
Classification: Components
Component: XML (show other bugs)
: Trunk
: x86 Windows NT
P1 major (vote)
: M10
Assigned To: Frank Tang
: rchen
: Andrew Overholt [:overholt]
Depends on:
Blocks: 7228
  Show dependency treegraph
Reported: 1999-03-30 17:41 PST by rchen
Modified: 1999-09-07 13:42 PDT (History)
7 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Description User image rchen 1999-03-30 17:41:43 PST
I tried to pseudo localize menu items in navigator.xul to Japanese by
putting encoding either "UTF-8", "ISO-10646-UCS-2" or "Shift_JIS" in text
declaration with menu items in encoded Japanese text. It failed and menu items
displayed in garbages with 3.18.99 M3 Windows build on Japanese NT 4.0.
Comment 1 User image Peter Trudelle 1999-03-30 17:50:59 PST
reassigned to saari as p3 for m5
Comment 2 User image bobj 1999-03-31 10:54:59 PST
How did you do this?  Did you use <META Content-Type ... charset= ...>?

By default XUL should use UTF-8.  My understanding is that all our .xul files
used in 5.0 will be UTF-8 encoded.  But you are correct, a XUL content developer
should be able to use other charset encodings.
Comment 3 User image bobj 1999-03-31 16:30:59 PST
Ignore my previous comments.  Let me start again.
Ray (not Gerardo),
How did you do this?  Did you use (XML syntax, not HTML syntax):
   <?xml encoding='IANA-charset-name'?>
[Ray told me he did.]

2 things are required:
(1) By default, XML assumes the charset encoding is either UCS-16 or UTF-8
    based upon the presence or not of a Byte-Order-Mark (BOM).  If the data
    is in UTF-8, we need to convert to UCS-16.  (The UTF-8 to UCS-2 converter
    is part of M4.  Pre-5.0 Communicator has code to check for BOM.)
(2) For other charset encodings, we must be able to parse
          <?xml encoding='IANA-charset-name'?>
    and call the appropriate Unicode converter
Part (1) is needed for M4 and part (2) could wait for M5.
See reference:
Comment 4 User image rchen 1999-03-31 16:46:59 PST
I have <?xml encoding='UTF-8'?> in the first line of my XUL file. I also have
Japanese text encoded in UTF-8 in menu items.
Comment 5 User image Frank Tang 1999-04-19 11:54:59 PDT
Add nisheeth to cc list since he own EXPAT integration. Saari, I have no idea why this bug assign to you. If you think you should not own this bug,
reassign this to nisheeth.
Comment 6 User image saari (gone) 1999-04-20 16:34:59 PDT
Reassigning to nisheeth
Comment 7 User image Nisheeth Ranjan 1999-04-24 03:12:59 PDT
*** This bug has been marked as a duplicate of 4463 ***
Comment 8 User image rchen 1999-04-26 14:42:59 PDT
Bug 4431 is not a duplicate of 4463.
4463 is for default encoding but 4431 is for general encoding. Currently if I
put Shift_JIS encoding with Shift JIS text, it crashes. I tested it with 4/26
build on Japanese NT 4.0.
Comment 9 User image bobj 1999-04-26 19:26:59 PDT
Suumarizing the new status of this bug:
The expat XML parser needs to parse

   <?xml encoding=...?>

and then call the approprate charset converter
Comment 10 User image Nisheeth Ranjan 1999-05-07 10:59:59 PDT
Setting component to XML and milestone to M6...
Comment 11 User image Nisheeth Ranjan 1999-05-12 14:57:59 PDT
Moving non-crasher XML bugs to M7...
Comment 12 User image Nisheeth Ranjan 1999-06-15 15:50:59 PDT
I've spoken to Harish and Frank about this.  We will implement this using the
observer mechanism we already have in place for META tags.  Harish implemented
the META tag mechanism and is ready to extend that to include observation of XML
PIs.  I'm re-assigning the bug to him and setting the milestone to M8...

Specifically, an observer will register an interest in the <?xml ?> PI and will
get notified by the parser when that PI is encountered.  The observer can then
check for the encoding attribute and tell the webshell to reload the document
with a new charset, if necessary.
Comment 13 User image harishd 1999-07-07 14:05:59 PDT
Have a fix but will not be checking in until M9.  Need time for verification.

Setting to M9.
Comment 14 User image bobj 1999-07-07 14:41:59 PDT
Do you need some test cases?  IQA may be able to assist.
Comment 15 User image harishd 1999-07-19 14:12:59 PDT
Fix is in.  Now, the observers can register for <?xml ?> PI and will
get notified by the parser when that PI is encountered.

Marking the bug fixed.
Comment 16 User image Katsuhiko Momoi 1999-08-23 22:56:59 PDT
Ray, can you verify that this got fiexed for M9?
Comment 17 User image Frank Tang 1999-08-24 01:24:59 PDT
harishd, I though you didn't fix this and we decide to use another way to handle
this. Should we reopen this bug and assign it to me (ftang) ?
Comment 18 User image Frank Tang 1999-08-25 12:39:59 PDT
reopen this bug and assign it to ftang.
Comment 19 User image Frank Tang 1999-08-25 12:40:59 PDT
mark this M10
Comment 20 User image Frank Tang 1999-08-30 17:05:59 PDT
Add code into nsParser.cpp to detect BOM and also implement the Appendix F of
XML 1.0
Comment 21 User image Allan Masri 1999-09-07 13:42:59 PDT
Used files at babel/automation/xmlencoding/ to verify this fix

Note You need to log in before you can comment on or make changes to this bug.