Last Comment Bug 134963 - XML elment name in gb18030 surrogate character doesn't work
: XML elment name in gb18030 surrogate character doesn't work
Status: VERIFIED INVALID
: intl
Product: Core
Classification: Components
Component: Internationalization (show other bugs)
: Trunk
: x86 Windows XP
: -- normal (vote)
: ---
Assigned To: Shanjian Li
: Yuying Long
: Makoto Kato [:m_kato]
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2002-04-02 13:55 PST by Yuying Long
Modified: 2002-04-03 14:32 PST (History)
5 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
doesn't work test page (480 bytes, application/vnd.mozilla.xul+xml)
2002-04-02 13:57 PST, Yuying Long
no flags Details
a worked test case (1.03 KB, application/vnd.mozilla.xul+xml)
2002-04-02 14:02 PST, Yuying Long
no flags Details
Another not working test page (656 bytes, application/vnd.mozilla.xul+xml)
2002-04-02 14:05 PST, Yuying Long
no flags Details

Description Yuying Long 2002-04-02 13:55:19 PST
Build: 04-02 trunk build on WinXP-SimpChinese

XML elment name in gb18030 surrogate characters doesn't work, although it works
with the gb18030 surrogate characters contents of rugular element names.
Comment 1 Yuying Long 2002-04-02 13:57:15 PST
Created attachment 77286 [details]
doesn't work test page
Comment 2 Yuying Long 2002-04-02 14:02:26 PST
Created attachment 77287 [details]
a worked test case

This test page works with gb18030 surrogate characters content but not in
element delaration
Comment 3 Roy Yokoyama 2002-04-02 14:04:51 PST
->shanjian
Comment 4 Yuying Long 2002-04-02 14:05:06 PST
Created attachment 77288 [details]
Another not working test page
Comment 5 Frank Tang 2002-04-03 14:06:50 PST
Can you create test cases with surrogate in UTF-8 ?
Comment 6 Frank Tang 2002-04-03 14:20:16 PST
ok, here is what happen
in XML 1.0 see http://www.w3.org/TR/2000/REC-xml-20001006
a well-formed xml is defined as follow
http://www.w3.org/TR/2000/REC-xml-20001006#sec-well-formed
>[1]    document    ::=    prolog element Misc*

and if you look at the definitation of element
http://www.w3.org/TR/2000/REC-xml-20001006#NT-element
>[39]    element    ::=    EmptyElemTag | STag content ETag
and if you look at the definitation of STag

http://www.w3.org/TR/2000/REC-xml-20001006#NT-STag
>[40]    STag    ::=    '<' Name (S Attribute)* S? '>' [WFC: Unique Att Spec]
and look at the definitation of Name
http://www.w3.org/TR/2000/REC-xml-20001006#NT-Name

>[5]    Name    ::=    (Letter | '_' | ':') (NameChar)*
and you look at NameChar
http://www.w3.org/TR/2000/REC-xml-20001006#NT-NameChar
>[4]    NameChar    ::=    Letter | Digit | '.' | '-' | '_' | ':' |
CombiningChar | Extender 

and look at Letter
http://www.w3.org/TR/2000/REC-xml-20001006#NT-Letter
>[84]    Letter    ::=    BaseChar | Ideographic


you will see both Unicode ideograph extension A (U+3400-U+4dff) and Extension B
(in surrogate) are not listed for BaseChar nor Ideographic. Therefore, those
characters cannot be used as Name in XML 1.0

We should talk to XML author about this issue and maybe they will change it for
later version of XML

but untill then, this is an invalid bug. 



Comment 7 Frank Tang 2002-04-03 14:30:51 PST
cc tbray@textuality.com
Comment 8 Yuying Long 2002-04-03 14:32:27 PST
Mark as verified not a mozilla problem according to Frank's comment.

Note You need to log in before you can comment on or make changes to this bug.