XML elment name in gb18030 surrogate character doesn't work

VERIFIED INVALID

Status

()

Core
Internationalization
VERIFIED INVALID
16 years ago
16 years ago

People

(Reporter: Yuying Long, Assigned: Shanjian Li)

Tracking

({intl})

Trunk
x86
Windows XP
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(3 attachments)

480 bytes, application/vnd.mozilla.xul+xml
Details
1.03 KB, application/vnd.mozilla.xul+xml
Details
656 bytes, application/vnd.mozilla.xul+xml
Details
(Reporter)

Description

16 years ago
Build: 04-02 trunk build on WinXP-SimpChinese

XML elment name in gb18030 surrogate characters doesn't work, although it works
with the gb18030 surrogate characters contents of rugular element names.
(Reporter)

Comment 1

16 years ago
Created attachment 77286 [details]
doesn't work test page
(Reporter)

Comment 2

16 years ago
Created attachment 77287 [details]
a worked test case

This test page works with gb18030 surrogate characters content but not in
element delaration

Comment 3

16 years ago
->shanjian
Assignee: yokoyama → shanjian
Summary: XML elment name in gb18030 surrogate character doesn't work → XML elment name in gb18030 surrogate character doesn't work
(Reporter)

Comment 4

16 years ago
Created attachment 77288 [details]
Another not working test page
(Reporter)

Updated

16 years ago
Keywords: intl
QA Contact: ruixu → ylong

Comment 5

16 years ago
Can you create test cases with surrogate in UTF-8 ?

Comment 6

16 years ago
ok, here is what happen
in XML 1.0 see http://www.w3.org/TR/2000/REC-xml-20001006
a well-formed xml is defined as follow
http://www.w3.org/TR/2000/REC-xml-20001006#sec-well-formed
>[1]    document    ::=    prolog element Misc*

and if you look at the definitation of element
http://www.w3.org/TR/2000/REC-xml-20001006#NT-element
>[39]    element    ::=    EmptyElemTag | STag content ETag
and if you look at the definitation of STag

http://www.w3.org/TR/2000/REC-xml-20001006#NT-STag
>[40]    STag    ::=    '<' Name (S Attribute)* S? '>' [WFC: Unique Att Spec]
and look at the definitation of Name
http://www.w3.org/TR/2000/REC-xml-20001006#NT-Name

>[5]    Name    ::=    (Letter | '_' | ':') (NameChar)*
and you look at NameChar
http://www.w3.org/TR/2000/REC-xml-20001006#NT-NameChar
>[4]    NameChar    ::=    Letter | Digit | '.' | '-' | '_' | ':' |
CombiningChar | Extender 

and look at Letter
http://www.w3.org/TR/2000/REC-xml-20001006#NT-Letter
>[84]    Letter    ::=    BaseChar | Ideographic


you will see both Unicode ideograph extension A (U+3400-U+4dff) and Extension B
(in surrogate) are not listed for BaseChar nor Ideographic. Therefore, those
characters cannot be used as Name in XML 1.0

We should talk to XML author about this issue and maybe they will change it for
later version of XML

but untill then, this is an invalid bug. 



Status: NEW → RESOLVED
Last Resolved: 16 years ago
Resolution: --- → INVALID

Comment 7

16 years ago
cc tbray@textuality.com
(Reporter)

Comment 8

16 years ago
Mark as verified not a mozilla problem according to Frank's comment.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.