isXMLName() should be properly implemented

RESOLVED FIXED

Status

RESOLVED FIXED
14 years ago
13 years ago

People

(Reporter: martin.honnen, Assigned: igor)

Tracking

(Blocks: 1 bug)

1.6R1
x86
Windows XP

Details

Attachments

(1 attachment, 1 obsolete attachment)

(Reporter)

Description

14 years ago
When I try
  isXMLName(String.fromCharCode(8364) + '1')
with Rhino 1.6 release 1 2004 11 30 it yields true while I think the character
with Unicode 8364 (it is the Euro symbol '€') is not allowed as the first
character in an XML name, not even allowed in there at all.
(Reporter)

Comment 1

14 years ago
The problem with isXMLName doesn't seem to be restricted to that particular
case, here are some tests where Rhino all yields true while the result should be
false I think:

Rhino 1.6 release 1 2004 11 30
js> isXMLName(String.fromCharCode(8364) + '1')
true
js> isXMLName('-el')
true
js> isXMLName('1el')

So changing summary.

Hmm, I have just looked at the source and indeed the implementation currently is

    public boolean isXMLName(Context cx, Object name)
    {
        // TODO: Check if qname.localName() matches NCName

        return true;
    }

so obviously this is a known issue.
Summary: isXMLName(String.fromCharCode(8364) + '1') should give false → isXMLName() should give false for arguments String.fromCharCode(8364) + '1', '-el', isXMLName('1el'), needs to be implemented
(Assignee)

Comment 2

14 years ago
Changing the title to reflect the real nature of the bug: currently isXMLName()
in Rhino simply returns true.

Summary: isXMLName() should give false for arguments String.fromCharCode(8364) + '1', '-el', isXMLName('1el'), needs to be implemented → isXMLName() should be properly implemented
(Assignee)

Updated

14 years ago
Blocks: 270779
(Assignee)

Comment 3

14 years ago
Created attachment 171078 [details] [diff] [review]
Fix: just follow E4X 13.1.2.1 and http://w3.org/TR/xml-names11/#NT-NCName
(Assignee)

Comment 4

14 years ago
Created attachment 171174 [details] [diff] [review]
Patch change to work around jikes compiler bug
(Assignee)

Updated

14 years ago
Attachment #171078 - Attachment is obsolete: true
(Assignee)

Comment 5

14 years ago
I committed the fix
Status: NEW → RESOLVED
Last Resolved: 14 years ago
Resolution: --- → FIXED
(Assignee)

Comment 6

14 years ago
(In reply to comment #0)
> When I try
>   isXMLName(String.fromCharCode(8364) + '1')
> with Rhino 1.6 release 1 2004 11 30 it yields true while I think the character
> with Unicode 8364 (it is the Euro symbol '€') is not allowed as the first
> character in an XML name, not even allowed in there at all.

BTW, according to http://www.w3.org/TR/xml11#NT-NameStartChar the characters
within [#x2070-#x218F] are allowed as first XML name character so € (8364 or 0x
20ac) is allowed
(Reporter)

Comment 7

14 years ago
(In reply to comment #6)
 
> according to http://www.w3.org/TR/xml11#NT-NameStartChar the characters
> within [#x2070-#x218F] are allowed as first XML name character so € (8364 or 0x
> 20ac) is allowed

Only that E4X edition 1 (ECMA-357) only refers to XML 1.0 and Namespaces for XML
but not to XML 1.1 and in XML 1.0 the Euro character '€' is not allowed (inside
names).

Have you now implemented isXMLName following the rules of the XML 1.1
specification? That will break compatibility between Spidermonkey E4X and Rhino
E4X then as I think Spidermonkey attempts to implement XML 1.0 rules.
(Assignee)

Comment 8

14 years ago
(In reply to comment #7)
> 
> Have you now implemented isXMLName following the rules of the XML 1.1
> specification? That will break compatibility between Spidermonkey E4X and Rhino
> E4X then as I think Spidermonkey attempts to implement XML 1.0 rules.

You are right, I just followed XML 1.1 rules while E4X refer to XML 1.0. Now
rules in XML 1.0, http://w3.org/TR/2004/REC-xml-20040204/#NT-Name , are much
more complex then in XML 1.1 and implementing them directly would lead to a huge
bloat. 

Note that it is not possible AFAICS to use java.lang.Character methods directly
since in JDK 1.4 they refer to Unicode 3.0, in JDK 1.5 they refer to Unicode 4.0
while XML 1.0 uses Unicode 2.0. In a sense following XML 1.1 is much simpler but
not E4X-compliant.
Hi.

While the number of distinct ranges that cover the XML 1.0 name characters is greater than that of XML 1.1 name characters, using a lookup table where each bit represents a character in plane 0 shouldn't be too much of a bloat.  For example, see the arrays:

http://svn.apache.org/viewcvs.cgi/xmlgraphics/batik/trunk/sources/org/apache/batik/xml/XMLCharacters.java?rev=216064&view=markup

and the methods to look up the arrays:

http://svn.apache.org/viewcvs.cgi/xmlgraphics/batik/trunk/sources/org/apache/batik/xml/XMLUtilities.java?rev=216064&view=markup

that are used in Batik.  You're welcome to use the arrays for Rhino (barring any licence complications).
You need to log in before you can comment on or make changes to this bug.