Closed
Bug 355674
Opened 18 years ago
Closed 12 years ago
Japanese character in E4X literal is incorrectly escaped
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
WONTFIX
People
(Reporter: jruderman, Unassigned)
References
Details
(Keywords: testcase)
Attachments
(3 files)
1.19 KB,
patch
|
Details | Diff | Splinter Review | |
1.19 KB,
text/plain
|
Details | |
2.12 KB,
text/html
|
Details |
js> f = eval("function() { return <\u3056/> }")
function () {
return <\u3056/>;
}
js> eval("" + f)
typein:9: SyntaxError: illegal XML character:
typein:9: return <\u3056/>;
typein:9: ............^
Reporter | ||
Comment 1•18 years ago
|
||
See also bug 349814 (fixed) and bug 352285.
Jeff, do you want this?
Comment 2•18 years ago
|
||
Monday and Tuesday next week are anti-suicide holidays at MIT (they call it Columbus Day, but we know better), so I should be able to look at this sometime over the "weekend".
Assignee: general → jwalden+bmo
Comment 3•18 years ago
|
||
There's only one option here: the string must contain the actual Unicode character. \ does not start an escape sequence in an XML initializer per 8.3, so we can't use \u3056. We could use ざ, which is the specified replacement for \ escapes per section 8.3, but entities aren't valid XML as element names:
data:application/xml,<ざ/>
(I assume the spec doesn't intend to make that actually valid as an initializer, but I could be wrong. I hope I'm not.) Consequently, we're left with the literal character itself as the only way to decompile the element name.
I don't know whether "%hc"/"%c"/something else is the right format here. %hc and %c work for me (the former partially because !JS_C_STRINGS_ARE_UTF8 means behavior fallthrough to %c) in that .charCodeAt(index) returns the right number, but I don't know if that's architecture-dependent behavior based on the size of the char type or something similar.
Attachment #242302 -
Flags: review?(brendan)
Comment 4•18 years ago
|
||
Hmm, so clearly there's something whacked with the fix, because this test fails (failure HTML to be posted momentarily).
Comment 5•18 years ago
|
||
Comment 6•18 years ago
|
||
Comment on attachment 242302 [details] [diff] [review]
Patch
Debugging with a breakpoint in str_indexOf shows that when we compare \u3056 in the decompiled version and in the string to be found, the decompiled version only includes the low 8 bits and not the high ones. I tried various things to work around this -- "%hs" with a jschar buf[2] = {c, 0}, "%c%c" with (c&0xFF, c>>8) and vice versa, and if I remember correctly "%lc" as well, but all failed for one reason or another, mostly due to not omitting the high-order bits.
There may be a way to do this, but I've scanned dosprintf enough and think I've tried all the plausible-looking ways to do this, so I don't think I'm going to be able to fix this without a suggestion on how to do it (other than the seemingly-overkill task of changing Sprinter to use jschar* instead of char*).
Attachment #242302 -
Flags: review?(brendan)
Updated•14 years ago
|
Assignee: jwalden+bmo → general
Comment 7•13 years ago
|
||
Jeff, the testcase you posted is the same as the patch you posted.
Comment 8•12 years ago
|
||
E4X will be removed again from Spidermonkey (bug 788293)
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•