Closed
Bug 153325
Opened 22 years ago
Closed 17 years ago
javascript href broken when a variable contains percent escaped umlaut
Categories
(Core :: DOM: Core & HTML, defect)
Core
DOM: Core & HTML
Tracking
()
RESOLVED
WORKSFORME
People
(Reporter: s.a.moeller, Unassigned)
Details
(Keywords: intl)
Attachments
(1 file, 2 obsolete files)
752 bytes,
text/html
|
Details |
From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.1a) Gecko/20020610 BuildID: 2002061108 javascript href is broken when a function's variable is containing an escaped umlaut (i.e. %c4) Reproducible: Always Steps to Reproduce: 1. use anything like <a href="javascript:alert('%c4')">js link</a> 2. click the link Actual Results: nothing happens Expected Results: alert box opens (with either %c4, or Ä, or whatever displayed on it)
Reporter | ||
Comment 1•22 years ago
|
||
Comment 2•22 years ago
|
||
Confirmed on 2002 061908 Win2k. Sending to parser since that's where the similar bug 51355 was.
Assignee: rogerl → harishd
Component: JavaScript Engine → Parser
OS: Linux → All
QA Contact: pschwartau → moied
Hardware: PC → All
Updated•22 years ago
|
Status: UNCONFIRMED → NEW
Ever confirmed: true
Parser does not process attribute values. I get an assertion ( "not a UTF8 string" )when I click on the problem link. ConvertUTF8toUCS2::write(const char * 0x0012f5f8, unsigned int 10) line 660 + 20 bytes nsCharSinkTraits<ConvertUTF8toUCS2>::write(ConvertUTF8toUCS2 & {...}, const char * 0x0012f5f8, unsigned int 10) line 571 copy_string(nsReadingIterator<char> & {"????????????????"}, const nsReadingIterator<char> & {"???????????"}, ConvertUTF8toUCS2 & {...}) line 90 + 39 bytes NS_ConvertUTF8toUCS2::Init(const nsACString & {...}) line 1350 + 35 bytes NS_ConvertUTF8toUCS2::NS_ConvertUTF8toUCS2(const nsACString & {...}) line 558 nsJSThunk::EvaluateScript() line 284 nsJSChannel::AsyncOpen(nsJSChannel * const 0x03b6e458, nsIStreamListener * 0x03990b98, nsISupports * 0x00000000) line 619 + 11 bytes nsDocumentOpenInfo::Open(nsIChannel * 0x03b6e458, int 1, nsISupports * 0x03a21270) line 170 + 18 bytes nsURILoader::OpenURIVia(nsURILoader * const 0x01a4c208, nsIChannel * 0x03b6e458, int 1, nsISupports * 0x03a21270, unsigned int 0) line 538 + 20 bytes nsURILoader::OpenURI(nsURILoader * const 0x01a4c208, nsIChannel * 0x03b6e458, int 1, nsISupports * 0x03a21270) line 500 nsDocShell::DoChannelLoad(nsIChannel * 0x03b6e458, nsIURILoader * 0x01a4c208) line 5184 + 39 bytes nsDocShell::DoURILoad(nsIURI * 0x03cb17c8, nsIURI * 0x03c38e48, nsISupports * 0x031c5ce8, nsIInputStream * 0x00000000, nsIInputStream * 0x00000000, int 1, nsIDocShell * * 0x00000000, nsIRequest * * 0x00000000) line 4959 + 38 bytes nsDocShell::InternalLoad(nsDocShell * const 0x03a21270, nsIURI * 0x03cb17c8, nsIURI * 0x03c38e48, nsISupports * 0x00000000, int 1, const unsigned short * 0x0012fc0c, nsIInputStream * 0x00000000, nsIInputStream * 0x00000000, unsigned int 2097153, nsISHEntry * 0x00000000, int 1, nsIDocShell * * 0x00000000, nsIRequest * * 0x00000000) line 4752 + 51 bytes nsWebShell::OnLinkClickSync(nsWebShell * const 0x03a213b4, nsIContent * 0x034ce0a0, nsLinkVerb eLinkVerb_Replace, const unsigned short * 0x03d1b9a8, const unsigned short * 0x100e5b00 gCommonEmptyBuffer, nsIInputStream * 0x00000000, nsIInputStream * 0x00000000, nsIDocShell * * 0x00000000, nsIRequest * * 0x00000000) line 619 + 91 bytes OnLinkClickEvent::HandleEvent() line 462 HandlePLEvent(OnLinkClickEvent * 0x03c8ea80) line 476 PL_HandleEvent(PLEvent * 0x03c8ea80) line 596 + 10 bytes PL_ProcessPendingEvents(PLEventQueue * 0x01497b18) line 526 + 9 bytes _md_EventReceiverProc(HWND__ * 0x000702f8, unsigned int 49272, unsigned int 0, long 21592856) line 1077 + 9 bytes
Assignee: harishd → rogerl
Component: Parser → JavaScript Engine
QA Contact: moied → pschwartau
Comment 4•22 years ago
|
||
Stefan: nice testcase! Reassigning to DOM Level 0. Stefan's testcase shows that certain %XX sequences work, but that others don't. That reminds me of bug 144429, "URL encoding in window.open regressed, URL parsing problem?" ------- Additional Comment_ #11 From Henrik Rundqvist 2002-05-16 05:11 ------- Another interesting thing about all this is why "%7E" (tilda) slips through but "%E4" (a Swedish character) does not? Study the attached testcase and you'll see what I mean. Response if you click on "window.open": "The requested URL /~Gle was not found on this server." Notice that the tilda is there. So only some %XX codes gets affected? ------- Additional Comment_ #12 From Johnny Stenback 2002-05-16 09:18 ------- The reason for tilde going through but Scandinavian chanracters not going through is that we assume that the string is a UTF8 string once we've unescaped it. We try to convert the UTF8 string to a unicode string and that can't be done for non-UTF8 encoded non-ASCII characters such as %E4. It's a bug in nsJSProtocolHandler.cpp...
Assignee: rogerl → jst
Component: JavaScript Engine → DOM Level 0
QA Contact: pschwartau → desale
Comment 6•21 years ago
|
||
The fix for bug 44272 made the testcase #4 gives me A-umlaut in the alert box if character encoding is ISO-8859-1/15. But, there's something more in this bug so that I'm keeping this open. It seems wrong to interpret '%C4' as U+00C4 in JS string literal. Shouldn't it be considered literal '%C4' (three characters). I have to look up ECMAscript standard. BTW, if we want to be purists, this bug would be 'WONT FIX'. 'javascript:' url-scheme is not allowed in href.
Keywords: intl
Comment 7•21 years ago
|
||
jshin: what's the minimal testcase showing
> It seems wrong to interpret '%C4' as U+00C4 in JS
> string literal. Shouldn't it be considered literal '%C4' (three characters).
?
/be
note that javascript: is supported by many browsers including IE (which doesn't support data:). wontfixing just because the scheme isn't standardized isn't acceptable.
Comment 9•21 years ago
|
||
timeless: yes, javascript: is a de-facto standard we must and will support. But who suggested otherwise? /be
Comment 10•21 years ago
|
||
jshin, tail of comment 6, presumably not particularly seriously.
Comment 11•21 years ago
|
||
Of course, I was not serious. Note that it's qualified by 'if we .... a purist'. Anyway, it's a little bit surprising that you're the first to pick it up and write that 'wontfix' is not acceptable. I thought you'd be more likely to be on the otherside :-p. Also note that this bug was 'almost' fixed thanks to the fix for bug 44272. re: comment #6 brendan, what is the following JS code supposed to produce? document.write('%C6')? Three characters, <U+0025 U+0043 U+0036> or a single character <U+00C6>? That was my question. The testcase presents an interesting problem if the answer to the above is the former because 'JS string literal' is used in a URL. <a href="javascript:alert('%B0%A1')">Alert</a> What should show up inside the alert box if the above is in non-ISO-8859-1/non-ISO-8859-15 page? Currently, the URL-unescaping is done before JS part is handed over to the JS engine. So, the result is dependent on the current character encoding. How about these? <a href="javascript:alert(unescape('%B0%A1'))">Alert</a> Compare it with the following, the result of running which is charset-independent. <script type="text/javascript"> document.write('%B0%A1'); document.write(unescape('%B0%A1')); </script>
Comment 12•21 years ago
|
||
i'm rarely a standards purist. what i care about are things i use. and i use data:, javascript:, view-source, and about: urls heavily.
Comment 13•21 years ago
|
||
> document.write('%C6')?
>
> Three characters, <U+0025 U+0043 U+0036> or a single character <U+00C6>? That
> was my question.
If that document.write is in a .js file, then the answer is obvious: three
characters: '%C6' (or the U+0025 U+0043 U+0036 sequence, if you prefer that
spelling). JS string literals are well-specified by ECMA-262 and there is
nothing about %-escaping in that spec.
If the string literal is embedded in an href= attribute value, then the JS
engine may not see the verbatim source string -- there may be a layer of
interpretation when the attribute is parsed, and one when the href url is loaded
(when the link is clicked). The attribute value interpretation should handle
& and other such entities, but not mess with %, right?
But the link url loading step will unescape (in nsJSProtocolHandler.cpp),
because it expects that the url was escaped (by whom? by the page author?).
Why is that unescaping dependent on the document charset?
/be
Comment 14•21 years ago
|
||
Thanks for bearing with my 'laziness' and confirming that '%XX' is notsubject to any special processing. > he link url loading step will unescape (in nsJSProtocolHandler.cpp), > because it expects that the url was escaped (by whom? by the page author?). This is the conflict between JS's notion of 'escaping' and url-escaping. Why is it that escaped? Because in URL, every non-ASCII character (plus some ASCII chars) has to be url-escaped and some authors apparently escape non-ASCII characters in javascript: url. Unfortunately, url-escaping was not well-defined when it comes to which charset its byte sequence (when unescaped) is in. Most of time, it has to be interpreted as in the document charset. That is, what file is referred to by 'http://www.example.com/%b0%91.png' depends on the document charset although the increasing number of sites (still very few, though) began to support url-escaped UTF-8 URLs. http://lxr.mozilla.org/seamonkey/source/dom/src/jsurl/nsJSProtocolHandler.cpp#195 http://lxr.mozilla.org/seamonkey/source/dom/src/jsurl/nsJSProtocolHandler.cpp#717 http://lxr.mozilla.org/seamonkey/source/dom/src/jsurl/nsJSProtocolHandler.cpp#774 > Why is that unescaping dependent on the document charset? See above. This bug is kinda tech-evangelism bug. '\uHHHH' notation has to be used in javascript: url if necessary.
Summary: javascript href broken when a variable is containing escaped umlaut → javascript href broken when a variable contains percent escaped umlaut
Comment 15•18 years ago
|
||
Stefan, does this now work for you? WFM last item in testcase, FF & SM
Reporter | ||
Comment 16•18 years ago
|
||
Yes, WFM now. Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8.0.7) Gecko/20060910 SeaMonkey/1.0.5
Comment 17•18 years ago
|
||
->WFM then
Status: NEW → RESOLVED
Closed: 18 years ago
Resolution: --- → WORKSFORME
Reporter | ||
Comment 18•17 years ago
|
||
This bug is still present, if the charset is UTF-8. See new test case.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Reporter | ||
Comment 19•17 years ago
|
||
Reporter | ||
Updated•17 years ago
|
Attachment #88625 -
Attachment is obsolete: true
Comment 20•17 years ago
|
||
Do things right! \u00C4 (latin capital A with diaeresis) is Unicode value, in URL it must be encoded in UTF-8, at least as two bytes: { 0xC3, 0x84 }. <a href="javascript:alert('%C3%84')"> - will work as expected.
Comment 21•17 years ago
|
||
Roman is right. In a UTF-8 document <a href="javascript:alert('%C4')"> is just malformed.
Status: REOPENED → RESOLVED
Closed: 18 years ago → 17 years ago
Resolution: --- → WORKSFORME
Reporter | ||
Comment 22•17 years ago
|
||
Why is it malformed? '%C4' is a valid string expression, isn't it? I have no interest in the particular letter it may or may not represent. It is just any string. And if I call alert(), or any other function, with a valid string as its parameter, that function should be executed. That's the problem: I see no alert! By the way: <a href="javascript:alert('%C3%84')"> does not work, also.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Reporter | ||
Comment 23•17 years ago
|
||
Attachment #271523 -
Attachment is obsolete: true
Comment 24•17 years ago
|
||
I'm not going to get into a closing/reopening war here, but '%C4' is *not* a valid string expression in an href in a UTF-8 document. RFC 2396 defines that it has to be interpreted as the octet 0xC4, which is invalid UTF-8 on its own.
Also, there is a typo in the last line of attachment 292585 [details]: you updated the text but not the actual href attribute.
Reporter | ||
Comment 25•17 years ago
|
||
Oops. Sorry. My conclusion was based on that testcase. Indeed, it does not work because of my typo. You're right. So I'm convinced. Resolving as WORKSFORME, again.
Status: REOPENED → RESOLVED
Closed: 17 years ago → 17 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•