Closed
Bug 51355
Opened 24 years ago
Closed 23 years ago
moz escapes javascript: urls containing non-ascii chars
Categories
(Core :: DOM: HTML Parser, defect, P2)
Core
DOM: HTML Parser
Tracking
()
VERIFIED
FIXED
mozilla0.9.2
People
(Reporter: mikko.rantalainen, Assigned: nhottanscp)
References
Details
(Keywords: highrisk, intl, Whiteboard: [PDT+]wait for tree open to check in)
Attachments
(11 files)
220 bytes,
text/html
|
Details | |
229 bytes,
text/html
|
Details | |
1.13 KB,
text/html
|
Details | |
4.68 KB,
patch
|
Details | Diff | Splinter Review | |
2.27 KB,
patch
|
Details | Diff | Splinter Review | |
4.50 KB,
patch
|
Details | Diff | Splinter Review | |
4.55 KB,
patch
|
Details | Diff | Splinter Review | |
4.52 KB,
patch
|
Details | Diff | Splinter Review | |
1.65 KB,
patch
|
Details | Diff | Splinter Review | |
1.64 KB,
patch
|
Details | Diff | Splinter Review | |
1.69 KB,
patch
|
Details | Diff | Splinter Review |
Any "special" character in href causes javascript to fail because browser escapes all "special" characters. Following code works in Netscape Communicator 4.73 but two last fail under mozilla (build ID:2000090421 - quite probably others builds too). <html> <a href='javascript:alert("Hello World!")'>Hello World!</a><br> <a href='javascript:alert("Hello Wörld!")'>Hello Wörld!</a><br> <a href='javascript:alert("Hello Wörld!")'>Hello Wörld!</a><br> </html> As seen in the status bar mozilla translates second and third href as 'javascript:alert(%22Hello%20W%F6rld!%22)' which doesn't work. I'm not sure whether the second href should work or not but IMHO third one should because I have escaped ö character. (Escaping " as %22 doesn't make any difference either). IMO correct solution would be to use string between " or ' characters in whole with no other translation but *unescaping* before scripting.
Comment 1•24 years ago
|
||
Comment 2•24 years ago
|
||
Even NN4.73, if you type this into the URL bar: javascript:alert("Hello Wörld!"); it doesn't work. There seems to be browser code that escapes the characters if they are part of HTML. Therefore I'm sending this bug over to the Parser component for further triage, as it doesn't seem to be a JS Engine issue -
Assignee: rogerl → rickg
Status: UNCONFIRMED → NEW
Component: Javascript Engine → Parser
Ever confirmed: true
QA Contact: pschwartau → janc
marking wont fix, since I don't think we're supporting js attributes.
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → WONTFIX
Comment 5•24 years ago
|
||
javascript: in a href is not the same as JavaScript entities. We're not supporting the latter. The former should and does work. The entities and Unicode characters in the attributes in the examples shown should be correctly preserved.
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Vidur's right, my mistake. Note that the parsing engine is passing the href's along unchanged. I suspect the culprit is either the link handling code or the sink. I'll dig a bit more.
Status: REOPENED → ASSIGNED
I've now confirmed that the javascript attributes are being parsed and stored as written. I now suspect that the anchor handling code is converting these before showing them.
Reassigning to trudele for triage. I think the content model is well formed, and that the problem is in the link handling code.
Thanks to billlaw, I've tracked this down to nsIOService::Escape(). The valid nsString is being converted to a whacky C-String (hence the %20's). Please take a look.
Assignee: trudelle → warren
Comment 10•24 years ago
|
||
Note: this may also be the solution to bug 40469 (-?)
Comment 11•24 years ago
|
||
The problem specifically is the use of Escape here: http://lxr.mozilla.org/seamonkey/source/layout/html/content/src/nsHTMLUtils.cpp# 150 We are converting all questionable characters in the string to their escaped values. Probably what we should do is construct the appropriate kind of url for the protocol in question (here, javascript:), and let the protocol's url parser deal with the escaping (if necessary). For javascript: we'll want to escape the things inside the quotes (I think), but not the entire url string. For "standard URLs" (most of the other protocols) we'll escape according to their rules.
Comment 12•24 years ago
|
||
Not working on any platform: 2000-10-05-09-MN6 : Windows 2000-10-05-13-MN6 : Mac 2000-10-05-09-MN6 : Linux Changing platform/os to all/all Nominating for rtm
Comment 13•24 years ago
|
||
Warren, is there a trivial fix for this? This bug has languished almost a month in the rtm nomination state. If we don't need this for rtm, please mark it rtm-.
Whiteboard: [need info]
Comment 14•24 years ago
|
||
rtm-, no activity. please update the bug if you're actually working on it.
Whiteboard: [need info] → [rtm-]
Comment 15•24 years ago
|
||
Even a classic non breakable space html entity gives same wrong behavior. Try this (attached next) : <html><head><title>non breakable space</title></head><body> <a href='javascript:alert("Hello World!")'>Hello World!</a><br> or<br> <a href="javascript:alert('Hello World!')">Hello World!</a> </body></html>
Comment 16•24 years ago
|
||
Comment 18•23 years ago
|
||
*** Bug 75542 has been marked as a duplicate of this bug. ***
The escapes are part of the URL syntax, so it should be just as valid if the page contained the escaped characters. Isn't the problem really that something needs to unescape them?
Comment 20•23 years ago
|
||
Reassigning to component's default owner. Warren, hope you don't mind.
Assignee: warren → harishd
Comment 21•23 years ago
|
||
The problem is either in the JS engine or in layout. Reassigning bug to waterson. ccing attinasi & rogerl
Assignee: harishd → waterson
Comment 22•23 years ago
|
||
*** Bug 77043 has been marked as a duplicate of this bug. ***
Comment 23•23 years ago
|
||
*** Bug 60576 has been marked as a duplicate of this bug. ***
Comment 24•23 years ago
|
||
Comment 25•23 years ago
|
||
*** Bug 55468 has been marked as a duplicate of this bug. ***
Comment 26•23 years ago
|
||
*** Bug 63314 has been marked as a duplicate of this bug. ***
Comment 27•23 years ago
|
||
*** Bug 66055 has been marked as a duplicate of this bug. ***
Comment 28•23 years ago
|
||
cc jst@netscape.com,ftang@netscape.com since I'm copying their comments from duplicate bugs. (pschwartau@netscape.com and jag@tty.nl are already cc'ed here.) From bug 55468: ------- Additional Comments From Johnny Stenback 2000-11-21 02:01 ------- Mozilla's javascript: URL handling is a mess, we escape the non-ASCII characters in their original charset but we never unescape them - also, the escaped URL has no charset info in it so even if we did unescape the JS URL before executing it there's no way to know what charset the URL is encoded as... oh my From bug 66055: ------- Additional Comments From Frank Tang 2001-01-30 11:32 ------- We probably don't care if this only a issue in alert. But I think the alert here is just use as a simplified test case. The real problem is these JavaScript is inside the HREF. We escape them since other URL need to be escaped but we didn't consider the case for JavaScript. What we should do is find out the code between the layout engine and the javaScript code and unescape it back by using the document charset. I think this bug is in the border line of fixing or not fixing. Let's keep this as future untill we find breakage in the real world case. From bug 63314: ------- Additional Comments From Peter ``jag'' Annema 2000-12-19 12:29 ------- Actually, this should be escaped (it's supposed to be an URI[0]), so I guess it should be unescaped before handing it to the JS engine. I strongly suspect this to be a dupe of an existing bug. [0] http://www.w3.org/TR/html4/struct/links.html#adef-href ------- Additional Comments From Phil Schwartau 2001-04-26 11:06 ------- This WW3 link may be of interest: http://www.w3.org/TR/html4/appendix/notes.html#non-ascii-chars
Comment 29•23 years ago
|
||
cc jst@netscape.com,ftang@netscape.com since I just copied their comments from duplicate bugs.
Comment 30•23 years ago
|
||
My attached testcase doesn't cover: - %hh in javascript: URLs that contain non-ascii characters - %hh in javascript: URLs that do not contain non-ascii characters - unescaping with the correct charset (?) (from bug 55468)
Updated•23 years ago
|
Keywords: mozilla1.0
Comment 31•23 years ago
|
||
*** Bug 69043 has been marked as a duplicate of this bug. ***
Comment 32•23 years ago
|
||
*** Bug 79602 has been marked as a duplicate of this bug. ***
Comment 34•23 years ago
|
||
The problem seems to be around line: 150 in nsHTMLUtils.cpp ( what rickg had pointed out ). http://lxr.mozilla.org/seamonkey/source/content/html/content/src/nsHTMLUtils.cpp #150 Back to waterson :-)
Assignee: harishd → waterson
Assignee | ||
Comment 37•23 years ago
|
||
I tried additional cases. <a href='javascript:alert("Hello W\u00F6rld!")'>Hello Wörld!</a><br> this works. <a href='javascript:alert("Hello W%F6rld!")'>Hello Wörld!</a><br> this works but does not unescape "%F6". Not sure which spec the string should follow, URI, HTML, JS or all? Is there a spec for the case, href=javascript? Only \uxxxx seems to be working now.
Comment 38•23 years ago
|
||
nhotta: try this <a href="Hello Wörld">Hello Wörld</a>
Assignee | ||
Comment 39•23 years ago
|
||
I tried that (it was in the original report) and it did not work. I assume "ö" is decoded to unicode and if that is escaped to "%F6", it still should works even if that is not unescaped (as my second example). I am going to look at the actual value in nsHTMLUtils.cpp. But I would like to know what is really allowed in that string (%xx, \uxxxx, &#xxx;).
Assignee | ||
Comment 40•23 years ago
|
||
I looked at the value, that was "%22Hello%20W%F6rld!%22" as mentioned in the original report. So it has to be unescaped before passed to JavaScript. The problem is when unescape and convert to unicode, you don't know about the charset. There is a proposal of adding a hint_charset field to nsIURI. It also proposes to use UTF-8, so the current conversion and escape code may not be needed, you just need to set the hint_charset and call NS_ConvertUCS2toUTF8. It was posted on 5/11/2001 to n.p.m.netlib, cc to people from that post. Is there a bug for that proposal?
Assignee | ||
Comment 41•23 years ago
|
||
Assignee | ||
Updated•23 years ago
|
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla0.9.2
Assignee | ||
Comment 42•23 years ago
|
||
Filed bug 84032 for adding "hint_charset" to nsIURI.
Assignee | ||
Comment 43•23 years ago
|
||
Asked jst and rpotts for reviews.
Comment 44•23 years ago
|
||
Should charset conversion apply to all new protocols by default? CF bug 81717 (converting \ to / in links should happen for only http,https,ftp rather than everything except javascript).
Comment 45•23 years ago
|
||
I had a look at the attached patch and there's a bunch of issues with this patch: In nsGenericHTMLElement.cpp: nsAutoString href; - href.AssignWithConversion(hrefCStr); + href = NS_ConvertUTF8toUCS2(hrefCStr); If you'd make this: - nsAutoString href; - href.AssignWithConversion(hrefCStr); + NS_ConvertUTF8toUCS2 href(hrefCStr); you'd save a string copy, no? The changes in nsHTMLUtils.cpp seem like major overkill to me, we convert a unicode string into a utf8 cstring (and this conversion is very likely to end up doing an malloc since URL's are often longer than what fits in nsAutoString's internal buffer) and then create a new URI object and parse the whole URI string only to find out if the URI string starts with "javascript:"? Wouldn't simple string manipulation be more efficient (and less code) here? Oh, is the attached patch a cvs diff -w? If not you better fix the indentation in nsHTMLUtils.cpp: + if (bIsJavascript) { + spec = utf8Spec; + } + else { // Otherwise, we'll need to use aDocument to cough up a character // set converter, and re-encode the relative portion of the URL as ... ^ |____ indent! nsInputStreamChannel.cpp, can this get more inefficient? :-) nsString name; - name.AppendWithConversion(urlStr); + name.Append(NS_ConvertUTF8toUCS2(urlStr)); *result = name.ToNewUnicode(); How about: - nsString name; - name.AppendWithConversion(urlStr); - *result = name.ToNewUnicode(); + *result = ToNewUnicode(NS_ConvertUTF8toUCS2(urlStr)); in nsSimpleURI::SetSpec(): nsAutoString spec; - spec.AssignWithConversion(aSpec); + spec = NS_ConvertUTF8toUCS2(aSpec); Any reason to not make 'spec' be of type NS_ConvertUTF8toUCS2 in stead of being of type nsAutoString? nsJSProtocolHandler.cpp: nsAutoString scriptString; - scriptString.AssignWithConversion(script); + scriptString.Assign(NS_ConvertUTF8toUCS2(script)); rv = scriptContext->EvaluateString(scriptString, Why not: - nsAutoString scriptString; - scriptString.AssignWithConversion(script); + NS_ConvertUTF8toUCS2 scriptString(script); rv = scriptContext->EvaluateString(scriptString, Fix those issues and I'll have one more look...
Assignee | ||
Comment 46•23 years ago
|
||
Jesse, I am not clear about your question. Charset conversion does not convert \ to /, that unrelated.
Assignee | ||
Comment 47•23 years ago
|
||
Assignee | ||
Comment 48•23 years ago
|
||
Comment 49•23 years ago
|
||
Much better! One more thing, in nsHTMLUtils.cpp: + PRInt32 pos = aSpec.Find(":"); + nsAutoString scheme; + if ((pos != -1) && + (aSpec.Left(scheme, pos) != -1) && + scheme.EqualsIgnoreCase("javascript")) { + spec.Assign(NS_ConvertUCS2toUTF8(aSpec.GetUnicode())); + } This code will end up doing aSpec.Left() and scheme.EqualsIgnoreCase() for every URL when we could limit the cases to where the ':' is found at the right place, i.e. if ((pos == nsCRT::strlen("javascript")) && ...), or something like that. Oh, and use aSpec.FindChar(':') when looking for a single character. Make those changes and you'll have sr=jst
Assignee | ||
Comment 50•23 years ago
|
||
That part only executed if the URI contains non ASCII characters. Anyway, faster is better, I will update the patch.
Assignee | ||
Comment 51•23 years ago
|
||
Comment 52•23 years ago
|
||
I don't see a reason for the pos != -1 check here, if you're worried about the performance hit from nsCRT::strlen("javascript") in the cases where a ':' isn't found in the URI then you could make the check be simply: if (pos == 10 /* strlen("javascript") */ && ... which would be even faster and even less code, and you could actually optimize this even further by never searching for the ':' character, in stead you could simply check if the 10th character is a ':' (after making sure the string is longer than 10 characters) and then do the check if the data before the ':' was "javascript" or something else, but since this is not performance critical or anything, I'm ok with this as is, except for the "pos != -1" check. sr=jst if you remove that check...
Assignee | ||
Comment 53•23 years ago
|
||
Comment 54•23 years ago
|
||
sr on the nsSimpleURI and nsInputStreamChannel string changes. I wonder if there are other places in necko that we should be doing this?.
Assignee | ||
Comment 55•23 years ago
|
||
Under mozilla/netwerk, there are 59 of ToNewCstring, 43 of AssignWithConversion, 11 of NS_ConvertASCIItoUCS2 Changing them to ToNewUTF8String, NS_ConvertUTF8toUCS2 would not break ASCII (UTF-8 is a superset of ASCII), though I am not sure about performance difference.
Comment 57•23 years ago
|
||
The current patch makes it so javascript: is the only protocol where the URL isn't subject to charset conversion. Is this what we want to do, or do we want to list the protocols that *do* get converted? (I'm bringing this up because in bug 81717, another URL conversion, converting \ to /, is being applied to everything except javascript: and causing data: URLs to break.)
Comment 58•23 years ago
|
||
>I wonder if
there are other places in necko that we should be doing this?.
Let's sperate that into different bug. Let's deal with known issue for now and
unknown issue later.
Whiteboard: r/sr=jst+dougt, need a=
Assignee | ||
Comment 59•23 years ago
|
||
Jesse, I think it's fine to add other protocols if they also want UTF-8 string. If there is a specific example, please file a separate bug. Note that protocol handlers may also need change in order to accept UTF-8 URI.
Assignee | ||
Comment 60•23 years ago
|
||
Testing information: I ran http://www.mozilla.org/quality/precheckin-tests.html which include http and ftp. Tested links in 4.01 spec http://www.w3.org/TR/html4/. Also tried non ASCII cases http://www.fcenter.ru/wn/hn02082000.htm to make sure non javascript case is not affected.
Assignee | ||
Comment 61•23 years ago
|
||
Filed separately as bug 85310 about the remaining ToNewCString in necko.
Comment 62•23 years ago
|
||
pdt+ base on 6/11 pdt meeting.
Updated•23 years ago
|
Whiteboard: r/sr=jst+dougt, need a= → [PDT+]r/sr=jst+dougt, need a=
Comment 63•23 years ago
|
||
I review the 06/08/01 10:02 patch, I feel the change is right but have high risk except the nsHTMLUtils.cpp part. I afraid this may cause some unknown regression since some changs is in the low leverl code. In this late cycle, I suggest we hold this patch and ask naoki to work on a work around changes which only in nsHTMLUtils.cpp. In the case of JavaScript protocol, instead of convert from Unicode to UTF8, we could convert to the javascript escape \u for non ASCII data. In this way, we can gunarantee it only impact JavaScript protocol but not other code. We should definitely consider the rest of the 06/08/01 10:02 patch after we branch off from moz.0.9.3. It is safer to test that kind of change in the early development cycle.
Comment 64•23 years ago
|
||
Adding highrisk and patch keywords based on ftang's comments.
Assignee | ||
Comment 65•23 years ago
|
||
Assignee | ||
Comment 66•23 years ago
|
||
Frank reviewed the patch. His comments, * use nsCRT::IsAscii * check ECMA script spec for \uxxxx format
Assignee | ||
Comment 67•23 years ago
|
||
From the ECMA script spec, http://www.ecma.ch/ecma1/stand/ecma-262.htm > UnicodeEscapeSequence :: See 7.6 > \u HexDigit HexDigit HexDigit HexDigit So it looks like it expects four digits although I didn't find a place which says it had to be four digits. I found that the current mozilla does not accept non four digits like \uF6, it has to be \u00F6.
Assignee | ||
Comment 68•23 years ago
|
||
Comment 69•23 years ago
|
||
r=ftang
Comment 70•23 years ago
|
||
I would've put the \u as part of the format string to PR_snprintf() (and made sure there's room for that too in buf) in stead of first appending \u to the string and then doing the printf and then appending the result of printf to the string, that would've been less code, but both ways work. sr=jst
Comment 71•23 years ago
|
||
+ nsAutoString scheme; + if ((pos == (PRInt32) nsCRT::strlen("javascript")) && + (aSpec.Left(scheme, pos) != -1) && + scheme.EqualsIgnoreCase("javascript")) { FYI, here's a way to consolidate the string constant and avoid the strlen of that constant, without wiring in 10 (although 10 is well-known to be the length of "javascript"!): + static const char kJavaScript[] = "javascript"; + nsAutoString scheme; + if ((pos == (PRInt32)(sizeof kJavaScript - 1)) && + (aSpec.Left(scheme, pos) != -1) && + scheme.EqualsIgnoreCase(kJavaScript")) { I agree with what jst said. Instead of: + char buf[5]; + spec.Truncate(0); + for (const PRUnichar* uch = aSpec.GetUnicode(); *uch; ++uch) { + if (!nsCRT::IsAscii(*uch)) { + spec.Append("\\u"); + PR_snprintf(buf, sizeof(buf), "%.4x", *uch); + spec.Append(buf); Why not + char buf[6+1]; // space for \uXXXX plus a NUL at the end + spec.Truncate(0); + for (const PRUnichar* uch = aSpec.GetUnicode(); *uch; ++uch) { + if (!nsCRT::IsAscii(*uch)) { + PR_snprintf(buf, sizeof(buf), "\\u%.4x", *uch); + spec.Append(buf); /be
Assignee | ||
Comment 72•23 years ago
|
||
Assignee | ||
Updated•23 years ago
|
Whiteboard: [PDT+]r/sr=jst+dougt, need a= → [PDT+]need sr=
Comment 73•23 years ago
|
||
*** Bug 86092 has been marked as a duplicate of this bug. ***
Comment 74•23 years ago
|
||
sr=jst
Assignee | ||
Updated•23 years ago
|
Whiteboard: [PDT+]need sr= → [PDT+]need a=
Comment 75•23 years ago
|
||
a= asa@mozilla.org for checkin to the trunk. (on behalf of drivers)
Updated•23 years ago
|
Whiteboard: [PDT+]need a= → [PDT+]wait for tree open to check in
Assignee | ||
Comment 76•23 years ago
|
||
checked in
Status: ASSIGNED → RESOLVED
Closed: 24 years ago → 23 years ago
Resolution: --- → FIXED
Comment 77•23 years ago
|
||
I checked now. It work! Thanx! from Japan
Comment 78•23 years ago
|
||
verified win2k 2001061905 macos 2001061908 linux 2001061914
Status: RESOLVED → VERIFIED
Comment 79•23 years ago
|
||
*** Bug 88220 has been marked as a duplicate of this bug. ***
You need to log in
before you can comment on or make changes to this bug.
Description
•