Closed
Bug 51355
Opened 24 years ago
Closed 23 years ago
moz escapes javascript: urls containing non-ascii chars
Categories
(Core :: DOM: HTML Parser, defect, P2)
Core
DOM: HTML Parser
Tracking
()
VERIFIED
FIXED
mozilla0.9.2
People
(Reporter: mikko.rantalainen, Assigned: nhottanscp)
References
Details
(Keywords: highrisk, intl, Whiteboard: [PDT+]wait for tree open to check in)
Attachments
(11 files)
220 bytes,
text/html
|
Details | |
229 bytes,
text/html
|
Details | |
1.13 KB,
text/html
|
Details | |
4.68 KB,
patch
|
Details | Diff | Splinter Review | |
2.27 KB,
patch
|
Details | Diff | Splinter Review | |
4.50 KB,
patch
|
Details | Diff | Splinter Review | |
4.55 KB,
patch
|
Details | Diff | Splinter Review | |
4.52 KB,
patch
|
Details | Diff | Splinter Review | |
1.65 KB,
patch
|
Details | Diff | Splinter Review | |
1.64 KB,
patch
|
Details | Diff | Splinter Review | |
1.69 KB,
patch
|
Details | Diff | Splinter Review |
Any "special" character in href causes javascript to fail because browser
escapes all "special" characters. Following code works in Netscape Communicator
4.73 but two last fail under mozilla (build ID:2000090421 - quite probably
others builds too).
<html>
<a href='javascript:alert("Hello World!")'>Hello World!</a><br>
<a href='javascript:alert("Hello Wörld!")'>Hello Wörld!</a><br>
<a href='javascript:alert("Hello Wörld!")'>Hello Wörld!</a><br> </html>
As seen in the status bar mozilla translates second and third href as
'javascript:alert(%22Hello%20W%F6rld!%22)' which doesn't work. I'm not sure
whether the second href should work or not but IMHO third one should because I
have escaped ö character. (Escaping " as %22 doesn't make any difference
either).
IMO correct solution would be to use string between " or ' characters in whole
with no other translation but *unescaping* before scripting.
Comment 1•24 years ago
|
||
Comment 2•24 years ago
|
||
Even NN4.73, if you type this into the URL bar:
javascript:alert("Hello Wörld!");
it doesn't work. There seems to be browser code that escapes the
characters if they are part of HTML. Therefore I'm sending this bug
over to the Parser component for further triage, as it doesn't seem
to be a JS Engine issue -
Assignee: rogerl → rickg
Status: UNCONFIRMED → NEW
Component: Javascript Engine → Parser
Ever confirmed: true
QA Contact: pschwartau → janc
marking wont fix, since I don't think we're supporting js attributes.
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → WONTFIX
Comment 5•24 years ago
|
||
javascript: in a href is not the same as JavaScript entities. We're not
supporting the latter. The former should and does work. The entities and Unicode
characters in the attributes in the examples shown should be correctly
preserved.
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Vidur's right, my mistake. Note that the parsing engine is passing the href's
along unchanged. I suspect the culprit is either the link handling code or the
sink. I'll dig a bit more.
Status: REOPENED → ASSIGNED
I've now confirmed that the javascript attributes are being parsed and stored as
written. I now suspect that the anchor handling code is converting these before
showing them.
Reassigning to trudele for triage. I think the content model is well formed, and
that the problem is in the link handling code.
Thanks to billlaw, I've tracked this down to nsIOService::Escape(). The valid
nsString is being converted to a whacky C-String (hence the %20's). Please take
a look.
Assignee: trudelle → warren
Comment 10•24 years ago
|
||
Note: this may also be the solution to bug 40469 (-?)
Comment 11•24 years ago
|
||
The problem specifically is the use of Escape here:
http://lxr.mozilla.org/seamonkey/source/layout/html/content/src/nsHTMLUtils.cpp#
150
We are converting all questionable characters in the string to their escaped
values. Probably what we should do is construct the appropriate kind of url for
the protocol in question (here, javascript:), and let the protocol's url parser
deal with the escaping (if necessary). For javascript: we'll want to escape the
things inside the quotes (I think), but not the entire url string. For "standard
URLs" (most of the other protocols) we'll escape according to their rules.
Comment 12•24 years ago
|
||
Not working on any platform:
2000-10-05-09-MN6 : Windows
2000-10-05-13-MN6 : Mac
2000-10-05-09-MN6 : Linux
Changing platform/os to all/all
Nominating for rtm
Comment 13•24 years ago
|
||
Warren, is there a trivial fix for this? This bug has languished almost a month
in the rtm nomination state. If we don't need this for rtm, please mark it rtm-.
Whiteboard: [need info]
Comment 14•24 years ago
|
||
rtm-, no activity. please update the bug if you're actually working on it.
Whiteboard: [need info] → [rtm-]
Comment 15•24 years ago
|
||
Even a classic non breakable space html entity gives same wrong behavior.
Try this (attached next) :
<html><head><title>non breakable space</title></head><body>
<a href='javascript:alert("Hello World!")'>Hello World!</a><br>
or<br>
<a href="javascript:alert('Hello World!')">Hello World!</a>
</body></html>
Comment 16•24 years ago
|
||
Comment 18•24 years ago
|
||
*** Bug 75542 has been marked as a duplicate of this bug. ***
The escapes are part of the URL syntax, so it should be just as valid if the
page contained the escaped characters. Isn't the problem really that
something needs to unescape them?
Comment 20•24 years ago
|
||
Reassigning to component's default owner. Warren, hope you don't mind.
Assignee: warren → harishd
Comment 21•24 years ago
|
||
The problem is either in the JS engine or in layout.
Reassigning bug to waterson. ccing attinasi & rogerl
Assignee: harishd → waterson
Comment 22•24 years ago
|
||
*** Bug 77043 has been marked as a duplicate of this bug. ***
Comment 23•24 years ago
|
||
*** Bug 60576 has been marked as a duplicate of this bug. ***
Comment 24•24 years ago
|
||
Comment 25•24 years ago
|
||
*** Bug 55468 has been marked as a duplicate of this bug. ***
Comment 26•24 years ago
|
||
*** Bug 63314 has been marked as a duplicate of this bug. ***
Comment 27•24 years ago
|
||
*** Bug 66055 has been marked as a duplicate of this bug. ***
Comment 28•24 years ago
|
||
cc jst@netscape.com,ftang@netscape.com since I'm copying their comments from
duplicate bugs. (pschwartau@netscape.com and jag@tty.nl are already cc'ed
here.)
From bug 55468:
------- Additional Comments From Johnny Stenback 2000-11-21 02:01 -------
Mozilla's javascript: URL handling is a mess, we escape the non-ASCII characters
in their original charset but we never unescape them - also, the escaped URL has
no charset info in it so even if we did unescape the JS URL before executing it
there's no way to know what charset the URL is encoded as... oh my
From bug 66055:
------- Additional Comments From Frank Tang 2001-01-30 11:32 -------
We probably don't care if this only a issue in alert. But I think the alert here
is just use as a simplified test case. The real problem is these JavaScript is
inside the HREF. We escape them since other URL need to be escaped but we didn't
consider the case for JavaScript. What we should do is find out the code between
the layout engine and the javaScript code and unescape it back by using the
document charset.
I think this bug is in the border line of fixing or not fixing. Let's keep this
as future untill we find breakage in the real world case.
From bug 63314:
------- Additional Comments From Peter ``jag'' Annema 2000-12-19 12:29 -------
Actually, this should be escaped (it's supposed to be an URI[0]), so I guess it
should be unescaped before handing it to the JS engine. I strongly suspect this
to be a dupe of an existing bug.
[0] http://www.w3.org/TR/html4/struct/links.html#adef-href
------- Additional Comments From Phil Schwartau 2001-04-26 11:06 -------
This WW3 link may be of interest:
http://www.w3.org/TR/html4/appendix/notes.html#non-ascii-chars
Comment 29•24 years ago
|
||
cc jst@netscape.com,ftang@netscape.com since I just copied their comments from
duplicate bugs.
Comment 30•24 years ago
|
||
My attached testcase doesn't cover:
- %hh in javascript: URLs that contain non-ascii characters
- %hh in javascript: URLs that do not contain non-ascii characters
- unescaping with the correct charset (?) (from bug 55468)
Updated•24 years ago
|
Keywords: mozilla1.0
Comment 31•24 years ago
|
||
*** Bug 69043 has been marked as a duplicate of this bug. ***
Comment 32•24 years ago
|
||
*** Bug 79602 has been marked as a duplicate of this bug. ***
Comment 34•23 years ago
|
||
The problem seems to be around line: 150 in nsHTMLUtils.cpp ( what rickg had
pointed out ).
http://lxr.mozilla.org/seamonkey/source/content/html/content/src/nsHTMLUtils.cpp
#150
Back to waterson :-)
Assignee: harishd → waterson
Assignee | ||
Comment 37•23 years ago
|
||
I tried additional cases.
<a href='javascript:alert("Hello W\u00F6rld!")'>Hello Wörld!</a><br>
this works.
<a href='javascript:alert("Hello W%F6rld!")'>Hello Wörld!</a><br>
this works but does not unescape "%F6".
Not sure which spec the string should follow, URI, HTML, JS or all?
Is there a spec for the case, href=javascript?
Only \uxxxx seems to be working now.
Comment 38•23 years ago
|
||
nhotta: try this
<a href="Hello Wörld">Hello Wörld</a>
Assignee | ||
Comment 39•23 years ago
|
||
I tried that (it was in the original report) and it did not work.
I assume "ö" is decoded to unicode and if that is escaped to "%F6", it
still should works even if that is not unescaped (as my second example).
I am going to look at the actual value in nsHTMLUtils.cpp.
But I would like to know what is really allowed in that string (%xx, \uxxxx,
&#xxx;).
Assignee | ||
Comment 40•23 years ago
|
||
I looked at the value, that was "%22Hello%20W%F6rld!%22" as mentioned in the
original report.
So it has to be unescaped before passed to JavaScript. The problem is when
unescape and convert to unicode, you don't know about the charset.
There is a proposal of adding a hint_charset field to nsIURI. It also proposes
to use UTF-8, so the current conversion and escape code may not be needed, you
just need to set the hint_charset and call NS_ConvertUCS2toUTF8.
It was posted on 5/11/2001 to n.p.m.netlib, cc to people from that post. Is
there a bug for that proposal?
Assignee | ||
Comment 41•23 years ago
|
||
Assignee | ||
Updated•23 years ago
|
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla0.9.2
Assignee | ||
Comment 42•23 years ago
|
||
Filed bug 84032 for adding "hint_charset" to nsIURI.
Assignee | ||
Comment 43•23 years ago
|
||
Asked jst and rpotts for reviews.
Comment 44•23 years ago
|
||
Should charset conversion apply to all new protocols by default? CF bug 81717
(converting \ to / in links should happen for only http,https,ftp rather than
everything except javascript).
Comment 45•23 years ago
|
||
I had a look at the attached patch and there's a bunch of issues with this patch:
In nsGenericHTMLElement.cpp:
nsAutoString href;
- href.AssignWithConversion(hrefCStr);
+ href = NS_ConvertUTF8toUCS2(hrefCStr);
If you'd make this:
- nsAutoString href;
- href.AssignWithConversion(hrefCStr);
+ NS_ConvertUTF8toUCS2 href(hrefCStr);
you'd save a string copy, no?
The changes in nsHTMLUtils.cpp seem like major overkill to me, we convert a
unicode string into a utf8 cstring (and this conversion is very likely to end up
doing an malloc since URL's are often longer than what fits in nsAutoString's
internal buffer) and then create a new URI object and parse the whole URI string
only to find out if the URI string starts with "javascript:"? Wouldn't simple
string manipulation be more efficient (and less code) here?
Oh, is the attached patch a cvs diff -w? If not you better fix the indentation
in nsHTMLUtils.cpp:
+ if (bIsJavascript) {
+ spec = utf8Spec;
+ }
+ else {
// Otherwise, we'll need to use aDocument to cough up a character
// set converter, and re-encode the relative portion of the URL as
...
^
|____ indent!
nsInputStreamChannel.cpp, can this get more inefficient? :-)
nsString name;
- name.AppendWithConversion(urlStr);
+ name.Append(NS_ConvertUTF8toUCS2(urlStr));
*result = name.ToNewUnicode();
How about:
- nsString name;
- name.AppendWithConversion(urlStr);
- *result = name.ToNewUnicode();
+ *result = ToNewUnicode(NS_ConvertUTF8toUCS2(urlStr));
in nsSimpleURI::SetSpec():
nsAutoString spec;
- spec.AssignWithConversion(aSpec);
+ spec = NS_ConvertUTF8toUCS2(aSpec);
Any reason to not make 'spec' be of type NS_ConvertUTF8toUCS2 in stead of being
of type nsAutoString?
nsJSProtocolHandler.cpp:
nsAutoString scriptString;
- scriptString.AssignWithConversion(script);
+ scriptString.Assign(NS_ConvertUTF8toUCS2(script));
rv = scriptContext->EvaluateString(scriptString,
Why not:
- nsAutoString scriptString;
- scriptString.AssignWithConversion(script);
+ NS_ConvertUTF8toUCS2 scriptString(script);
rv = scriptContext->EvaluateString(scriptString,
Fix those issues and I'll have one more look...
Assignee | ||
Comment 46•23 years ago
|
||
Jesse, I am not clear about your question. Charset conversion does not convert \
to /, that unrelated.
Assignee | ||
Comment 47•23 years ago
|
||
Assignee | ||
Comment 48•23 years ago
|
||
Comment 49•23 years ago
|
||
Much better! One more thing, in nsHTMLUtils.cpp:
+ PRInt32 pos = aSpec.Find(":");
+ nsAutoString scheme;
+ if ((pos != -1) &&
+ (aSpec.Left(scheme, pos) != -1) &&
+ scheme.EqualsIgnoreCase("javascript")) {
+ spec.Assign(NS_ConvertUCS2toUTF8(aSpec.GetUnicode()));
+ }
This code will end up doing aSpec.Left() and scheme.EqualsIgnoreCase() for every
URL when we could limit the cases to where the ':' is found at the right place,
i.e. if ((pos == nsCRT::strlen("javascript")) && ...), or something like that.
Oh, and use aSpec.FindChar(':') when looking for a single character.
Make those changes and you'll have sr=jst
Assignee | ||
Comment 50•23 years ago
|
||
That part only executed if the URI contains non ASCII characters. Anyway, faster
is better, I will update the patch.
Assignee | ||
Comment 51•23 years ago
|
||
Comment 52•23 years ago
|
||
I don't see a reason for the pos != -1 check here, if you're worried about the
performance hit from nsCRT::strlen("javascript") in the cases where a ':' isn't
found in the URI then you could make the check be simply:
if (pos == 10 /* strlen("javascript") */ &&
...
which would be even faster and even less code, and you could actually optimize
this even further by never searching for the ':' character, in stead you could
simply check if the 10th character is a ':' (after making sure the string is
longer than 10 characters) and then do the check if the data before the ':' was
"javascript" or something else, but since this is not performance critical or
anything, I'm ok with this as is, except for the "pos != -1" check. sr=jst if
you remove that check...
Assignee | ||
Comment 53•23 years ago
|
||
Comment 54•23 years ago
|
||
sr on the nsSimpleURI and nsInputStreamChannel string changes. I wonder if
there are other places in necko that we should be doing this?.
Assignee | ||
Comment 55•23 years ago
|
||
Under mozilla/netwerk, there are
59 of ToNewCstring,
43 of AssignWithConversion,
11 of NS_ConvertASCIItoUCS2
Changing them to ToNewUTF8String, NS_ConvertUTF8toUCS2 would not break ASCII
(UTF-8 is a superset of ASCII), though I am not sure about performance difference.
Comment 57•23 years ago
|
||
The current patch makes it so javascript: is the only protocol where the URL
isn't subject to charset conversion. Is this what we want to do, or do we want
to list the protocols that *do* get converted? (I'm bringing this up because in
bug 81717, another URL conversion, converting \ to /, is being applied to
everything except javascript: and causing data: URLs to break.)
Comment 58•23 years ago
|
||
>I wonder if
there are other places in necko that we should be doing this?.
Let's sperate that into different bug. Let's deal with known issue for now and
unknown issue later.
Whiteboard: r/sr=jst+dougt, need a=
Assignee | ||
Comment 59•23 years ago
|
||
Jesse, I think it's fine to add other protocols if they also want UTF-8 string.
If there is a specific example, please file a separate bug. Note that protocol
handlers may also need change in order to accept UTF-8 URI.
Assignee | ||
Comment 60•23 years ago
|
||
Testing information:
I ran http://www.mozilla.org/quality/precheckin-tests.html which include http
and ftp. Tested links in 4.01 spec http://www.w3.org/TR/html4/.
Also tried non ASCII cases http://www.fcenter.ru/wn/hn02082000.htm to make sure
non javascript case is not affected.
Assignee | ||
Comment 61•23 years ago
|
||
Filed separately as bug 85310 about the remaining ToNewCString in necko.
Comment 62•23 years ago
|
||
pdt+ base on 6/11 pdt meeting.
Updated•23 years ago
|
Whiteboard: r/sr=jst+dougt, need a= → [PDT+]r/sr=jst+dougt, need a=
Comment 63•23 years ago
|
||
I review the 06/08/01 10:02 patch, I feel the change is right but have high risk
except the nsHTMLUtils.cpp part. I afraid this may cause some unknown regression
since some changs is in the low leverl code.
In this late cycle, I suggest we hold this patch and ask naoki to work on a work
around changes which only in nsHTMLUtils.cpp. In the case of JavaScript protocol,
instead of convert from Unicode to UTF8, we could convert to the javascript
escape \u for non ASCII data. In this way, we can gunarantee it only impact
JavaScript protocol but not other code.
We should definitely consider the rest of the 06/08/01 10:02 patch after we
branch off from moz.0.9.3. It is safer to test that kind of change in the early
development cycle.
Comment 64•23 years ago
|
||
Adding highrisk and patch keywords based on ftang's comments.
Assignee | ||
Comment 65•23 years ago
|
||
Assignee | ||
Comment 66•23 years ago
|
||
Frank reviewed the patch. His comments,
* use nsCRT::IsAscii
* check ECMA script spec for \uxxxx format
Assignee | ||
Comment 67•23 years ago
|
||
From the ECMA script spec,
http://www.ecma.ch/ecma1/stand/ecma-262.htm
> UnicodeEscapeSequence :: See 7.6
> \u HexDigit HexDigit HexDigit HexDigit
So it looks like it expects four digits although I didn't find a place which
says it had to be four digits.
I found that the current mozilla does not accept non four digits like \uF6, it
has to be \u00F6.
Assignee | ||
Comment 68•23 years ago
|
||
Comment 69•23 years ago
|
||
r=ftang
Comment 70•23 years ago
|
||
I would've put the \u as part of the format string to PR_snprintf() (and made
sure there's room for that too in buf) in stead of first appending \u to the
string and then doing the printf and then appending the result of printf to the
string, that would've been less code, but both ways work.
sr=jst
Comment 71•23 years ago
|
||
+ nsAutoString scheme;
+ if ((pos == (PRInt32) nsCRT::strlen("javascript")) &&
+ (aSpec.Left(scheme, pos) != -1) &&
+ scheme.EqualsIgnoreCase("javascript")) {
FYI, here's a way to consolidate the string constant and avoid the strlen of
that constant, without wiring in 10 (although 10 is well-known to be the length
of "javascript"!):
+ static const char kJavaScript[] = "javascript";
+ nsAutoString scheme;
+ if ((pos == (PRInt32)(sizeof kJavaScript - 1)) &&
+ (aSpec.Left(scheme, pos) != -1) &&
+ scheme.EqualsIgnoreCase(kJavaScript")) {
I agree with what jst said. Instead of:
+ char buf[5];
+ spec.Truncate(0);
+ for (const PRUnichar* uch = aSpec.GetUnicode(); *uch; ++uch) {
+ if (!nsCRT::IsAscii(*uch)) {
+ spec.Append("\\u");
+ PR_snprintf(buf, sizeof(buf), "%.4x", *uch);
+ spec.Append(buf);
Why not
+ char buf[6+1]; // space for \uXXXX plus a NUL at the end
+ spec.Truncate(0);
+ for (const PRUnichar* uch = aSpec.GetUnicode(); *uch; ++uch) {
+ if (!nsCRT::IsAscii(*uch)) {
+ PR_snprintf(buf, sizeof(buf), "\\u%.4x", *uch);
+ spec.Append(buf);
/be
Assignee | ||
Comment 72•23 years ago
|
||
Assignee | ||
Updated•23 years ago
|
Whiteboard: [PDT+]r/sr=jst+dougt, need a= → [PDT+]need sr=
Comment 73•23 years ago
|
||
*** Bug 86092 has been marked as a duplicate of this bug. ***
Comment 74•23 years ago
|
||
sr=jst
Assignee | ||
Updated•23 years ago
|
Whiteboard: [PDT+]need sr= → [PDT+]need a=
Comment 75•23 years ago
|
||
a= asa@mozilla.org for checkin to the trunk.
(on behalf of drivers)
Updated•23 years ago
|
Whiteboard: [PDT+]need a= → [PDT+]wait for tree open to check in
Assignee | ||
Comment 76•23 years ago
|
||
checked in
Status: ASSIGNED → RESOLVED
Closed: 24 years ago → 23 years ago
Resolution: --- → FIXED
Comment 77•23 years ago
|
||
I checked now. It work! Thanx! from Japan
Comment 78•23 years ago
|
||
verified win2k 2001061905
macos 2001061908
linux 2001061914
Status: RESOLVED → VERIFIED
Comment 79•23 years ago
|
||
*** Bug 88220 has been marked as a duplicate of this bug. ***
You need to log in
before you can comment on or make changes to this bug.
Description
•