153325 - javascript href broken when a variable contains percent escaped umlaut

Reporter

Description

•

22 years ago

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.1a) Gecko/20020610
BuildID:    2002061108

javascript href is broken when a function's variable is containing an escaped
umlaut (i.e. %c4)

Reproducible: Always
Steps to Reproduce:
1. use anything like <a href="javascript:alert('%c4')">js link</a>
2. click the link

Actual Results:  nothing happens

Expected Results:  alert box opens (with either %c4, or Ä, or whatever displayed
on it)

Stefan A. Möller

Reporter

Comment 1

•

22 years ago

Attached file testcase (obsolete) — Details

Jesse Ruderman

Comment 2

•

22 years ago

Confirmed on 2002 061908 Win2k.  Sending to parser since that's where the
similar bug 51355 was.

Assignee: rogerl → harishd

Component: JavaScript Engine → Parser

OS: Linux → All

QA Contact: pschwartau → moied

Hardware: PC → All

Phil Schwartau

Updated

•

22 years ago

Status: UNCONFIRMED → NEW

Ever confirmed: true

harishd

Comment 3

•

22 years ago

Parser does not process attribute values.

I get an assertion ( "not a UTF8 string" )when I click on the problem link.

ConvertUTF8toUCS2::write(const char * 0x0012f5f8, unsigned int 10) line 660 + 20
bytes
nsCharSinkTraits<ConvertUTF8toUCS2>::write(ConvertUTF8toUCS2 & {...}, const char
* 0x0012f5f8, unsigned int 10) line 571
copy_string(nsReadingIterator<char> & {"????????????????"}, const
nsReadingIterator<char> & {"???????????"}, ConvertUTF8toUCS2 & {...}) line
90 + 39 bytes
NS_ConvertUTF8toUCS2::Init(const nsACString & {...}) line 1350 + 35 bytes
NS_ConvertUTF8toUCS2::NS_ConvertUTF8toUCS2(const nsACString & {...}) line 558
nsJSThunk::EvaluateScript() line 284
nsJSChannel::AsyncOpen(nsJSChannel * const 0x03b6e458, nsIStreamListener *
0x03990b98, nsISupports * 0x00000000) line 619 + 11 bytes
nsDocumentOpenInfo::Open(nsIChannel * 0x03b6e458, int 1, nsISupports *
0x03a21270) line 170 + 18 bytes
nsURILoader::OpenURIVia(nsURILoader * const 0x01a4c208, nsIChannel * 0x03b6e458,
int 1, nsISupports * 0x03a21270, unsigned int 0) line 538 + 20 bytes
nsURILoader::OpenURI(nsURILoader * const 0x01a4c208, nsIChannel * 0x03b6e458,
int 1, nsISupports * 0x03a21270) line 500
nsDocShell::DoChannelLoad(nsIChannel * 0x03b6e458, nsIURILoader * 0x01a4c208)
line 5184 + 39 bytes
nsDocShell::DoURILoad(nsIURI * 0x03cb17c8, nsIURI * 0x03c38e48, nsISupports *
0x031c5ce8, nsIInputStream * 0x00000000, nsIInputStream * 0x00000000, int 1,
nsIDocShell * * 0x00000000, nsIRequest * * 0x00000000) line 4959 + 38 bytes
nsDocShell::InternalLoad(nsDocShell * const 0x03a21270, nsIURI * 0x03cb17c8,
nsIURI * 0x03c38e48, nsISupports * 0x00000000, int 1, const unsigned short *
0x0012fc0c, nsIInputStream * 0x00000000, nsIInputStream * 0x00000000, unsigned
int 2097153, nsISHEntry * 0x00000000, int 1, nsIDocShell * * 0x00000000,
nsIRequest * * 0x00000000) line 4752 + 51 bytes
nsWebShell::OnLinkClickSync(nsWebShell * const 0x03a213b4, nsIContent *
0x034ce0a0, nsLinkVerb eLinkVerb_Replace, const unsigned short * 0x03d1b9a8,
const unsigned short * 0x100e5b00 gCommonEmptyBuffer, nsIInputStream *
0x00000000, nsIInputStream * 0x00000000, nsIDocShell * * 0x00000000, nsIRequest
* * 0x00000000) line 619 + 91 bytes
OnLinkClickEvent::HandleEvent() line 462
HandlePLEvent(OnLinkClickEvent * 0x03c8ea80) line 476
PL_HandleEvent(PLEvent * 0x03c8ea80) line 596 + 10 bytes
PL_ProcessPendingEvents(PLEventQueue * 0x01497b18) line 526 + 9 bytes
_md_EventReceiverProc(HWND__ * 0x000702f8, unsigned int 49272, unsigned int 0,
long 21592856) line 1077 + 9 bytes

Assignee: harishd → rogerl

Component: Parser → JavaScript Engine

QA Contact: moied → pschwartau

Phil Schwartau

Comment 4

•

22 years ago

Stefan: nice testcase! Reassigning to DOM Level 0.

Stefan's testcase shows that certain %XX sequences work, but 
that others don't. That reminds me of bug 144429, 
"URL encoding in window.open regressed, URL parsing problem?"


------- Additional Comment_ #11 From Henrik Rundqvist 2002-05-16 05:11 ------- 
Another interesting thing about all this is why "%7E" (tilda)
slips through but "%E4" (a Swedish character) does not?

Study the attached testcase and you'll see what I mean.
Response if you click on "window.open":

  "The requested URL /~Gle was not found on this server."

Notice that the tilda is there. So only some %XX codes gets affected?


------- Additional Comment_ #12 From Johnny Stenback 2002-05-16 09:18 ------- 
The reason for tilde going through but Scandinavian chanracters not going 
through is that we assume that the string is a UTF8 string once we've
unescaped it. We try to convert the UTF8 string to a unicode string
and that can't be done for non-UTF8 encoded non-ASCII characters such as %E4. 
It's a bug in nsJSProtocolHandler.cpp...

Assignee: rogerl → jst

Component: JavaScript Engine → DOM Level 0

QA Contact: pschwartau → desale

Johnny Stenback (:jst)

Comment 5

•

21 years ago

Mass-reassigning bugs to dom_bugs@netscape.com

Assignee: jst → dom_bugs

Jungshik Shin

Comment 6

•

21 years ago

The fix for bug 44272 made the testcase #4 gives me A-umlaut in the alert box if
character encoding is  ISO-8859-1/15. But, there's something more in this bug so
that I'm keeping this open. It seems wrong to interpret '%C4' as U+00C4 in JS
string literal. Shouldn't it be considered literal '%C4' (three characters). I
have to look up ECMAscript standard. 

BTW, if we want to be purists, this bug would be 'WONT FIX'. 'javascript:'
url-scheme is not allowed in href.

Keywords: intl

Brendan Eich [:brendan]

Comment 7

•

21 years ago

jshin: what's the minimal testcase showing

> It seems wrong to interpret '%C4' as U+00C4 in JS
> string literal. Shouldn't it be considered literal '%C4' (three characters).

?

/be

timeless

Comment 8

•

21 years ago

note that javascript: is supported by many browsers including IE (which doesn't
support data:). wontfixing just because the scheme isn't standardized isn't
acceptable.

Brendan Eich [:brendan]

Comment 9

•

21 years ago

timeless: yes, javascript: is a de-facto standard we must and will support.  But
who suggested otherwise?

/be

timeless

Comment 10

•

21 years ago

jshin, tail of comment 6, presumably not particularly seriously.

Jungshik Shin

Comment 11

•

21 years ago

Of course, I was not serious. Note that it's qualified by 'if we .... a purist'.
Anyway, it's a little bit surprising that you're the first to pick it up and
write that 'wontfix' is not acceptable. I thought you'd be more likely to be on
the otherside :-p. Also note that this bug was 'almost' fixed thanks to the fix
for bug 44272.

re: comment #6

brendan, what is the following JS code supposed to produce?

document.write('%C6')? 

Three characters, <U+0025 U+0043 U+0036> or a single character <U+00C6>? That
was my question. 

The testcase presents an interesting problem if the answer to the above is the
former because 'JS string literal' is used in a URL.

<a href="javascript:alert('%B0%A1')">Alert</a> 

What should show up inside the alert box if the above is in
non-ISO-8859-1/non-ISO-8859-15 page? Currently, the URL-unescaping is done
before JS part  is handed over to the JS engine. So, the result is dependent on
the current character encoding. How about these?

<a href="javascript:alert(unescape('%B0%A1'))">Alert</a>

Compare it with the following, the result of running which is charset-independent.

<script type="text/javascript">
document.write('%B0%A1');
document.write(unescape('%B0%A1'));
</script>

timeless

Comment 12

•

21 years ago

i'm rarely a standards purist. what i care about are things i use. and i use
data:, javascript:, view-source, and about: urls heavily.

Brendan Eich [:brendan]

Comment 13

•

21 years ago

> document.write('%C6')? 
> 
> Three characters, <U+0025 U+0043 U+0036> or a single character <U+00C6>? That
> was my question. 

If that document.write is in a .js file, then the answer is obvious: three
characters: '%C6' (or the U+0025 U+0043 U+0036 sequence, if you prefer that
spelling).  JS string literals are well-specified by ECMA-262 and there is
nothing about %-escaping in that spec.

If the string literal is embedded in an href= attribute value, then the JS
engine may not see the verbatim source string -- there may be a layer of
interpretation when the attribute is parsed, and one when the href url is loaded
(when the link is clicked).  The attribute value interpretation should handle
&amp; and other such entities, but not mess with %, right?

But the link url loading step will unescape (in nsJSProtocolHandler.cpp),
because it expects that the url was escaped (by whom?  by the page author?). 
Why is that unescaping dependent on the document charset?

/be

Jungshik Shin

Comment 14

•

21 years ago

Thanks for bearing with my 'laziness' and confirming that '%XX' is notsubject to
any special processing. 

> he link url loading step will unescape (in nsJSProtocolHandler.cpp),
> because it expects that the url was escaped (by whom?  by the page author?). 

 This is the conflict between JS's notion of 'escaping' and url-escaping.
Why is it that escaped? Because in URL, every non-ASCII character (plus some
ASCII chars) has to be url-escaped and some authors apparently escape non-ASCII
characters in javascript: url.   Unfortunately, url-escaping was not
well-defined when it comes to which charset its byte sequence (when unescaped)
is in. Most of time, it has to be interpreted as in the document charset. That
is, what file is referred to by 'http://www.example.com/%b0%91.png' depends on
the document charset although the increasing number of sites (still very few,
though) began to support url-escaped UTF-8 URLs.
 
http://lxr.mozilla.org/seamonkey/source/dom/src/jsurl/nsJSProtocolHandler.cpp#195
http://lxr.mozilla.org/seamonkey/source/dom/src/jsurl/nsJSProtocolHandler.cpp#717
http://lxr.mozilla.org/seamonkey/source/dom/src/jsurl/nsJSProtocolHandler.cpp#774

> Why is that unescaping dependent on the document charset?

 See above. This bug is kinda tech-evangelism bug.  '\uHHHH' notation has to be
used in javascript: url if necessary.

timeless

Updated

•

21 years ago

Summary: javascript href broken when a variable is containing escaped umlaut → javascript href broken when a variable contains percent escaped umlaut

Wayne Mery (:wsmwk)

Comment 15

•

18 years ago

Stefan, does this now work for you?
WFM last item in testcase, FF & SM

Stefan A. Möller

Reporter

Comment 16

•

18 years ago

Yes, WFM now.
Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8.0.7) Gecko/20060910 SeaMonkey/1.0.5

Wayne Mery (:wsmwk)

Comment 17

•

18 years ago

->WFM then

Status: NEW → RESOLVED

Closed: 18 years ago

Resolution: --- → WORKSFORME

Stefan A. Möller

Reporter

Comment 18

•

17 years ago

This bug is still present, if the charset is UTF-8. See new test case.

Status: RESOLVED → REOPENED

Resolution: WORKSFORME → ---

Stefan A. Möller

Reporter

Comment 19

•

17 years ago

Attached file UTF-8 test case (obsolete) — Details

Stefan A. Möller

Reporter

Updated

•

17 years ago

Attachment #88625 - Attachment is obsolete: true

Roman Dawydkin

Comment 20

•

17 years ago

Do things right!
\u00C4 (latin capital A with diaeresis) is Unicode value, in URL it must be encoded in UTF-8, at least as two bytes: { 0xC3, 0x84 }.
<a href="javascript:alert('%C3%84')"> - will work as expected.

Simon Montagu :smontagu

Comment 21

•

17 years ago

Roman is right. In a UTF-8 document <a href="javascript:alert('%C4')"> is just malformed.

Status: REOPENED → RESOLVED

Closed: 18 years ago → 17 years ago

Resolution: --- → WORKSFORME

Stefan A. Möller

Reporter

Comment 22

•

17 years ago

Why is it malformed? '%C4' is a valid string expression, isn't it? I have no interest in the particular letter it may or may not represent. It is just any string. And if I call alert(), or any other function, with a valid string as its parameter, that function should be executed. That's the problem: I see no alert!

By the way: <a href="javascript:alert('%C3%84')"> does not work, also.

Status: RESOLVED → REOPENED

Resolution: WORKSFORME → ---

Stefan A. Möller

Reporter

Comment 23

•

17 years ago

Attached file For the sake of completeness: Test case supplemented by adding '%C3%84' — Details

Attachment #271523 - Attachment is obsolete: true

Simon Montagu :smontagu

Comment 24

•

17 years ago

I'm not going to get into a closing/reopening war here, but '%C4' is *not* a valid string expression in an href in a UTF-8 document. RFC 2396 defines that it has to be interpreted as the octet 0xC4, which is invalid UTF-8 on its own.

Also, there is a typo in the last line of attachment 292585 [details]: you updated the text but not the actual href attribute.

Stefan A. Möller

Reporter

Comment 25

•

17 years ago

Oops. Sorry. My conclusion was based on that testcase. Indeed, it does not work because of my typo. You're right. So I'm convinced. Resolving as WORKSFORME, again.

Status: REOPENED → RESOLVED

Closed: 17 years ago → 17 years ago

Resolution: --- → WORKSFORME

testcase 22 years ago Stefan A. Möller 614 bytes, text/html		Details
UTF-8 test case 17 years ago Stefan A. Möller 664 bytes, text/html		Details
For the sake of completeness: Test case supplemented by adding '%C3%84' 17 years ago Stefan A. Möller 752 bytes, text/html		Details