Closed Bug 51355 Opened 24 years ago Closed 23 years ago

moz escapes javascript: urls containing non-ascii chars

Tracking

()

Status:

VERIFIED FIXED

Milestone:

mozilla0.9.2

People

(Reporter: mikko.rantalainen, Assigned: nhottanscp)

References

Details

(Keywords: highrisk, intl, Whiteboard: [PDT+]wait for tree open to check in)

Attachments

(11 files)

Attaching the reporter's HTML test case here - 24 years ago Phil Schwartau 220 bytes, text/html		Details
nbsp html entity in url 24 years ago Martin Jacobs 229 bytes, text/html		Details
extended testcase 24 years ago Jesse Ruderman 1.13 KB, text/html		Details
Patch, use UTF-8 if the URI shcheme is 'javascript', changed files where ASCII is assumened for URI to UTF-8. 23 years ago nhottanscp 4.68 KB, patch		Details \| Diff \| Splinter Review
Patch, updated with jst's comment (diff -uw). 23 years ago nhottanscp 2.27 KB, patch		Details \| Diff \| Splinter Review
Patch, updated with jst's comment (diff -uw), ignore the last patch, I put a wrong one. 23 years ago nhottanscp 4.50 KB, patch		Details \| Diff \| Splinter Review
Patch, updated with the 2nd jst's comment (diff -uw). 23 years ago nhottanscp 4.55 KB, patch		Details \| Diff \| Splinter Review
Patch, removed "pos != -1" check. 23 years ago nhottanscp 4.52 KB, patch		Details \| Diff \| Splinter Review
Patch, localized change to only modify nsHTMLUtils.cpp. 23 years ago nhottanscp 1.65 KB, patch		Details \| Diff \| Splinter Review
Patch, updated with ftang's suggestion. 23 years ago nhottanscp 1.64 KB, patch		Details \| Diff \| Splinter Review
Patch, updated with the suggestions (diff -uw) 23 years ago nhottanscp 1.69 KB, patch		Details \| Diff \| Splinter Review

Mikko Rantalainen

Reporter

Description

•

24 years ago

Any "special" character in href causes javascript to fail because browser escapes all "special" characters. Following code works in Netscape Communicator 4.73 but two last fail under mozilla (build ID:2000090421 - quite probably others builds too). <html> <a href='javascript:alert("Hello World!")'>Hello World!</a> <a href='javascript:alert("Hello Wörld!")'>Hello Wörld!</a> <a href='javascript:alert("Hello Wörld!")'>Hello Wörld!</a> </html> As seen in the status bar mozilla translates second and third href as 'javascript:alert(%22Hello%20W%F6rld!%22)' which doesn't work. I'm not sure whether the second href should work or not but IMHO third one should because I have escaped ö character. (Escaping " as %22 doesn't make any difference either). IMO correct solution would be to use string between " or ' characters in whole with no other translation but *unescaping* before scripting.

Phil Schwartau

Comment 1

•

24 years ago

Attached file Attaching the reporter's HTML test case here - — Details

Phil Schwartau

Comment 2

•

24 years ago

Even NN4.73, if you type this into the URL bar: javascript:alert("Hello Wörld!"); it doesn't work. There seems to be browser code that escapes the characters if they are part of HTML. Therefore I'm sending this bug over to the Parser component for further triage, as it doesn't seem to be a JS Engine issue -

Assignee: rogerl → rickg

Status: UNCONFIRMED → NEW

Component: Javascript Engine → Parser

Ever confirmed: true

QA Contact: pschwartau → janc

rickg

Comment 3

•

24 years ago

I don't believe we plan to support javascript in attributes. Vidur?

rickg

Comment 4

•

24 years ago

marking wont fix, since I don't think we're supporting js attributes.

Status: NEW → RESOLVED

Closed: 24 years ago

Resolution: --- → WONTFIX

vidur (gone)

Comment 5

•

24 years ago

javascript: in a href is not the same as JavaScript entities. We're not supporting the latter. The former should and does work. The entities and Unicode characters in the attributes in the examples shown should be correctly preserved.

Status: RESOLVED → REOPENED

Resolution: WONTFIX → ---

rickg

Comment 6

•

24 years ago

Vidur's right, my mistake. Note that the parsing engine is passing the href's along unchanged. I suspect the culprit is either the link handling code or the sink. I'll dig a bit more.

Status: REOPENED → ASSIGNED

rickg

Comment 7

•

24 years ago

I've now confirmed that the javascript attributes are being parsed and stored as written. I now suspect that the anchor handling code is converting these before showing them.

rickg

Comment 8

•

24 years ago

Reassigning to trudele for triage. I think the content model is well formed, and that the problem is in the link handling code.

Assignee: rickg → trudelle

Status: ASSIGNED → NEW

Keywords: nsbeta3

Priority: P3 → P2

rickg

Comment 9

•

24 years ago

Thanks to billlaw, I've tracked this down to nsIOService::Escape(). The valid nsString is being converted to a whacky C-String (hence the %20's). Please take a look.

Assignee: trudelle → warren

Phil Schwartau

Comment 10

•

24 years ago

Note: this may also be the solution to bug 40469 (-?)

Warren Harris

Comment 11

•

24 years ago

The problem specifically is the use of Escape here: http://lxr.mozilla.org/seamonkey/source/layout/html/content/src/nsHTMLUtils.cpp# 150 We are converting all questionable characters in the string to their escaped values. Probably what we should do is construct the appropriate kind of url for the protocol in question (here, javascript:), and let the protocol's url parser deal with the escaping (if necessary). For javascript: we'll want to escape the things inside the quotes (I think), but not the entire url string. For "standard URLs" (most of the other protocols) we'll escape according to their rules.

Jan Carpenter

Comment 12

•

24 years ago

Not working on any platform: 2000-10-05-09-MN6 : Windows 2000-10-05-13-MN6 : Mac 2000-10-05-09-MN6 : Linux Changing platform/os to all/all Nominating for rtm

Keywords: rtm

OS: Linux → All

Hardware: PC → All

selmer (gone)

Comment 13

•

24 years ago

Warren, is there a trivial fix for this? This bug has languished almost a month in the rtm nomination state. If we don't need this for rtm, please mark it rtm-.

Whiteboard: [need info]

selmer (gone)

Comment 14

•

24 years ago

rtm-, no activity. please update the bug if you're actually working on it.

Whiteboard: [need info] → [rtm-]

Martin Jacobs

Comment 15

•

24 years ago

Even a classic non breakable space html entity gives same wrong behavior. Try this (attached next) : <html><head><title>non breakable space</title></head><body> <a href='javascript:alert("Hello World!")'>Hello World!</a> or <a href="javascript:alert('Hello World!')">Hello World!</a> </body></html>

Martin Jacobs

Comment 16

•

24 years ago

Attached file nbsp html entity in url — Details

bsharma

Comment 17

•

24 years ago

updated qa contact.

QA Contact: janc → bsharma

Keyser Sose

Comment 18

•

24 years ago

*** Bug 75542 has been marked as a duplicate of this bug. ***

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 19

•

24 years ago

The escapes are part of the URL syntax, so it should be just as valid if the page contained the escaped characters. Isn't the problem really that something needs to unescape them?

Peter "jag" Annema

Comment 20

•

24 years ago

Reassigning to component's default owner. Warren, hope you don't mind.

Assignee: warren → harishd

harishd

Comment 21

•

24 years ago

The problem is either in the JS engine or in layout. Reassigning bug to waterson. ccing attinasi & rogerl

Assignee: harishd → waterson

harishd

Comment 22

•

24 years ago

*** Bug 77043 has been marked as a duplicate of this bug. ***

Martin Honnen

Comment 23

•

24 years ago

*** Bug 60576 has been marked as a duplicate of this bug. ***

Jesse Ruderman

Comment 24

•

24 years ago

Attached file extended testcase — Details

Jesse Ruderman

Comment 25

•

24 years ago

*** Bug 55468 has been marked as a duplicate of this bug. ***

Jesse Ruderman

Comment 26

•

24 years ago

*** Bug 63314 has been marked as a duplicate of this bug. ***

Jesse Ruderman

Comment 27

•

24 years ago

*** Bug 66055 has been marked as a duplicate of this bug. ***

Jesse Ruderman

Comment 28

•

24 years ago

cc jst@netscape.com,ftang@netscape.com since I'm copying their comments from duplicate bugs. (pschwartau@netscape.com and jag@tty.nl are already cc'ed here.) From bug 55468: ------- Additional Comments From Johnny Stenback 2000-11-21 02:01 ------- Mozilla's javascript: URL handling is a mess, we escape the non-ASCII characters in their original charset but we never unescape them - also, the escaped URL has no charset info in it so even if we did unescape the JS URL before executing it there's no way to know what charset the URL is encoded as... oh my From bug 66055: ------- Additional Comments From Frank Tang 2001-01-30 11:32 ------- We probably don't care if this only a issue in alert. But I think the alert here is just use as a simplified test case. The real problem is these JavaScript is inside the HREF. We escape them since other URL need to be escaped but we didn't consider the case for JavaScript. What we should do is find out the code between the layout engine and the javaScript code and unescape it back by using the document charset. I think this bug is in the border line of fixing or not fixing. Let's keep this as future untill we find breakage in the real world case. From bug 63314: ------- Additional Comments From Peter ``jag'' Annema 2000-12-19 12:29 ------- Actually, this should be escaped (it's supposed to be an URI[0]), so I guess it should be unescaped before handing it to the JS engine. I strongly suspect this to be a dupe of an existing bug. [0] http://www.w3.org/TR/html4/struct/links.html#adef-href ------- Additional Comments From Phil Schwartau 2001-04-26 11:06 ------- This WW3 link may be of interest: http://www.w3.org/TR/html4/appendix/notes.html#non-ascii-chars

Keywords: 4xp, intl

Summary: "javascript:code" doesn't work in hrefs properly → moz escapes javascript: urls containing non-ascii chars

Jesse Ruderman

Comment 29

•

24 years ago

cc jst@netscape.com,ftang@netscape.com since I just copied their comments from duplicate bugs.

Jesse Ruderman

Comment 30

•

24 years ago

My attached testcase doesn't cover: - %hh in javascript: URLs that contain non-ascii characters - %hh in javascript: URLs that do not contain non-ascii characters - unescaping with the correct charset (?) (from bug 55468)

Brendan Eich [:brendan]

Updated

•

24 years ago

Keywords: mozilla1.0

Jesse Ruderman

Comment 31

•

24 years ago

*** Bug 69043 has been marked as a duplicate of this bug. ***

Jesse Ruderman

Updated

•

24 years ago

Keywords: mostfreq

Phil Schwartau

Comment 32

•

24 years ago

*** Bug 79602 has been marked as a duplicate of this bug. ***

Chris Waterson

Comment 33

•

23 years ago

How did I get this bug?

Assignee: waterson → harishd

harishd

Comment 34

•

23 years ago

The problem seems to be around line: 150 in nsHTMLUtils.cpp ( what rickg had pointed out ). http://lxr.mozilla.org/seamonkey/source/content/html/content/src/nsHTMLUtils.cpp #150 Back to waterson :-)

Assignee: harishd → waterson

harishd

Comment 35

•

23 years ago

This should probably go to ftang.

Assignee: waterson → ftang

Frank Tang

Comment 36

•

23 years ago

nhotta- can you help this one?

Assignee: ftang → nhotta