From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux 2.4.1 i686; en-US; 0.8.1) Gecko/20010309 BuildID: 2001030905 %20 is everywhere a space should be used. This means attachments to mail are named incorrectly. It also causes crashes or lockup in some cases (see the URL example above) since the file name is not correct. Reproducible: Always Steps to Reproduce: Go to http://www.compaq.com. Click on ipaq devices. Click on blackberry wireless email. Click on product specs. Click on either of the 2 PDF file links. Both of these have %20 in the name and ftp crashes (older versions like 0.8) or hangs (this version). Actual Results: CPU maxed out and it hung. Expected Results: Downloaded the correct file name without the %20 inserted everywhere a space should be %20 is everywhere in mozilla ... please can't this be fixed? It's very annoying for mail attachments, and it breaks everything where a space in a file name exists. This has been in there since at least 0.8.
The problems is that the href attribute contains a ™ which seems to trouble mozilla : the url of the link stops at that character. Every character after ™ until the href closing quote are not used for the link.
There are two separate problems here. 1) The page uses the following code: <a href="ftp://ftp.compaq.com/pub/products/handhelds/Compaq iPAQ BlackBerry™ Bundle Brief.pdf">Product Brief</a> ™ is the entity for "TM". We are not picking up the part of the URL after the ™ so we try to fetch "ftp://ftp.compaq.com/pub/products/handhelds/Compaq iPAQ BlackBerry" and fail. The crash/hang is a separate bug about trying to fetch nonexistent files over FTP. 2) The file is named "Compaq iPAQ BlackBerry Bundle Brief.pdf" on the server, with no "TM" character in it. So the link is broken in any case (trying to fetch it with NS 4.x, for example, fails) and needs to be fixed. In either case, the %20s are _supposed_ to be there -- that's how spaces are encoded in URLs. Changing summary to correspond to the one real bug here. Should this be FTP or parser?
updating url to point to page with broken link
One comment: %20s are not supposed to be there in mail file attachment names, etc. Please don't ignore this problem. When I get a mail file attachment whose name is supposed to have spaces in it, the file is saved with a file name having %20s where spaces should be.
bug 71735 filed on mailnews attachments.
Adding a simple test case which illustrates the bustage. The bug is in the URL parsing, not the html parser.
Assignee: dougt → gagan
Target Milestone: --- → Future
I don't think this is a problem with the url parser. http://www.netscape.com/company/stuff™/index.html The above url is parsed and all after the # is treated as fragment/href. &, # and ; are special characters inside urls. If you use entities inside urls, make sure you escape them properly before giving it to the urlparser to prevent this from happening. This can be done directly in the document or maybe by the htmlparser when detecting entities in an string that looks like an url.
As andreas says, dougt's test case is invalid. spaces in urls work file, and the testcase is ok for me. However, the MT symbols have been removed from the page, and that symbol isn't in ASCII anyway, so its invalid for ftp, and without a testcase I can't see how other program deals with it. We display the pdf rather than opening it, but I havne't got mime helpers for pdf set up, so thats ok, I guess. WFM.
Status: NEW → RESOLVED
Last Resolved: 17 years ago
Resolution: --- → WORKSFORME
-> networking, + testcase. This issue sounds like it is generic to more than ftp URL parsing.
Component: Networking: FTP → Networking
This is not networking ... if an entity is not properly escaped that is not a necko/urlparser problem ... the testcase is invalid ... tech evangelism it should be ...
You need to log in before you can comment on or make changes to this bug.