Part of url after ™ not picked up

RESOLVED WORKSFORME

Status

()

Core
Networking
--
critical
RESOLVED WORKSFORME
17 years ago
16 years ago

People

(Reporter: apr, Assigned: Gagan)

Tracking

({testcase})

Trunk
Future
x86
Linux
testcase
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(URL)

Attachments

(1 attachment)

(Reporter)

Description

17 years ago
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux 2.4.1 i686; en-US; 0.8.1) Gecko/20010309
BuildID:    2001030905

%20 is everywhere a space should be used. This means
attachments to mail are named incorrectly. It also causes
crashes or lockup in some cases (see the URL example above)
since the file name is not correct.

Reproducible: Always
Steps to Reproduce:
Go to http://www.compaq.com. 
Click on ipaq devices.
Click on blackberry wireless email. 
Click on product specs.
Click on either of the 2 PDF file links. Both of these have %20 in the name and
ftp crashes (older versions like 0.8) or hangs (this version).

Actual Results:  CPU maxed out and it hung.

Expected Results:  Downloaded the correct file name without the %20 inserted
everywhere
a space should be

%20 is everywhere in mozilla ... please can't this be fixed?
It's very annoying for mail attachments, and it breaks everything where a space
in a file name exists. This has been in there since at least 0.8.

Comment 1

17 years ago
The problems is that the href attribute contains a ™
 which seems to trouble mozilla : the url of the link stops at that character.
Every character after ™ until the href closing quote are not used for the link.
There are two separate problems here.

1)  The page uses the following code:

<a href="ftp://ftp.compaq.com/pub/products/handhelds/Compaq iPAQ
BlackBerry&#153; Bundle Brief.pdf">Product Brief</a>

&#153; is the entity for "TM".  We are not picking up the part of the URL after
the &#153; so we try to fetch
"ftp://ftp.compaq.com/pub/products/handhelds/Compaq iPAQ BlackBerry" and fail. 
The crash/hang is a separate bug about trying to fetch nonexistent files over FTP.

2)  The file is named "Compaq iPAQ BlackBerry Bundle Brief.pdf" on the server,
with no "TM" character in it.  So the link is broken in any case (trying to
fetch it with NS 4.x, for example, fails) and needs to be fixed.

In either case, the %20s are _supposed_ to be there -- that's how spaces are
encoded in URLs.  Changing summary to correspond to the one real bug here.

Should this be FTP or parser?
Assignee: asa → dougt
Component: Browser-General → Networking: FTP
QA Contact: doronr → tever
Summary: %20 is inserted in URLs and mail file names where spaces should be → Part of url after &#153; not picked up
(Reporter)

Comment 4

17 years ago
One comment:

%20s are not supposed to be there in mail file attachment names, etc.
Please don't ignore this problem. When I get a mail file attachment
whose name is supposed to have spaces in it, the file is saved
with a file name having %20s where spaces should be.
bug 71735 filed on mailnews attachments.

Updated

17 years ago
Status: UNCONFIRMED → NEW
Ever confirmed: true

Comment 6

17 years ago
Adding a simple test case which illustrates the bustage.  The bug is in the URL
parsing, not the html parser.

Comment 7

17 years ago
Created attachment 27702 [details]
Test case which used the [tm] symbol

Comment 8

17 years ago
-> gagan.
Assignee: dougt → gagan
(Assignee)

Comment 9

17 years ago
cc'ing andreas
Target Milestone: --- → Future

Comment 10

17 years ago
I don't think this is a problem with the url parser.

http://www.netscape.com/company/stuff&#153;/index.html

The above url is parsed and all after the # is treated as fragment/href. &, #
and ; are special characters inside urls. If you use entities inside urls, make
sure you escape them properly before giving it to the urlparser to prevent this
from happening. This can be done directly in the document or maybe by the
htmlparser when detecting entities in an string that looks like an url.
                                                                
As andreas says, dougt's test case is invalid. spaces in urls work file, and the
testcase is ok for me.

However, the MT symbols have been removed from the page, and that symbol isn't
in ASCII anyway, so its invalid for ftp, and without a testcase I can't see how
other program deals with it.

We display the pdf rather than opening it, but I havne't got mime helpers for
pdf set up, so thats ok, I guess.

WFM.
Status: NEW → RESOLVED
Last Resolved: 17 years ago
Resolution: --- → WORKSFORME

Updated

17 years ago
QA Contact: tever → benc

Comment 12

16 years ago
-> networking, + testcase.
This issue sounds like it is generic to more than ftp URL parsing.
Component: Networking: FTP → Networking
Keywords: testcase

Comment 13

16 years ago
This is not networking ... if an entity is not properly escaped that is not a
necko/urlparser problem ... the testcase is invalid ... tech evangelism it
should be ...
You need to log in before you can comment on or make changes to this bug.