<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Comment 7

•

24 years ago

Currently, attributes are processed in the parser or the content sink, not layout. I think it is important that we follow the advise of the RFC on URIs, specifically the part that says: ... The whitespace should be ignored when extracting the URI. My guess is that this is a parser problem. If it turns out to be a problem in the ContentSink's handling of the attributes, then by all means send it back to me to look at (I have little clue what happens in the parser, but I believe that the attributes are read up into parser nodes or something that are sent to the sink...). Harish, can you check what can be done at the parser level? Since the CR is being stripped, stripping the tabs should be pretty easy. If necessary, the content sink can post-process the attribute, but that is messy since it will not know if the whitespace characters came from line breaks or if they were intentional...

Assignee: attinasi → harishd

harishd

Comment 8

•

24 years ago

Mark: For quotted attributes parser just consumes content withing the quotes. It does not make any decision about what needs to be stripped off ( including CRs ). I don't think even the content sink can make this call. IMO, the attribute handling code, where ever it is, should be responsible in stripping off unwanted tabs, spaces, etc.

Comment 9

•

24 years ago

Yea, you are right Harish. The more I think about this, the more it seems like this is another case where our URI parsing code should take care of it (like bug 79929). Over to dougt to look at: my read is that all whitespace from the URI should be stripped, and this is in accordance with the RFC on URIs (http://www.ietf.org/rfc/rfc2396.txt)

Assignee: harishd → dougt

Component: Parser → Networking

Updated

•

24 years ago

Priority: -- → P1

Target Milestone: --- → mozilla0.9.3

Comment 10

•

24 years ago

Attached patch Strips whitespace during url resolution. — Details — Splinter Review

Comment 11

•

24 years ago

Although, I could not reproduce the problem with the above URL, I could with the test case provided. I attached a simple patch with strips out the whitespace from the url string. I am not totally convinced that this is kosher in all cases even with the RFC telling us that it is.. :-/. andreas, could you review the patch?

Status: NEW → ASSIGNED

Comment 12

•

24 years ago

Pollman has a patch to do the same thing in bug 79929 - we might want to coordinate these two fixes...

Comment 13

•

24 years ago

I opened bug 87298 to handle the general case of whitespace trimming on URLs... The patch presented here by dougt could apply to that bug.

Comment 14

•

24 years ago

I don't think that pollman's fix will be needed with the one I attached.

Eric Pollmann

Comment 15

•

24 years ago

From a quick look, it seems that the patch on this bug also strips internal whitespace, no? If so, will it be able to handle pages like this: http://eric.302easyst.net/mozilla/bugs/gotofunny.html Note that this page works in Nav 4.x, IE 5.5, and Netscape 6... Note that this testcase only demonstrates 2 of the 25 or so cases we need to verify backwards compatability with Nav and IE, the complete list is on bug 79929.

Eric Pollmann

Comment 16

•

24 years ago

Sorry, had not looked at the testcase on *this* bug. :) It appears that IE treats spaces differently than tabs and linebreaks when stripping whitespace from urls? We really should build up so more extensive testcases before going with any particular solution...

Comment 17

•

24 years ago

I agree with Eric, and hopefully we can hash it all out in bug 87298 instead of in miscellaneous bugs reported on specific symptoms... Parsing URLs correctly seems pretty important, from both a Standards perspective and a compatability perspective.

Eric Pollmann

Comment 18

•

24 years ago

Did you mean Bug 87894?

Comment 19

•

24 years ago

Yes, bug 87894. I guess I did that a couple of times... sorry.

Comment 20

•

24 years ago

tom, is there a group of testcases against URI handling correctness?

Comment 21

•

24 years ago

Always stripping whitespaces in the urlparser is really wrong because the parser does not know the context (especially in Resolve). Currently in SetSpec we strip leading and trailing whitespaces and control characters. We also strip linebreaks in the middle. Whitespaces and other control characters left after that are getting escaped. If we apply the patch we would kill all access to files with whitespace in the directory names, at least when accessed with Resolve. Not very uncommon ... and the rfc is not very helpful here. I guess we could fix this case with also removing the tab-char from the url in SetSpec as we do with the linebreaks. It is very likely that this char was not used as part of the url but as attempt to format it. The real deal would be to have code in htmlparser/contentSink/layout/... that guesses from the context if it should remove or mask spaces/linebreaks/tabs from an url. What gets through to the urlparser then gets stripped or escaped. See bug 23485 for another lengthy discussion on this problem. We do not strip anything when using Resolve because sometimes leading or trailing whitespaces in relative urls are valid. After resolving the url SetSpec does the rest ...

Comment 22

•

24 years ago

This bug sounds like a dup of 23485. Over to vishy.

Assignee: dougt → vishy

Status: ASSIGNED → NEW

Viswanath Ramachandran

Assignee

Updated

•

24 years ago

Keywords: nsbeta1

Viswanath Ramachandran

Assignee

Comment 23

•

24 years ago

nav triage: moving to m1.0, not a stopper at this time.

Keywords: nsbeta1 → nsbeta1+

Priority: P1 → P3

Target Milestone: mozilla0.9.3 → mozilla1.0

dcary

Comment 24

•

24 years ago

I agree with Marc Attinasi that, to be standards compliant, ``all'' whitespace from the URI should be stripped. But Andreas Otte brings up the good point that people need to access files that have whitespace in the directory names. I think we can do both, if we are very careful about how we handle URIs. When I look at the original file created by the content author, I can easily tell the difference between (a) whitespace added to break a long file name into reasonable-length lines, that should be stripped out before using it. For example, href="more_ details/really_long_file_name" obviously refers to a file in a directory named ``more_details''. (b) whitespace in the actual directory name that is necessary to retrieve the file, written by a content author that is careful to comply to the standards. For example, href="more %20details/really_long_file_name" obviously refers to a file in a directory named ``more details'' . (c) whitespace in the actual directory name that is necessary to retrieve the file, written by a less careful content author. For example, href="more details/really_long_file_name" obviously refers to a file in a directory named ``more details'', although to be pedantic the author really should have encoded that space with the percent escape %20. Does discriminating between (a) and (b) sound do-able ? Is (c) really an option, or must we strip out even that whitespace to maintain standards compliance ?

Comment 25

•

24 years ago

I would vote for not only removing \n from the middle of urls but also tabs. I think this would fix this case. I would not remove whitespace from the middle of an uri. That would kill tons of pages.

Comment 26

•

24 years ago

Attached patch patch strips \t from urls in SetSpec, testcase and bugurl working fine — Details — Splinter Review

Darin Fisher

Comment 27

•

24 years ago

sr=darin

Comment 28

•

24 years ago

fix checked in

Status: NEW → RESOLVED

Closed: 24 years ago

Resolution: --- → FIXED

Comment 29

•

24 years ago

Would someone like to update the subject so it is more descriptive of the current problem? -TIA.

Updated

•

24 years ago

Summary: page context not displayed → tab character inside an absolute url is not getting removed (was: page context not displayed)

Comment 30

•

23 years ago

I'll verify this as I get through some URL testcase work...

Keywords: testcase

Summary: tab character inside an absolute url is not getting removed (was: page context not displayed) → URL: tabs inside URLs

Updated

•

23 years ago

QA Contact: bsharma → benc

Comment 31

•

23 years ago

VERIFIED: Mozilla 1.2.1, Win98

Whiteboard: checkmac checklinux