Closed
Bug 87298
Opened 23 years ago
Closed 23 years ago
URL: tabs inside URLs
Categories
(Core :: Networking, defect, P3)
Tracking
()
VERIFIED
FIXED
mozilla1.0
People
(Reporter: caruso, Assigned: vishy)
References
()
Details
(Keywords: testcase)
Attachments
(3 files)
110 bytes,
text/html
|
Details | |
1.15 KB,
patch
|
Details | Diff | Splinter Review | |
714 bytes,
patch
|
Details | Diff | Splinter Review |
From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux 2.4.2 i686; en-US; rv:0.9.1) Gecko/20010607 BuildID: 2001060713 The context of the article are not displayed at all. Almost blank page is the result of any selection on article contex. Reproducible: Always Steps to Reproduce: 1.Go to htpp://www.corriere.it/ 2. Follow the link "sezioni del giornale" on the right 3. Follow one of the link on the right (ex. "Economia") 4. Choice an article. Actual Results: No article displayed Expected Results: Display the context of the article URL: http://www.corriere.it/edicola/index.jsp?path=Economia and try to read some article.
Comment 1•23 years ago
|
||
The problem here is the url generated on the 'Ecomonia' page. If you click on the 'Accordo raggiunto a Bruxelles: sì all?ingresso della Cina nel Wto' link in Mozilla you go to the following url: http://www.corriere.it/edicola/index.jsp?path=ECONOMIA%09%09&doc=BOX However in IE you go to: http://www.corriere.it/edicola/index.jsp?path=ECONOMIA&doc=BOX The %09%09 are two tab characters caused by the definition of the link: <A CLASS="titolo" HREF="/edicola/index.jsp?path=ECONOMIA &doc=BOX" CLASS="nav"> Is it valid to have a link defined on two lines? mozilla seems to be stripping out the Carriage Return.
Don't know if this applies, but rfc 2396 ( http://www.ietf.org/rfc/rfc2396.txt ) says: In some cases, extra whitespace (spaces, linebreaks, tabs, etc.) may need to be added to break long URI across lines. The whitespace should be ignored when extracting the URI.
Component: Browser-General → Parser
reassign & confirm, I thought something like that might exist. I'll let parser qa consider..
Assignee: asa → harishd
Status: UNCONFIRMED → NEW
Ever confirmed: true
QA Contact: doronr → bsharma
The decision, about stripping off \n,\t, etc, should probably happen in layout. -->attinasi
Assignee: harishd → attinasi
Comment 7•23 years ago
|
||
Currently, attributes are processed in the parser or the content sink, not layout. I think it is important that we follow the advise of the RFC on URIs, specifically the part that says: ... The whitespace should be ignored when extracting the URI. My guess is that this is a parser problem. If it turns out to be a problem in the ContentSink's handling of the attributes, then by all means send it back to me to look at (I have little clue what happens in the parser, but I believe that the attributes are read up into parser nodes or something that are sent to the sink...). Harish, can you check what can be done at the parser level? Since the CR is being stripped, stripping the tabs should be pretty easy. If necessary, the content sink can post-process the attribute, but that is messy since it will not know if the whitespace characters came from line breaks or if they were intentional...
Assignee: attinasi → harishd
Mark: For quotted attributes parser just consumes content withing the quotes. It does not make any decision about what needs to be stripped off ( including CRs ). I don't think even the content sink can make this call. IMO, the attribute handling code, where ever it is, should be responsible in stripping off unwanted tabs, spaces, etc.
Comment 9•23 years ago
|
||
Yea, you are right Harish. The more I think about this, the more it seems like this is another case where our URI parsing code should take care of it (like bug 79929). Over to dougt to look at: my read is that all whitespace from the URI should be stripped, and this is in accordance with the RFC on URIs (http://www.ietf.org/rfc/rfc2396.txt)
Assignee: harishd → dougt
Component: Parser → Networking
Updated•23 years ago
|
Priority: -- → P1
Target Milestone: --- → mozilla0.9.3
Comment 10•23 years ago
|
||
Comment 11•23 years ago
|
||
Although, I could not reproduce the problem with the above URL, I could with the test case provided. I attached a simple patch with strips out the whitespace from the url string. I am not totally convinced that this is kosher in all cases even with the RFC telling us that it is.. :-/. andreas, could you review the patch?
Status: NEW → ASSIGNED
Comment 12•23 years ago
|
||
Pollman has a patch to do the same thing in bug 79929 - we might want to coordinate these two fixes...
Comment 13•23 years ago
|
||
I opened bug 87298 to handle the general case of whitespace trimming on URLs... The patch presented here by dougt could apply to that bug.
Comment 14•23 years ago
|
||
I don't think that pollman's fix will be needed with the one I attached.
Comment 15•23 years ago
|
||
From a quick look, it seems that the patch on this bug also strips internal whitespace, no? If so, will it be able to handle pages like this: http://eric.302easyst.net/mozilla/bugs/gotofunny.html Note that this page works in Nav 4.x, IE 5.5, and Netscape 6... Note that this testcase only demonstrates 2 of the 25 or so cases we need to verify backwards compatability with Nav and IE, the complete list is on bug 79929.
Comment 16•23 years ago
|
||
Sorry, had not looked at the testcase on *this* bug. :) It appears that IE treats spaces differently than tabs and linebreaks when stripping whitespace from urls? We really should build up so more extensive testcases before going with any particular solution...
Comment 17•23 years ago
|
||
I agree with Eric, and hopefully we can hash it all out in bug 87298 instead of in miscellaneous bugs reported on specific symptoms... Parsing URLs correctly seems pretty important, from both a Standards perspective and a compatability perspective.
Comment 18•23 years ago
|
||
Did you mean Bug 87894?
Comment 19•23 years ago
|
||
Yes, bug 87894. I guess I did that a couple of times... sorry.
Comment 20•23 years ago
|
||
tom, is there a group of testcases against URI handling correctness?
Comment 21•23 years ago
|
||
Always stripping whitespaces in the urlparser is really wrong because the parser does not know the context (especially in Resolve). Currently in SetSpec we strip leading and trailing whitespaces and control characters. We also strip linebreaks in the middle. Whitespaces and other control characters left after that are getting escaped. If we apply the patch we would kill all access to files with whitespace in the directory names, at least when accessed with Resolve. Not very uncommon ... and the rfc is not very helpful here. I guess we could fix this case with also removing the tab-char from the url in SetSpec as we do with the linebreaks. It is very likely that this char was not used as part of the url but as attempt to format it. The real deal would be to have code in htmlparser/contentSink/layout/... that guesses from the context if it should remove or mask spaces/linebreaks/tabs from an url. What gets through to the urlparser then gets stripped or escaped. See bug 23485 for another lengthy discussion on this problem. We do not strip anything when using Resolve because sometimes leading or trailing whitespaces in relative urls are valid. After resolving the url SetSpec does the rest ...
Comment 22•23 years ago
|
||
This bug sounds like a dup of 23485. Over to vishy.
Assignee: dougt → vishy
Status: ASSIGNED → NEW
Assignee | ||
Comment 23•23 years ago
|
||
nav triage: moving to m1.0, not a stopper at this time.
Comment 24•23 years ago
|
||
I agree with Marc Attinasi that, to be standards compliant, ``all'' whitespace from the URI should be stripped. But Andreas Otte brings up the good point that people need to access files that have whitespace in the directory names. I think we can do both, if we are very careful about how we handle URIs. When I look at the original file created by the content author, I can easily tell the difference between (a) whitespace added to break a long file name into reasonable-length lines, that should be stripped out before using it. For example, href="more_ details/really_long_file_name" obviously refers to a file in a directory named ``more_details''. (b) whitespace in the actual directory name that is necessary to retrieve the file, written by a content author that is careful to comply to the standards. For example, href="more %20details/really_long_file_name" obviously refers to a file in a directory named ``more details'' . (c) whitespace in the actual directory name that is necessary to retrieve the file, written by a less careful content author. For example, href="more details/really_long_file_name" obviously refers to a file in a directory named ``more details'', although to be pedantic the author really should have encoded that space with the percent escape %20. Does discriminating between (a) and (b) sound do-able ? Is (c) really an option, or must we strip out even that whitespace to maintain standards compliance ?
Comment 25•23 years ago
|
||
I would vote for not only removing \n from the middle of urls but also tabs. I think this would fix this case. I would not remove whitespace from the middle of an uri. That would kill tons of pages.
Comment 26•23 years ago
|
||
Comment 27•23 years ago
|
||
sr=darin
Comment 28•23 years ago
|
||
fix checked in
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Comment 29•23 years ago
|
||
Would someone like to update the subject so it is more descriptive of the current problem? -TIA.
Updated•23 years ago
|
Summary: page context not displayed → tab character inside an absolute url is not getting removed (was: page context not displayed)
Comment 30•22 years ago
|
||
I'll verify this as I get through some URL testcase work...
Keywords: testcase
Summary: tab character inside an absolute url is not getting removed (was: page context not displayed) → URL: tabs inside URLs
Comment 32•22 years ago
|
||
VERIFIED: Mozilla 1.3a, Linux, Mac OS X 10.2.3.
Status: RESOLVED → VERIFIED
Whiteboard: checkmac checklinux
You need to log in
before you can comment on or make changes to this bug.
Description
•