160 bytes, text/html
1001 bytes, patch
|Details | Diff | Splinter Review|
1.14 KB, patch
|Details | Diff | Splinter Review|
Patch to content sink: strips whitespace for all attributes that are not in a special list (back to old behavior without regressing the option value fix)
1.43 KB, patch
|Details | Diff | Splinter Review|
1.88 KB, patch
|Details | Diff | Splinter Review|
At the above URL, there are two IMG tags that have a space in the path to the image e.g. SRC=" /someimage.gif". This causes Mozilla not to load the images in question. Removing those spaces with the editor and looking in the preview shows the images load and display correctly.
I'm seeing this too in Build 2001050921 in Linux. Changing OS to All. Only happens when the url is not absolute. I don't know what the Html standards say on this.
*** Bug 84313 has been marked as a duplicate of this bug. ***
wonder how much content really looks like this...
I believe we should remove trailing and leading whitespace (but not ws in between). Clayton is not the correct owner, HTML Element component should be removed from Bugzilla. Reassigning to layout.
marc, can you have a look?
I'll look deeper, but removing the leading (and possibly trailing) spaces shoul dbe pretty easy, although I believe that this should probably be avoided for standard mode (I need to check the HTML specs to see if the attribute is somehow special). Just curious: why is this critical for 0.9.2?
See also bug 24346 - same problem, resolved FIXED.
Changing QA contact
CC'ing Harish - ideas?
This space is no longer stripped out as of 21-April, rev 3.339 of that file. If I remember correctly, that was to fix a problem with: <select> <option value=" foo"> In that case we *do* need to retain the space. Perhaps the space should be stripped out in the image frame, or in the url parser?
I think it is dangerous to change this at the Content sink since it has no clue about attribute semantics, or even the URLParser level where it cannot understand the calling-context. If I strip out the leading whitespace (and trailing I guess) in the ImageFrame, then the DOM is not messed up. Of course, it is not entirely clear that we are better off having a DOM value for the attribute and using something else to actually load the image, but at least the DOM is consistent with the source. My proposed fix, which I am coding now and testing, is to do the trimming of the src and lowsrc attribute values in the nsImageFrame::LoadImage method, for QuirksMode only. I really think that we should NOT trim the whitespace in standard mode, but I have yet to find anything in the specs to back me up...
should this go in nsImageFrame::LoadImage so that it catches the case when someone changes the source dynamically?
Should a similar patch go in for: <form action=> <body background=> <blockquote/q/del/ins cite=> <object classid=> <object/applit codebase=> <object data=> <a/area/link/base href=> <img/frame/iframe longdesc=> <head profile=> <script/input/frame/iframe/img src=> <img/input/object usemap=> (There are the attributes with type URI at: http://www.w3.org/TR/html4/index/attributes.html) Is it ever legal to have whitespace at the beginning of a URI? Could all this be done in one place?
According to RFC 2396, the space character is forbidden everywhere within any URI. So Moz should just do whatever it is Moz usually does with a bogus attribute value. Stripping the whitespace out is probably the wrong thing, as it may hide the problem from a page author using Mozilla to test his work. Most validating tools won't catch this, because they don't include URI parsers. Regarding the original problem page: when did <image> get in the HTML spec? Mozilla seems to understand <image> = <img>, but I don't know why it should.
Specs aside, there are existing pages that expect this behavior. All other browsers on the market (including Mozilla up to 21-April) do strip leading whitespace. Let's not make this bug a repeat of bug 57968 and ship a browser, then run around telling webmasters why their images are mysteriously not showing up. (If we must do this for strict mode, fine...) Also, this has been our behavior up until 21-April-2001, so there is good support for the theory that we won't encounter too many problems if we do strip the whitespace for all URLs...
But as I said, do not strip whitespace in the middle, unless we were doing that previously. We can (or at least used to) handle spaces in URIs. There are websites that actually use spaces in directory and file names... (in the middle, not in the beginning or end that I know of). Even Microsoft's BizTalk website had them when it originally appeared, but they changed it fast :)
First, I strongly oppose putting this change *outside* of Quirks mode. Thewhitespace is illegal, and the only reason we want to handle it at all is because legacy browsers did, hence QuirksMode. In Standard mode, we want to adhere to the specifications as closely as possibly, regardless of legacy behavior (God save the web!). Furthermore, I can see no reason to muck with whitespace inside the URL at all, just leading and trailing. Re: pavlov's comment: Yea, maybe it should be done in LoadImage for the reasons you stated and because it is actually easier there (less code). I'll redo tha patch.
Created attachment 39521 [details] [diff] [review] New Patch that handles dynamic changes to src too
I've attached a patch to trim the leading and trailing whitespace in the LoadImage method (in QuirksMode only). This will handle dynamic changes to the src attribute as well, but there is the additional cost of a string-copy of the src attribute values now. Please review patchID 39521 and lemme know what y'all think. Also, I still have not heard why this is critical to 0.9.2 and RTM+ - it mystifies me since it is marked severity:minor and there are no top50 URLs noted. Anyway, it is easy enough, so marking target milestone 0.9.2
There is a typo in the comment, and you do not need the quirksmode variable since you only test once. Otherwise looks ok to me.
Thanks Heikki, I fixed the typo and removed the quirksMode variable. Also, I moved the whitespace static array into the block where it is used (could not see any reason to remove it though).
Who is the sr= person for this bug?
Good question - trying shaver (for no good reason)
To reiterate, the checkin on 21-April that broke this one case (<img src=>) also broke up to 20 other cases, listed above - every other case where a URL is specified as an attribute. This patch only addresses one of them. Should we open up a separate bug for the others? I've only tested a few, but, our solution seems to be the least consistent of any browser out there. Some strip whitespace, some do not. This makes our behavior probably all the more confusing/frustrating/misleading for web designers. <form action=" foo.cgi"> IE, Nav 4.x, and Mozilla now all handle this differently Netscape 6 handles it like IE. <body background=" foo.gif"> IE, Nav 4.x, and Netscape 6 all show this image, we do not. <script/input/img src=> IE, Nav 4.x, and Netscape 6 all handle the script src=, input type=image src=, and img src= cases, we only handle the img src= case For example, take a look at this page. Nav 4.x and IE render it one way (arguably incorrectly), and we go a different route: http://eric.302easyst.net/mozilla/bugs/attrws.html My feeling is that if we fix img src= we should fix the other cases too...
Eric, I think your reasoning is sound. I'd like to look at the problem at the sink level and see what I can come up with. Of course, this bug is marked PDT+ critical for some reason, so maybe I'll need to get this in before adressing the more fundamental problem. In that case, this hack should be removed if/when the more basic issue is resolved. Regarding the April change that regressed this: if the sink was stripping whitespace when it set attributes, then is it true that the dynamic case was not handled? Is that OK? Also, how much semantic knowledge do we want the sink to have about attribute values?
True, well... as this is PDT+, I think the patch above is definitely better than not taking any action at all, as it brings us one step closer to compatability!
I agree that whitespace should not be stripped except in quirks mode. The danger here is that whitepace might be signifigant in the case of an image that is actually generated by a cgi. sr=blizzard
After prodding from Mike. Can you guys make sure that you open bugs to deal with and investigate the other isses that Eric has brought up here? Making sure that we pick something to be consistent with and making a better long term solution? THanks.
Chofmann didn't say it when he added PDT+, but the reason to want this badly is that we have so many pages out there where we don't "just work" as other browsers do (or at least as well as 4.x ;-) I assume this class of quirks (including the spinoff bugs suggested by Eric) is a significant contributor to the problem. If this patch isn't going to introduce a bunch of risk, then I agree it's good to get what we have and work on doing better. If the patch might cause a bunch of regressions, I'd rather wait (unless current behavior is *worse* than what we shipped in N6.) What's the prognosis on fixing the problem in the general case to maximize compatibility with IE and N4.x across the full range of URIs with whitespace?
Fixing the general problem is going to be more risky. I think the risk can be mitigated, however, by getting enough eyes on it. The change I am proposing is pretty simple, but it adds some semantic knowledge to the ContentSink that I don't think it should have. That said, there are many of examples of the Sink already doing this kind of element-specific / attribute-specific processing, mostly for quirks. I'll put a patch together that takes care of stripping leading and trailing whitespace from attributes while not regression the option values (the reason this was changed and regressed in the first place). Once the patch is there, people can look at it and decide if the risk is reasonable and the benefits desirable...
Created attachment 39742 [details] [diff] [review] Patch to content sink: strips whitespace for all attributes that are not in a special list (back to old behavior without regressing the option value fix)
I attached a patch to the content sink that makes it strip out whitespace for attributes EXCEPT those that are in a special table of non-strippable attributes. Currently, the table is only built up of attribute names, though in theory it might need tag names too (since attributes in different contexts might have different legacy behavior). Currently, the table only has the 'value' attribute so it does not regress the option value bug (gets us back to where we were when bug 24346 was fixed, more or less). The benefit is that all of the cases that Eric Pollman enumerated are handled. The drawbacks to this approach are: cannot distinguish between Quirks and Standard mode, and a slight performance hit as each attribute has to be looked up in a small table. Reviews? Thoughts? Concerns? Come on, lemme have it!
Sounds backwards. We know what attributes have a URI type, so why not strip whitespace for only those attributes. Actually it always seemed to me that whitespace stripping should be done by whatever code is responsible for initiating the HTTP request, or resolving the relative URI. That code knows it is working on a URI, and that neither leading nor trailing whitespace belong there. That would definitely have less of a performance impact as well. Consulting a lookup table for every attribute can add up on pages like LXR, or W3C recs, others, but stripping whitespace need only happen when navigating to another page or loading style sheets, scripts, objects, and images.
Jeffrey, I agree with you, in fact the whole idea of stripping whitespace on attributes stinks if you ask me. I was trying to minimize the impact, and since we used to strip spaces for ALL attributes, and then changed it for the 'value' attribute bug, I coded up a patch that is closest to what we had been doing for some time. Clearly we were not discriminating between URL attributes and non-URL attributes, so why should we now? It all seems so wrong, really, but then what do I know? Bear in mind that we are EVEN NOW stripping whitespace on all attributes, just not the 'space' character. The current code has the Trim in there but no space char in the set. So, the impact of the patch is just to add a check against the set of attributes...
I would suggest that we'd move the whitespace stripping code to HTMLContentSink::AddAttributes() where we can do the name compares by simply comparing atom pointers and not having to do a ASCII and Unicode string comparison for all attributes, we already have the atom in HTMLContentSink::AddAttributes() so we might as well use it. PS. Since we're changing this code, we might as well replace Truncate() and Append() with one call to Assign() in HTMLContentSink::GetAttributeValueAt() too.
Please do not penalize standards mode performance to be compatible with illegal html. Lowest common denominator is not what standards mode is all about. Leave compatibility for quirks and Netscape.
Bug 83603 is caused by spaces in a script src tag... So if this one is fixed generically, close that one as well.
This is a 0.9.2 bug. Can someone summarize what's going on with this bug? Thanks.
Summary: we used to strip leading whitespace from all attributes in the ContentSink. We still do this, but we do not include the 'space' character in the set of whitespace chars to strip. So, now if a URL has a leading space, it is not handled properly. If it is a src attribute for an image, the image is not displayed. If it is a src attribute for a script tag, the script is not loaded. My initial stab at fixing this was to handle images only, and in quirks mode only. Then it was pointed out that lots of attributes need the same processing, so I made a hack into the content sink to basically revert it back to what it was doing back in April (i.e. always stripping whitespace, including spaces, except I excluded the 'value' attribute since that is the one that it was changed for). The big problem is, this is non-standard behavior - whitespace in the URL is illegal, and by stripping it we are not conforming to spec. In the content sink it is not really possible (I think) to distinguish between Standard and Quirks mode. There is also a small performance penalty in stripping whitespace from every attribute... Another proposed solution is to handle it in the URL parser - I was looking at stripping the whitespace in the NS_NewURI call, but I am not very familiar with that code, and not sure if it can be condition on Std. vs. Quirks mode. There is a patch attached that handles <IMG> only. The question is about the general case...
Adding dougt since he can give us advice on the URL parsing bits.
I was looking at the NS_NewURI methods in nsNetUtil and it looks like we could do the stripping there, but I cannot tell if the better place would be in the IOService itself. The NS_NewURI method is inline, which seems quite strange since it is called so often: if it is really being inlined then it seems like a cause of potential bloat, if it is not being inlined then it is probably screwing up code locality (anybody know if it is inlined on the various platforms?). I can take a stab at it, but I keep thinking about johnhall's plea to NOT penalize standards mode performance by stripping illegal whitespace - and I agree with him. Stripping at the URI parsing would cause this penalty.
But stripping at URI parsing is going to happen a lot less frequently than stripping in the attribute parsing.
PRessure is mounting: should I just checkin the hack in patch 39521 to fix the image problem, and then we can work on the general URL-stripping problem for 0.9.3? Please advise - I have r and sr but need approval from drivers...
Created attachment 40030 [details] [diff] [review] patch in netlib [one more fly in the ointment... :) ]
This patch also fixes the problem, and might address some of the performance concerns. I copied the code that was already in nsStdUrl::SetSpec to strip leading whitespace from absolute url's, to nsStdUrl::Resolve. It may, in fact, only need to be in one of the two places. ??? Unfortunately, I don't really know this code all that well... As for performance: We are already doing the exact same logic for every URL that gets created in SetSpec(). In the normal case (no whitespace), the logic reduces to a few instructions. As for how often this code is called, in loading the testcase I posted above: http://eric.302easyst.net/mozilla/bugs/attrws.html SetSpec is called 109 times (no change here) Resove is called 16 times (this patch would add a few extra instructions per call) I am not advocating this as the correct solution, because I really, Really don't know what the ramifications would be, but I'm posting it here as yet another solution to consider, if needed.
at this late stage in the game I'd rather take the lower risk hack (for images only) on the 0.9.2 branch and put the real fix on the trunk. a= firstname.lastname@example.org (on behalf of drivers) for images only fix on the 0.9.2 branch.
Patch for images only checked into trunk. The remaining, more general, issue is being dealt with both here and in bug 87298 - I'm going to open a new bug to cover the URL problem specifically, and reference it in both bugs (this and 87298).
Bug 87894 opened to cover remaining, general issue with whitespace in URLs. Marking this one FIXED
Re-opening, can't see the spinning logo on https://webmail.jadedirect.com, or that cute little baby in the test case
Terri, did you use a BRANCH build? This is BRANCH only...
Sorry, my comment on Additional Comments From Marc Attinasi 2001-06-26 14:47 was wrong, it should have said BRANCH, not trunk. This hacked fix is for the BRANCH only.
Alrighty, FIXED ON BRANCH ;-)
Verified fixed w2k branch build 2001062803 Verified fixed linux branch build 2001062706 Verified fixed mac branch build 2001062809
Did this regress? See bug 133536 http://bugzilla.mozilla.org/show_bug.cgi?id=133536 and testcase in http://bugzilla.mozilla.org/showattachment.cgi?attach_id=76204
This has only been fixed on the 0.9.2 branch, so nothing regressed, the issue is still there.