Closed Bug 79929 Opened 23 years ago Closed 23 years ago

IMG tags with a space in the src= path are not loaded

Categories

(Core :: Layout, defect, P3)

x86
All
defect

Tracking

()

VERIFIED FIXED
mozilla0.9.2

People

(Reporter: mgregan, Assigned: attinasi)

References

()

Details

(Whiteboard: PDT+, critical for 0.9.2, have patch, r and sr; a=drivers for image only fix on 0.9.2 branch)

Attachments

(5 files)

At the above URL, there are two IMG tags that have a space in the path to the
image e.g. SRC=" /someimage.gif". This causes Mozilla not to load the images in
question. Removing those spaces with the editor and looking in the preview shows
the images load and display correctly.
I'm seeing this too in Build 2001050921 in Linux.
Changing OS to All.
Only happens when the url is not absolute.
I don't know what the Html standards say on this.
OS: Windows 2000 → All
updating component
Assignee: asa → clayton
Status: UNCONFIRMED → NEW
Component: Browser-General → HTML Element
Ever confirmed: true
QA Contact: doronr → bsharma
*** Bug 84313 has been marked as a duplicate of this bug. ***
Whiteboard: want for mozilla 0.9.2
Whiteboard: want for mozilla 0.9.2 → critical for 0.9.2
wonder how much content really looks like this...
I believe we should remove trailing and leading whitespace (but not ws in between).

Clayton is not the correct owner, HTML Element component should be removed from
Bugzilla. Reassigning to layout.
Assignee: clayton → karnaze
Component: HTML Element → Layout
QA Contact: bsharma → petersen
marc, can you have a look?
Assignee: karnaze → attinasi
Whiteboard: critical for 0.9.2 → PDT+, critical for 0.9.2
I'll look deeper, but removing the leading (and possibly trailing) spaces shoul
dbe pretty easy, although I believe that this should probably be avoided for
standard mode (I need to check the HTML specs to see if the attribute is somehow
special).

Just curious: why is this critical for 0.9.2?
Status: NEW → ASSIGNED
See also bug 24346 - same problem, resolved FIXED.
Changing QA contact
QA Contact: petersen → tpreston
CC'ing Harish - ideas?
This space is no longer stripped out as of 21-April, rev 3.339 of that file.  If
I remember correctly, that was to fix a problem with:

<select>
 <option value=" foo">

In that case we *do* need to retain the space.  Perhaps the space should be
stripped out in the image frame, or in the url parser?
I think it is dangerous to change this at the Content sink since it has no clue
about attribute semantics, or even the URLParser level where it cannot
understand the calling-context. If I strip out the leading whitespace (and
trailing I guess) in the ImageFrame, then the DOM is not messed up. Of course,
it is not entirely clear that we are better off having a DOM value for the
attribute and using something else to actually load the image, but at least the
DOM is consistent with the source.

My proposed fix, which I am coding now and testing, is to do the trimming of the
src and lowsrc attribute values in the nsImageFrame::LoadImage method, for
QuirksMode only. I really think that we should NOT trim the whitespace in
standard mode, but I have yet to find anything in the specs to back me up...
Attached file Simple testcase
Attached patch PatchSplinter Review
should this go in nsImageFrame::LoadImage so that it catches the case when 
someone changes the source dynamically?
Should a similar patch go in for:

<form action=>
<body background=>
<blockquote/q/del/ins cite=>
<object classid=>
<object/applit codebase=>
<object data=>
<a/area/link/base href=>
<img/frame/iframe longdesc=>
<head profile=>
<script/input/frame/iframe/img src=>
<img/input/object usemap=>

(There are the attributes with type URI at:
http://www.w3.org/TR/html4/index/attributes.html)

Is it ever legal to have whitespace at the beginning of a URI?  Could all this
be done in one place?
According to RFC 2396, the space character is forbidden everywhere within any
URI.  So Moz should just do whatever it is Moz usually does with a bogus
attribute value.  Stripping the whitespace out is probably the wrong thing, as
it may hide the problem from a page author using Mozilla to test his work.  Most
validating tools won't catch this, because they don't include URI parsers.

Regarding the original problem page: when did <image> get in the HTML spec? 
Mozilla seems to understand <image> = <img>, but I don't know why it should.
Specs aside, there are existing pages that expect this behavior.  All other
browsers on the market (including Mozilla up to 21-April) do strip leading
whitespace.  Let's not make this bug a repeat of bug 57968 and ship a browser,
then run around telling webmasters why their images are mysteriously not showing
up.  (If we must do this for strict mode, fine...)

Also, this has been our behavior up until 21-April-2001, so there is good
support for the theory that we won't encounter too many problems if we do strip
the whitespace for all URLs...
But as I said, do not strip whitespace in the middle, unless we were doing that
previously. We can (or at least used to) handle spaces in URIs. There are
websites that actually use spaces in directory and file names... (in the middle,
not in the beginning or end that I know of). Even Microsoft's BizTalk website
had them when it originally appeared, but they changed it fast :)
First, I strongly oppose putting this change *outside* of Quirks mode.
Thewhitespace is illegal, and the only reason we want to handle it at all is
because legacy browsers did, hence QuirksMode. In Standard mode, we want to
adhere to the specifications as closely as possibly, regardless of legacy
behavior (God save the web!). Furthermore, I can see no reason to muck with
whitespace inside the URL at all, just leading and trailing.

Re: pavlov's comment: Yea, maybe it should be done in LoadImage for the reasons
you stated and because it is actually easier there (less code). I'll redo tha patch.
I've attached a patch to trim the leading and trailing whitespace in the
LoadImage method (in QuirksMode only). This will handle dynamic changes to the
src attribute as well, but there is the additional cost of a string-copy of the
src attribute values now.

Please review patchID 39521 and lemme know what y'all think.

Also, I still have not heard why this is critical to 0.9.2 and RTM+ - it
mystifies me since it is marked severity:minor and there are no top50 URLs
noted. Anyway, it is easy enough, so marking target milestone 0.9.2
Priority: -- → P3
Target Milestone: --- → mozilla0.9.2
There is a typo in the comment, and you do not need the quirksmode variable
since you only test once. Otherwise looks ok to me.
Thanks Heikki, I fixed the typo and removed the quirksMode variable. Also, I
moved the whitespace static array into the block where it is used (could not see
any reason to remove it though).
Who is the sr= person for this bug?
Whiteboard: PDT+, critical for 0.9.2 → PDT+, r=heikki, need sr=
Good question - trying shaver (for no good reason)
Keywords: patch
Summary: IMG tags with a space in the src= path are not loaded → [PATCH] IMG tags with a space in the src= path are not loaded
To reiterate, the checkin on 21-April that broke this one case (<img src=>) also
broke up to 20 other cases, listed above - every other case where a URL is
specified as an attribute.  This patch only addresses one of them.  Should we
open up a separate bug for the others?

I've only tested a few, but, our solution seems to be the least consistent of
any browser out there.  Some strip whitespace, some do not. This makes our
behavior probably all the more confusing/frustrating/misleading for web designers.

<form action=" foo.cgi">
  IE, Nav 4.x, and Mozilla now all handle this differently
  Netscape 6 handles it like IE.

<body background=" foo.gif">
  IE, Nav 4.x, and Netscape 6 all show this image, we do not.

<script/input/img src=>
  IE, Nav 4.x, and Netscape 6 all handle the script src=, input type=image src=,
and img src= cases, we only handle the img src= case

For example, take a look at this page.  Nav 4.x and IE render it one way
(arguably incorrectly), and we go a different route:
http://eric.302easyst.net/mozilla/bugs/attrws.html

My feeling is that if we fix img src= we should fix the other cases too...
Keywords: patch
Summary: [PATCH] IMG tags with a space in the src= path are not loaded → IMG tags with a space in the src= path are not loaded
Whiteboard: PDT+, r=heikki, need sr= → PDT+, critical for 0.9.2
Eric, I think your reasoning is sound. I'd like to look at the problem at the
sink level and see what I can come up with. Of course, this bug is marked PDT+
critical for some reason, so maybe I'll need to get this in before adressing the
more fundamental problem. In that case, this hack should be removed if/when the
more basic issue is resolved.

Regarding the April change that regressed this: if the sink was stripping
whitespace when it set attributes, then is it true that the dynamic case was not
handled? Is that OK? Also, how much semantic knowledge do we want the sink to
have about attribute values?
True, well...  as this is PDT+, I think the patch above is definitely better
than not taking any action at all, as it brings us one step closer to compatability!
I agree that whitespace should not be stripped except in quirks mode.  The
danger here is that whitepace might be signifigant in the case of an image that
is actually generated by a cgi.

sr=blizzard
After prodding from Mike.  Can you guys make sure that you open bugs to deal
with and investigate the other isses that Eric has brought up here?  Making sure
that we pick something to be consistent with and making a better long term
solution?  THanks.
Chofmann didn't say it when he added PDT+, but the reason to want this badly is
that we have so many pages out there where we don't "just work" as other
browsers do (or at least as well as 4.x ;-)  I assume this class of quirks
(including the spinoff bugs suggested by Eric) is a significant contributor to
the problem.  If this patch isn't going to introduce a bunch of risk, then I
agree it's good to get what we have and work on doing better.  If the patch
might cause a bunch of regressions, I'd rather wait (unless current behavior is
*worse* than what we shipped in N6.)

What's the prognosis on fixing the problem in the general case to maximize
compatibility with IE and N4.x across the full range of URIs with whitespace?
Whiteboard: PDT+, critical for 0.9.2 → PDT+, critical for 0.9.2, have patch, r and sr
Fixing the general problem is going to be more risky. I think the risk can be
mitigated, however, by getting enough eyes on it. The change I am proposing is
pretty simple, but it adds some semantic knowledge to the ContentSink that I
don't think it should have. That said, there are many of examples of the Sink
already doing this kind of element-specific / attribute-specific processing,
mostly for quirks. I'll put a patch together that takes care of stripping
leading and trailing whitespace from attributes while not regression the option
values (the reason this was changed and regressed in the first place). Once the
patch is there, people can look at it and decide if the risk is reasonable and
the benefits desirable...
I attached a patch to the content sink that makes it strip out whitespace for
attributes EXCEPT those that are in a special table of non-strippable
attributes. Currently, the table is only built up of attribute names, though in
theory it might need tag names too (since attributes in different contexts might
have different legacy behavior). Currently, the table only has the 'value'
attribute so it does not regress the option value bug (gets us back to where we
were when bug 24346 was fixed, more or less).

The benefit is that all of the cases that Eric Pollman enumerated are handled.

The drawbacks to this approach are: cannot distinguish between Quirks and
Standard mode, and a slight performance hit as each attribute has to be looked
up in a small table.

Reviews? Thoughts? Concerns? Come on, lemme have it!
Sounds backwards.  We know what attributes have a URI type, so why not strip
whitespace for only those attributes.  Actually it always seemed to me that
whitespace stripping should be done by whatever code is responsible for
initiating the HTTP request, or resolving the relative URI.  That code knows it
is working on a URI, and that neither leading nor trailing whitespace belong
there.  That would definitely have less of a performance impact as well. 
Consulting a lookup table for every attribute can add up on pages like LXR, or
W3C recs, others, but stripping whitespace need only happen when navigating to
another page or loading style sheets, scripts, objects, and images.
Jeffrey, I agree with you, in fact the whole idea of stripping whitespace on
attributes stinks if you ask me. I was trying to minimize the impact, and since
we used to strip spaces for ALL attributes, and then changed it for the 'value'
attribute bug, I coded up a patch that is closest to what we had been doing for
some time. Clearly we were not discriminating between URL attributes and non-URL
attributes, so why should we now? It all seems so wrong, really, but then what
do I know?

Bear in mind that we are EVEN NOW stripping whitespace on all attributes, just
not the 'space' character. The current code has the Trim in there but no space
char in the set. So, the impact of the patch is just to add a check against the
set of attributes...
I would suggest that we'd move the whitespace stripping code to
HTMLContentSink::AddAttributes() where we can do the name compares by simply
comparing atom pointers and not having to do a ASCII and Unicode string
comparison  for all attributes, we already have the atom in
HTMLContentSink::AddAttributes() so we might as well use it.

PS. Since we're changing this code, we might as well replace Truncate() and
Append() with one call to Assign() in HTMLContentSink::GetAttributeValueAt() too.
Please do not penalize standards mode performance to be compatible with illegal
html. Lowest common denominator is not what standards mode is all about. Leave
compatibility for quirks and Netscape.
Bug 83603 is caused by spaces in a script src tag...  So if this one is fixed
generically, close that one as well.
This is a 0.9.2 bug.  Can someone summarize what's going on with this bug?  Thanks.
Summary: we used to strip leading whitespace from all attributes in the
ContentSink. We still do this, but we do not include the 'space' character in
the set of whitespace chars to strip. So, now if a URL has a leading space, it
is not handled properly. If it is a src attribute for an image, the image is not
displayed. If it is a src attribute for a script tag, the script is not loaded.

My initial stab at fixing this was to handle images only, and in quirks mode
only. Then it was pointed out that lots of attributes need the same processing,
so I made a hack into the content sink to basically revert it back to what it
was doing back in April (i.e. always stripping whitespace, including spaces,
except I excluded the 'value' attribute since that is the one that it was
changed for). The big problem is, this is non-standard behavior - whitespace in
the URL is illegal, and by stripping it we are not conforming to spec. In the
content sink it is not really possible (I think) to distinguish between Standard
and Quirks mode. There is also a small performance penalty in stripping
whitespace from every attribute...

Another proposed solution is to handle it in the URL parser - I was looking at
stripping the whitespace in the NS_NewURI call, but I am not very familiar with
that code, and not sure if it can be condition on Std. vs. Quirks mode.

There is a patch attached that handles <IMG> only. The question is about the
general case...
Adding dougt since he can give us advice on the URL parsing bits.
I was looking at the NS_NewURI methods in nsNetUtil and it looks like we could
do the stripping there, but I cannot tell if the better place would be in the
IOService itself. The NS_NewURI method is inline, which seems quite strange
since it is called so often: if it is really being inlined then it seems like a
cause of potential bloat, if it is not being inlined then it is probably
screwing up code locality (anybody know if it is inlined on the various platforms?).

I can take a stab at it, but I keep thinking about johnhall's plea to NOT
penalize standards mode performance by stripping illegal whitespace - and I
agree with him. Stripping at the URI parsing would cause this penalty. 
But stripping at URI parsing is going to happen a lot less frequently than
stripping in the attribute parsing.
PRessure is mounting: should I just checkin the hack in patch 39521 to fix the 
image problem, and then we can work on the general URL-stripping problem for 
0.9.3? Please advise - I have r and sr but need approval from drivers...
This patch also fixes the problem, and might address some of the performance
concerns.  I copied the code that was already in nsStdUrl::SetSpec to strip
leading whitespace from absolute url's, to nsStdUrl::Resolve.  It may, in fact,
only need to be in one of the two places.  ???  Unfortunately, I don't really
know this code all that well...

As for performance:
  We are already doing the exact same logic for every URL that gets created in
SetSpec().  In the normal case (no whitespace), the logic reduces to a few
instructions.  As for how often this code is called, in loading the testcase I
posted above:
http://eric.302easyst.net/mozilla/bugs/attrws.html

SetSpec is called 109 times (no change here)
Resove is called 16 times (this patch would add a few extra instructions per call)

I am not advocating this as the correct solution, because I really, Really don't
know what the ramifications would be, but I'm posting it here as yet another
solution to consider, if needed.
Whiteboard: PDT+, critical for 0.9.2, have patch, r and sr → PDT+, critical for 0.9.2, have patch, r and sr; need a decision on what patch to use from drivers and people in bug
at this late stage in the game I'd rather take the lower risk hack (for images
only) on the 0.9.2 branch and put the real fix on the trunk.  a= asa@mozilla.org
(on behalf of drivers) for images only fix on the 0.9.2 branch. 
Whiteboard: PDT+, critical for 0.9.2, have patch, r and sr; need a decision on what patch to use from drivers and people in bug → PDT+, critical for 0.9.2, have patch, r and sr; a=drivers for image only fix on 0.9.2 branch
Patch for images only checked into trunk. The remaining, more general, issue is
being dealt with both here and in bug 87298 - I'm going to open a new bug to
cover the URL problem specifically, and reference it in both bugs (this and 87298).
Bug 87894 opened to cover remaining, general issue with whitespace in URLs.
Marking this one FIXED
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Re-opening, can't see the spinning logo on https://webmail.jadedirect.com, or
that cute little baby in the test case
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Terri, did you use a BRANCH build? This is BRANCH only...
Sorry, my comment on  
   Additional Comments From Marc Attinasi 2001-06-26 14:47
was wrong, it should have said BRANCH, not trunk. This hacked fix is for the
BRANCH only.
Alrighty, FIXED ON BRANCH ;-)
Status: REOPENED → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → FIXED
Verified fixed w2k branch build 2001062803
Verified fixed linux branch build 2001062706
Verified fixed mac branch build 2001062809
Status: RESOLVED → VERIFIED
This has only been fixed on the 0.9.2 branch, so nothing regressed, the issue is
still there.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: