Last Comment Bug 328684 - Crash displaying multipart JPEG
: Crash displaying multipart JPEG
Status: RESOLVED FIXED
: crash, fixed1.8.0.4, fixed1.8.1
Product: Core
Classification: Components
Component: ImageLib (show other bugs)
: Trunk
: x86 Linux
: -- normal (vote)
: ---
Assigned To: Robert O'Callahan (:roc) (email my personal email if necessary)
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2006-02-26 17:55 PST by Robert O'Callahan (:roc) (email my personal email if necessary)
Modified: 2006-04-23 21:29 PDT (History)
6 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
fix? (1.84 KB, patch)
2006-02-26 17:58 PST, Robert O'Callahan (:roc) (email my personal email if necessary)
darin.moz: review+
darin.moz: superreview+
darin.moz: approval‑branch‑1.8.1+
dveditz: approval1.8.0.4+
Details | Diff | Splinter Review
fix? (1.84 KB, patch)
2006-02-26 17:59 PST, Robert O'Callahan (:roc) (email my personal email if necessary)
no flags Details | Diff | Splinter Review
nsHttp log (222.18 KB, text/plain)
2006-02-27 13:51 PST, Robert O'Callahan (:roc) (email my personal email if necessary)
no flags Details
fix (4.32 KB, patch)
2006-02-27 20:44 PST, Robert O'Callahan (:roc) (email my personal email if necessary)
roc: review-
roc: superreview-
Details | Diff | Splinter Review

Description Robert O'Callahan (:roc) (email my personal email if necessary) 2006-02-26 17:55:31 PST
I will attach a testcase that loads a JPEG which refreshes using multipart and causes a crash on Linux/GTK2. What happens is that the imgRequest sees an initial content type of image/jpeg, and decides that this means it's not multipart, so calls SetMutable(PR_FALSE) on the gfxImageFrame. Later it sees a multiper/x-mixed-replace content type, calls SetMutable(PR_TRUE), and starts mutating the data. This eventually causes a crash in GTK, apparently because we've thrown away some data structure.
Comment 1 Robert O'Callahan (:roc) (email my personal email if necessary) 2006-02-26 17:58:57 PST
Created attachment 213271 [details] [diff] [review]
fix?

This patch fixes the bug for this testcase. I don't know if it's correct without studying this code a lot more, so I'd appreciate some advice. In particular, I'm not sure if it's a bug in GTK (should it be able to handle a SetMutable(PR_TRUE) after a SetMutable(PR_FALSE)?), imgRequest (is this patch the correct way to detect possible multipart-ness?) or necko (is the multipart channel returning the right initial content type?).
Comment 2 Robert O'Callahan (:roc) (email my personal email if necessary) 2006-02-26 17:59:42 PST
Created attachment 213272 [details] [diff] [review]
fix?

This patch fixes the bug for this testcase. I don't know if it's correct without studying this code a lot more, so I'd appreciate some advice. In particular, I'm not sure if it's a bug in GTK (should it be able to handle a SetMutable(PR_TRUE) after a SetMutable(PR_FALSE)?), imgRequest (is this patch the correct way to detect possible multipart-ness?) or necko (is the multipart channel returning the right initial content type?).
Comment 3 Darin Fisher 2006-02-27 07:51:01 PST
Hmm.. the channel's content type is not supposed to change after it calls OnStartRequest.  Could you please upload a NSPR_LOG_MODULES=nsHttp:5 log?
Comment 4 Boris Zbarsky [:bz] (still a bit busy) 2006-02-27 08:39:33 PST
So is there a testcase?  I bet it's a multipart with another multipart in it, right?
Comment 5 Robert O'Callahan (:roc) (email my personal email if necessary) 2006-02-27 13:35:08 PST
You should be able to reproduce with this image:

http://www.airport-nuernberg.de/_/tools/webcam.html?_FRAME=64&refresh=0&datei=webcam-0-0.jpg

The original bug report is here:
https://bugzilla.novell.com/show_bug.cgi?id=140416
Comment 6 Robert O'Callahan (:roc) (email my personal email if necessary) 2006-02-27 13:51:42 PST
Created attachment 213373 [details]
nsHttp log

Here's the log. I actually made it with a 1.8 branch build, because the crash is much harder to reproduce on trunk (but I'm quite sure that I have reproduced it there).
Comment 7 Darin Fisher 2006-02-27 13:57:40 PST
NOTE: The original channel is redirected.  The redirected channel has a content-type of text/html.  Maybe image lib is reading the content type from the redirected channel instead of the newly created channel?
Comment 8 Robert O'Callahan (:roc) (email my personal email if necessary) 2006-02-27 14:07:43 PST
The reason I was having trouble reproducing the bug on trunk is because I was running my patched build :-)

(In reply to comment #7)
> NOTE: The original channel is redirected.  The redirected channel has a
> content-type of text/html.  Maybe image lib is reading the content type from
> the redirected channel instead of the newly created channel?

imgRequest is getting the type from mChannel, which is the base channel for the multipart. http://lxr.mozilla.org/mozilla/source/modules/libpr0n/src/imgRequest.cpp#620
Maybe that's set wrong further up the stack, but I can't see it from reading the code.

If we agree that it's a necko issue I'll debug further to try to figure out where the initial content type is coming from.
Comment 9 Boris Zbarsky [:bz] (still a bit busy) 2006-02-27 14:13:31 PST
> imgRequest is getting the type from mChannel

Which is set when we start the load.  So comment 7 is right on target.  Sounds to me like image loader wants to observe redirects, no?

That said, making this not depend on the type like that is also a good idea, I suspect.
Comment 10 Robert O'Callahan (:roc) (email my personal email if necessary) 2006-02-27 14:56:00 PST
I see. I think that means my patch is correct.
Comment 11 Darin Fisher 2006-02-27 15:13:41 PST
Comment on attachment 213271 [details] [diff] [review]
fix?

Two possible correct fixes:

1) Observe redirects, and update mChannel accordingly.

2) Use aRequest instead of mChannel to read the content type.

NOTE: #1 probably needs to be done to avoid another bug.  imgRequest::Cancel cancels mChannel, but it does not cancel any channels that result from redirects.  Bad libpr0n!
Comment 12 Darin Fisher 2006-02-27 15:15:49 PST
The image loader is so in need of an overhaul.  This is just one of many related bugs.  It does not use necko correctly.
Comment 13 Stuart Parmenter 2006-02-27 17:07:16 PST
indeed. if anyone has lots of spare time, i can make some good suggestions on how it should be overhauled :)
Comment 14 Robert O'Callahan (:roc) (email my personal email if necessary) 2006-02-27 20:44:46 PST
Created attachment 213411 [details] [diff] [review]
fix

How about this? It seems to work.
Comment 15 Darin Fisher 2006-02-27 21:32:55 PST
I don't understand how that works.  Are you sure that OnChannelRedirect is being called?  The channel's notification callbacks are set to point at the load group's notification callbacks.  I don't see how the channel would access the imgRequest as a nsIChannelEventSink.
Comment 16 Robert O'Callahan (:roc) (email my personal email if necessary) 2006-02-27 21:41:50 PST
No, I'm not sure. I guess I need to study this code some more.
Comment 17 Darin Fisher 2006-02-28 10:34:50 PST
roc: I think you need to set some image lib object as the notification callbacks for the channel before calling AsyncOpen (see the call to NS_NewChannel).  Notification callbacks is a nsIInterfaceRequestor.  That object will be asked for a nsIChannelEventSink when a redirect occurs.  That gives you the opportunity to observe the redirect.  That is a bit of a complex solution.  Another solution is as follows:

Construct a load group for the channel.  The channel will then insert any channels that it creates via redirect into the load group.  When imgRequest::Cancel is called, cancel the load group instead of mChannel.  That will take care of canceling all channels.  In fact, it is probably not necessary to keep mChannel (and that might be a good thing to do to keep people from being tempted into using it).  Then, in imgRequest::OnStartRequest, QI aRequest to nsIChannel to read the content type.

I think this second solution might be better.
Comment 18 Darin Fisher 2006-02-28 10:40:41 PST
nsLoadGroup implements nsISupportsPriority, so it'd be easy to replace those calls on mChannel with similar calls on "mLoadGroup."  Perhaps a delicate issue is with the calls to mChannel->GetOriginalURI.
Comment 19 Robert O'Callahan (:roc) (email my personal email if necessary) 2006-02-28 13:29:16 PST
darin, what do you think of taking https://bugzilla.mozilla.org/attachment.cgi?id=213271 as a branch fix (and on trunk)? It seems to be both correct and a simplification, although it doesn't solve all the problems.

I'm going to try this loadgroup approach for trunk, but I'm afraid that it's going to be a significant risk for branch.

One question: if I set the loadgroup for the channel, won't that cause problems if it already belongs to a loadgroup?
Comment 20 Darin Fisher 2006-02-28 14:41:54 PST
> darin, what do you think of taking
> https://bugzilla.mozilla.org/attachment.cgi?id=213271 as a branch fix (and on
> trunk)? It seems to be both correct and a simplification, although it doesn't
> solve all the problems.

Let's do this as a last resort... let's try for the trunk patch first?


> I'm going to try this loadgroup approach for trunk, but I'm afraid that it's
> going to be a significant risk for branch.

Well, it occured to me that I can basically implement <a ping> today using "new Image" combined with this bug.  So, from the point of view of ensuring that all content loads stop when a page transition occurs, we should probably fix the "cancelation" bug for 2.0 as well.


> One question: if I set the loadgroup for the channel, won't that cause 
> problems if it already belongs to a loadgroup?

Image lib currently does not put channels into a loadgroup!  *gasp*
Comment 21 Robert O'Callahan (:roc) (email my personal email if necessary) 2006-02-28 15:11:10 PST
> > darin, what do you think of taking
> > https://bugzilla.mozilla.org/attachment.cgi?id=213271 as a branch fix (and
> > on trunk)? It seems to be both correct and a simplification, although it
> > doesn't solve all the problems.
> 
> Let's do this as a last resort... let's try for the trunk patch first?

Apart from it not fixing all problems, what's wrong with it? It seems cleaner to do a QI than to check the MIME type.

(In reply to comment #20)
> > One question: if I set the loadgroup for the channel, won't that cause 
> > problems if it already belongs to a loadgroup?
> 
> Image lib currently does not put channels into a loadgroup!  *gasp*

Yeah, so why doesn't it?
http://lxr.mozilla.org/mozilla/source/modules/libpr0n/src/imgLoader.cpp#207
You went to all the trouble of passing a load group in there, but decided not to use it?

But even though no-one currently puts image load channels in a loadgroup, is it really right to rely on that for the future? Shouldn't we be fixing NewImageChannel to use the loadgroup?
Comment 22 Darin Fisher 2006-02-28 15:17:18 PST
> Apart from it not fixing all problems, what's wrong with it? It seems cleaner
> to do a QI than to check the MIME type.

Agreed.  It's important that aRequest be QI'd instead of mChannel, which is what you are doing.  So, OK... yeah, let's go for it, and then spin off the other issue to another bug (or whatever).


> > > One question: if I set the loadgroup for the channel, won't that cause 
> > > problems if it already belongs to a loadgroup?
> > 
> > Image lib currently does not put channels into a loadgroup!  *gasp*
> 
> Yeah, so why doesn't it?
> http://lxr.mozilla.org/mozilla/source/modules/libpr0n/src/imgLoader.cpp#207
> You went to all the trouble of passing a load group in there, but decided not
> to use it?
> 
> But even though no-one currently puts image load channels in a loadgroup, is 
> it really right to rely on that for the future? Shouldn't we be fixing
> NewImageChannel to use the loadgroup?

So, we have these imgRequestProxy things that are put into the loadgroup of the corresponding docshell.  Since we may have multiple requests for the same URL, we chain them all together on a single imgRequest that actually talks to necko.  Given that there is not a 1-1 relationship between imgRequestProxy and necko channel, we cannot put the necko channel in the docshell's loadgroup.  If we did, then stop a load in one page would stop the same image load that may be happening in another page.  By the way, I'm not defending this architecture -- only explaining it ;-)  The fact that we do this across documents is very broken in my opinion.
Comment 23 Darin Fisher 2006-02-28 15:18:24 PST
Comment on attachment 213271 [details] [diff] [review]
fix?

r+sr=darin for this patch.  but please file a bug on the cancelation issue.  you can assign it to me.
Comment 24 Robert O'Callahan (:roc) (email my personal email if necessary) 2006-02-28 15:41:41 PST
Filed bug 328903. Thanks!
Comment 25 Robert O'Callahan (:roc) (email my personal email if necessary) 2006-02-28 15:43:47 PST
checked in on trunk
Comment 26 Robert O'Callahan (:roc) (email my personal email if necessary) 2006-02-28 15:44:37 PST
Comment on attachment 213271 [details] [diff] [review]
fix?

I think we should take this on branches too, since it fixes a crasher and appears to be low risk.
Comment 27 Robert O'Callahan (:roc) (email my personal email if necessary) 2006-03-01 16:51:22 PST
Checked into 1.8.1
Comment 28 Wolfgang Rosenauer [:wolfiR] 2006-03-26 20:20:34 PST
Comment on attachment 213271 [details] [diff] [review]
fix?

approval for 1.8.0.3?
Comment 29 Tim Riley [:timr] 2006-03-27 07:15:58 PST
This is a crasher.  It's been baking ont he trunk since 2/28/06.  My pre-vote for  the 1.5.0.3 Release Team is to approve.

Just to claify, we want attachment 213271 [details] [diff] [review] ("fix?") for 1.8.0.3 even though attachement 213411 ("fix") was created more recently and does not have a "?" in the attachement name.
Comment 30 Daniel Veditz [:dveditz] 2006-04-21 13:38:18 PDT
Comment on attachment 213271 [details] [diff] [review]
fix?

approved for 1.8.0 branch, a=dveditz for drivers
Comment 31 Wolfgang Rosenauer [:wolfiR] 2006-04-23 21:29:07 PDT
Checking in modules/libpr0n/src/imgRequest.cpp;
/cvsroot/mozilla/modules/libpr0n/src/imgRequest.cpp,v  <--  imgRequest.cpp
new revision: 1.84.8.1.2.1; previous revision: 1.84.8.1
done

Note You need to log in before you can comment on or make changes to this bug.