Closed Bug 744309 Opened 12 years ago Closed 12 years ago

Random 100% CPU usage, due to image decoding

Categories

(Core :: Graphics: ImageLib, defect)

x86
All
defect
Not set
major

Tracking

()

RESOLVED FIXED
mozilla18
Tracking Status
firefox14 + fixed
firefox16 --- wontfix
firefox17 + fixed

People

(Reporter: bzbarsky, Assigned: justin.lebar+bug)

Details

(Whiteboard: [qa-])

Attachments

(1 file)

I know this is a crappy bug report, so please bear with me.

The last week or two my nightly (shark branch) builds have, every so often, started chewing up 100% of a CPU core; it would keep happening until I restarted the build.  Today I finally managed to get a profile of it when it happened.  The profile showed about 50% of the time under mozilla::image::imgDecodeRequestor::Run.  The rest was handling all the mac event loop gunk to deal with the resulting paint events from the invalidates.

The image(s) in question seem to be GIFs based on the profiler stacks.

While this is happening, memory usage slowly grows (where by "slowly" I mean about 0.5MB/s or so).

I have the profiles still open in case there are questions I can answer.

Nominating for tracking, because whatever the heck is going on we _really_ don't want to ship this.  :(
This rings a bell with bug 583069. I don't think that explains why we're thrashing, though.

This could happen if mInDecoder somehow got stuck to true, since then each call into RequestDecode would do an async dispatch, forever. But it's not clear to me how that could happen modulo memory corruption...
> This could happen if mInDecoder somehow got stuck to true, since then each call into RequestDecode 
> would do an async dispatch, forever.

But then we wouldn't decode anything, right?  We'd just spin and see that there's nothing to do?

This could also happen if we try to draw an image which isn't locked over and over.  The image isn't locked, we decode it, then we toss it (very quickly these days under some circumstances).

Next time this happens, you could try making image.mem.max_decoded_image_kb large.  If that immediately stops the loop, then we're thrashing there, somehow.
bz: Is this _only_ happening on trunk (shark branch)? Or have you noticed it in a release/beta/aurora?
I don't really browse with release/beta/aurora, so I have no idea whether it's happening there....

Justin, the next time it happens (likely to be a day or two), I'll try what you suggest.
Happened again.  I tried bumping that pref up by 100x.  No effect.  :(
This has been happening to me pretty reliably, averaging about once every 36 hours or so.  :(
I'm not sure what we should be doing here.  If you can catch it in a debugger and see what's causing the decodes, that would help, I guess.

It sure sounds like we're decoding and then immediately discarding an image.  If it's not because of image.mem.max_decoded_image_kb, I don't know what it would be due to.  You've been browsing with max_decoded_image_kb set high and still see this problem (as opposed to increasing the pref while you're ilooping)?
I can do that if I try browsing in a debug build, but that's been an exercise in pain when I've tried it recently...  The nightlies we ship, including the shark ones, don't seem to have usable debugging symbols that gdb can use.  If such symbols do exist, _please_ tell me!

> You've been browsing with max_decoded_image_kb set high

No.  I can try that if you want.  Let me know.
> You've been browsing with max_decoded_image_kb set high
>
> No.  I can try that if you want.  Let me know.

I can't speak to the debug symbols (our shark builds really *should* ship with symbols!), but yes, if we can just increase this max_decoded_image_kb pref, we should do so.  (See also bug 746055 comment 9.)
> our shark builds really *should* ship with symbols

They have function names, but nothing else.
OK.  Bumped the pref to 2000000 and restarted.  We'll see how it goes.
It's possible that we're touching the discard logic off main thread on Mac (we know we're doing it on Windows).  If so, the discard tracker's linked list could become corrupted, or our count of how much memory is devoted to allocations could go negative.  Either way, that would be Bad.  See bug 745662.
For what it's worth, since comment 11 it's been 6 days and the problem hasn't come up yet....
(In reply to Justin Lebar [:jlebar] from comment #12)
> It's possible that we're touching the discard logic off main thread on Mac
> (we know we're doing it on Windows).  If so, the discard tracker's linked
> list could become corrupted, or our count of how much memory is devoted to
> allocations could go negative.  Either way, that would be Bad.  See bug
> 745662.

Maybe we should reland that assert after fixing the peek thing?
(In reply to Justin Lebar [:jlebar] from comment #9)
> I can't speak to the debug symbols (our shark builds really *should* ship
> with symbols!), but yes, if we can just increase this max_decoded_image_kb
> pref, we should do so.  (See also bug 746055 comment 9.)

Given Comment 13, can we move forward with this fix?
Assignee: nobody → justin.lebar+bug
> Given Comment 13, can we move forward with this fix?

I bumped the pref to 250MB in bug 746055.

That's a wallpaper fix, though, so I don't think we should close this bug.  I presume we can still get into this state -- whatever it is, somehow.

We could bump the pref higher, but it's serving a purpose (preventing OOMs), so I'm not sure we want to make it infinity.

My plan is to revisit this after we fix bug 745141.  Once that bug is fixed, we can add release-time main-thread assertions to the discard logic, so we can ensure that we're not hitting another race condition there.  With that ruled out, I'll be able to dig in here, I hope.

I've also been browsing with the pref set very low, in the hopes of triggering this bug, but so far, no dice.
With this bug wallpapered over, I'm OK no longer tracking for FF14, unless we hear more reports of this happening to people.
That seems totally reasonable to me.
Justin, if you believe this is unaffected for 14, can you mark this bug as such?
I think this is fixed for FF14 by the patch for Aurora in bug 745141, yes.

The story for FF15 is cloudier; we're basically hoping that the 250mb cap is big enough not to trigger this problem.  Unfortunately I don't have room on my plate to dig in deeper anytime soon; b2g is eating me alive.  We could just disable the max-image cap everywhere.  Or we could hope that 250mb is good enough.
Has anyone seen this happen lately?  It's really a bad bug, and I'm afraid I only papered over it...
I haven't seen this recur, no...
I can sometimes see this on Windows with image.mem.max_decoded_image_kb set to 1, but I don't have a definitive way of reproducing beyond browsing around and hoping it happens.

I got a profile of it happening at [1]. It looks like roughly half the time is spent under either imgDecodeRequestor::Run or DiscardRunnable::Run.

[1] https://people.mozilla.com/~bgirard/cleopatra/?report=8d36a4b437652a22391d4c2ca40f745d34274b2f
I just saw this happening on my Linux box (where I have image.mem.max_decoded_image_kb set to very low, at 256, in an attempt to trigger this bug).

24% of my time is spent in nsGIFDecoder2:DoLzw.  Here's the stack (although it's kind of obvious):

            --- mozilla::image::nsGIFDecoder2::DoLzw(unsigned char const*)
               |          
               |--99.85%-- mozilla::image::nsGIFDecoder2::WriteInternal(char const*, unsigned int)
               |          mozilla::image::RasterImage::WriteToDecoder(char const*, unsigned int)
               |          mozilla::image::RasterImage::DecodeWorker::DecodeSomeOfImage(mozilla::image::RasterImage*, mozilla::image::RasterImage::DecodeWorker::DecodeType)
               |          mozilla::image::RasterImage::DecodeWorker::DecodeABitOf(mozilla::image::RasterImage*)
               |          mozilla::image::RasterImage::RequestDecode()
               |          mozilla::image::imgDecodeRequestor::Run()
               |          nsThread::ProcessNextEvent(bool, bool*)
               |          NS_ProcessNextEvent_P(nsIThread*, bool)
               |          mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*)
               |          MessageLoop::Run()
               |          nsBaseAppShell::Run()
               |          nsAppStartup::Run()
               |          XREMain::XRE_mainRun()
               |          XREMain::XRE_main(int, char**, nsXREAppData const*)
               |          XRE_main
               |          main
Summary: Random 100% CPU usage on Mac, apparently due to image decoding → Random 100% CPU usage, due to image decoding
OS: Mac OS X → All
I caught this in gdb.  It looks like we're calling RequestDecode on the same image repeatedly.  On my machine, I actually have three or four images which are all getting repeatedly RequestDecoded()'ed.

In GDB, I see four calls in a row to RequestDecode() for the same image.  Then we move on to the next image.  Here are backtraces for one set of four calls:

== First hit ==

> Breakpoint 1, mozilla::image::RasterImage::RequestDecode (this=0x7f2c5be98b60)
>     at /builds/slave/m-cen-lnx64-ntly/build/image/src/RasterImage.cpp:2416
> 2416	  if (mError)
> (gdb) bt
> #0  mozilla::image::RasterImage::RequestDecode (this=0x7f2c5be98b60)
>     at /builds/slave/m-cen-lnx64-ntly/build/image/src/RasterImage.cpp:2416
> #1  0x00007f2c93321db5 in mozilla::image::imgDecodeRequestor::Run() () from /home/jlebar/code/moz/nightly/libxul.so
> #2  0x00007f2c93ec841c in nsThread::ProcessNextEvent(bool, bool*) () from /home/jlebar/code/moz/nightly/libxul.so
> #3  0x00007f2c93e92b3c in NS_ProcessNextEvent_P(nsIThread*, bool) () from /home/jlebar/code/moz/nightly/libxul.so
> #4  0x00007f2c93b751db in nsXULWindow::ShowModal() () from /home/jlebar/code/moz/nightly/libxul.so
> #5  0x00007f2c93b51f7a in nsWindowWatcher::OpenWindowInternal(nsIDOMWindow*, char const*, char const*, char const*, bool, bool, bool, nsIArray*, nsIDOMWindow**) () from /home/jlebar/code/moz/nightly/libxul.so
> 
Looks like we're in a nested event loop; I don't know if that's relevant.

This call does InitDecoder(/* aDoSizeDecode = */ false);, then returns.

== Second hit ==

> Breakpoint 1, mozilla::image::RasterImage::RequestDecode (this=0x7f2c5be98b60)
>     at /builds/slave/m-cen-lnx64-ntly/build/image/src/RasterImage.cpp:2416
> 2416	  if (mError)
> (gdb) bt
> #0  mozilla::image::RasterImage::RequestDecode (this=0x7f2c5be98b60)
>     at /builds/slave/m-cen-lnx64-ntly/build/image/src/RasterImage.cpp:2416
> #1  0x00007f2c9331d1d1 in WantDecodedFrames (this=0x7f2c5be98b60)
>     at /builds/slave/m-cen-lnx64-ntly/build/image/src/RasterImage.cpp:2406
> #2  mozilla::image::RasterImage::GetImgFrame (this=0x7f2c5be98b60, framenum=0)
>     at /builds/slave/m-cen-lnx64-ntly/build/image/src/RasterImage.cpp:590
> #3  0x00007f2c933200a8 in mozilla::image::RasterImage::SetFrameDisposalMethod(unsigned int, int) ()
>    from /home/jlebar/code/moz/nightly/libxul.so
> #4  0x00007f2c93337c6c in mozilla::image::nsGIFDecoder2::BeginImageFrame (this=0x7f2c2e563000, 
>     aDepth=<optimized out>) at /builds/slave/m-cen-lnx64-ntly/build/image/decoders/nsGIFDecoder2.cpp:202
> #5  0x00007f2c93338648 in mozilla::image::nsGIFDecoder2::WriteInternal (this=0x7f2c2e563000, 
>     aBuffer=<optimized out>, aCount=<optimized out>)
>     at /builds/slave/m-cen-lnx64-ntly/build/image/decoders/nsGIFDecoder2.cpp:922
> #6  0x00007f2c9331f2bb in mozilla::image::RasterImage::WriteToDecoder(char const*, unsigned int) ()
>    from /home/jlebar/code/moz/nightly/libxul.so
> #7  0x00007f2c9331f58b in mozilla::image::RasterImage::DecodeWorker::DecodeSomeOfImage(mozilla::image::RasterImage*, mozilla::image::RasterImage::DecodeWorker::DecodeType) () from /home/jlebar/code/moz/nightly/libxul.so
> #8  0x00007f2c9331d860 in mozilla::image::RasterImage::DecodeWorker::DecodeABitOf (this=0x7f2c73ebfa10, 
>     aImg=0x7f2c5be98b60) at /builds/slave/m-cen-lnx64-ntly/build/image/src/RasterImage.cpp:2904
> #9  0x00007f2c9331d020 in mozilla::image::RasterImage::RequestDecode (this=0x7f2c5be98b60)
>     at /builds/slave/m-cen-lnx64-ntly/build/image/src/RasterImage.cpp:2467
> #10 0x00007f2c93321db5 in mozilla::image::imgDecodeRequestor::Run() () from /home/jlebar/code/moz/nightly/libxul.so
> #11 0x00007f2c93ec841c in nsThread::ProcessNextEvent(bool, bool*) () from /home/jlebar/code/moz/nightly/libxul.so
> #12 0x00007f2c93e92b3c in NS_ProcessNextEvent_P(nsIThread*, bool) () from /home/jlebar/code/moz/nightly/libxul.so
> #13 0x00007f2c93b751db in nsXULWindow::ShowModal() () from /home/jlebar/code/moz/nightly/libxul.so
> #14 0x00007f2c93b51f7a in nsWindowWatcher::OpenWindowInternal(nsIDOMWindow*, char const*, char const*, char const*, bool, bool, bool, nsIArray*, nsIDOMWindow**) () from /home/jlebar/code/moz/nightly/libxul.so
> #15 0x00007f2c93b4e6c2 in nsWindowWatcher::OpenWindow(nsIDOMWindow*, char const*, char const*, char const*, nsISupports*, nsIDOMWindow**) () from /home/jlebar/code/moz/nightly/libxul.so

This call does 

    if (mDecoder && !mDecoder->IsSizeDecode())

sees that the condition is true, and returns.

== Third hit ==

> Breakpoint 1, mozilla::image::RasterImage::RequestDecode (this=0x7f2c5be98b60)
>     at /builds/slave/m-cen-lnx64-ntly/build/image/src/RasterImage.cpp:2416
> 2416	  if (mError)
> (gdb) bt
> #0  mozilla::image::RasterImage::RequestDecode (this=0x7f2c5be98b60)
>     at /builds/slave/m-cen-lnx64-ntly/build/image/src/RasterImage.cpp:2416
> #1  0x00007f2c9331d1d1 in WantDecodedFrames (this=0x7f2c5be98b60)
>     at /builds/slave/m-cen-lnx64-ntly/build/image/src/RasterImage.cpp:2406
> #2  mozilla::image::RasterImage::GetImgFrame (this=0x7f2c5be98b60, framenum=0)
>     at /builds/slave/m-cen-lnx64-ntly/build/image/src/RasterImage.cpp:590
> #3  0x00007f2c93320058 in mozilla::image::RasterImage::SetFrameTimeout(unsigned int, int) ()
>    from /home/jlebar/code/moz/nightly/libxul.so
> #4  0x00007f2c93337db6 in mozilla::image::nsGIFDecoder2::EndImageFrame (this=0x7f2c2e563000)
>     at /builds/slave/m-cen-lnx64-ntly/build/image/decoders/nsGIFDecoder2.cpp:262
> #5  0x00007f2c93338a61 in mozilla::image::nsGIFDecoder2::WriteInternal (this=0x7f2c2e563000, 
>     aBuffer=<optimized out>, aCount=<optimized out>)
>     at /builds/slave/m-cen-lnx64-ntly/build/image/decoders/nsGIFDecoder2.cpp:1017
> #6  0x00007f2c9331f2bb in mozilla::image::RasterImage::WriteToDecoder(char const*, unsigned int) ()
>    from /home/jlebar/code/moz/nightly/libxul.so
> #7  0x00007f2c9331f58b in mozilla::image::RasterImage::DecodeWorker::DecodeSomeOfImage(mozilla::image::RasterImage*, mozilla::image::RasterImage::DecodeWorker::DecodeType) () from /home/jlebar/code/moz/nightly/libxul.so
> #8  0x00007f2c9331d860 in mozilla::image::RasterImage::DecodeWorker::DecodeABitOf (this=0x7f2c73ebfa10, 
>     aImg=0x7f2c5be98b60) at /builds/slave/m-cen-lnx64-ntly/build/image/src/RasterImage.cpp:2904
> #9  0x00007f2c9331d020 in mozilla::image::RasterImage::RequestDecode (this=0x7f2c5be98b60)
>     at /builds/slave/m-cen-lnx64-ntly/build/image/src/RasterImage.cpp:2467
> #10 0x00007f2c93321db5 in mozilla::image::imgDecodeRequestor::Run() () from /home/jlebar/code/moz/nightly/libxul.so
> #11 0x00007f2c93ec841c in nsThread::ProcessNextEvent(bool, bool*) () from /home/jlebar/code/moz/nightly/libxul.so
> #12 0x00007f2c93e92b3c in NS_ProcessNextEvent_P(nsIThread*, bool) () from /home/jlebar/code/moz/nightly/libxul.so
> #13 0x00007f2c93b751db in nsXULWindow::ShowModal() () from /home/jlebar/code/moz/nightly/libxul.so
> #14 0x00007f2c93b51f7a in nsWindowWatcher::OpenWindowInternal(nsIDOMWindow*, char const*, char const*, char const*, bool, bool, bool, nsIArray*, nsIDOMWindow**) () from /home/jlebar/code/moz/nightly/libxul.so
> #15 0x00007f2c93b4e6c2 in nsWindowWatcher::OpenWindow(nsIDOMWindow*, char const*, char const*, char const*, nsISupports*, nsIDOMWindow**) () from /home/jlebar/code/moz/nightly/libxul.so

Just like the second hit, this call does 

    if (mDecoder && !mDecoder->IsSizeDecode())

sees that the condition is true, and returns.

== Fourth hit ==

> Breakpoint 1, mozilla::image::RasterImage::RequestDecode (this=0x7f2c5be98b60)
>     at /builds/slave/m-cen-lnx64-ntly/build/image/src/RasterImage.cpp:2416
> 2416	  if (mError)
> (gdb) bt
> #0  mozilla::image::RasterImage::RequestDecode (this=0x7f2c5be98b60)
>     at /builds/slave/m-cen-lnx64-ntly/build/image/src/RasterImage.cpp:2416
> #1  0x00007f2c9331d1d1 in WantDecodedFrames (this=0x7f2c5be98b60)
>     at /builds/slave/m-cen-lnx64-ntly/build/image/src/RasterImage.cpp:2406
> #2  mozilla::image::RasterImage::GetImgFrame (this=0x7f2c5be98b60, framenum=0)
>     at /builds/slave/m-cen-lnx64-ntly/build/image/src/RasterImage.cpp:590
> #3  0x00007f2c9331ffb8 in mozilla::image::RasterImage::SetFrameAsNonPremult(unsigned int, bool) ()
>    from /home/jlebar/code/moz/nightly/libxul.so
> #4  0x00007f2c9331c29e in mozilla::image::Decoder::PostDecodeDone (this=0x7f2c2e563000)
>     at /builds/slave/m-cen-lnx64-ntly/build/image/src/Decoder.cpp:275
> #5  0x00007f2c93337e8b in mozilla::image::nsGIFDecoder2::FinishInternal (this=0x7f2c2e563000)
>     at /builds/slave/m-cen-lnx64-ntly/build/image/decoders/nsGIFDecoder2.cpp:115
> #6  0x00007f2c9331c353 in mozilla::image::Decoder::Finish (this=0x7f2c2e563000)
>     at /builds/slave/m-cen-lnx64-ntly/build/image/src/Decoder.cpp:88
> #7  0x00007f2c9331f1d7 in mozilla::image::RasterImage::ShutdownDecoder(mozilla::image::RasterImage::eShutdownIntent) () from /home/jlebar/code/moz/nightly/libxul.so
> #8  0x00007f2c9331f66e in mozilla::image::RasterImage::DecodeWorker::DecodeSomeOfImage(mozilla::image::RasterImage*, mozilla::image::RasterImage::DecodeWorker::DecodeType) () from /home/jlebar/code/moz/nightly/libxul.so
> #9  0x00007f2c9331d860 in mozilla::image::RasterImage::DecodeWorker::DecodeABitOf (this=0x7f2c73ebfa10, 
>     aImg=0x7f2c5be98b60) at /builds/slave/m-cen-lnx64-ntly/build/image/src/RasterImage.cpp:2904
> #10 0x00007f2c9331d020 in mozilla::image::RasterImage::RequestDecode (this=0x7f2c5be98b60)
>     at /builds/slave/m-cen-lnx64-ntly/build/image/src/RasterImage.cpp:2467
> #11 0x00007f2c93321db5 in mozilla::image::imgDecodeRequestor::Run() () from /home/jlebar/code/moz/nightly/libxul.so
> #12 0x00007f2c93ec841c in nsThread::ProcessNextEvent(bool, bool*) () from /home/jlebar/code/moz/nightly/libxul.so
> #13 0x00007f2c93e92b3c in NS_ProcessNextEvent_P(nsIThread*, bool) () from /home/jlebar/code/moz/nightly/libxul.so
> #14 0x00007f2c93b751db in nsXULWindow::ShowModal() () from /home/jlebar/code/moz/nightly/libxul.so
> #15 0x00007f2c93b51f7a in nsWindowWatcher::OpenWindowInternal(nsIDOMWindow*, char const*, char const*, char const*, bool, bool, bool, nsIArray*, nsIDOMWindow**) () from /home/jlebar/code/moz/nightly/libxul.so

This call does |if (mInDecoder)|, sees that the condition is true, and then does

  nsRefPtr<imgDecodeRequestor> requestor = new imgDecodeRequestor(this);

and re-enqueues itself into the event loop!
I may understand what's going on here.

> // Simplified a bit
> RasterImage::RequestDecode()
> {
>   if (mError || mDecoded || !StoringSourceData() || (mDecoder && !mDecoder->IsSizeDecode()))
>     return;
> 
>   if (mInDecoder) {
>     nsRefPtr<imgDecodeRequestor> requestor = new imgDecodeRequestor(this);
>     return NS_DispatchToCurrentThread(requestor);
>     // asynchronously calls this->RequestDecode();
>   }

What I think is happening is, after Decoder::Finish() finishes, we're setting
neither mError nor mDecoded.

Therefore, when the imgDecodeRequestor runs, it starts a new decode.  This
process repeats ad infinitum.

Of course, we shouldn't even be spawning this imgDecodeRequestor in the first
place.  But I don't think that's the main bug.
> we're setting neither mError nor mDecoded.

Although that's pretty odd, because Decoder::PostDecodeDone (in the last stack trace in comment 25) should call RasterImage::DecodingComplete, which will ensure that either mError or mDecoded is set.

Unfortunately GDB seems to have died on me, so the above stacks are probably all we're going to get from this session.
PostDecodeDone calls DecodingComplete _after_ calling SetFrameAsNonPremult, which is what's calling GetImgFrame that calls WantDecodedFrames that calls RequestDecode that actually re-enqueues the decode request.

However, that shouldn't have any effect if the image is actually already decoded. AFAICT the only way this should not be a no-op is if a request for the image to be discarded or for the image in a format (nonpremult vs premult, etc) that it isn't already in.

I'd be very interested to see where/whether there is a call to RasterImage::Discard for this image after the 4th RequestDecode call; that is, if you can ever reproduce this in a debugger again. :)
> or for the image in a format (nonpremult vs premult, etc) that it isn't already in.

I think you a word here, but I'm not quite sure what it is.

> I'd be very interested to see where/whether there is a call to RasterImage::Discard for this image 
> after the 4th RequestDecode call; that is, if you can ever reproduce this in a debugger again. :)

That discarding has something to do with this would make sense, since it seems that it may be easier to reproduce this issue by decreasing the max-decoded-image-bytes threshold.

I can make the imgDecodeRequestor not get enqueued when we're in the process of finishing a decode.  And I can modify the discard heuristics to be flexible with the max-decoded-image-bytes limit, although I'm afraid that could only paper over this problem.  (Instead of decoding in a loop, we could end up decoding once per second, which would still be awful for battery life on B2G!)

Any other ideas for how to fix this, assuming I don't get another shot at this in gdb?
Attached patch Patch, v1Splinter Review
I'm not at all pleased with this.
Attachment #659932 - Flags: review?(joe)
Comment on attachment 659932 [details] [diff] [review]
Patch, v1

o god o god

You don't actually need this extra flag - (mDecoder && mDecoder->GetDecodeDone()) should be enough, I think.

I can't quite convince myself that this is the fundamentally wrong way to solve this bug, sadly. I might even r+ a fix that used the above :(
Attachment #659932 - Flags: review?(joe) → review-
(In reply to Justin Lebar [:jlebar] from comment #29)
> > or for the image in a format (nonpremult vs premult, etc) that it isn't already in.
> 
> I think you a word here, but I'm not quite sure what it is.

Slightly rewritten:

"Ask for the image in a format different from the one it's stored in."

As for reproducing this in gdb - maybe I can start browsing with the set of values you've tried, since they clearly work!
> You don't actually need this extra flag - (mDecoder && mDecoder->GetDecodeDone()) should be enough, 
> I think.

We set mDecoder to null before calling decoder->Finish(), in ShutdownDecoder.

> As for reproducing this in gdb - maybe I can start browsing with the set of values you've tried, 
> since they clearly work!

They may work, but it had been more than a month since I last saw this bug.
Comment on attachment 659932 [details] [diff] [review]
Patch, v1

Back to r? because I don't think comment 31 works.
Attachment #659932 - Flags: review- → review?(joe)
(In reply to Justin Lebar [:jlebar] from comment #33)
> > You don't actually need this extra flag - (mDecoder && mDecoder->GetDecodeDone()) should be enough, 
> > I think.
> 
> We set mDecoder to null before calling decoder->Finish(), in ShutdownDecoder.

I don't understand why this would affect things. Surely we would check just after IsSizeDecode(), and hence ShutdownDecoder won't have been called?
Maybe I don't understand comment 31 correctly.  To be really concrete about it:

> You don't actually need this extra flag - (mDecoder &&
> mDecoder->GetDecodeDone()) should be enough, I think.

I thought/think you were/are suggesting that I should replace 

> @@ -2423,16 +2426,22 @@ RasterImage::RequestDecode()
>    if (mDecoder && !mDecoder->IsSizeDecode())
>      return NS_OK;
> +  if (mFinishing)
> +    return NS_OK;

with

> @@ -2423,16 +2426,22 @@ RasterImage::RequestDecode()
>    if (mDecoder && !mDecoder->IsSizeDecode())
>      return NS_OK;
> +  if (mDecoder && mDecoder->GetDecodeDone())
> +    return NS_OK;

What I was saying is that the RequestDecode() call we want to be affected by this early return happens during the call to decoder->Finish() inside ShutdownDecoder:

> RasterImage::ShutdownDecoder() {
>   [...]
>   nsRefPtr<Decoder> decoder = mDecoder;
>   mDecoder = nullptr;
> 
>   mInDecoder = true;
>   decoder->Finish();
>   mInDecoder = false;
>   [...]
> }

But notice that before we call decoder->Finish(), we've already set mDecoder = nullptr.  So |mDecoder && mDecoder->GetDecodeDone()| will not cause the desired early return.
Oh, I missed that in the callstack.

..

this sucks.
Comment on attachment 659932 [details] [diff] [review]
Patch, v1

i hope you're happy ahhhhh
Attachment #659932 - Flags: review?(joe) → review+
> i hope you're happy ahhhhh

I am not at all happy, but at least we may finally have fixed this bug.

I may or may not have had a try run for this, but try is currently very backed up, so ima push to inbound.

https://hg.mozilla.org/integration/mozilla-inbound/rev/62a48b2712fb
Comment on attachment 659932 [details] [diff] [review]
Patch, v1

[Approval Request Comment]
Bug caused by (feature/regressing bug #): maybe bug 732820?

User impact if declined: Occasional 100% CPU usage by Firefox until browser is restarted.  Browser remains relatively responsive throughout, in my tests.

Testing completed (on m-c, etc.): We need to let this bake on m-c, of course.  In fact, it may even get backed out.  But I didn't want to forget to come back in a few days and request approval.

Risk to taking this patch (and alternatives if risky): This code is so messed up...there's a possibility I regressed something else.  I think it's unlikely we wouldn't notice pretty quickly if I did break something.

String or UUID changes made by this patch: None
Attachment #659932 - Flags: approval-mozilla-beta?
Attachment #659932 - Flags: approval-mozilla-aurora?
https://hg.mozilla.org/mozilla-central/rev/62a48b2712fb
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla18
Comment on attachment 659932 [details] [diff] [review]
Patch, v1

[Triage Comment]
Given where we are in the cycle, the risk here, and the fact that FF15 was affected (without significant user feedback), let's fix in FF17 first.
Attachment #659932 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
Attachment #659932 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Boris, can you please check if this is now fixed for you with the Firefox 17 Beta builds?
Whiteboard: [qa-]
I have no way to really check this.  See "Random" in summary...
(In reply to Boris Zbarsky (:bz) from comment #45)
> I have no way to really check this.  See "Random" in summary...

Okay, thanks. I think we'll just need to trust the fix and keep an eye out on our feedback channels after we release.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: