744309 - Random 100% CPU usage, due to image decoding

Reporter

Description

•

12 years ago

I know this is a crappy bug report, so please bear with me.

The last week or two my nightly (shark branch) builds have, every so often, started chewing up 100% of a CPU core; it would keep happening until I restarted the build.  Today I finally managed to get a profile of it when it happened.  The profile showed about 50% of the time under mozilla::image::imgDecodeRequestor::Run.  The rest was handling all the mac event loop gunk to deal with the resulting paint events from the invalidates.

The image(s) in question seem to be GIFs based on the profiler stacks.

While this is happening, memory usage slowly grows (where by "slowly" I mean about 0.5MB/s or so).

I have the profiles still open in case there are questions I can answer.

Nominating for tracking, because whatever the heck is going on we _really_ don't want to ship this.  :(

Bobby Holley (:bholley)

Comment 1

•

12 years ago

This rings a bell with bug 583069. I don't think that explains why we're thrashing, though.

This could happen if mInDecoder somehow got stuck to true, since then each call into RequestDecode would do an async dispatch, forever. But it's not clear to me how that could happen modulo memory corruption...

Justin Lebar (not reading bugmail)

Assignee

Comment 2

•

12 years ago

> This could happen if mInDecoder somehow got stuck to true, since then each call into RequestDecode 
> would do an async dispatch, forever.

But then we wouldn't decode anything, right?  We'd just spin and see that there's nothing to do?

This could also happen if we try to draw an image which isn't locked over and over.  The image isn't locked, we decode it, then we toss it (very quickly these days under some circumstances).

Next time this happens, you could try making image.mem.max_decoded_image_kb large.  If that immediately stops the loop, then we're thrashing there, somehow.

Scott Johnson (:jwir3)

Comment 3

•

12 years ago

bz: Is this _only_ happening on trunk (shark branch)? Or have you noticed it in a release/beta/aurora?

Boris Zbarsky [:bzbarsky]

Reporter

Comment 4

•

12 years ago

I don't really browse with release/beta/aurora, so I have no idea whether it's happening there....

Justin, the next time it happens (likely to be a day or two), I'll try what you suggest.

Alex Keybl [:akeybl]

Updated

•

12 years ago

tracking-firefox14: ? → +

Boris Zbarsky [:bzbarsky]

Reporter

Comment 5

•

12 years ago

Happened again.  I tried bumping that pref up by 100x.  No effect.  :(

Boris Zbarsky [:bzbarsky]

Reporter

Comment 6

•

12 years ago

This has been happening to me pretty reliably, averaging about once every 36 hours or so.  :(

Justin Lebar (not reading bugmail)

Assignee

Comment 7

•

12 years ago

I'm not sure what we should be doing here.  If you can catch it in a debugger and see what's causing the decodes, that would help, I guess.

It sure sounds like we're decoding and then immediately discarding an image.  If it's not because of image.mem.max_decoded_image_kb, I don't know what it would be due to.  You've been browsing with max_decoded_image_kb set high and still see this problem (as opposed to increasing the pref while you're ilooping)?

Boris Zbarsky [:bzbarsky]

Reporter

Comment 8

•

12 years ago

I can do that if I try browsing in a debug build, but that's been an exercise in pain when I've tried it recently...  The nightlies we ship, including the shark ones, don't seem to have usable debugging symbols that gdb can use.  If such symbols do exist, _please_ tell me!

> You've been browsing with max_decoded_image_kb set high

No.  I can try that if you want.  Let me know.

Justin Lebar (not reading bugmail)

Assignee

Comment 9

•

12 years ago

> You've been browsing with max_decoded_image_kb set high
>
> No.  I can try that if you want.  Let me know.

I can't speak to the debug symbols (our shark builds really *should* ship with symbols!), but yes, if we can just increase this max_decoded_image_kb pref, we should do so.  (See also bug 746055 comment 9.)

Boris Zbarsky [:bzbarsky]

Reporter

Comment 10

•

12 years ago

> our shark builds really *should* ship with symbols

They have function names, but nothing else.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 11

•

12 years ago

OK.  Bumped the pref to 2000000 and restarted.  We'll see how it goes.

Justin Lebar (not reading bugmail)

Assignee

Comment 12

•

12 years ago

It's possible that we're touching the discard logic off main thread on Mac (we know we're doing it on Windows).  If so, the discard tracker's linked list could become corrupted, or our count of how much memory is devoted to allocations could go negative.  Either way, that would be Bad.  See bug 745662.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 13

•

12 years ago

For what it's worth, since comment 11 it's been 6 days and the problem hasn't come up yet....

Bobby Holley (:bholley)

Comment 14

•

12 years ago

(In reply to Justin Lebar [:jlebar] from comment #12)
> It's possible that we're touching the discard logic off main thread on Mac
> (we know we're doing it on Windows).  If so, the discard tracker's linked
> list could become corrupted, or our count of how much memory is devoted to
> allocations could go negative.  Either way, that would be Bad.  See bug
> 745662.

Maybe we should reland that assert after fixing the peek thing?

Alex Keybl [:akeybl]

Comment 15

•

12 years ago

(In reply to Justin Lebar [:jlebar] from comment #9)
> I can't speak to the debug symbols (our shark builds really *should* ship
> with symbols!), but yes, if we can just increase this max_decoded_image_kb
> pref, we should do so.  (See also bug 746055 comment 9.)

Given Comment 13, can we move forward with this fix?

Assignee: nobody → justin.lebar+bug

Justin Lebar (not reading bugmail)

Assignee

Comment 16

•

12 years ago

> Given Comment 13, can we move forward with this fix?

I bumped the pref to 250MB in bug 746055.

That's a wallpaper fix, though, so I don't think we should close this bug.  I presume we can still get into this state -- whatever it is, somehow.

We could bump the pref higher, but it's serving a purpose (preventing OOMs), so I'm not sure we want to make it infinity.

My plan is to revisit this after we fix bug 745141.  Once that bug is fixed, we can add release-time main-thread assertions to the discard logic, so we can ensure that we're not hitting another race condition there.  With that ruled out, I'll be able to dig in here, I hope.

I've also been browsing with the pref set very low, in the hopes of triggering this bug, but so far, no dice.

Justin Lebar (not reading bugmail)

Assignee

Comment 17

•

12 years ago

With this bug wallpapered over, I'm OK no longer tracking for FF14, unless we hear more reports of this happening to people.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 18

•

12 years ago

That seems totally reasonable to me.

Joe Drew (not getting mail)

Comment 19

•

12 years ago

Justin, if you believe this is unaffected for 14, can you mark this bug as such?

Justin Lebar (not reading bugmail)

Assignee

Comment 20

•

12 years ago

I think this is fixed for FF14 by the patch for Aurora in bug 745141, yes.

The story for FF15 is cloudier; we're basically hoping that the 250mb cap is big enough not to trigger this problem.  Unfortunately I don't have room on my plate to dig in deeper anytime soon; b2g is eating me alive.  We could just disable the max-image cap everywhere.  Or we could hope that 250mb is good enough.

status-firefox14: --- → fixed

Justin Lebar (not reading bugmail)

Assignee

Comment 21

•

12 years ago

Has anyone seen this happen lately?  It's really a bad bug, and I'm afraid I only papered over it...

Boris Zbarsky [:bzbarsky]

Reporter

Comment 22

•

12 years ago

I haven't seen this recur, no...

Robert Lickenbrock [:rclick]

Comment 23

•

12 years ago

I can sometimes see this on Windows with image.mem.max_decoded_image_kb set to 1, but I don't have a definitive way of reproducing beyond browsing around and hoping it happens.

I got a profile of it happening at [1]. It looks like roughly half the time is spent under either imgDecodeRequestor::Run or DiscardRunnable::Run.

[1] https://people.mozilla.com/~bgirard/cleopatra/?report=8d36a4b437652a22391d4c2ca40f745d34274b2f

Justin Lebar (not reading bugmail)

Assignee

Comment 24

•

12 years ago

I just saw this happening on my Linux box (where I have image.mem.max_decoded_image_kb set to very low, at 256, in an attempt to trigger this bug).

24% of my time is spent in nsGIFDecoder2:DoLzw.  Here's the stack (although it's kind of obvious):

            --- mozilla::image::nsGIFDecoder2::DoLzw(unsigned char const*)
               |          
               |--99.85%-- mozilla::image::nsGIFDecoder2::WriteInternal(char const*, unsigned int)
               |          mozilla::image::RasterImage::WriteToDecoder(char const*, unsigned int)
               |          mozilla::image::RasterImage::DecodeWorker::DecodeSomeOfImage(mozilla::image::RasterImage*, mozilla::image::RasterImage::DecodeWorker::DecodeType)
               |          mozilla::image::RasterImage::DecodeWorker::DecodeABitOf(mozilla::image::RasterImage*)
               |          mozilla::image::RasterImage::RequestDecode()
               |          mozilla::image::imgDecodeRequestor::Run()
               |          nsThread::ProcessNextEvent(bool, bool*)
               |          NS_ProcessNextEvent_P(nsIThread*, bool)
               |          mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*)
               |          MessageLoop::Run()
               |          nsBaseAppShell::Run()
               |          nsAppStartup::Run()
               |          XREMain::XRE_mainRun()
               |          XREMain::XRE_main(int, char**, nsXREAppData const*)
               |          XRE_main
               |          main

Summary: Random 100% CPU usage on Mac, apparently due to image decoding → Random 100% CPU usage, due to image decoding

Justin Lebar (not reading bugmail)

Assignee

Updated

•

12 years ago

OS: Mac OS X → All

Justin Lebar (not reading bugmail)

Assignee

Comment 25

•

12 years ago

I caught this in gdb.  It looks like we're calling RequestDecode on the same image repeatedly.  On my machine, I actually have three or four images which are all getting repeatedly RequestDecoded()'ed.

In GDB, I see four calls in a row to RequestDecode() for the same image.  Then we move on to the next image.  Here are backtraces for one set of four calls:

== First hit ==

> Breakpoint 1, mozilla::image::RasterImage::RequestDecode (this=0x7f2c5be98b60)
>     at /builds/slave/m-cen-lnx64-ntly/build/image/src/RasterImage.cpp:2416
> 2416	  if (mError)
> (gdb) bt
> #0  mozilla::image::RasterImage::RequestDecode (this=0x7f2c5be98b60)
>     at /builds/slave/m-cen-lnx64-ntly/build/image/src/RasterImage.cpp:2416
> #1  0x00007f2c93321db5 in mozilla::image::imgDecodeRequestor::Run() () from /home/jlebar/code/moz/nightly/libxul.so
> #2  0x00007f2c93ec841c in nsThread::ProcessNextEvent(bool, bool*) () from /home/jlebar/code/moz/nightly/libxul.so
> #3  0x00007f2c93e92b3c in NS_ProcessNextEvent_P(nsIThread*, bool) () from /home/jlebar/code/moz/nightly/libxul.so
> #4  0x00007f2c93b751db in nsXULWindow::ShowModal() () from /home/jlebar/code/moz/nightly/libxul.so
> #5  0x00007f2c93b51f7a in nsWindowWatcher::OpenWindowInternal(nsIDOMWindow*, char const*, char const*, char const*, bool, bool, bool, nsIArray*, nsIDOMWindow**) () from /home/jlebar/code/moz/nightly/libxul.so
> 
Looks like we're in a nested event loop; I don't know if that's relevant.

This call does InitDecoder(/* aDoSizeDecode = */ false);, then returns.

== Second hit ==

> Breakpoint 1, mozilla::image::RasterImage::RequestDecode (this=0x7f2c5be98b60)
>     at /builds/slave/m-cen-lnx64-ntly/build/image/src/RasterImage.cpp:2416
> 2416	  if (mError)
> (gdb) bt
> #0  mozilla::image::RasterImage::RequestDecode (this=0x7f2c5be98b60)
>     at /builds/slave/m-cen-lnx64-ntly/build/image/src/RasterImage.cpp:2416
> #1  0x00007f2c9331d1d1 in WantDecodedFrames (this=0x7f2c5be98b60)
>     at /builds/slave/m-cen-lnx64-ntly/build/image/src/RasterImage.cpp:2406
> #2  mozilla::image::RasterImage::GetImgFrame (this=0x7f2c5be98b60, framenum=0)
>     at /builds/slave/m-cen-lnx64-ntly/build/image/src/RasterImage.cpp:590
> #3  0x00007f2c933200a8 in mozilla::image::RasterImage::SetFrameDisposalMethod(unsigned int, int) ()
>    from /home/jlebar/code/moz/nightly/libxul.so
> #4  0x00007f2c93337c6c in mozilla::image::nsGIFDecoder2::BeginImageFrame (this=0x7f2c2e563000, 
>     aDepth=<optimized out>) at /builds/slave/m-cen-lnx64-ntly/build/image/decoders/nsGIFDecoder2.cpp:202
> #5  0x00007f2c93338648 in mozilla::image::nsGIFDecoder2::WriteInternal (this=0x7f2c2e563000, 
>     aBuffer=<optimized out>, aCount=<optimized out>)
>     at /builds/slave/m-cen-lnx64-ntly/build/image/decoders/nsGIFDecoder2.cpp:922
> #6  0x00007f2c9331f2bb in mozilla::image::RasterImage::WriteToDecoder(char const*, unsigned int) ()
>    from /home/jlebar/code/moz/nightly/libxul.so
> #7  0x00007f2c9331f58b in mozilla::image::RasterImage::DecodeWorker::DecodeSomeOfImage(mozilla::image::RasterImage*, mozilla::image::RasterImage::DecodeWorker::DecodeType) () from /home/jlebar/code/moz/nightly/libxul.so
> #8  0x00007f2c9331d860 in mozilla::image::RasterImage::DecodeWorker::DecodeABitOf (this=0x7f2c73ebfa10, 
>     aImg=0x7f2c5be98b60) at /builds/slave/m-cen-lnx64-ntly/build/image/src/RasterImage.cpp:2904
> #9  0x00007f2c9331d020 in mozilla::image::RasterImage::RequestDecode (this=0x7f2c5be98b60)
>     at /builds/slave/m-cen-lnx64-ntly/build/image/src/RasterImage.cpp:2467
> #10 0x00007f2c93321db5 in mozilla::image::imgDecodeRequestor::Run() () from /home/jlebar/code/moz/nightly/libxul.so
> #11 0x00007f2c93ec841c in nsThread::ProcessNextEvent(bool, bool*) () from /home/jlebar/code/moz/nightly/libxul.so
> #12 0x00007f2c93e92b3c in NS_ProcessNextEvent_P(nsIThread*, bool) () from /home/jlebar/code/moz/nightly/libxul.so
> #13 0x00007f2c93b751db in nsXULWindow::ShowModal() () from /home/jlebar/code/moz/nightly/libxul.so
> #14 0x00007f2c93b51f7a in nsWindowWatcher::OpenWindowInternal(nsIDOMWindow*, char const*, char const*, char const*, bool, bool, bool, nsIArray*, nsIDOMWindow**) () from /home/jlebar/code/moz/nightly/libxul.so
> #15 0x00007f2c93b4e6c2 in nsWindowWatcher::OpenWindow(nsIDOMWindow*, char const*, char const*, char const*, nsISupports*, nsIDOMWindow**) () from /home/jlebar/code/moz/nightly/libxul.so

This call does 

    if (mDecoder && !mDecoder->IsSizeDecode())

sees that the condition is true, and returns.

== Third hit ==

> Breakpoint 1, mozilla::image::RasterImage::RequestDecode (this=0x7f2c5be98b60)
>     at /builds/slave/m-cen-lnx64-ntly/build/image/src/RasterImage.cpp:2416
> 2416	  if (mError)
> (gdb) bt
> #0  mozilla::image::RasterImage::RequestDecode (this=0x7f2c5be98b60)
>     at /builds/slave/m-cen-lnx64-ntly/build/image/src/RasterImage.cpp:2416
> #1  0x00007f2c9331d1d1 in WantDecodedFrames (this=0x7f2c5be98b60)
>     at /builds/slave/m-cen-lnx64-ntly/build/image/src/RasterImage.cpp:2406
> #2  mozilla::image::RasterImage::GetImgFrame (this=0x7f2c5be98b60, framenum=0)
>     at /builds/slave/m-cen-lnx64-ntly/build/image/src/RasterImage.cpp:590
> #3  0x00007f2c93320058 in mozilla::image::RasterImage::SetFrameTimeout(unsigned int, int) ()
>    from /home/jlebar/code/moz/nightly/libxul.so
> #4  0x00007f2c93337db6 in mozilla::image::nsGIFDecoder2::EndImageFrame (this=0x7f2c2e563000)
>     at /builds/slave/m-cen-lnx64-ntly/build/image/decoders/nsGIFDecoder2.cpp:262
> #5  0x00007f2c93338a61 in mozilla::image::nsGIFDecoder2::WriteInternal (this=0x7f2c2e563000, 
>     aBuffer=<optimized out>, aCount=<optimized out>)
>     at /builds/slave/m-cen-lnx64-ntly/build/image/decoders/nsGIFDecoder2.cpp:1017
> #6  0x00007f2c9331f2bb in mozilla::image::RasterImage::WriteToDecoder(char const*, unsigned int) ()
>    from /home/jlebar/code/moz/nightly/libxul.so
> #7  0x00007f2c9331f58b in mozilla::image::RasterImage::DecodeWorker::DecodeSomeOfImage(mozilla::image::RasterImage*, mozilla::image::RasterImage::DecodeWorker::DecodeType) () from /home/jlebar/code/moz/nightly/libxul.so
> #8  0x00007f2c9331d860 in mozilla::image::RasterImage::DecodeWorker::DecodeABitOf (this=0x7f2c73ebfa10, 
>     aImg=0x7f2c5be98b60) at /builds/slave/m-cen-lnx64-ntly/build/image/src/RasterImage.cpp:2904
> #9  0x00007f2c9331d020 in mozilla::image::RasterImage::RequestDecode (this=0x7f2c5be98b60)
>     at /builds/slave/m-cen-lnx64-ntly/build/image/src/RasterImage.cpp:2467
> #10 0x00007f2c93321db5 in mozilla::image::imgDecodeRequestor::Run() () from /home/jlebar/code/moz/nightly/libxul.so
> #11 0x00007f2c93ec841c in nsThread::ProcessNextEvent(bool, bool*) () from /home/jlebar/code/moz/nightly/libxul.so
> #12 0x00007f2c93e92b3c in NS_ProcessNextEvent_P(nsIThread*, bool) () from /home/jlebar/code/moz/nightly/libxul.so
> #13 0x00007f2c93b751db in nsXULWindow::ShowModal() () from /home/jlebar/code/moz/nightly/libxul.so
> #14 0x00007f2c93b51f7a in nsWindowWatcher::OpenWindowInternal(nsIDOMWindow*, char const*, char const*, char const*, bool, bool, bool, nsIArray*, nsIDOMWindow**) () from /home/jlebar/code/moz/nightly/libxul.so
> #15 0x00007f2c93b4e6c2 in nsWindowWatcher::OpenWindow(nsIDOMWindow*, char const*, char const*, char const*, nsISupports*, nsIDOMWindow**) () from /home/jlebar/code/moz/nightly/libxul.so

Just like the second hit, this call does 

    if (mDecoder && !mDecoder->IsSizeDecode())

sees that the condition is true, and returns.

== Fourth hit ==

> Breakpoint 1, mozilla::image::RasterImage::RequestDecode (this=0x7f2c5be98b60)
>     at /builds/slave/m-cen-lnx64-ntly/build/image/src/RasterImage.cpp:2416
> 2416	  if (mError)
> (gdb) bt
> #0  mozilla::image::RasterImage::RequestDecode (this=0x7f2c5be98b60)
>     at /builds/slave/m-cen-lnx64-ntly/build/image/src/RasterImage.cpp:2416
> #1  0x00007f2c9331d1d1 in WantDecodedFrames (this=0x7f2c5be98b60)
>     at /builds/slave/m-cen-lnx64-ntly/build/image/src/RasterImage.cpp:2406
> #2  mozilla::image::RasterImage::GetImgFrame (this=0x7f2c5be98b60, framenum=0)
>     at /builds/slave/m-cen-lnx64-ntly/build/image/src/RasterImage.cpp:590
> #3  0x00007f2c9331ffb8 in mozilla::image::RasterImage::SetFrameAsNonPremult(unsigned int, bool) ()
>    from /home/jlebar/code/moz/nightly/libxul.so
> #4  0x00007f2c9331c29e in mozilla::image::Decoder::PostDecodeDone (this=0x7f2c2e563000)
>     at /builds/slave/m-cen-lnx64-ntly/build/image/src/Decoder.cpp:275
> #5  0x00007f2c93337e8b in mozilla::image::nsGIFDecoder2::FinishInternal (this=0x7f2c2e563000)
>     at /builds/slave/m-cen-lnx64-ntly/build/image/decoders/nsGIFDecoder2.cpp:115
> #6  0x00007f2c9331c353 in mozilla::image::Decoder::Finish (this=0x7f2c2e563000)
>     at /builds/slave/m-cen-lnx64-ntly/build/image/src/Decoder.cpp:88
> #7  0x00007f2c9331f1d7 in mozilla::image::RasterImage::ShutdownDecoder(mozilla::image::RasterImage::eShutdownIntent) () from /home/jlebar/code/moz/nightly/libxul.so
> #8  0x00007f2c9331f66e in mozilla::image::RasterImage::DecodeWorker::DecodeSomeOfImage(mozilla::image::RasterImage*, mozilla::image::RasterImage::DecodeWorker::DecodeType) () from /home/jlebar/code/moz/nightly/libxul.so
> #9  0x00007f2c9331d860 in mozilla::image::RasterImage::DecodeWorker::DecodeABitOf (this=0x7f2c73ebfa10, 
>     aImg=0x7f2c5be98b60) at /builds/slave/m-cen-lnx64-ntly/build/image/src/RasterImage.cpp:2904
> #10 0x00007f2c9331d020 in mozilla::image::RasterImage::RequestDecode (this=0x7f2c5be98b60)
>     at /builds/slave/m-cen-lnx64-ntly/build/image/src/RasterImage.cpp:2467
> #11 0x00007f2c93321db5 in mozilla::image::imgDecodeRequestor::Run() () from /home/jlebar/code/moz/nightly/libxul.so
> #12 0x00007f2c93ec841c in nsThread::ProcessNextEvent(bool, bool*) () from /home/jlebar/code/moz/nightly/libxul.so
> #13 0x00007f2c93e92b3c in NS_ProcessNextEvent_P(nsIThread*, bool) () from /home/jlebar/code/moz/nightly/libxul.so
> #14 0x00007f2c93b751db in nsXULWindow::ShowModal() () from /home/jlebar/code/moz/nightly/libxul.so
> #15 0x00007f2c93b51f7a in nsWindowWatcher::OpenWindowInternal(nsIDOMWindow*, char const*, char const*, char const*, bool, bool, bool, nsIArray*, nsIDOMWindow**) () from /home/jlebar/code/moz/nightly/libxul.so

This call does |if (mInDecoder)|, sees that the condition is true, and then does

  nsRefPtr<imgDecodeRequestor> requestor = new imgDecodeRequestor(this);

and re-enqueues itself into the event loop!

Justin Lebar (not reading bugmail)

Assignee

Comment 26

•

12 years ago

I may understand what's going on here.

> // Simplified a bit
> RasterImage::RequestDecode()
> {
>   if (mError || mDecoded || !StoringSourceData() || (mDecoder && !mDecoder->IsSizeDecode()))
>     return;
> 
>   if (mInDecoder) {
>     nsRefPtr<imgDecodeRequestor> requestor = new imgDecodeRequestor(this);
>     return NS_DispatchToCurrentThread(requestor);
>     // asynchronously calls this->RequestDecode();
>   }

What I think is happening is, after Decoder::Finish() finishes, we're setting
neither mError nor mDecoded.

Therefore, when the imgDecodeRequestor runs, it starts a new decode.  This
process repeats ad infinitum.

Of course, we shouldn't even be spawning this imgDecodeRequestor in the first
place.  But I don't think that's the main bug.

Justin Lebar (not reading bugmail)

Assignee

Comment 27

•

12 years ago

> we're setting neither mError nor mDecoded.

Although that's pretty odd, because Decoder::PostDecodeDone (in the last stack trace in comment 25) should call RasterImage::DecodingComplete, which will ensure that either mError or mDecoded is set.

Unfortunately GDB seems to have died on me, so the above stacks are probably all we're going to get from this session.

Joe Drew (not getting mail)

Comment 28

•

12 years ago

PostDecodeDone calls DecodingComplete _after_ calling SetFrameAsNonPremult, which is what's calling GetImgFrame that calls WantDecodedFrames that calls RequestDecode that actually re-enqueues the decode request.

However, that shouldn't have any effect if the image is actually already decoded. AFAICT the only way this should not be a no-op is if a request for the image to be discarded or for the image in a format (nonpremult vs premult, etc) that it isn't already in.

I'd be very interested to see where/whether there is a call to RasterImage::Discard for this image after the 4th RequestDecode call; that is, if you can ever reproduce this in a debugger again. :)

Justin Lebar (not reading bugmail)

Assignee

Comment 29

•

12 years ago

> or for the image in a format (nonpremult vs premult, etc) that it isn't already in.

I think you a word here, but I'm not quite sure what it is.

> I'd be very interested to see where/whether there is a call to RasterImage::Discard for this image 
> after the 4th RequestDecode call; that is, if you can ever reproduce this in a debugger again. :)

That discarding has something to do with this would make sense, since it seems that it may be easier to reproduce this issue by decreasing the max-decoded-image-bytes threshold.

I can make the imgDecodeRequestor not get enqueued when we're in the process of finishing a decode.  And I can modify the discard heuristics to be flexible with the max-decoded-image-bytes limit, although I'm afraid that could only paper over this problem.  (Instead of decoding in a loop, we could end up decoding once per second, which would still be awful for battery life on B2G!)

Any other ideas for how to fix this, assuming I don't get another shot at this in gdb?

Justin Lebar (not reading bugmail)

Assignee

Comment 30

•

12 years ago

Attached patch Patch, v1 — Details — Splinter Review

I'm not at all pleased with this.

Attachment #659932 - Flags: review?(joe)

Joe Drew (not getting mail)

Comment 31

•

12 years ago

Comment on attachment 659932 [details] [diff] [review]
Patch, v1

o god o god

You don't actually need this extra flag - (mDecoder && mDecoder->GetDecodeDone()) should be enough, I think.

I can't quite convince myself that this is the fundamentally wrong way to solve this bug, sadly. I might even r+ a fix that used the above :(

Attachment #659932 - Flags: review?(joe) → review-

Joe Drew (not getting mail)

Comment 32

•

12 years ago

(In reply to Justin Lebar [:jlebar] from comment #29)
> > or for the image in a format (nonpremult vs premult, etc) that it isn't already in.
> 
> I think you a word here, but I'm not quite sure what it is.

Slightly rewritten:

"Ask for the image in a format different from the one it's stored in."

As for reproducing this in gdb - maybe I can start browsing with the set of values you've tried, since they clearly work!

Justin Lebar (not reading bugmail)

Assignee

Comment 33

•

12 years ago

> You don't actually need this extra flag - (mDecoder && mDecoder->GetDecodeDone()) should be enough, 
> I think.

We set mDecoder to null before calling decoder->Finish(), in ShutdownDecoder.

> As for reproducing this in gdb - maybe I can start browsing with the set of values you've tried, 
> since they clearly work!

They may work, but it had been more than a month since I last saw this bug.

Justin Lebar (not reading bugmail)

Assignee

Comment 34

•

12 years ago

Comment on attachment 659932 [details] [diff] [review]
Patch, v1

Back to r? because I don't think comment 31 works.

Attachment #659932 - Flags: review- → review?(joe)

Joe Drew (not getting mail)

Comment 35

•

12 years ago

(In reply to Justin Lebar [:jlebar] from comment #33)
> > You don't actually need this extra flag - (mDecoder && mDecoder->GetDecodeDone()) should be enough, 
> > I think.
> 
> We set mDecoder to null before calling decoder->Finish(), in ShutdownDecoder.

I don't understand why this would affect things. Surely we would check just after IsSizeDecode(), and hence ShutdownDecoder won't have been called?

Justin Lebar (not reading bugmail)

Assignee

Comment 36

•

12 years ago

Maybe I don't understand comment 31 correctly.  To be really concrete about it:

> You don't actually need this extra flag - (mDecoder &&
> mDecoder->GetDecodeDone()) should be enough, I think.

I thought/think you were/are suggesting that I should replace 

> @@ -2423,16 +2426,22 @@ RasterImage::RequestDecode()
>    if (mDecoder && !mDecoder->IsSizeDecode())
>      return NS_OK;
> +  if (mFinishing)
> +    return NS_OK;

with

> @@ -2423,16 +2426,22 @@ RasterImage::RequestDecode()
>    if (mDecoder && !mDecoder->IsSizeDecode())
>      return NS_OK;
> +  if (mDecoder && mDecoder->GetDecodeDone())
> +    return NS_OK;

What I was saying is that the RequestDecode() call we want to be affected by this early return happens during the call to decoder->Finish() inside ShutdownDecoder:

> RasterImage::ShutdownDecoder() {
>   [...]
>   nsRefPtr<Decoder> decoder = mDecoder;
>   mDecoder = nullptr;
> 
>   mInDecoder = true;
>   decoder->Finish();
>   mInDecoder = false;
>   [...]
> }

But notice that before we call decoder->Finish(), we've already set mDecoder = nullptr.  So |mDecoder && mDecoder->GetDecodeDone()| will not cause the desired early return.

Joe Drew (not getting mail)

Comment 37

•

12 years ago

Oh, I missed that in the callstack.

..

this sucks.

Joe Drew (not getting mail)

Comment 38

•

12 years ago

Comment on attachment 659932 [details] [diff] [review]
Patch, v1

i hope you're happy ahhhhh

Attachment #659932 - Flags: review?(joe) → review+

Justin Lebar (not reading bugmail)

Assignee

Comment 39

•

12 years ago

> i hope you're happy ahhhhh

I am not at all happy, but at least we may finally have fixed this bug.

I may or may not have had a try run for this, but try is currently very backed up, so ima push to inbound.

https://hg.mozilla.org/integration/mozilla-inbound/rev/62a48b2712fb

Justin Lebar (not reading bugmail)

Assignee

Comment 40

•

12 years ago

Comment on attachment 659932 [details] [diff] [review]
Patch, v1

[Approval Request Comment]
Bug caused by (feature/regressing bug #): maybe bug 732820?

User impact if declined: Occasional 100% CPU usage by Firefox until browser is restarted.  Browser remains relatively responsive throughout, in my tests.

Testing completed (on m-c, etc.): We need to let this bake on m-c, of course.  In fact, it may even get backed out.  But I didn't want to forget to come back in a few days and request approval.

Risk to taking this patch (and alternatives if risky): This code is so messed up...there's a possibility I regressed something else.  I think it's unlikely we wouldn't notice pretty quickly if I did break something.

String or UUID changes made by this patch: None

Attachment #659932 - Flags: approval-mozilla-beta?

Attachment #659932 - Flags: approval-mozilla-aurora?

Graeme McCutcheon [:graememcc]

Comment 41

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/62a48b2712fb

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla18

Alex Keybl [:akeybl]

Updated

•

12 years ago

status-firefox16: --- → wontfix

status-firefox17: --- → affected

tracking-firefox17: --- → +

Alex Keybl [:akeybl]

Comment 42

•

12 years ago

Comment on attachment 659932 [details] [diff] [review]
Patch, v1

[Triage Comment]
Given where we are in the cycle, the risk here, and the fact that FF15 was affected (without significant user feedback), let's fix in FF17 first.

Attachment #659932 - Flags: approval-mozilla-beta? → approval-mozilla-beta-

Alex Keybl [:akeybl]

Updated

•

12 years ago

Attachment #659932 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Justin Lebar (not reading bugmail)

Assignee

Comment 43

•

12 years ago

Landed for Aurora 17: https://hg.mozilla.org/releases/mozilla-aurora/rev/847afafd1c6d

status-firefox17: affected → fixed

u279076

Comment 44

•

12 years ago

Boris, can you please check if this is now fixed for you with the Firefox 17 Beta builds?

Whiteboard: [qa-]

Boris Zbarsky [:bzbarsky]

Reporter

Comment 45

•

12 years ago

I have no way to really check this.  See "Random" in summary...

u279076

Comment 46

•

12 years ago

(In reply to Boris Zbarsky (:bz) from comment #45)
> I have no way to really check this.  See "Random" in summary...

Okay, thanks. I think we'll just need to trust the fix and keep an eye out on our feedback channels after we release.