Closed
Bug 824658
Opened 12 years ago
Closed 12 years ago
Long-lived loop decoding a PNG
Categories
(Core :: Graphics: ImageLib, defect)
Tracking
()
RESOLVED
FIXED
mozilla21
People
(Reporter: justin.lebar+bug, Assigned: khuey)
References
Details
(Keywords: regression, Whiteboard: [Snappy:P1])
Attachments
(1 file)
2.99 KB,
patch
|
joe
:
review+
akeybl
:
approval-mozilla-aurora+
akeybl
:
approval-mozilla-beta-
overholt
:
approval-mozilla-b2g18+
|
Details | Diff | Splinter Review |
We've seen this before, and I remember that it took us months to figure out what was going on. Unfortunately I just saw it again. :(
This loop was not infinite; it went away after a while.
I guess I'll try to catch it in gdb.
http://people.mozilla.com/~bgirard/cleopatra/#report=7ee7f6a5f43900029ad8624ab13f66b723ff66a3
Reporter | ||
Comment 1•12 years ago
|
||
This seems to be happening quite often. The jank is awful.
http://people.mozilla.com/~bgirard/cleopatra/#
Whiteboard: [Snappy:P1]
Reporter | ||
Comment 2•12 years ago
|
||
This bug also prevents Firefox from cleanly shutting down. So a user will have to kill it using the task manager before starting Firefox up again.
Reporter | ||
Comment 3•12 years ago
|
||
> http://people.mozilla.com/~bgirard/cleopatra/#
erm, sorry, I guess that's not a profile. :) Anyway, it looked just like the one in comment 0.
tracking-firefox20:
--- → ?
Reporter | ||
Updated•12 years ago
|
Keywords: regression
Reporter | ||
Comment 4•12 years ago
|
||
I caught this in GDB and, before it segfaulted Firefox, I observed that we appeared to be discarding the image that we'd then re-decode. We were discarding from the discard manager, which indicates that the image we were drawing was not locked.
tracking-firefox20:
? → ---
Reporter | ||
Updated•12 years ago
|
Assignee: nobody → justin.lebar+bug
Reporter | ||
Comment 5•12 years ago
|
||
I've verified that some images are ::Drawn while discardable. I suspect such images may be the problem here.
If so, my plan is to notice when an image is Drawn while discardable and discard it less aggressively. (In particular, drawn-while-discardable images won't be subject to our hard cap on decoded image memory.)
But I need to test for a bit.
Reporter | ||
Comment 6•12 years ago
|
||
This is bad, right?
> #0 mozilla::image::RasterImage::RequestDecodeCore (this=0x12c667b80, aDecodeType=<value temporarily unavailable, due to optimizations>) at ../../../../image/src/RasterImage.cpp:2784
> #1 0x0000000101362a45 in mozilla::image::RasterImage::GetImgFrame (this=0x12c667b80, framenum=0) at ../../../../image/src/RasterImage.cpp:817
> #2 0x0000000101363f17 in mozilla::image::RasterImage::SetFrameAsNonPremult (this=0x18, aFrameNum=1, aIsNonPremult=false) at ../../../../image/src/RasterImage.cpp:1567
> #3 0x000000010135ff90 in mozilla::image::Decoder::PostDecodeDone (this=0x10b2a6870) at ../../../../image/src/Decoder.cpp:282
> #4 0x000000010259a96b in MOZ_PNG_push_have_end [inlined] () at /Users/jlebar/code/moz/ff-inbound/src/media/libpng/pngpread.c:1453
> #5 0x000000010259a96b in MOZ_PNG_push_read_chunk (png_ptr=0x10fe9d800, info_ptr=0x127119a80) at ../../../../media/libpng/pngpread.c:2784
> #6 0x000000010259a52b in MOZ_PNG_proc_some_data [inlined] () at /Users/jlebar/code/moz/ff-inbound/src/media/libpng/pngpread.c:121
> #7 0x000000010259a52b in MOZ_PNG_process_data (png_ptr=0x10fe9d800, info_ptr=0x127119a80, buffer=<value temporarily unavailable, due to optimizations>, buffer_size=<value temporarily unavailable, due to optimizations>) at ../../../../media/libpng/pngpread.c:2784
> #8 0x000000010137f941 in mozilla::image::nsPNGDecoder::WriteInternal (this=<value temporarily unavailable, due to optimizations>, aBuffer=<value temporarily unavailable, due to optimizations>, aCount=<value temporarily unavailable, due to optimizations>) at ../../../../image/decoders/nsPNGDecoder.cpp:348
> #9 0x00000001013646e5 in mozilla::image::RasterImage::WriteToDecoder (this=0x12c667b80, aBuffer=0x12d647008 "?PNG\r\n\032\n", aCount=7867) at ../../../../image/src/RasterImage.cpp:2688
> #10 0x0000000101366aca in mozilla::image::RasterImage::DecodeWorker::DecodeSomeOfImage (this=<value temporarily unavailable, due to optimizations>, aImg=0x12c667b80, aDecodeType=mozilla::image::RasterImage::DecodeWorker::DECODE_TYPE_NORMAL) at ../../../../image/src/RasterImage.cpp:3200
Reporter | ||
Comment 7•12 years ago
|
||
Yeah, and the image's lock count is 0, so as soon as it finishes decoding, we discard it. Then the runnable comes 'round the mountain, and we're screwed.
Reporter | ||
Comment 8•12 years ago
|
||
Here is my hypothesis as to how this is happening.
1. Image A's decode finishes. Image A sets mDecoded = true. Image A's lock
count is 0, so it calls DiscardTracker::Reset.
2. DiscardTracker::Reset notices that we're low on memory and calls
MaybeDiscardSoon, which enqueues a task to run DiscardTracker::DiscardNow.
3. From image B's Decoder::PostDecodeDone, we end up in RequestDecodeCore.
This notices that mInDecoder and dispatches a new imgDecodeRequestor to the
main thread.
4. Image B's decode finishes. Its lock count is also 0.
5. DiscardTracker::DiscardNow runs. This happens before image B's
imgDecodeRequestor runs, because DiscardNow was enqueued after image A's
decode.
DiscardNow discards image B (and maybe also image A, but we don't care).
6. Image B's imgDecodeRequestor runs and decodes image B.
Comment 9•12 years ago
|
||
It really looks like we should modify SetFrameAsNonPremult (and similar) to call GetImgFrameNoDecode(), since they're called from within decoders. If we do we no longer call RequestDecode at all afaict.
Reporter | ||
Comment 10•12 years ago
|
||
> It really looks like we should modify SetFrameAsNonPremult (and similar) to call GetImgFrameNoDecode(), since they're called from
> within decoders.
I can do that. But we've already had another bug just like this, so I also want to change the discard tracker so that it's more forgiving of this sort of thing.
Reporter | ||
Comment 11•12 years ago
|
||
Heh, so: If you ever load a tab in the background, we'll decode images in that tab while they're unlocked. And that's of course all you need to trigger this bug.
I'm becoming increasingly skeptical that this logic where we willingly keep only X kb of decoded image data can work properly.
If we're over the limit, should we even decode an unlocked image? We're going to throw it away immediately...
Reporter | ||
Comment 12•12 years ago
|
||
I'm probably seeing this problem when nobody else is because I've set image.mem.max_decoded_kb to 1024. (I set this specifically to tickle bugs like this.)
But we set this value to 5mb on B2G, so it's not inconceivable that we'd hit this bug there.
Assignee | ||
Comment 13•12 years ago
|
||
Assignee: justin.lebar+bug → khuey
Status: NEW → ASSIGNED
Attachment #700350 -
Flags: review?(justin.lebar+bug)
Reporter | ||
Comment 14•12 years ago
|
||
Comment on attachment 700350 [details] [diff] [review]
Patch
Whatchathink, joe?
khuey is also reworking the discard tracker to avoid this class of issue (per comment 10), but he says he's going to do this in a separate bug.
Attachment #700350 -
Flags: review?(justin.lebar+bug) → review?(joe)
Comment 15•12 years ago
|
||
Comment on attachment 700350 [details] [diff] [review]
Patch
Review of attachment 700350 [details] [diff] [review]:
-----------------------------------------------------------------
This looks lovely
Attachment #700350 -
Flags: review?(joe) → review+
Reporter | ||
Comment 16•12 years ago
|
||
Comment on attachment 700350 [details] [diff] [review]
Patch
[Approval Request Comment]
Per comment 15, I'm afraid B2G is particularly susceptible to this bug. It's hard to notice that it's occurring on the phone, except you'll run down your battery and the phone will be sluggish.
Imagelib changes are always scary, but I think we should take this.
No string changes, etc.
Attachment #700350 -
Flags: approval-mozilla-b2g18?
Reporter | ||
Comment 17•12 years ago
|
||
> Per comment 15, I'm afraid B2G is particularly susceptible to this bug.
Sorry, comment 12.
Updated•12 years ago
|
blocking-b2g: --- → tef?
Reporter | ||
Comment 18•12 years ago
|
||
khuey is away, so I landed this with Joe's blessing so that we could get as much bake time on nightly as possible before porting to b2g. Hope that's OK, Kyle!
https://hg.mozilla.org/integration/mozilla-inbound/rev/efc12cf7cf4a
(I realized that this may not have been pushed to try only after pushing this to inbound. If it burns, let it be known that it's my fault, not khuey's.)
Assignee | ||
Comment 19•12 years ago
|
||
Yep that's fine. I was on a plane :-P
Updated•12 years ago
|
Attachment #700350 -
Flags: approval-mozilla-b2g18? → approval-mozilla-b2g18+
Comment 20•12 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla21
Comment 21•12 years ago
|
||
https://hg.mozilla.org/releases/mozilla-b2g18/rev/e85226806816
This be nominated for Aurora/Beta as well?
status-b2g18:
--- → fixed
status-firefox19:
--- → affected
status-firefox20:
--- → affected
status-firefox21:
--- → fixed
Reporter | ||
Comment 22•12 years ago
|
||
(In reply to Ryan VanderMeulen from comment #21)
> https://hg.mozilla.org/releases/mozilla-b2g18/rev/e85226806816
>
> This be nominated for Aurora/Beta as well?
0.5% of users (well, 0.5% of sessions) are hitting this bug. I'm not sure if it's worth taking on Aurora and Beta. Maybe Aurora, though.
http://goo.gl/tIBTX
Reporter | ||
Comment 23•12 years ago
|
||
Comment on attachment 700350 [details] [diff] [review]
Patch
[Approval Request Comment]
* Bug caused by (feature/regressing bug #): Probably bug 732820.
* User impact if declined: 0.5% of users will experience high CPU usage and a janky browser for at least some minutes, possibly until they restart the browser. This may also cause shutdown hangs.
* Testing completed (on m-c, etc.): On m-c.
* Risk to taking this patch (and alternatives if risky): I'd say moderate risk; imagelib changes often have regressions. I'm not convinced this is worth taking on beta, but maybe it's worthwhile for Aurora.
String or UUID changes made by this patch: none
Attachment #700350 -
Flags: approval-mozilla-beta?
Attachment #700350 -
Flags: approval-mozilla-aurora?
Comment 24•12 years ago
|
||
Comment on attachment 700350 [details] [diff] [review]
Patch
Agreed on taking this on Aurora only given the risk profile. We're early enough in the Aurora cycle for this to get the necessary bake time for release.
Attachment #700350 -
Flags: approval-mozilla-beta?
Attachment #700350 -
Flags: approval-mozilla-beta-
Attachment #700350 -
Flags: approval-mozilla-aurora?
Attachment #700350 -
Flags: approval-mozilla-aurora+
Updated•12 years ago
|
blocking-b2g: tef? → ---
Comment 25•12 years ago
|
||
Reporter | ||
Comment 26•12 years ago
|
||
Oops, sorry guys. I just pushed an empty cset for this to Aurora.
https://hg.mozilla.org/releases/mozilla-aurora/rev/6ee29f8c87d7
(It claims to have modified some files, but there's no diff in those files; this is a known bug in hg.)
You need to log in
before you can comment on or make changes to this bug.
Description
•