Closed Bug 824658 Opened 12 years ago Closed 12 years ago

Long-lived loop decoding a PNG

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla21

Tracking Flags:

Tracking

Status

firefox19

---

wontfix

firefox20

---

fixed

firefox21

---

fixed

b2g18

---

fixed

People

(Reporter: justin.lebar+bug, Assigned: khuey)

References

Details

(Keywords: regression, Whiteboard: [Snappy:P1])

Attachments

(1 file)

Patch 12 years ago Kyle Huey (Exited; not receiving bugmail, old account, do not use) 2.99 KB, patch	joe : review+ akeybl : approval-mozilla-aurora+ akeybl : approval-mozilla-beta- overholt : approval-mozilla-b2g18+	Details \| Diff \| Splinter Review

Justin Lebar (not reading bugmail)

Reporter

Description

•

12 years ago

We've seen this before, and I remember that it took us months to figure out what was going on. Unfortunately I just saw it again. :( This loop was not infinite; it went away after a while. I guess I'll try to catch it in gdb. http://people.mozilla.com/~bgirard/cleopatra/#report=7ee7f6a5f43900029ad8624ab13f66b723ff66a3

Justin Lebar (not reading bugmail)

Reporter

Comment 1

•

12 years ago

This seems to be happening quite often. The jank is awful. http://people.mozilla.com/~bgirard/cleopatra/#

Whiteboard: [Snappy:P1]

Justin Lebar (not reading bugmail)

Reporter

Comment 2

•

12 years ago

This bug also prevents Firefox from cleanly shutting down. So a user will have to kill it using the task manager before starting Firefox up again.

Justin Lebar (not reading bugmail)

Reporter

Comment 3

•

12 years ago

> http://people.mozilla.com/~bgirard/cleopatra/# erm, sorry, I guess that's not a profile. :) Anyway, it looked just like the one in comment 0.

tracking-firefox20: --- → ?

Justin Lebar (not reading bugmail)

Reporter

Updated

•

12 years ago

Keywords: regression

Justin Lebar (not reading bugmail)

Reporter

Comment 4

•

12 years ago

I caught this in GDB and, before it segfaulted Firefox, I observed that we appeared to be discarding the image that we'd then re-decode. We were discarding from the discard manager, which indicates that the image we were drawing was not locked.

tracking-firefox20: ? → ---

Justin Lebar (not reading bugmail)

Reporter

Updated

•

12 years ago

Assignee: nobody → justin.lebar+bug

Justin Lebar (not reading bugmail)

Reporter

Updated

•

12 years ago

Blocks: 824583

Justin Lebar (not reading bugmail)

Reporter

Comment 5

•

12 years ago

I've verified that some images are ::Drawn while discardable. I suspect such images may be the problem here. If so, my plan is to notice when an image is Drawn while discardable and discard it less aggressively. (In particular, drawn-while-discardable images won't be subject to our hard cap on decoded image memory.) But I need to test for a bit.

Justin Lebar (not reading bugmail)

Reporter

Comment 6

•

12 years ago

This is bad, right? > #0 mozilla::image::RasterImage::RequestDecodeCore (this=0x12c667b80, aDecodeType=<value temporarily unavailable, due to optimizations>) at ../../../../image/src/RasterImage.cpp:2784 > #1 0x0000000101362a45 in mozilla::image::RasterImage::GetImgFrame (this=0x12c667b80, framenum=0) at ../../../../image/src/RasterImage.cpp:817 > #2 0x0000000101363f17 in mozilla::image::RasterImage::SetFrameAsNonPremult (this=0x18, aFrameNum=1, aIsNonPremult=false) at ../../../../image/src/RasterImage.cpp:1567 > #3 0x000000010135ff90 in mozilla::image::Decoder::PostDecodeDone (this=0x10b2a6870) at ../../../../image/src/Decoder.cpp:282 > #4 0x000000010259a96b in MOZ_PNG_push_have_end [inlined] () at /Users/jlebar/code/moz/ff-inbound/src/media/libpng/pngpread.c:1453 > #5 0x000000010259a96b in MOZ_PNG_push_read_chunk (png_ptr=0x10fe9d800, info_ptr=0x127119a80) at ../../../../media/libpng/pngpread.c:2784 > #6 0x000000010259a52b in MOZ_PNG_proc_some_data [inlined] () at /Users/jlebar/code/moz/ff-inbound/src/media/libpng/pngpread.c:121 > #7 0x000000010259a52b in MOZ_PNG_process_data (png_ptr=0x10fe9d800, info_ptr=0x127119a80, buffer=<value temporarily unavailable, due to optimizations>, buffer_size=<value temporarily unavailable, due to optimizations>) at ../../../../media/libpng/pngpread.c:2784 > #8 0x000000010137f941 in mozilla::image::nsPNGDecoder::WriteInternal (this=<value temporarily unavailable, due to optimizations>, aBuffer=<value temporarily unavailable, due to optimizations>, aCount=<value temporarily unavailable, due to optimizations>) at ../../../../image/decoders/nsPNGDecoder.cpp:348 > #9 0x00000001013646e5 in mozilla::image::RasterImage::WriteToDecoder (this=0x12c667b80, aBuffer=0x12d647008 "?PNG\r\n\032\n", aCount=7867) at ../../../../image/src/RasterImage.cpp:2688 > #10 0x0000000101366aca in mozilla::image::RasterImage::DecodeWorker::DecodeSomeOfImage (this=<value temporarily unavailable, due to optimizations>, aImg=0x12c667b80, aDecodeType=mozilla::image::RasterImage::DecodeWorker::DECODE_TYPE_NORMAL) at ../../../../image/src/RasterImage.cpp:3200

Justin Lebar (not reading bugmail)

Reporter

Comment 7

•

12 years ago

Yeah, and the image's lock count is 0, so as soon as it finishes decoding, we discard it. Then the runnable comes 'round the mountain, and we're screwed.

Justin Lebar (not reading bugmail)

Reporter

Comment 8

•

12 years ago

Here is my hypothesis as to how this is happening. 1. Image A's decode finishes. Image A sets mDecoded = true. Image A's lock count is 0, so it calls DiscardTracker::Reset. 2. DiscardTracker::Reset notices that we're low on memory and calls MaybeDiscardSoon, which enqueues a task to run DiscardTracker::DiscardNow. 3. From image B's Decoder::PostDecodeDone, we end up in RequestDecodeCore. This notices that mInDecoder and dispatches a new imgDecodeRequestor to the main thread. 4. Image B's decode finishes. Its lock count is also 0. 5. DiscardTracker::DiscardNow runs. This happens before image B's imgDecodeRequestor runs, because DiscardNow was enqueued after image A's decode. DiscardNow discards image B (and maybe also image A, but we don't care). 6. Image B's imgDecodeRequestor runs and decodes image B.

Joe Drew (not getting mail)

Comment 9

•

12 years ago

It really looks like we should modify SetFrameAsNonPremult (and similar) to call GetImgFrameNoDecode(), since they're called from within decoders. If we do we no longer call RequestDecode at all afaict.

Justin Lebar (not reading bugmail)

Reporter

Comment 10

•

12 years ago

> It really looks like we should modify SetFrameAsNonPremult (and similar) to call GetImgFrameNoDecode(), since they're called from > within decoders. I can do that. But we've already had another bug just like this, so I also want to change the discard tracker so that it's more forgiving of this sort of thing.

Justin Lebar (not reading bugmail)

Reporter

Comment 11

•

12 years ago

Heh, so: If you ever load a tab in the background, we'll decode images in that tab while they're unlocked. And that's of course all you need to trigger this bug. I'm becoming increasingly skeptical that this logic where we willingly keep only X kb of decoded image data can work properly. If we're over the limit, should we even decode an unlocked image? We're going to throw it away immediately...

Justin Lebar (not reading bugmail)

Reporter

Comment 12

•

12 years ago

I'm probably seeing this problem when nobody else is because I've set image.mem.max_decoded_kb to 1024. (I set this specifically to tickle bugs like this.) But we set this value to 5mb on B2G, so it's not inconceivable that we'd hit this bug there.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 13

•

12 years ago

Attached patch Patch — Details — Splinter Review

Assignee: justin.lebar+bug → khuey

Status: NEW → ASSIGNED

Attachment #700350 - Flags: review?(justin.lebar+bug)

Justin Lebar (not reading bugmail)

Reporter

Comment 14

•

12 years ago

Comment on attachment 700350 [details] [diff] [review] Patch Whatchathink, joe? khuey is also reworking the discard tracker to avoid this class of issue (per comment 10), but he says he's going to do this in a separate bug.

Attachment #700350 - Flags: review?(justin.lebar+bug) → review?(joe)

Joe Drew (not getting mail)

Comment 15

•

12 years ago

Comment on attachment 700350 [details] [diff] [review] Patch Review of attachment 700350 [details] [diff] [review]: ----------------------------------------------------------------- This looks lovely

Attachment #700350 - Flags: review?(joe) → review+

Justin Lebar (not reading bugmail)

Reporter

Comment 16

•

12 years ago

Comment on attachment 700350 [details] [diff] [review] Patch [Approval Request Comment] Per comment 15, I'm afraid B2G is particularly susceptible to this bug. It's hard to notice that it's occurring on the phone, except you'll run down your battery and the phone will be sluggish. Imagelib changes are always scary, but I think we should take this. No string changes, etc.

Attachment #700350 - Flags: approval-mozilla-b2g18?

Justin Lebar (not reading bugmail)

Reporter

Comment 17

•

12 years ago

> Per comment 15, I'm afraid B2G is particularly susceptible to this bug. Sorry, comment 12.

Lawrence Mandel [:lmandel] (use needinfo)

Updated

•

12 years ago

blocking-b2g: --- → tef?

Justin Lebar (not reading bugmail)

Reporter

Comment 18

•

12 years ago

khuey is away, so I landed this with Joe's blessing so that we could get as much bake time on nightly as possible before porting to b2g. Hope that's OK, Kyle! https://hg.mozilla.org/integration/mozilla-inbound/rev/efc12cf7cf4a (I realized that this may not have been pushed to try only after pushing this to inbound. If it burns, let it be known that it's my fault, not khuey's.)

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 19

•

12 years ago

Yep that's fine. I was on a plane :-P

Andrew Overholt [:overholt]

Updated

•

12 years ago

Attachment #700350 - Flags: approval-mozilla-b2g18? → approval-mozilla-b2g18+

Ed Morley [:emorley]

Comment 20

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/efc12cf7cf4a

Status: ASSIGNED → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla21

Ryan VanderMeulen [:RyanVM]

Comment 21

•

12 years ago

https://hg.mozilla.org/releases/mozilla-b2g18/rev/e85226806816 This be nominated for Aurora/Beta as well?

status-b2g18: --- → fixed

status-firefox19: --- → affected

status-firefox20: --- → affected

status-firefox21: --- → fixed

Justin Lebar (not reading bugmail)

Reporter

Comment 22

•

12 years ago

(In reply to Ryan VanderMeulen from comment #21) > https://hg.mozilla.org/releases/mozilla-b2g18/rev/e85226806816 > > This be nominated for Aurora/Beta as well? 0.5% of users (well, 0.5% of sessions) are hitting this bug. I'm not sure if it's worth taking on Aurora and Beta. Maybe Aurora, though. http://goo.gl/tIBTX

Justin Lebar (not reading bugmail)

Reporter

Comment 23

•

12 years ago

Comment on attachment 700350 [details] [diff] [review] Patch [Approval Request Comment] * Bug caused by (feature/regressing bug #): Probably bug 732820. * User impact if declined: 0.5% of users will experience high CPU usage and a janky browser for at least some minutes, possibly until they restart the browser. This may also cause shutdown hangs. * Testing completed (on m-c, etc.): On m-c. * Risk to taking this patch (and alternatives if risky): I'd say moderate risk; imagelib changes often have regressions. I'm not convinced this is worth taking on beta, but maybe it's worthwhile for Aurora. String or UUID changes made by this patch: none

Attachment #700350 - Flags: approval-mozilla-beta?

Attachment #700350 - Flags: approval-mozilla-aurora?

Alex Keybl [:akeybl]

Comment 24

•

12 years ago

Comment on attachment 700350 [details] [diff] [review] Patch Agreed on taking this on Aurora only given the risk profile. We're early enough in the Aurora cycle for this to get the necessary bake time for release.

Attachment #700350 - Flags: approval-mozilla-beta?

Attachment #700350 - Flags: approval-mozilla-beta-

Attachment #700350 - Flags: approval-mozilla-aurora?

Attachment #700350 - Flags: approval-mozilla-aurora+

Alex Keybl [:akeybl]

Updated

•

12 years ago

blocking-b2g: tef? → ---

Ryan VanderMeulen [:RyanVM]

Comment 25

•

12 years ago

https://hg.mozilla.org/releases/mozilla-aurora/rev/0ef586dc6e57

status-firefox19: affected → wontfix

status-firefox20: affected → fixed

Justin Lebar (not reading bugmail)

Reporter

Comment 26

•

12 years ago

Oops, sorry guys. I just pushed an empty cset for this to Aurora. https://hg.mozilla.org/releases/mozilla-aurora/rev/6ee29f8c87d7 (It claims to have modified some files, but there's no diff in those files; this is a known bug in hg.)

You need to log in before you can comment on or make changes to this bug.