Closed
Bug 1120279
Opened 10 years ago
Closed 8 years ago
crash in mozilla::image::imgFrame::GetAnimationData()
Categories
(Core :: Graphics: ImageLib, defect)
Tracking
()
RESOLVED
FIXED
mozilla52
People
(Reporter: away, Assigned: aosmond)
References
Details
(Keywords: crash, Whiteboard: gfx-noted)
Crash Data
Attachments
(1 file, 2 obsolete files)
1.82 KB,
patch
|
eflores
:
review+
gchang
:
approval-mozilla-aurora+
ritu
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is
report bp-e801057b-cd50-4aa7-999f-018fe2150109.
=============================================================
Null |frame| in GetTimeoutForFrame. This began with build 20150109030224. Bug 1116733 is definitely a suspect but it's not clear whether this is a new issue or just a new signature due to refactoring.
Flags: needinfo?(seth)
Comment 1•10 years ago
|
||
I'm pretty certain that this bug, bug 1108034, and bug 1119518 are dupes that only look different due to refactoring. I will do some more detailed triage and confirm that soon.
Updated•10 years ago
|
Whiteboard: gfx-noted
Updated•10 years ago
|
Crash Signature: [@ mozilla::image::imgFrame::GetAnimationData()] → [@ mozilla::image::imgFrame::GetAnimationData()]
[@ mozilla::image::imgFrame::GetAnimationData() const ]
OS: Windows NT → All
Comment 2•10 years ago
|
||
This should be fixed by bug 1079627. The underlying problem will be fixed in bug 1126739. The aim is to uplift both of these to Aurora. I cannot set the dependencies correctly here because of a circular dependency, unfortunately.
Depends on: 1126739
Comment 3•10 years ago
|
||
I think I hit this too: https://crash-stats.mozilla.com/report/index/6a40a5b3-2f74-415c-bfaa-d6eb32150205
Updating to see if I just don't have the latest version.
Updated•9 years ago
|
Crash Signature: [@ mozilla::image::imgFrame::GetAnimationData()]
[@ mozilla::image::imgFrame::GetAnimationData() const ] → [@ mozilla::image::imgFrame::GetAnimationData()]
[@ mozilla::image::imgFrame::GetAnimationData() const ]
[@ mozilla::image::imgFrame::GetAnimationData]
[@ mozilla::image::imgFrame::GetAnimationData const ]
Given that we just have a steady stream of these crashes coming in, I'd say none of the listed possible dependencies fully resolved the issue.
Assignee: nobody → aosmond
Assignee | ||
Comment 5•8 years ago
|
||
It fails trying to get the pointer for imgFrame::mMonitor::mMutex::mLock (at offset 0x4 or 0x8 depending on arch), which means it is imgFrame pointer itself that is null. As such, the signature would be different in a debug build as we would trip the assertion at https://dxr.mozilla.org/mozilla-central/rev/b1156b0eb96fcb49966b20e5fcf6a01f634ea2ee/image/FrameAnimator.cpp#419 but I don't see any bugs filed against it, so thus far it has only been observed in release builds.
Assignee | ||
Comment 6•8 years ago
|
||
We failed to get the previous frame for the blend. We don't know about the status for the next frame (for the purposes of FrameAnimator::DoBlend, aNextFrameIndex is always aPrevFrameIndex + 1.)
We know that FrameAnimator::GetRawFrame failed gracefully because if it thought it successfully acquired an imgFrame, it would have failed on the exact same error earlier when creating an RawAccessFrameRef from the DrawableSurface's imgFrame pointer, as imgFrame::LockImageData acquires mMonitor. So https://dxr.mozilla.org/mozilla-central/rev/b1156b0eb96fcb49966b20e5fcf6a01f634ea2ee/image/FrameAnimator.cpp#405 was never reached.
Thus either the DrawableSurface::Seek or SurfaceCache::Lookup failed.
Assignee | ||
Comment 7•8 years ago
|
||
Before FrameAnimator::DoBlend is called, FrameAnimator::AdvanceFrame acquires the RawAccessFrameRef for the next frame. In order to continue, the decoding must either be finished, or we have the complete next frame.
https://dxr.mozilla.org/mozilla-central/rev/b1156b0eb96fcb49966b20e5fcf6a01f634ea2ee/image/FrameAnimator.cpp#206
It might be a problem with just the previous/current frame then rather than both *or* the decoding completed, updated the frame count but did not supply the necessary frames.
Assignee | ||
Comment 8•8 years ago
|
||
Analysis of a minidump indicates the following:
1) aPrevFrameIndex is 0, aNextFrameIndex is 1.
2) FrameAnimator::GetRawFrame returns null for both indices.
3) The stack suggests the possibility that SurfaceCache::Lookup found the image key but failed on the surface key step.
The implications of 1 and 2 while in FrameAnimator::DoBlend
1) AnimationState::mDoneDecoding is true, because we did not already return (https://dxr.mozilla.org/mozilla-central/rev/82d0a583a9a39bf0b0000bccbf6d5c9ec2596bcc/image/FrameAnimator.cpp#206).
2) AnimationState::KnownFrameCount > 1, because we did not already return (https://dxr.mozilla.org/mozilla-central/rev/82d0a583a9a39bf0b0000bccbf6d5c9ec2596bcc/image/FrameAnimator.cpp#180).
This suggests the animated image successfully decoded, but its frames failed to be put in SurfaceCache or they somehow got evicted.
Assignee | ||
Comment 9•8 years ago
|
||
One possible sequence of events:
1) Precondition: Animated image is fully decoded.
2) imgFrame::Draw fails (maybe the surface/volatile buffer was purged, for example).
3) RasterImage::RecoverFromInvalidFrames is called, triggers the removal of the image/surfaces from SurfaceCache and re-adds only the image entry (https://dxr.mozilla.org/mozilla-central/rev/8a494adbc5cced90a4edf0c98cffde906bf7f3ae/image/RasterImage.cpp#1220).
4) One of the following happened
a) RasterImage::mHasSourceData remains false. Perhaps the final RasterImage::OnImageDataAvailable (off main thread) has triggered, we finished decoding, and a Draw came in before OnImageDataComplete (main thread) could trigger. As a result, our requested RasterImage::Decode would happen asynchronously instead of synchronously (https://dxr.mozilla.org/mozilla-central/rev/8a494adbc5cced90a4edf0c98cffde906bf7f3ae/image/RasterImage.cpp#1092).
b) Decoder::InitInternal failed (specifically nsPNGDecoder::InitInternal is in theory but unlikely to be fallible), decode aborted silently (https://dxr.mozilla.org/mozilla-central/rev/8a494adbc5cced90a4edf0c98cffde906bf7f3ae/image/DecoderFactory.cpp#188).
c) SurfaceCache::Insert failed, decode aborted silently (https://dxr.mozilla.org/mozilla-central/rev/8a494adbc5cced90a4edf0c98cffde906bf7f3ae/image/DecoderFactory.cpp#203).
5) Image still sits in the nsRefreshDriver list, it attempts to advance the frame, AnimationState::mDoneDecoding remains true (because we didn't reset it) but there are no frames in SurfaceCache (because we deleted them, but either failed to start the redecode, or haven't yet finished it).
6) Boom, crash.
Assignee | ||
Comment 10•8 years ago
|
||
It is difficult to know if I really captured the crash sequence given I have not been able to reproduce myself :). Thankfully (??) we see this each day on recent nightlies from the crash reports, so if they stop after landing, that should be sufficient confirmation.
try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=6ae0310a037a
Attachment #8791739 -
Flags: review?(edwin)
Assignee | ||
Comment 11•8 years ago
|
||
Just for posterity; if the above patch doesn't fix it, this has a good chance of avoiding the crash (but doesn't answer why we are missing the next frame when decoding says we are done).
Assignee | ||
Updated•8 years ago
|
Attachment #8791741 -
Attachment is obsolete: true
Assignee | ||
Updated•8 years ago
|
Status: NEW → ASSIGNED
Assignee | ||
Comment 12•8 years ago
|
||
It is possible bug 1296706 is related to this. The assert implies that AnimationState::mDoneDecoding is true but we are adding more frames. Perhaps the original decode ended early for some transient condition, but on a redecode, it was able to complete. Attachment 8791739 [details] [diff] would potentially fix this, although I think we would need to also reset the frame count if true, because if a redecode can give more frames, it could also give fewer in theory.
See Also: → 1296706
Comment on attachment 8791739 [details] [diff] [review]
Ensure consistent animated image state when redecoding, v1
Review of attachment 8791739 [details] [diff] [review]:
-----------------------------------------------------------------
Sounds plausible.
Attachment #8791739 -
Flags: review?(edwin) → review+
Comment 14•8 years ago
|
||
Pushed by aosmond@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/c0021d576b3a
Ensure the animated image state is consistent if redecoding fails or is asynchronous. r=edwin
Comment 15•8 years ago
|
||
bugherder |
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
status-firefox51:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
Comment 16•8 years ago
|
||
Well, at least you got one nightly, maybe that'll be enough to tell whether you're on the right track for fixing the crash.
Backed out in https://hg.mozilla.org/mozilla-central/rev/17242152a8fb for reftest failures, I'm not sure I'll find all of the ones we've filed but certainly all of bug 1303449 bug 1303490 bug 1303492 bug 1303505 bug 1303506 bug 1303491 and my favorite for the assertions overflowing the maximum log size bug 1303502.
Status: RESOLVED → REOPENED
status-firefox51:
fixed → ---
Resolution: FIXED → ---
Target Milestone: mozilla51 → ---
Assignee | ||
Comment 17•8 years ago
|
||
The crash reports confirmed that the issue (at least with my other patch breaking things) that this issue can happen on Linux as well as Mac OS X and Windows. The intermittent failures were likely a result of breaking multiple decode requests (mea culpa) in the normal case, not the failure case.
I have a new theory which extends from comment 9 which is probably the most likely scenario.
4) d) extends a) where by the first decoder is still active when the Draw fails (i.e. we started the animation because we have some frames but not all of them yet). As a result, we evict the old surface provider and create a new one, however the original decoding is still happening in the background. When the first decoder finishes, it sets AnimationState::mDoneDecoding to true (shared state between multiple decoders!) even though the surface provider for the new decode attempt may not have processed *any* frames yet.
The silver lining here is that the present state of AnimationState living in RasterImage instead of FrameAnimator and AnimationState living in AnimatedSurfaceProvider is temporary, as per Bug 1288040 Comment 0. That would be its very nature solve these races on the shared state between multiple decoders.
Given a much better understanding now of how we get into this state (and the fact that we know the code needs to be refactored anyways), I am much more comfortable with removing the mDoneDecoding check which is a very minor optimization given we already incurred the cost of hitting SurfaceCache for the next frame beforehand (which can also tell us what we want to know in a much safer manner).
Assignee | ||
Comment 18•8 years ago
|
||
In addition to the above, this change should be much safer and *not* break half the tests which require image decoding... ;).
try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=e44ce3071ef8
Attachment #8791739 -
Attachment is obsolete: true
Attachment #8793336 -
Flags: review?(edwin)
Attachment #8793336 -
Flags: review?(edwin) → review+
Let's do a try run this time to make sure, eh. ;-)
Comment 20•8 years ago
|
||
Pushed by aosmond@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/6a41bd18fe6c
Always check if the next frame is available before advancing an animation.
Assignee | ||
Comment 21•8 years ago
|
||
(In reply to Edwin Flores [:eflores] [:edwin] from comment #19)
> Let's do a try run this time to make sure, eh. ;-)
Yes. It had a clean run this time!
Comment 22•8 years ago
|
||
bugherder |
Status: REOPENED → RESOLVED
Closed: 8 years ago → 8 years ago
status-firefox52:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
Assignee | ||
Comment 23•8 years ago
|
||
Comment on attachment 8793336 [details] [diff] [review]
Ignore decoding complete and always check next frame exists, v2
Approval Request Comment
[Feature/regressing bug #]: N/A
[User impact if declined]: May crash when displaying through an animation.
[Describe test coverage new/current, TreeHerder]: Only has been observed via crash reports as condition necessary to reproduce depends on the system releasing graphics buffers (known possibility but poorly handled for animated images). Crash reports have ceased in builds containing the patch since landing.
[Risks and why]: Very low risk. We crashed because we assumed if we have finished decoding, we must have all of the necessary frames, rather than check each frame before use. Now we no longer make that assumption and always check the frame before use. This makes the code path before and after finishing decoding the same, so it was already well tested.
[String/UUID change made/needed]: N/A
Attachment #8793336 -
Flags: approval-mozilla-aurora?
Comment 24•8 years ago
|
||
Comment on attachment 8793336 [details] [diff] [review]
Ignore decoding complete and always check next frame exists, v2
Fix a crash. Take it in 51 aurora.
Attachment #8793336 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Updated•8 years ago
|
status-firefox51:
--- → affected
Comment 25•8 years ago
|
||
bugherder uplift |
Assignee | ||
Comment 26•8 years ago
|
||
Comment on attachment 8793336 [details] [diff] [review]
Ignore decoding complete and always check next frame exists, v2
Further soaking in aurora and nightly hasn't revealed any problems as a result of this, and the crashes have stopped for aurora since uplifting. Still seeing recent reports from beta users.
Approval Request Comment
[Feature/regressing bug #]: N/A
[User impact if declined]: May crash when displaying through an animation.
[Describe test coverage new/current, TreeHerder]: Only has been observed via crash reports as condition necessary to reproduce depends on the system releasing graphics buffers (known possibility but poorly handled for animated images). Crash reports have ceased in nightly and aurora builds containing the patch since landing.
[Risks and why]: Very low risk. We crashed because we assumed if we have finished decoding, we must have all of the necessary frames, rather than check each frame before use. Now we no longer make that assumption and always check the frame before use. This makes the code path before and after finishing decoding the same, so it was already well tested.
[String/UUID change made/needed]: N/A
Attachment #8793336 -
Flags: approval-mozilla-beta?
Assignee | ||
Comment 27•8 years ago
|
||
I highly suspect the crash signature [@ mozilla::image::FrameAnimator::AdvanceFrame ] is a variant of the crash tracked here, but is also fixed by this patch.
Assignee | ||
Updated•8 years ago
|
Crash Signature: [@ mozilla::image::imgFrame::GetAnimationData()]
[@ mozilla::image::imgFrame::GetAnimationData() const ]
[@ mozilla::image::imgFrame::GetAnimationData]
[@ mozilla::image::imgFrame::GetAnimationData const ] → [@ mozilla::image::imgFrame::GetAnimationData()]
[@ mozilla::image::imgFrame::GetAnimationData() const ]
[@ mozilla::image::imgFrame::GetAnimationData]
[@ mozilla::image::imgFrame::GetAnimationData const ]
[@ mozilla::image::FrameAnimator::AdvanceFram…
Comment on attachment 8793336 [details] [diff] [review]
Ignore decoding complete and always check next frame exists, v2
Fixes a crash, has stabilized on Nightly and Aurora, Beta50+
Attachment #8793336 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
status-firefox50:
--- → affected
Comment 29•8 years ago
|
||
bugherder uplift |
Comment 30•8 years ago
|
||
Bug 1346510 provides a more likely seeming explanation for how we were hitting this case.
You need to log in
before you can comment on or make changes to this bug.
Description
•