Crash in ExpirationTrackerImpl<T>::RemoveObjectLocked

RESOLVED FIXED in Firefox 56

Status

()

defect
P3
critical
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: calixte, Assigned: aosmond)

Tracking

({crash})

57 Branch
mozilla57
Unspecified
All
Points:
---
Bug Flags:
qe-verify -

Firefox Tracking Flags

(firefox-esr52 wontfix, firefox55 wontfix, firefox56 fixed, firefox57 verified)

Details

(Whiteboard: [gfx-noted], crash signature)

Attachments

(2 attachments, 1 obsolete attachment)

This bug was filed from the Socorro interface and is 
report bp-875c3e4f-d371-4700-8655-84dee0170906.
=============================================================

There are 16 crashes in nightly 57, 167 in beta 56, 168 in release 55 and 227 in esr 52.

:aosmond, could you investigate please ?
Flags: needinfo?(aosmond)
I can guess what happened. ExpirationTrackerImpl::AddObjectLocked can fail due to a number of conditions. If it fails even once, it can cause a signature similar to this crash later on.
Assignee: nobody → aosmond
Status: NEW → ASSIGNED
Component: Graphics → ImageLib
Priority: -- → P3
Whiteboard: [gfx-noted]
This patch discards the surface if it cannot track it (only can happen if it is unlocked, so in theory, safe to discard even if premature).

try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=26d2e2d084091eccbe8f56875aeb11d661a7dc2e
Flags: needinfo?(aosmond)
Attachment #8905159 - Flags: review?(tnikkel)
Comment on attachment 8905159 [details] [diff] [review]
Gracefully handle failures in SurfaceCacheImpl::StartTracking., v1

Hmm, just realized that MarkUsed suffers from the same problem. Hold on.
Attachment #8905159 - Flags: review?(tnikkel)
(In reply to Andrew Osmond [:aosmond] from comment #4)
> Created attachment 8905199 [details] [diff] [review]
> Gracefully handle failures in SurfaceCacheImpl::StartTracking., v2
> 
> Let's try that again...
> 
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=84ff209245173549bb463f8c2ab6ec781ad3a522

Dammit...thinking once more... Handling cache insert and unlock failures with removals was probably okay. Handling *mark used* the same way might be not the best thing to do, as we clearly *just accessed* the surface. Timothy, maybe we are better off just adding some extra state which indicates whether or not we *expect* the surface to be in the tracker (simply so we can assert on it to catch any coding errors...) rather than removing it or asserting it is in shutdown (could easily be an OOM)?
(In reply to Andrew Osmond [:aosmond] from comment #5)
> (In reply to Andrew Osmond [:aosmond] from comment #4)
> > Created attachment 8905199 [details] [diff] [review]
> > Gracefully handle failures in SurfaceCacheImpl::StartTracking., v2
> > 
> > Let's try that again...
> > 
> > https://treeherder.mozilla.org/#/
> > jobs?repo=try&revision=84ff209245173549bb463f8c2ab6ec781ad3a522
> 
> Dammit...thinking once more... Handling cache insert and unlock failures
> with removals was probably okay. Handling *mark used* the same way might be
> not the best thing to do, as we clearly *just accessed* the surface.
> Timothy, maybe we are better off just adding some extra state which
> indicates whether or not we *expect* the surface to be in the tracker
> (simply so we can assert on it to catch any coding errors...) rather than
> removing it or asserting it is in shutdown (could easily be an OOM)?

I think it's reasonable to handle MarkUsed that way even if we just accessed the surface. If it fails we are in bad shape and we shouldn't try to pretend to be doing well.
Attachment #8905199 - Flags: review?(tnikkel) → review+
Pushed by philringnalda@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/1e6956da301f
Gracefully handle failures in SurfaceCacheImpl::StartTracking. r=tnikkel
https://hg.mozilla.org/mozilla-central/rev/1e6956da301f
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Going off crash-stats, it looks like this patch was effective. Can you please provide a rebased patch for Beta (doesn't graft cleanly) and nominate it for approval? Thanks!
Flags: needinfo?(aosmond)
Crash Signature: [@ ExpirationTrackerImpl<T>::RemoveObjectLocked] → [@ ExpirationTrackerImpl<T>::RemoveObjectLocked] [@ InvalidArrayIndex_CRASH | ExpirationTrackerImpl<T>::RemoveObjectLocked]
Flags: needinfo?(aosmond)
try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=50faf92d5a8b1239afb6f86a089c96f146b1eda2

Rebased for mozilla-beta. Waiting on the try results before requesting uplift to make sure I didn't make any mistakes.
Attachment #8906608 - Flags: review+
Comment on attachment 8906608 [details] [diff] [review]
[uplift for beta] Gracefully handle failures in SurfaceCacheImpl::StartTracking., v1 [carries r=tnikkel]

Approval Request Comment
[Feature/Bug causing the regression]: N/A
[User impact if declined]: Occasional crashes due to OOM and/or during shutdown, mostly in content process.
[Is this code covered by automated tests?]: No. We test the surface cache which are where the changes are, but we never seem to hit the OOM or shutdown condition necessary to crash.
[Has the fix been verified in Nightly?]: Yes. Crash reports stopped as soon as the fix landed in nightly.
[Needs manual test from QE? If yes, steps to reproduce]: No.
[List of other uplifts needed for the feature/fix]: None.
[Is the change risky?]: No.
[Why is the change risky/not risky?]: It is well contained in one module and boils down to actually handling previously ignored error codes.
[String changes made/needed]: None.
Attachment #8906608 - Flags: approval-mozilla-beta?
Comment on attachment 8906608 [details] [diff] [review]
[uplift for beta] Gracefully handle failures in SurfaceCacheImpl::StartTracking., v1 [carries r=tnikkel]

Crash fix, looks useful from nightly results, let's take this for beta 11.
Attachment #8906608 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(In reply to Andrew Osmond [:aosmond] from comment #11)
> [Needs manual test from QE? If yes, steps to reproduce]: No.
Marking this issue as qe-verify -, per Andrew's assessment on manual qa needs.
Flags: qe-verify-
No more crash on 57.
Bah, I uplifted this yesterday but forgot to mark the bug afterward. This made it in for b11.
https://hg.mozilla.org/releases/mozilla-beta/rev/0b5841c7c399
I looked at the crashes with the given signatures on 56.0b11 -- they are completely different issues from this one, so I wouldn't expect the crash rate to go to zero.
You need to log in before you can comment on or make changes to this bug.