Closed Bug 1109283 Opened 5 years ago Closed 5 years ago

Debug build shutdown crash in mozilla::image::SurfaceCacheImpl::StopTracking, when starting Firefox with no profile. (with "###!!! ASSERTION: Tried to remove an object that's not tracked: 'state->IsTracked()', nsExpirationTracker.h, line 141")

Categories

(Core :: ImageLib, defect)

x86_64
Linux
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla37
Tracking Status
firefox36 --- fixed
firefox37 --- fixed

People

(Reporter: dholbert, Assigned: seth)

References

Details

(Keywords: assertion, crash, regression)

Attachments

(2 files, 1 obsolete file)

Attached file backtrace
STR:
 1. In a debug build, run:
    ./dist/bin/firefox -profile ./bogusPath -no-remote
  (where "./bogusPath" is a path to a nonexistant directory)

 2. Click "OK" on the "Profile Missing" dialog that pops up.

ACTUAL RESULTS:
Assertion followed by crash.
The assertion is:
###!!! ASSERTION: Tried to remove an object that's not tracked: 'state->IsTracked()', file ../../dist/include/nsExpirationTracker.h, line 141

And the crash happens *while* we're trying to evaluate this assertion's condition:
> 144     NS_ASSERTION(generation.Length() > index &&
> 145                  generation[index] == aObj, "Object is lying about its index");
https://mxr.mozilla.org/mozilla-central/source/xpcom/ds/nsExpirationTracker.h#144

I'm not exactly sure why we crash there -- gdb told me 'index' was 173693530, and it crashed while trying to evaluate the result of "p generation.Length()".

This is inside of ~RasterImage's call into SurfaceCache::RemoveImage, FWIW. Full backtrace attached.
Attachment #8533931 - Attachment description: shutdown-crash.txt → backtrace
Flags: needinfo?(seth)
Keywords: regression
On OS X the STR above don't work for me, but I can reproduce by running:

> ./obj/dist/NightlyDebug.app/Contents/MacOS/firefox -ProfileManager -no-remote

And then clicking on the "Exit" button.
Flags: needinfo?(seth)
So during XPCOM shutdown, nsExpirationTracker::AddObject can fail because it tries to call |do_CreateInstance("@mozilla.org/timer;1")| via nsExpirationTracker::CheckStartTimer. This doesn't work during shutdown because XPCOM.

The solution is to check if that happened and warn instead of trying to remove the surface from the nsExpirationTracker in SurfaceCacheImpl::StopTracking. For now I'm not inclined to do anything special in StartTracking when this happens, even though we can detect it there; this is generally harmless since the surface will get freed when the image is released or imgIContainer::RequestDiscard is called or if we have a memory pressure event. And it seems almost impossible to hit this except during XPCOM shutdown.
This patch implements the strategy described in the previous comment. After applying this patch, I can no longer reproduce an assertion or a crash using the STR in this bug. (I do sometimes get the warning added in this patch, which is expected!)
Attachment #8534010 - Flags: review?(dholbert)
Assignee: nobody → seth
Status: NEW → ASSIGNED
Comment on attachment 8534010 [details] [diff] [review]
Handle failure of nsExpirationTracker::AddObject in the SurfaceCache

Looks good! Just a few nits:

>+++ b/image/src/SurfaceCache.cpp
>@@ -380,33 +380,40 @@ public:
> 
>     mAvailableCost -= costEntry.GetCost();
> 
>     if (aSurface->IsLocked()) {
>       mLockedCost += costEntry.GetCost();
>       MOZ_ASSERT(mLockedCost <= mMaxCost, "Locked more than we can hold?");
>     } else {
>       mCosts.InsertElementSorted(costEntry);
>+      // This may fail during XPCOM shutdown; we need to ensure the object is
>+      // tracked before calling RemoveObject in StopTracking, below.
>       mExpirationTracker.AddObject(aSurface);

Nit:
 s/XPCOM shutdown; we/XPCOM shutdown, which means we/

(Otherwise it sounds a bit like you're just stating two independent facts. The causal relationship is important to emphasize.)

>   void StopTracking(CachedSurface* aSurface)
>   {
[...]
>+      if (aSurface->GetExpirationState()->IsTracked()) {
>+        mExpirationTracker.RemoveObject(aSurface);
>+      } else {
>+        NS_WARNING("Not expiration tracking an unlocked surface!");
>+      }

(1) Maybe use MOZ_LIKELY() on this condition? It should be extremely rare that we fail the check.

(2) Add a comment somewhere near this code, explaining why we're bothering with the IsTracked() check.  (You have a comment in StartTracking, but don't have one here.)

e.g. maybe add this to the clause with the NS_WARNING:
        // Our AddObject() call in StartTracking() must've failed (e.g. due to OOM or
        // because xpcom is being shut down).

(3) Maybe add a hyphen to "expiration-tracking", in the NS_WARNING, to make it clearer where the verb in that sentence begins & ends. :)
Attachment #8534010 - Flags: review?(dholbert) → review+
Sure, I'll make those changes. Thanks for the review!
Attachment #8534010 - Attachment is obsolete: true
Here's a try job, out of paranoia:

https://tbpl.mozilla.org/?tree=Try&rev=06628cbbf103
https://hg.mozilla.org/mozilla-central/rev/6bfaf3bde1a0
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla37
Comment on attachment 8534173 [details] [diff] [review]
Handle failure of nsExpirationTracker::AddObject in the SurfaceCache

Approval Request Comment
[Feature/regressing bug #]: 1060869
[User impact if declined]: Potential crashes at shutdown.
[Describe test coverage new/current, TBPL]: On central.
[Risks and why]: Very low risk. Just makes us check if an operation has failed before we rely on its result.
[String/UUID change made/needed]: None.
Attachment #8534173 - Flags: approval-mozilla-aurora?
Attachment #8534173 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Depends on: 1161722
You need to log in before you can comment on or make changes to this bug.