Free ImageSurfaceCache outside of the SurfaceCache mutex lock

RESOLVED FIXED in Firefox 57

Status

()

enhancement
P3
normal
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: aosmond, Assigned: aosmond)

Tracking

unspecified
mozilla57
Points:
---

Firefox Tracking Flags

(firefox57 fixed)

Details

(Whiteboard: [gfx-noted])

Attachments

(3 attachments, 6 obsolete attachments)

1.42 KB, patch
tnikkel
: review+
Details | Diff | Splinter Review
3.45 KB, patch
aosmond
: review+
Details | Diff | Splinter Review
11.83 KB, patch
tnikkel
: review+
Details | Diff | Splinter Review
(Assignee)

Description

2 years ago
Whenever we need to free an ImageSurfaceCache object, we should do so outside of the SurfaceCache mutex lock to avoid deadlock scenarios. There is a problem today where the object stored inside the cache (i.e. the surface provider) needs to free other objects (i.e. RasterImage, Decoder, SourceBuffer) which can call back into the SurfaceCache (which is not re-entrant). We work around this by doing a dispatch, like so:

http://searchfox.org/mozilla-central/rev/4b79f3b23aebb4080ea85e94351fd4046116a957/image/AnimationSurfaceProvider.cpp#46

Ideally we could remove the dispatches and always free immediately. This will be useful for changes such as those in bug 523950 which keep the image and/or decoder objects around for a lot longer than is typical today (e.g. until shutdown) which will either trigger the deadlock detector (because we hold the same lock twice) or the leak detector (because our system group dispatch gets tossed on the floor).

This can be accomplished by keeping an array inside the SurfaceCache which is a holding area for ImageSurfaceCache objects that we wish to discard. Whenever we call an API which may cause a discard, we need to move the array contents into an array on the stack while holding the lock, and release the stack array outside the lock.
(Assignee)

Updated

2 years ago
Assignee: nobody → aosmond
Status: NEW → ASSIGNED
Priority: -- → P3
Whiteboard: [gfx-noted]
(Assignee)

Comment 4

2 years ago
I realized I missed the most important part -- shutdown should also release the instance outside the lock too!

try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=963bb2060a12b616bfdda821c5bdf4221f83fd2f
Attachment #8896268 - Attachment is obsolete: true
(Assignee)

Comment 5

2 years ago
Fix minor nit, I had a different approach initially and that was left over in the DiscardAll method that should not have been altered. No functional change.
Attachment #8896282 - Attachment is obsolete: true
(Assignee)

Updated

2 years ago
Blocks: 523950
(Assignee)

Updated

2 years ago
Attachment #8896267 - Flags: review?(nfroyd)
(Assignee)

Updated

2 years ago
Attachment #8896284 - Flags: review?(tnikkel)
(Assignee)

Updated

2 years ago
Attachment #8896269 - Flags: review?(tnikkel)
Comment on attachment 8896267 [details] [diff] [review]
Part 1. Add nsExpirationTracker::NotifyEndTransaction(Locked) callbacks., v1

Review of attachment 8896267 [details] [diff] [review]:
-----------------------------------------------------------------

r=me, but let's discuss the below to see if we could do better for the common case here.

::: xpcom/ds/nsExpirationTracker.h
@@ +346,5 @@
> +   * This may be overridden to perform any post-aging work that needs to be
> +   * done while still holding the lock. It will be called once after each timer
> +   * event, and each low memory event has been handled.
> +   */
> +  virtual void NotifyEndTransactionLocked(const AutoLock&) { };

Hm, I guess we don't want to call this NotifyEndAging{,Locked} because it's not being called after all calls to Age*.  NotifyHandlerEnd{,Locked}, to suggest that it's called from HandleLowMemory and HandleTimeout?  "Transaction" just doesn't feel like the best description of what's going on here.

@@ +353,5 @@
> +   * This may be overridden to perform any post-aging work that needs to be
> +   * done outside the lock. It will be called once after each
> +   * NotifyEndTransactionLocked call.
> +   */
> +  virtual void NotifyEndTransaction() { };

I'm not super-excited to add virtual functions which won't get much use.

Could we keep these, and add `final` overrides in nsExpirationTracker that do nothing, so maybe in the common case, the compiler can see that the virtual calls can actually be lowered to normal calls, and then inlined, which would make the overhead go away?  I think that's compatible with your overrides in SurfaceTracker.

I'm not sure if that's what would actually happen--whether the compiler is smart enough to do all that--but it'd be worth a shot, and would probably merit a comment describing what we're doing.  I'm also not sure if subclasses of nsExpirationTracker would actually want to override the methods...but if we come to that bridge, there are ways of solving that problem.
Attachment #8896267 - Flags: review?(nfroyd) → review+
(Assignee)

Comment 7

2 years ago
(In reply to Nathan Froyd [:froydnj] from comment #6)
> Comment on attachment 8896267 [details] [diff] [review]
> Part 1. Add nsExpirationTracker::NotifyEndTransaction(Locked) callbacks., v1
> 
> Review of attachment 8896267 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> r=me, but let's discuss the below to see if we could do better for the
> common case here.
> 
> ::: xpcom/ds/nsExpirationTracker.h
> @@ +346,5 @@
> > +   * This may be overridden to perform any post-aging work that needs to be
> > +   * done while still holding the lock. It will be called once after each timer
> > +   * event, and each low memory event has been handled.
> > +   */
> > +  virtual void NotifyEndTransactionLocked(const AutoLock&) { };
> 
> Hm, I guess we don't want to call this NotifyEndAging{,Locked} because it's
> not being called after all calls to Age*.  NotifyHandlerEnd{,Locked}, to
> suggest that it's called from HandleLowMemory and HandleTimeout? 
> "Transaction" just doesn't feel like the best description of what's going on
> here.
> 

Done.

> @@ +353,5 @@
> > +   * This may be overridden to perform any post-aging work that needs to be
> > +   * done outside the lock. It will be called once after each
> > +   * NotifyEndTransactionLocked call.
> > +   */
> > +  virtual void NotifyEndTransaction() { };
> 
> I'm not super-excited to add virtual functions which won't get much use.
> 
> Could we keep these, and add `final` overrides in nsExpirationTracker that
> do nothing, so maybe in the common case, the compiler can see that the
> virtual calls can actually be lowered to normal calls, and then inlined,
> which would make the overhead go away?  I think that's compatible with your
> overrides in SurfaceTracker.
> 
> I'm not sure if that's what would actually happen--whether the compiler is
> smart enough to do all that--but it'd be worth a shot, and would probably
> merit a comment describing what we're doing.  I'm also not sure if
> subclasses of nsExpirationTracker would actually want to override the
> methods...but if we come to that bridge, there are ways of solving that
> problem.

I implemented it as above, works for me. In the worst case where it is not smart enough, it will need to add an extra entry to its vtable that was not there before (for each subclass).

I guess the alternative would be to try and break the Notify* callbacks into another template to remove the virtual keyword entirely, which I assume will make it more likely to inline and remove the overhead.
Attachment #8896267 - Attachment is obsolete: true
(Assignee)

Updated

2 years ago
Attachment #8896399 - Flags: review+
(Assignee)

Comment 8

2 years ago
Update given the changes in part 1.
Attachment #8896284 - Attachment is obsolete: true
Attachment #8896284 - Flags: review?(tnikkel)
Attachment #8896400 - Flags: review?(tnikkel)
(Assignee)

Comment 9

2 years ago
Add an assert to TakeImageCachesDiscard to ensure the accepting discard array starts empty. If it isn't then that will cause a free while we hold a lock. That should never happen, but the SurfaceTracker case does involve a class variable instead of a stack variable like the rest.
Attachment #8896400 - Attachment is obsolete: true
Attachment #8896400 - Flags: review?(tnikkel)
Attachment #8896406 - Flags: review?(tnikkel)
(Assignee)

Updated

2 years ago
Attachment #8896406 - Flags: review?(tnikkel)
(Assignee)

Comment 10

2 years ago
Comment on attachment 8896269 [details] [diff] [review]
Part 3. AnimationSurfaceProvider no longer needs to always dispatch to free its RasterImage., v1

Doing some further testing, it definitely fixes the leak messages, but I still get the deadlock messages. I don't know how I missed them. Ugh. This needs to fix ImageSurfaceCache::Remove really. (Still need P1 though.)
Attachment #8896269 - Flags: review?(tnikkel)
(Assignee)

Comment 11

2 years ago
So it turns out it did work for the scenarios I originally tested (I would see on shutdown for example and that got fixed), but about:memory minimize did not (although with most other cases). Now it should be properly fixed. Just in case though, let's run the linux64 tests with the animated image decoding changes which should help trigger the deadlock case a lot more.

try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=4cc636c688fe084f94995c6ad0a28273ef78d623
Attachment #8896406 - Attachment is obsolete: true
(Assignee)

Updated

2 years ago
Attachment #8896530 - Flags: review?(tnikkel)
(Assignee)

Updated

2 years ago
Attachment #8896269 - Flags: review?(tnikkel)
Comment on attachment 8896530 [details] [diff] [review]
Part 2. Make the SurfaceCache free ImageSurfaceCache objects outside of the lock., v6

Hmm, we call Lookup pretty often, are we sure we want to do discarding of all surfaces in it?
(Assignee)

Comment 13

2 years ago
(In reply to Timothy Nikkel (:tnikkel) from comment #12)
> Comment on attachment 8896530 [details] [diff] [review]
> Part 2. Make the SurfaceCache free ImageSurfaceCache objects outside of the
> lock., v6
> 
> Hmm, we call Lookup pretty often, are we sure we want to do discarding of
> all surfaces in it?

Each call which may cause surfaces to be discarded should have a corresponding call to SurfaceCacheImpl::TakeDiscard. As a result, the calls shouldn't be freeing any more surfaces than they have historically. The only time Lookup will cause a surface to be freed is if a volatile buffer was purged.

That said, it does add a small amount of overhead to construct/destruct the LookupResult and nsTArray<RefPtr<CachedSurface>> objects on the stack. In theory it should be fairly rare and we could potentially let some other event clean it up for us at a less critical juncture...? We would be hanging onto the surface provider longer than we need to however. Worst case, when the expiration tracker next expires?
Attachment #8896530 - Flags: review?(tnikkel) → review+
Attachment #8896269 - Flags: review?(tnikkel) → review+

Comment 14

2 years ago
Pushed by aosmond@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/635b60924015
Part 1. Add nsExpirationTracker::NotifyEndTransaction(Locked) callbacks for subclasses to know when an aging iteration is complete. r=froydnj
https://hg.mozilla.org/integration/mozilla-inbound/rev/3f2e443dd48c
Part 2. Make the SurfaceCache free ImageSurfaceCache objects outside of the lock. r=tnikkel
https://hg.mozilla.org/integration/mozilla-inbound/rev/9ddc28603bea
Part 3. AnimationSurfaceProvider no longer needs to always dispatch to free its RasterImage. r=tnikkel

Comment 15

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/635b60924015
https://hg.mozilla.org/mozilla-central/rev/3f2e443dd48c
https://hg.mozilla.org/mozilla-central/rev/9ddc28603bea
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
You need to log in before you can comment on or make changes to this bug.