Closed Bug 1396870 Opened 7 years ago Closed 7 years ago

imgLoader::CheckCacheLimits() janks vnc_playback.html

Categories

(Core :: Graphics: ImageLib, enhancement)

53 Branch
enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla57
Performance Impact high
Tracking Status
firefox57 --- fixed

People

(Reporter: bkelly, Assigned: bkelly)

References

(Depends on 1 open bug)

Details

Attachments

(2 files)

+++ This bug was initially created as a clone of Bug #1369317 +++

While we have fixed some data:url performance issues vnc_playback.html is still seeing jank. Profile:

https://perfht.ml/2w3mLa9

For example, destroying HtmlImageElements seems quite slow because of some binary heap operations in imgCache.
Looks like a lot of this is coming from imgLoader::CheckCacheLimits() here:

http://searchfox.org/mozilla-central/source/image/imgLoader.cpp#1624

We should probably limit how much work we do at any one time in this loop and maybe defer it to an idle runnable.

Timothy, what do you think?
Flags: needinfo?(tnikkel)
Summary: destroy HTMLImageElement objects janks vnc_playback.html → imgLoader::CheckCacheLimits() janks vnc_playback.html
I see all the time spent in imgLoader::CheckCacheLimits when it is being called from IdleRunnableWrapper which is calling AsyncFreeSnowWhite. So isn't all this already being done on an idle runnable (which seems to have bad timing)?
Flags: needinfo?(tnikkel)
(In reply to Timothy Nikkel (:tnikkel) from comment #2)
> I see all the time spent in imgLoader::CheckCacheLimits when it is being
> called from IdleRunnableWrapper which is calling AsyncFreeSnowWhite. So
> isn't all this already being done on an idle runnable (which seems to have
> bad timing)?

Yes, but its running a while loop with no break.  I was suggesting it could stop the loop early if its exceeding some time budget (4ms, etc) and then schedule another idle runnable to start the loop again.

I see now, though, that its actually spending most of the time in make_heap() which is probably triggered from the IsDirty() check triggering Refresh():

http://searchfox.org/mozilla-central/source/image/imgLoader.cpp#1011

So it seems like we're constantly re-sorting the queue during the destruction of the HTMLImageElements for some reason.  I wonder if its remarking itself dirty within the loop somehow.
Oh, I see.  Every HTMLImageElement that is destroyed triggers this:

* ~HTMLImageElement() calls DestroyImageLoadingContent()
* DestroyImageLoadingContent() ends up calling imgRequestProxy::CancelAndForgetObserver()
* imgRequestProxy::CancelAndForgetObserver() eventually calls imgLoader::SetHasNoProxies()
* imgLoader::SetHasNoProxies() does a queue.Push() which dirties the list
* imgLoader::SetHasNoProxies() does a CheckCacheLimits() which may completely resort the list because its dirty

I wonder why queue.Push() doesn't do a std::push_heap() if the queue is not dirty.  That is only O(log n) while make_heap is always O(3 * n).  You would have to do a lot of Push() calls to make it worth doing the full make_heap call.
(In reply to Ben Kelly [:bkelly] from comment #4)
> Oh, I see.  Every HTMLImageElement that is destroyed triggers this:
> 
> * ~HTMLImageElement() calls DestroyImageLoadingContent()
> * DestroyImageLoadingContent() ends up calling
> imgRequestProxy::CancelAndForgetObserver()
> * imgRequestProxy::CancelAndForgetObserver() eventually calls
> imgLoader::SetHasNoProxies()
> * imgLoader::SetHasNoProxies() does a queue.Push() which dirties the list
> * imgLoader::SetHasNoProxies() does a CheckCacheLimits() which may
> completely resort the list because its dirty
> 
> I wonder why queue.Push() doesn't do a std::push_heap() if the queue is not
> dirty.  That is only O(log n) while make_heap is always O(3 * n).  You would
> have to do a lot of Push() calls to make it worth doing the full make_heap
> call.

Good find. I doubt there is a good reason. The code dates back to when the image cache was first created, so the issue could easily have not been considered.
I have a patch.  Let me see how it looks on try and maybe we can ask Pierre to test with it.
Attachment #8904701 - Flags: review+
Whiteboard: [qf] → [qf:p1]
I think to make the previous patch more effective we should try to avoid dirting the queue in some more places.  This patch:

1. Avoids marking an empty or single-entry queue as dirty.  By definition the sort can't be wrong for these queues.
2. Keeps the sort order intact when Remove() is called on the first entry in a clean list.

For example, previously calling EvictEntries would leave the queue dirtied.  This dirty flag would persist while many new entries are added.  When a Pop() occurs then it triggers a painful sort.  This patch tries to solve the problem by cleaning the dirty flag when empty so items can be efficiently inserted as we go.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=a5e567a720626553455f24fbc44f8a8accd41f69
Attachment #8904726 - Flags: review?(tnikkel)
Blocks: 1396982
Attachment #8904726 - Flags: review?(tnikkel) → review+
Pierre, can you test one of the builds in this try push to see if the jank is any improved?

https://treeherder.mozilla.org/#/jobs?repo=try&revision=a5e567a720626553455f24fbc44f8a8accd41f69

For example, if you are on linux I think this download link should work:

https://queue.taskcluster.net/v1/task/FA4EfneFRaeJiBf2dkjkfw/runs/0/artifacts/public/build/target.tar.bz2

Or mac:

https://queue.taskcluster.net/v1/task/E1fUpLb7TLS8pg8dI3ydiQ/runs/0/artifacts/public/build/target.dmg

(Note, for mac you have to turn off app signing checks in system preferences.)

The windows build is not done yet, but should be later this evening.

Thanks!
Flags: needinfo?(ossman)
Pushed by bkelly@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/c3218dfedd68
P1 Avoid dirting the image cache queue when inserting a new item into a clean queue. r=tnikkel
https://hg.mozilla.org/integration/mozilla-inbound/rev/5ac7748ad699
P2 Avoid dirtying when removing front entry or when the queue is empty. r=tnikkel
Visually it seems to be a lot better. No clear janks in the animation. However the run times still vary quite a bit.

I've uploaded another trace here:

https://perfht.ml/2wFhfOV

Something differs though as I'm only getting a single content row, and there's more profiling data missing in that start of that row.
Flags: needinfo?(ossman)
https://hg.mozilla.org/mozilla-central/rev/c3218dfedd68
https://hg.mozilla.org/mozilla-central/rev/5ac7748ad699
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
(In reply to Pierre Ossman from comment #11)
> Visually it seems to be a lot better. No clear janks in the animation.

Great!

> However the run times still vary quite a bit.

This might be hard to track down the cause, unfortunately.  As we move more work to idle dispatch this should improve.

If you have time to capture a profile for both a slow run and a fast run we might be able to see what the differences are between them.

> I've uploaded another trace here:
> 
> https://perfht.ml/2wFhfOV

Thanks!  I filed bug 1397297 to maybe improve our memory management a bit more here.  I also filed bug 1397304 for another minor image cache optimization.  It probably won't make a huge difference here, though.

> Something differs though as I'm only getting a single content row, and
> there's more profiling data missing in that start of that row.

This just means you only had a single content process running.  This could be because your profile was opted in to running a single process by one of our experiments or because you only opened one tab.  The "missing" data at the start is likely just time before the content process started.  So I think the profile is normal.

By the way, I'm impressed you got symbols in this profile from the builds I gave you.  I don't think the tool does that automatically for the try builds.  Did you load the symbols manually?

Thanks again!
(In reply to Ben Kelly [:bkelly] from comment #13)
> By the way, I'm impressed you got symbols in this profile from the builds I
> gave you.  I don't think the tool does that automatically for the try
> builds.  Did you load the symbols manually?

Apparently the linux build has symbols in the binary, so I assume thats how it worked.  Sorry for my confusion.
(In reply to Ben Kelly [:bkelly] from comment #13)
> 
> If you have time to capture a profile for both a slow run and a fast run we
> might be able to see what the differences are between them.
> 

Same scenario, but real time run:

https://perfht.ml/2wGjS2Q

It seems like the profiler is limited in how much data it stores? The run was actually 7 minutes.
(In reply to Pierre Ossman from comment #15)
> (In reply to Ben Kelly [:bkelly] from comment #13)
> > 
> > If you have time to capture a profile for both a slow run and a fast run we
> > might be able to see what the differences are between them.
> > 
> 
> Same scenario, but real time run:
> 
> https://perfht.ml/2wGjS2Q
> 
> It seems like the profiler is limited in how much data it stores? The run
> was actually 7 minutes.

Hmm, yea the sample buffer is limited.  It may not be possible to measure the entire length of the run.

Did this profile run faster than the last one?
You can modify the buffer size being used to be able to record longer runs: click open the profiler settings and adjust the slider for "Buffer size".
(In reply to Ben Kelly [:bkelly] from comment #16)
> 
> Did this profile run faster than the last one?

I think I might have misunderstood what you asked for. Did you want two as-fast-as-possible runs that simply ended up having different run times? :)
If so, then that's already in the previous trace. You can see a little peak in the "Content" graph between each run, so just divide the run on each such peak and you should be able to compare.
Performance Impact: --- → P1
Whiteboard: [qf:p1]
You need to log in before you can comment on or make changes to this bug.