Closed Bug 1396870 Opened 7 years ago Closed 7 years ago

imgLoader::CheckCacheLimits() janks vnc_playback.html

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla57

Project Flags:

Performance Impact

high

Tracking Flags:

Tracking

Status

firefox57

---

fixed

People

(Reporter: bkelly, Assigned: bkelly)

References

(Depends on 1 open bug)

Details

Attachments

(2 files)

Avoid dirting the image cache queue when inserting a new item into a clean queue. r=tnikkel 7 years ago Ben Kelly [:bkelly, not reviewing] 1.28 KB, patch	tnikkel : review+	Details \| Diff \| Splinter Review
P2 Avoid dirtying when removing front entry or when the queue is empty. r=tnikkel 7 years ago Ben Kelly [:bkelly, not reviewing] 2.25 KB, patch	tnikkel : review+	Details \| Diff \| Splinter Review

Ben Kelly [:bkelly, not reviewing]

Assignee

Description

•

7 years ago

+++ This bug was initially created as a clone of Bug #1369317 +++ While we have fixed some data:url performance issues vnc_playback.html is still seeing jank. Profile: https://perfht.ml/2w3mLa9 For example, destroying HtmlImageElements seems quite slow because of some binary heap operations in imgCache.

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 1

•

7 years ago

Looks like a lot of this is coming from imgLoader::CheckCacheLimits() here: http://searchfox.org/mozilla-central/source/image/imgLoader.cpp#1624 We should probably limit how much work we do at any one time in this loop and maybe defer it to an idle runnable. Timothy, what do you think?

Flags: needinfo?(tnikkel)

Summary: destroy HTMLImageElement objects janks vnc_playback.html → imgLoader::CheckCacheLimits() janks vnc_playback.html

Timothy Nikkel (:tnikkel)

Comment 2

•

7 years ago

I see all the time spent in imgLoader::CheckCacheLimits when it is being called from IdleRunnableWrapper which is calling AsyncFreeSnowWhite. So isn't all this already being done on an idle runnable (which seems to have bad timing)?

Flags: needinfo?(tnikkel)

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 3

•

7 years ago

(In reply to Timothy Nikkel (:tnikkel) from comment #2) > I see all the time spent in imgLoader::CheckCacheLimits when it is being > called from IdleRunnableWrapper which is calling AsyncFreeSnowWhite. So > isn't all this already being done on an idle runnable (which seems to have > bad timing)? Yes, but its running a while loop with no break. I was suggesting it could stop the loop early if its exceeding some time budget (4ms, etc) and then schedule another idle runnable to start the loop again. I see now, though, that its actually spending most of the time in make_heap() which is probably triggered from the IsDirty() check triggering Refresh(): http://searchfox.org/mozilla-central/source/image/imgLoader.cpp#1011 So it seems like we're constantly re-sorting the queue during the destruction of the HTMLImageElements for some reason. I wonder if its remarking itself dirty within the loop somehow.

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 4

•

7 years ago

Oh, I see. Every HTMLImageElement that is destroyed triggers this: * ~HTMLImageElement() calls DestroyImageLoadingContent() * DestroyImageLoadingContent() ends up calling imgRequestProxy::CancelAndForgetObserver() * imgRequestProxy::CancelAndForgetObserver() eventually calls imgLoader::SetHasNoProxies() * imgLoader::SetHasNoProxies() does a queue.Push() which dirties the list * imgLoader::SetHasNoProxies() does a CheckCacheLimits() which may completely resort the list because its dirty I wonder why queue.Push() doesn't do a std::push_heap() if the queue is not dirty. That is only O(log n) while make_heap is always O(3 * n). You would have to do a lot of Push() calls to make it worth doing the full make_heap call.

Timothy Nikkel (:tnikkel)

Comment 5

•

7 years ago

(In reply to Ben Kelly [:bkelly] from comment #4) > Oh, I see. Every HTMLImageElement that is destroyed triggers this: > > * ~HTMLImageElement() calls DestroyImageLoadingContent() > * DestroyImageLoadingContent() ends up calling > imgRequestProxy::CancelAndForgetObserver() > * imgRequestProxy::CancelAndForgetObserver() eventually calls > imgLoader::SetHasNoProxies() > * imgLoader::SetHasNoProxies() does a queue.Push() which dirties the list > * imgLoader::SetHasNoProxies() does a CheckCacheLimits() which may > completely resort the list because its dirty > > I wonder why queue.Push() doesn't do a std::push_heap() if the queue is not > dirty. That is only O(log n) while make_heap is always O(3 * n). You would > have to do a lot of Push() calls to make it worth doing the full make_heap > call. Good find. I doubt there is a good reason. The code dates back to when the image cache was first created, so the issue could easily have not been considered.

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 6

•

7 years ago

I have a patch. Let me see how it looks on try and maybe we can ask Pierre to test with it.

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 7

•

7 years ago

Attached patch Avoid dirting the image cache queue when inserting a new item into a clean queue. r=tnikkel — Details — Splinter Review

https://treeherder.mozilla.org/#/jobs?repo=try&revision=8927c12ec8bb848ed0ae4893e69eb9dcf0623297

Assignee: nobody → bkelly

Status: NEW → ASSIGNED

Timothy Nikkel (:tnikkel)

Updated

•

7 years ago

Attachment #8904701 - Flags: review+

Jean Gong :jgong

Updated

•

7 years ago

Whiteboard: [qf] → [qf:p1]

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 8

•

7 years ago

Attached patch P2 Avoid dirtying when removing front entry or when the queue is empty. r=tnikkel — Details — Splinter Review

I think to make the previous patch more effective we should try to avoid dirting the queue in some more places. This patch: 1. Avoids marking an empty or single-entry queue as dirty. By definition the sort can't be wrong for these queues. 2. Keeps the sort order intact when Remove() is called on the first entry in a clean list. For example, previously calling EvictEntries would leave the queue dirtied. This dirty flag would persist while many new entries are added. When a Pop() occurs then it triggers a painful sort. This patch tries to solve the problem by cleaning the dirty flag when empty so items can be efficiently inserted as we go. https://treeherder.mozilla.org/#/jobs?repo=try&revision=a5e567a720626553455f24fbc44f8a8accd41f69

Attachment #8904726 - Flags: review?(tnikkel)

Ben Kelly [:bkelly, not reviewing]

Assignee

Updated

•

7 years ago

Blocks: 1396982

Timothy Nikkel (:tnikkel)

Updated

•

7 years ago

Attachment #8904726 - Flags: review?(tnikkel) → review+

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 9

•

7 years ago

Pierre, can you test one of the builds in this try push to see if the jank is any improved? https://treeherder.mozilla.org/#/jobs?repo=try&revision=a5e567a720626553455f24fbc44f8a8accd41f69 For example, if you are on linux I think this download link should work: https://queue.taskcluster.net/v1/task/FA4EfneFRaeJiBf2dkjkfw/runs/0/artifacts/public/build/target.tar.bz2 Or mac: https://queue.taskcluster.net/v1/task/E1fUpLb7TLS8pg8dI3ydiQ/runs/0/artifacts/public/build/target.dmg (Note, for mac you have to turn off app signing checks in system preferences.) The windows build is not done yet, but should be later this evening. Thanks!

Flags: needinfo?(ossman)

Pulsebot

Comment 10

•

7 years ago

Pushed by bkelly@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/c3218dfedd68 P1 Avoid dirting the image cache queue when inserting a new item into a clean queue. r=tnikkel https://hg.mozilla.org/integration/mozilla-inbound/rev/5ac7748ad699 P2 Avoid dirtying when removing front entry or when the queue is empty. r=tnikkel

Pierre Ossman

Comment 11

•

7 years ago

Visually it seems to be a lot better. No clear janks in the animation. However the run times still vary quite a bit. I've uploaded another trace here: https://perfht.ml/2wFhfOV Something differs though as I'm only getting a single content row, and there's more profiling data missing in that start of that row.

Flags: needinfo?(ossman)

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 12

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/c3218dfedd68 https://hg.mozilla.org/mozilla-central/rev/5ac7748ad699

Status: ASSIGNED → RESOLVED

Closed: 7 years ago

status-firefox57: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla57

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 13

•

7 years ago

(In reply to Pierre Ossman from comment #11) > Visually it seems to be a lot better. No clear janks in the animation. Great! > However the run times still vary quite a bit. This might be hard to track down the cause, unfortunately. As we move more work to idle dispatch this should improve. If you have time to capture a profile for both a slow run and a fast run we might be able to see what the differences are between them. > I've uploaded another trace here: > > https://perfht.ml/2wFhfOV Thanks! I filed bug 1397297 to maybe improve our memory management a bit more here. I also filed bug 1397304 for another minor image cache optimization. It probably won't make a huge difference here, though. > Something differs though as I'm only getting a single content row, and > there's more profiling data missing in that start of that row. This just means you only had a single content process running. This could be because your profile was opted in to running a single process by one of our experiments or because you only opened one tab. The "missing" data at the start is likely just time before the content process started. So I think the profile is normal. By the way, I'm impressed you got symbols in this profile from the builds I gave you. I don't think the tool does that automatically for the try builds. Did you load the symbols manually? Thanks again!

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 14

•

7 years ago

(In reply to Ben Kelly [:bkelly] from comment #13) > By the way, I'm impressed you got symbols in this profile from the builds I > gave you. I don't think the tool does that automatically for the try > builds. Did you load the symbols manually? Apparently the linux build has symbols in the binary, so I assume thats how it worked. Sorry for my confusion.

Pierre Ossman

Comment 15

•

7 years ago

(In reply to Ben Kelly [:bkelly] from comment #13) > > If you have time to capture a profile for both a slow run and a fast run we > might be able to see what the differences are between them. > Same scenario, but real time run: https://perfht.ml/2wGjS2Q It seems like the profiler is limited in how much data it stores? The run was actually 7 minutes.

Ben Kelly [:bkelly, not reviewing]

Assignee

Comment 16

•

7 years ago

(In reply to Pierre Ossman from comment #15) > (In reply to Ben Kelly [:bkelly] from comment #13) > > > > If you have time to capture a profile for both a slow run and a fast run we > > might be able to see what the differences are between them. > > > > Same scenario, but real time run: > > https://perfht.ml/2wGjS2Q > > It seems like the profiler is limited in how much data it stores? The run > was actually 7 minutes. Hmm, yea the sample buffer is limited. It may not be possible to measure the entire length of the run. Did this profile run faster than the last one?

Nathan Froyd [:froydnj]

Comment 17

•

7 years ago

You can modify the buffer size being used to be able to record longer runs: click open the profiler settings and adjust the slider for "Buffer size".

Pierre Ossman

Comment 18

•

7 years ago

(In reply to Ben Kelly [:bkelly] from comment #16) > > Did this profile run faster than the last one? I think I might have misunderstood what you asked for. Did you want two as-fast-as-possible runs that simply ended up having different run times? :) If so, then that's already in the previous trace. You can see a little peak in the "Content" graph between each run, so just divide the run on each such peak and you should be able to compare.

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

3 years ago

Performance Impact: --- → P1

Whiteboard: [qf:p1]

You need to log in before you can comment on or make changes to this bug.