Closed Bug 1392453 Opened 7 years ago Closed 7 years ago

arstechnica.com is a lot slower with OMTP enabled on D2D

Categories

(Core :: Graphics, defect, P1)

x86
Windows 10
defect

Tracking

()

RESOLVED FIXED
mozilla58
Tracking Status
firefox57 --- unaffected
firefox58 --- fixed

People

(Reporter: mchang, Assigned: bas.schouten)

References

Details

(Whiteboard: [gfx-noted])

Attachments

(2 files, 1 obsolete file)

Attached patch Cache blend effect (obsolete) — Splinter Review
We were getting 2+ second Present calls, which would make the paint thread wait on the compositor thread, which would make the main thread wait on the paint thread.

This was happening because arstechnica uses the multiply mixed blend mode on a basic layer. That means d2d was being accessed on the main thread while capturing, the paint thread while painting, and the compositor thread at potentially all the same time.

Following Bas' advice, I started commenting out moz2d draw commands and saw DrawSurface was slow. I narrowed this down to the mixed blend mode. I don't exactly know why this works, but caching the blend effect fixed the issue for me and now OMTP is better than w/o OMTP. I did read creating effects is expensive, but maybe we just happen to hit a buffer size that lets us skid by? Either way, not recreating effects is probably a good thing.
Attachment #8899629 - Flags: review?(bas)
Comment on attachment 8899629 [details] [diff] [review]
Cache blend effect

Review of attachment 8899629 [details] [diff] [review]:
-----------------------------------------------------------------

I remember this having some issues. Mainly, if you get a blend effect being used multiple times in a row when the drawing is executed, all draws used the most recently set blend mode if I recall correctly. You should probably double check that this isn't a problem anymore for some reason. Other than that it looks fine.
Attachment #8899629 - Flags: review?(bas) → review+
Maybe good news? I tried on the latest master on my intel HD machine and draw calls dropped from 3-4K draw calls to 5-10. Yes, 5-10, with and without OMTP. I then tried arstechnica on my old build versus today's master and it's much better (still not great, but much much better).

Bisecting now (git revisions):

Bad - cf6a9260e78c7aa799a10af30edbd78d09d085e7
good - d0bae45c359a030ea2851058d0a28ef592348c49
(In reply to Mason Chang PTO Until 9/5 [:mchang] from comment #2)
> Maybe good news? I tried on the latest master on my intel HD machine and
> draw calls dropped from 3-4K draw calls to 5-10. Yes, 5-10, with and without
> OMTP. I then tried arstechnica on my old build versus today's master and
> it's much better (still not great, but much much better).
> 
> Bisecting now (git revisions):
> 
> Bad - cf6a9260e78c7aa799a10af30edbd78d09d085e7
> good - d0bae45c359a030ea2851058d0a28ef592348c49

Well, I tried locally, measuring the time taken by MLGDeviceD3D11::Present on 2 NVidia GPUs and an intel GPU. The NVidia GPUs didn't show any issues at all. The Intel GPU showed the occasional present taking 10-25ms, this is somewhat disappointing, however it's not new to OMTP. I can't seem to reproduce the issue of the original bug as reported anywhere.
Flags: needinfo?(mchang)
Also, fwiw, these present numbers are similar with D2D vs Skia on all my machines.
Additional information: I re-ran my testing on the -lower- half of the arstechnica page, where invalidation seems to be very buggy and we keep redrawing everything while scrolling. On the NVidia GPUs I don't see any issue with that bit either, however on my intel GPU I definitely see occasional spikes in present duration there, the highest that I've seen appears to be 150ms, however it might be that you can reproduce your even longer spikes on the other half as well, possibly something on the page or in gecko changed to improve invalidation on the upper portion.
This completely fixes arstechnica.com for me with and without OMTP. If we have a complex blend, we always create a new bitmap the size of the current bitmap[1], which on a 5K display is quite large. However, in the case of arstechnica, we would create a small temporary DrawTarget to do the push/pop layer grouping in basic layers, and always DrawSurface onto the same destination target. Thus, we'd create 30 temporary large bitmaps which was super slow.

This patch reuses the blending bitmap if we have a complex blend.

[1] http://searchfox.org/mozilla-central/source/gfx/2d/DrawTargetD2D1.cpp#1539
Attachment #8899629 - Attachment is obsolete: true
Flags: needinfo?(mchang)
Attachment #8905721 - Flags: review?(bas)
Also note I tried setting bounds on the push/pop layer but that didn't seem to matter and Skia was doing the same thing.
Comment on attachment 8905721 [details] [diff] [review]
Reuse complex blend temporary bitmap

Review of attachment 8905721 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/2d/DrawTargetD2D1.cpp
@@ +1558,5 @@
>  
>      RefPtr<ID2D1Bitmap1> tmpBitmap;
>      if (mDidComplexBlendWithListInList) {
> +      if (!EnsureComplexBlendBitmap()) {
> +        return nullptr;

I think this is correct in that we don't have to stream the list from above since we already tried to draw the list and failed. But I'd like to double check this is the case.
Blocks: 1394193
Comment on attachment 8905721 [details] [diff] [review]
Reuse complex blend temporary bitmap

Review of attachment 8905721 [details] [diff] [review]:
-----------------------------------------------------------------

I wonder if we want to evict this if it hasn't been used for a while since this potentially doubles memory usage.
Attachment #8905721 - Flags: review?(bas) → review+
Priority: -- → P3
Whiteboard: [gfx-noted]
See Also: → 1293586
I can't reproduce the issues from bug 1293586 and bug 1293134, which is why we initially created a temporary bitmap so that we wouldn't have command lists in command lists. Just getting rid of the temporary bitmap and returning the command list seems to work, so maybe it was fixed in a windows update?

Side note, caching the temporary bitmap causes some reftests to fail and I have no idea why. In theory, the bitmap is temporary, so if we force a Flush() after we draw the blend, which has the bitmap as an input, we shouldn't need the bitmap anymore. But that doesn't seem to work for no idea why. Bas, any ideas if we can either a) delete the temporary bitmap list in a list or b) Why we can't use the temporary bitmap? Does comment 1 apply even if we force a flush after the blend is drawn?
Flags: needinfo?(bas)
Try with just deleting the temporary bitmap and directly use the command list on the layer - https://treeherder.mozilla.org/#/jobs?repo=try&revision=4cdd3ba0c0b310cf0f21cfc2f002e7714f45f717
Flags: needinfo?(bas)
Comment on attachment 8907213 [details]
Bug 1392453: Allow a small amount of list-in-list drawing commands.

https://reviewboard.mozilla.org/r/178884/#review184006
Attachment #8907213 - Flags: review?(mchang) → review+
Comment on attachment 8907213 [details]
Bug 1392453: Allow a small amount of list-in-list drawing commands.

https://reviewboard.mozilla.org/r/178884/#review184008
Pushed by mchang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f84944ac039d
Allow limiting the blend surface area and the portion of the layer being resolved. r=mchang
https://hg.mozilla.org/mozilla-central/rev/f84944ac039d
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Depends on: 1400573
Mason, this looks like it's a decent sized change for non-OMTP case as well; given bug 1400573, maybe we it should be backed out and wait for 58.
Flags: needinfo?(mchang)
I've put the backout on bug 14000573. Mason, can you try if this simpler patch fixes the issue for you as well?

(FWIW, this fixes a perf bug in non-OMTP as well)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to Milan Sreckovic [:milan] from comment #19)
> Mason, this looks like it's a decent sized change for non-OMTP case as well;
> given bug 1400573, maybe we it should be backed out and wait for 58.

Yeah, I just r+ Bas' backout.
Flags: needinfo?(mchang)
Comment on attachment 8907213 [details]
Bug 1392453: Allow a small amount of list-in-list drawing commands.

https://reviewboard.mozilla.org/r/178884/#review187148

This is still really slow on arstechnica. I can't tell if it's any better than without the patch as both are still really bad.
Attachment #8907213 - Flags: review+
Flags: needinfo?(bas)
I did the backout.

I'm hoping to come up with a simple patch that works for 57. This is a big perf improvement for very little work.
Assignee: mchang → bas
Flags: needinfo?(bas)
Target Milestone: mozilla57 → ---
How about this one Mason?
Flags: needinfo?(mchang)
(In reply to Bas Schouten (:bas.schouten) from comment #26)
> How about this one Mason?

I just blew my windows machine before the need info :(.
Flags: needinfo?(mchang)
Hopefully I redirected the review request correctly. Can you test this on a windows machine to see if it improves arstechnica for you?
Flags: needinfo?(dvander)
Attachment #8907213 - Flags: review?(mchang)
Comment on attachment 8907213 [details]
Bug 1392453: Allow a small amount of list-in-list drawing commands.

https://reviewboard.mozilla.org/r/178884/#review187708

With this patch arstechnica seems a lot better. Paint times go from ~350ms to ~70ms.

::: gfx/2d/DrawTargetD2D1.cpp:1506
(Diff revision 3)
>      }
>    }
>    return true;
>  }
>  
> +static uint32_t sComplexBlendsWithListAllowedInList = 4;

Is this meant to be const?
Attachment #8907213 - Flags: review+
Pushed by bschouten@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/5d42f8d147c6
Allow a small amount of list-in-list drawing commands. r=mchang
https://hg.mozilla.org/mozilla-central/rev/5d42f8d147c6
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
You need to log in before you can comment on or make changes to this bug.