Open Bug 1416651 Opened 2 years ago Updated 7 months ago

Bad performance on running tresize talos suite

Categories

(Core :: Graphics: WebRender, defect, P3)

x86_64
All
defect

Tracking

()

Tracking Status
firefox57 --- unaffected
firefox58 --- unaffected

People

(Reporter: vliu, Unassigned)

References

(Depends on 1 open bug, Blocks 2 open bugs)

Details

(Whiteboard: [wr-reserve] [gfx-noted][needs-investigation])

See [1] and [2] for the Comparison link in detail. They showed that

tresize opt e10s in Linux x64 opt: 33.71%
tresize opt e10s stylo_disabled Linux x64 opt: 33.38%

tresize opt e10s in Windows 10 x64 opt: 89.89%
tresize opt e10s stylo_disabled in Windows 10 x64 opt: 78.57%

[1]: Comparison link on Linux x64 opt
     https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=226893bbeb3e&newProject=try&newRevision=624d8ef848c2d76eed6a2854e03e3c54950ffd12&framework=1
[2]: Comparison link on Windows 10 x64 opt
     https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=e20b42426413&newProject=try&newRevision=ec5b6c878d2abaf1c98a5cdc1284fd38625f1822&framework=1
Blocks: 1416635
Whiteboard: [wr-mvp] [triage]
Whiteboard: [wr-mvp] [triage] → [wr-mvp]
Whiteboard: [wr-mvp] → [wr-mvp] [gfx-noted]
I tried to sent try with geckoProfile enabled for both disable/enable wr. [1] and [2] were the try result.

disable wr:
[1]: https://treeherder.mozilla.org/#/jobs?repo=try&revision=77301c9c3e7f8038e78c838ea5292180db65fde8

enable wr
[2]: https://treeherder.mozilla.org/#/jobs?repo=try&revision=cb38bd99ffc84342caeee0363d2beaa088c74d12

From looked into geckoProfile above, I found the execution time of a sync call PCompositorBridge::Msg_FlushRendering in wr disabled was many times shorter than enabled case. It is worth taking time to see what is going on at this part.
Assignee: nobody → vliu
Status: NEW → ASSIGNED
Priority: P2 → P1
(In reply to Vincent Liu[:vliu] from comment #1)
> I tried to sent try with geckoProfile enabled for both disable/enable wr.
> [1] and [2] were the try result.
> 
> disable wr:
> [1]:
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=77301c9c3e7f8038e78c838ea5292180db65fde8
> 
> enable wr
> [2]:
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=cb38bd99ffc84342caeee0363d2beaa088c74d12
> 
> From looked into geckoProfile above, I found the execution time of a sync
> call PCompositorBridge::Msg_FlushRendering in wr disabled was many times
> shorter than enabled case. It is worth taking time to see what is going on
> at this part.

A more look into it, the most time in this Msg_FlushRendering sync call is calling WaitFlushed(). 
This WaitFlushed() is used to waits all pending WebRender task are done. WebRender needs this wait in the all run because it works as multi threaded way. Also, there is no better way to become an async call in the current design.
I will set this bug to Nobody because it might have above one bottlenecks affecting it. If I found any of them, I will file a new bug and depend to this one.
Assignee: vliu → nobody
Status: ASSIGNED → NEW
Priority: P1 → P2
I will take this bug for better tracking.
Assignee: nobody → vliu
Status: NEW → ASSIGNED
Priority: P2 → P1
Here to attach a geckoProfie link I run in mac machine for more detailed profiling.

[1]: https://perfht.ml/2Aj5YnC
After discussed with Vincent, this should be related to the longer pipelines in WR, compared to gecko. We could deal with this after solving other critical perf bugs.
Assignee: vliu → nobody
Status: ASSIGNED → NEW
Priority: P1 → P3
Whiteboard: [wr-mvp] [gfx-noted] → [wr-reserve] [gfx-noted]
https://treeherder.mozilla.org/perf.html#/graphs?series=autoland,1645986,1,1&series=autoland,1682853,1,1&series=autoland,1650995,1,1&series=autoland,1683335,1,1

Windows and Linux are both worse on tresize with WR enabled than WR disabled. Bumping to P1 for Windows regression.
Blocks: stage-wr-trains
No longer blocks: stage-wr-nightly
OS: Unspecified → All
Priority: P3 → P1
Hardware: Unspecified → x86_64
The WR update in bug 1466549 helps on both Linux and Windows, but there's still a bit of a regression compared to non-WR.
Depends on: 1466549
This got better on linux64-qr because of bug 1471962, but it's still slightly worse than non-WR on both Linux and Windows.
Assignee: nobody → a.beingessner
Bug 1477783 helped but not enough. Still worse on both Linux and Windows.
Depends on: 1477783
We can't release this to the field, but we can let this ride to beta.  However, we want to investigate and understand the bad perf numbers asap.
Whiteboard: [wr-reserve] [gfx-noted] → [wr-reserve] [gfx-noted][needs-investigation]
Priority: P1 → P2
Assignee: a.beingessner → nobody
Once bug 1489189 merges, then we have pretty reasonable results.

8.89 for WR vs 7.6 without.

I've spent some time investigating the test and it seems like we're measuring the right things here.

The remaining difference is pretty small, I don't think we need to block release on this.
Priority: P2 → P3
Blocks: stage-wr-next
No longer blocks: stage-wr-trains
You need to log in before you can comment on or make changes to this bug.