0.57 - 3.87% tscrollx (linux64-shippable-qr) regression on push 5f5c269d15d6e521b15243b5014a65965df0cc08 (Sat November 7 2020)
Categories
(Core :: Graphics: WebRender, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr78 | --- | unaffected |
firefox82 | --- | unaffected |
firefox83 | --- | unaffected |
firefox84 | --- | fixed |
firefox85 | --- | fixed |
People
(Reporter: alexandrui, Assigned: jnicol)
References
(Regression)
Details
(4 keywords)
Attachments
(1 file)
47 bytes,
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-beta+
|
Details | Review |
Perfherder has detected a talos performance regression from push 5f5c269d15d6e521b15243b5014a65965df0cc08. As author of one of the patches included in that push, we need your help to address this regression.
Regressions:
Ratio | Suite | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|---|
4% | tscrollx | linux64-shippable-qr | e10s stylo webrender | 1.57 -> 1.63 | |
1% | tscrollx | linux64-shippable-qr | e10s stylo webrender | 1.58 -> 1.59 |
Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the offending patch(es) will be backed out in accordance with our regression policy.
For more information on performance sheriffing please see our FAQ.
Reporter | ||
Updated•4 years ago
|
Comment 1•4 years ago
|
||
Set release status flags based on info from the regressing bug 1661528
Assignee | ||
Comment 2•4 years ago
|
||
Please don't back this out. It's an improvement on real world sites with larger amounts of texture upload. The regression here comes from testcases with a very low amount of texture upload. I will look for a way to mitigate the impact in those cases, but we're better off with the change even as it is.
Assignee | ||
Comment 3•4 years ago
|
||
Dzmitry, from looking at the profile it appears that the glFenceSync call after uploading to the vertex data textures explains the regression. I know we discussed only requiring 1 flush per frame, by using a single texture uploader for all uploads. However, even that won't help in this test case: after the initial upload of glyphs, we only upload very small amounts of vertex data each frame. There are no more texture cache uploads.
Is glFenceSync expected to be taking that long? If not, any ideas on how to avoid it?
Do you think this is representative of real world enough to worry about it? If so, should we avoid reusing PBOs altogether when they are so small that the fence is more expensive than allocating a new one?
Comment 4•4 years ago
|
||
I don't think creation of a fence is expensive. It should be totally fine to have one fence per frame created.
I suspect what's happening here is that inserting a fence requires the driver to stop recording the command list, and internally submit it to a GPU queue. After that, we continue submitting commands, so it has to record them into other command streams, and then submit again at the end of the frame. So we end up with at least 2 submissions instead of 1.
What we need to do is to merge these two submissions together, i.e. insert the fence only at a place where we expect the driver to finish recording command stream anyway, at the end of the frame (like we discussed).
Updated•4 years ago
|
Assignee | ||
Comment 5•4 years ago
|
||
UploadPBOPool::end_frame() creates a GLsync object for synchronizing
PBO access, which internaly may require flushing the command
stream. Previously this was being called mid-frame, and potentially
multiple times per frame. This overhead caused talos regressions in
cases with very small amounts of texture upload per frame. To fix
this, call UploadPBO::end_frame() only once at the end of the frame.
Updated•4 years ago
|
Comment 7•4 years ago
|
||
bugherder |
Comment 8•4 years ago
|
||
IIUC we're targeting Mali GPUs in 84. Do we need to uplift this to Beta in support of that?
Assignee | ||
Comment 9•4 years ago
|
||
We are, and in fact because this regression affects other platforms too there is even more reason to uplift.
Assignee | ||
Comment 10•4 years ago
|
||
Comment on attachment 9189848 [details]
Bug 1676390 - Call UploadPBOPool::end_frame() once per frame and at the end of the frame. r?kvark
Beta/Release Uplift Approval Request
- User impact if declined: Performance regression on some Talos tests / probably in the wild too
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Minor change, just delays creating a gl sync object until later in frame.
- String changes made/needed:
Comment 11•4 years ago
|
||
Comment on attachment 9189848 [details]
Bug 1676390 - Call UploadPBOPool::end_frame() once per frame and at the end of the frame. r?kvark
Approved for 84.0b7.
Comment 12•4 years ago
|
||
bugherder uplift |
Updated•4 years ago
|
Description
•