Graphics is lagged when moving mouse in circular motion

RESOLVED FIXED in Firefox 51

Status

()

defect
P2
normal
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: alice0775, Assigned: kip)

Tracking

({perf, regression})

51 Branch
mozilla53
x86
Windows 10
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox50 unaffected, firefox51+ fixed, firefox52+ fixed, firefox53+ fixed)

Details

(Whiteboard: [gfx-noted])

Attachments

(3 attachments)

[Tracking Requested - why for this release]: Performance regression

The problem is remarkable when disabled e10s.

Reproducible: always

Steps To Reproduce:
1. Open http://paperjs.org/examples/path-intersections/
2. Wait for 10-20sec until the CPU/HDD settles down if any
3. Moving mouse in circular motion so that intersections will appear

Actual Results:
Graphics is lagged forever.

Expected Results:
Graphics should be smooth.


Regression Window:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=ee721407a44afb26541d386c21c002a5b61a826b&tochange=ec7229cceafcb353f5293347aa653f4242219c88

Regressed by: Bug 1250244
Posted file about:support
Alice, I can reproduce same perf drop on 50.1.0 when e10s was off.
It seems a carryover regression since the issue was not observed on Firefox 48 that e10s is by default OFF.
Could you confirm again ?
(In reply to Astley Chen [:astley] UTC+8 from comment #3)
> Alice, I can reproduce same perf drop on 50.1.0 when e10s was off.
> It seems a carryover regression since the issue was not observed on Firefox
> 48 that e10s is by default OFF.
> Could you confirm again ?

Umm. I cannot reproduce any performance drop on Firefox50.1.0 with e10s is disabled.
https://hg.mozilla.org/releases/mozilla-release/rev/8612c3320053b796678921f8f23358e3e9df997e
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:50.0) Gecko/20100101 Firefox/50.0 ID:20161208153507
Keep, can you take a look at this (see comment 0)
Flags: needinfo?(kgilbert)
Whiteboard: [gfx-noted]
Via local build
Last Good: a4140fefc590
First Bad: 7b1349cb7487

Triggered by:	7b1349cb7487	Kearwood (Kip) Gilbert — Bug 1250244 - Part 7: Implement WebVR 1.0 API,r=bz
(In reply to Milan Sreckovic [:milan] from comment #5)
> Keep, can you take a look at this (see comment 0)

Sigh.  I meant Kip, of course.
I have been unable to reproduce with the current Nightly build (Tried with MacOS 10.12.2 and Windows 10 with e10s off and e10s on).  I saw high CPU usage while moving the mouse in circles; however, could not see any persistent effect when I stopped moving the mouse.  The CPU usage and memory usage increased but returned when I stopped the activity.

Astley Chen: I was able to test with intel integrated GPU and with an NVidia discrete GPU.  I notice that the original report was with an AMD GPU.  Would the machine you reproduced on happen to have an AMD GPU as well?
Flags: needinfo?(kgilbert) → needinfo?(aschen)
Screencast(Nightly53.0a1)
Disabled e10s: https://youtu.be/mWh4YcfH3yk
Enabled e10s: https://youtu.be/bHzUU3fsoZA
(In reply to Alice0775 White from comment #9)
> Screencast(Nightly53.0a1)
> Disabled e10s: https://youtu.be/mWh4YcfH3yk
> Enabled e10s: https://youtu.be/bHzUU3fsoZA

Thanks for the youtube clips.  I haven't been able to see a result like that yet, but I am still digging deeper.

I have tried disabling hardware accelerated layers and using 32-bit builds, but not yet able to reproduce.
(In reply to :kip (Kearwood Gilbert) from comment #8)
> I have been unable to reproduce with the current Nightly build (Tried with
> MacOS 10.12.2 and Windows 10 with e10s off and e10s on).  I saw high CPU
> usage while moving the mouse in circles; however, could not see any
> persistent effect when I stopped moving the mouse.  The CPU usage and memory
> usage increased but returned when I stopped the activity.
> 
> Astley Chen: I was able to test with intel integrated GPU and with an NVidia
> discrete GPU.  I notice that the original report was with an AMD GPU.  Would
> the machine you reproduced on happen to have an AMD GPU as well?

In addition, could you please provide some details about the machine that was reproducing the lag?  Perhaps the performance of my local machine is masking the issue.  Could you provide details on what CPU it has and how much memory?  Also, if possible it would be useful to know if any of the CPU cores are being saturated during the test (Can check using Windows Resource Monitor's CPU tab, expanding out the right-most panel to see all CPU cores as a graph)

Thanks for your help in tracking it down!
I was able to reproduce the effect seen in the video linked in Comment 9 by artificially throttling my CPU with BES (Battle Encoded Shirase).

I could see much better performance with e10s enabled; however, running mozreview in this configuration I was unable to see a difference correlating with the regression window.

I will continue checking over the regressing patch for any potential cause until we get more details on the machines that are able to see the regression.
Posted image CPU usage
PC: Core2Quad Q8300 2.5GHz, 8GB RAM
(In reply to :kip (Kearwood Gilbert) from comment #8)
> Astley Chen: I was able to test with intel integrated GPU and with an NVidia
> discrete GPU.  I notice that the original report was with an AMD GPU.  Would
> the machine you reproduced on happen to have an AMD GPU as well?

On my Windows 10 desktop with Nvidia GPU, performance drop between 50.1.0 & 51.b11 with e10s off is not that obvious. I don't AMD GPU around me now. However, I can reproduce on VMWare guest OS running Windows 10. I suppose it's probably not about particular GPU but really a problem happened on low-end machines which magnify the perf drop significantly.
Flags: needinfo?(aschen)
Priority: -- → P2
FYI,
I can also reproduce the problem on Linux Mint18 xfce x64(VMWare guest).
And I got a same regression window...
Sounds like a recent perf regression introduced in 51, track for 51/52/53.
It appears that there is a lot of GC happening while moving the mouse around.  Is this possibly related to Bug 1293262?
Flags: needinfo?(jcoppeard)
On a most similar Windows 10 machine, which had an Intel Q6600 and 8gb of RAM, we could replicate the lag without relying on a VM or artificially throttling the CPU.  As per the original report, it is much smoother with e10s.

MozRegression runs are not always consistent for this machine -- a couple of runs included Bug 1293262.
Thanks to :ashughes -- he has replicated the original bisection from Alice on a machine locally here.

It is not likely due to Bug 1293262.  I will now bisect the "Bug 1250244 - Part 7" patch itself and get to the root of this.

Thanks for the help everyone!
Flags: needinfo?(jcoppeard)
Assignee: nobody → kgilbert
Comment on attachment 8824609 [details]
Bug 1325810 - Reduce unneeded IPC when WebVR is not active

This is an experiment, and not yet confirmed to fix the perf regression.
I tested the builds and all of them failed except the "webvr_part7_fix" build, this includes the Tryserver build attached above. 

Here are the results:
> Nightly 51.0a1 20170106133656 a4140fefc590400e6fc644b16361bc8c19a8035f [webvr_part6]: FAILED
> Nightly 51.0a1 20170106145122 7b1349cb7487ce1b03a6f91c016315f9e8fdd0ec [webvr_part7]: FAILED
> Nightly 51.0a1 20170106154431 7b1349cb7487ce1b03a6f91c016315f9e8fdd0ec [webvr_part7_disable]: FAILED
> Nightly 51.0a1 20170106153858 7b1349cb7487ce1b03a6f91c016315f9e8fdd0ec [webvr_part7_fix]: PASS
> Nightly 51.0a1 20170106155732 cc38fe6bde478931ea9e64fd156fdc8eed32b7a5 [webvr_part8]: FAILED
> Nightly 53.0a1 20170106164412 65676423889790350bc9ad8cc6f0f4a24a7cae49 [webvr_try]: FAILED
Just to clarify the results:
Part8 appears to be the worst build.
Part7 and the Try build are less janky than Part8 but still affected by this bug.
Part6 and Part7_fix are the least janky and but it's hard to say objectively if they reproduce this bug.
I have sent a dropbox folder invite to you, Alice.  Could you try each of these builds, in addition to the build in the try push, and see which ones reproduce the jankiness?

So far, I am suspecting a cumulative effect, and expect that the patch I have attached to reduce the impact of the part7 patch in Bug 1250244.

To validate that the fix makes an improvement, could you also compare:

- Compare part7 with part7_fix

- Compare the try build to the current nightly


Thanks for all of your help in tracking this down!
Flags: needinfo?(alice0775)
Firefox50.1 : good (011-020 011-020 000)
Firefox51.0b12 : bad (003 003 000)
Aurora52.0a2 : bad (002 002 000)
Nightly53.0a1 : bad (003 002-003 000)

webvr_part6 : good (007-015 007-018 000)
webvr_part7 : bad (002-003 002-003 000)
webvr_part7_disable : good (006-016 007-015 000)
webvr_part7_fix : good (006-015 007-015 000)
webvr_part8 : bad (002 002 000)

Nightly53.0a1 with enable e10s : good (040-045 040-045 000)

where; (approx fps)value that indicated with layers.acceleration.draw-fps=true
Flags: needinfo?(alice0775)
I have tried the try build with an Oculus DK2 and the webvr.info test pages as a sanity check.  It does not appear to regress any WebVR support.
(In reply to Alice0775 White from comment #25)
> Firefox50.1 : good (011-020 011-020 000)
> Firefox51.0b12 : bad (003 003 000)
> Aurora52.0a2 : bad (002 002 000)
> Nightly53.0a1 : bad (003 002-003 000)
> 
> webvr_part6 : good (007-015 007-018 000)
> webvr_part7 : bad (002-003 002-003 000)
> webvr_part7_disable : good (006-016 007-015 000)
> webvr_part7_fix : good (006-015 007-015 000)
> webvr_part8 : bad (002 002 000)
> 
> Nightly53.0a1 with enable e10s : good (040-045 040-045 000)
> 
> where; (approx fps)value that indicated with
> layers.acceleration.draw-fps=true

Thanks for the detailed testing.

Your results are consistent with the fix reducing the regression.

I'll assign the patch for review and then check with release management team to see if this can be uplifted.
Attachment #8824609 - Flags: review?(dmu)
I expect the risk of this patch to be quite low, as it is a simple "if" statement around a call only needed for WebVR functionality.  This "if" statement will be evaluated once per vsync cycle, on the Compositor thread.  This change is localized to a single function in VRManager.cpp

I have manually tested that this does not break WebVR using the webvr.info sample pages.

Once I have landed this in Nightly, I will request uplift up to beta.

If this patch is not accepted in Beta, the impact will be limited to users that do not enable e10s.  The effect is manifested as inconsistent frame rates on pages that cause high CPU utilization due to JS, garbage collection, and/or non-accelerated 2d canvas operations.
Comment on attachment 8824609 [details]
Bug 1325810 - Reduce unneeded IPC when WebVR is not active

https://reviewboard.mozilla.org/r/103032/#review104046

LGTM, it looks like VRManagerParent sends redundant IPC while playing non-WebVR content to the client side, and it will trigger FireDOMVRDisplayPresentChangeEvent() that makes CPU be busy to handle some JS code, https://dxr.mozilla.org/mozilla-central/rev/2977ca1224525680cbfb5c3ce3018818b6dfd8f2/gfx/vr/VRManager.cpp#173
Attachment #8824609 - Flags: review?(dmu) → review+
Pushed by kgilbert@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/850ff3e59146
Reduce unneeded IPC when WebVR is not active r=daoshengmu
Comment on attachment 8824609 [details]
Bug 1325810 - Reduce unneeded IPC when WebVR is not active

Approval Request Comment
[Feature/Bug causing the regression]: 1250244
[User impact if declined]: Performance regression causing excess CPU overhead, resulting in sites with heavy Javascript or software rendering to run at inconsistent framerates 
[Is this code covered by automated tests?]: The affected code is hit on every VSync even outside WebVR usage, and would be executing during most automated tests
[Has the fix been verified in Nightly?]: Fix has been verified by original reporter and is currently in Mozilla-inbound 
[Needs manual test from QE? If yes, steps to reproduce]: The code in the fix will execute immediately upon viewing any web page.  A simple "smoke test", viewing any web site should suffice.  Our automated tests should cover this.
[List of other uplifts needed for the feature/fix]: none
[Is the change risky?]: Low risk
[Why is the change risky/not risky?]: The change is a simple "if" statement that is self contained and simply exits a WebVR specific function on the compositor side of an IPC rather than on the content side.
[String changes made/needed]: no changes
Attachment #8824609 - Flags: approval-mozilla-beta?
Attachment #8824609 - Flags: approval-mozilla-aurora?
https://hg.mozilla.org/mozilla-central/rev/850ff3e59146
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Comment on attachment 8824609 [details]
Bug 1325810 - Reduce unneeded IPC when WebVR is not active

fix perf regression from webvr, aurora52+
Attachment #8824609 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 8824609 [details]
Bug 1325810 - Reduce unneeded IPC when WebVR is not active

Fix a graphic perf regression. Beta51+. Should be in 51 beta 14.
Attachment #8824609 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment on attachment 8824609 [details]
Bug 1325810 - Reduce unneeded IPC when WebVR is not active

checkin-needed for aurora and beta
Attachment #8824609 - Flags: checkin?
Attachment #8824609 - Flags: checkin?
You need to log in before you can comment on or make changes to this bug.