Closed Bug 1514417 Opened 6 years ago Closed 6 years ago

WebVR framerate drops to < 1 FPS after several minutes, requires full restart

Categories

(Core :: WebVR, defect)

Desktop
Unspecified
defect
Not set
major

Tracking

()

VERIFIED FIXED
mozilla67
Tracking Status
firefox64 --- wontfix
firefox65 --- wontfix
firefox66 --- verified
firefox67 --- verified

People

(Reporter: gfodor, Assigned: daoshengmu)

References

Details

Attachments

(3 files, 1 obsolete file)

In nightly, Firefox 64, and I believe 63 if you run the basic A-Frame hello world app on Rift or Vive: https://aframe.io/examples/showcase/helloworld/ After several minutes the frame rate drops suddenly to less than 1 FPS. The browser and VR driver requires a restart to resolve. This also happens in Hubs: https://hubs.mozilla.com Profile showing the dropoff: https://perfht.ml/2Bf8KLj
Also confirmed this happens with the basic webvr sample in Nightly: https://webvr.info/samples/04-simple-mirroring.html https://perfht.ml/2BkYqRY
Attached image perf.io
(In reply to Daosheng Mu[:daoshengmu] from comment #2) > Created attachment 9031649 [details] > perf.io We can see this two markers, RunFrameRequestCallbacks and SubmitFrameAtVRDisplay at the content process take around 70~80 ms. "RunFrameRequestCallbacks": it is called by VRDisplay.requestAnimationFrame, it usually would be your render loop, it would affect by WebGL draw calls. For this snapshot, I think it is caused by VRDisplay.SubmitFrame(). https://dxr.mozilla.org/mozilla-central/rev/c2593a3058afdfeaac5c990e18794ee8257afe99/gfx/vr/ipc/VRManagerChild.cpp#430 "SubmitFrameAtVRDisplay": it happens by VRDisplay.SubmitFrame(). We submit the WebGL texture to our backend in GPU process. I believe this performance issue might be related with getting VR frame from the Canvas or VRManagerChild::syncObject(). https://dxr.mozilla.org/mozilla-central/rev/c2593a3058afdfeaac5c990e18794ee8257afe99/dom/vr/VRDisplay.cpp#681 I haven't taken a look what's going on in GPU process, but at least Content process takes too much time for this two APIs.
Some more data points about this bug, if it helps. I've tested it with Firefox 63 using an Oculus Rift and Windows Mixed Reality. Firefox 63 does not exhibit the bug (I tested up to 15 minutes in VR). The bug occurs Firefox 64 consistently usually within 10 minutes with both the Rift and WMR. We've also seen it with the Vive.
I would suggest to try it in FF 65 because we did some threading adjustment in FF backend. But, if the perf issue comes from the content process, it would be the same with FF 64.
Confirmed the issue still exists in FF Beta 65.0b5 with the 04-simple-mirroring sample and an Oculus Rift - https://perfht.ml/2UXF7ar Also re-confirmed the issue in the latest Nightly (66.0a1 build 20181219094656) - https://perfht.ml/2UWs9K1
I can reproduce it in FF 64 and confirm FF 63 is good. According to the perf file, https://perfht.ml/2UWs9K1, this performance issue is happened at the content process when doing WebGLContext:GetVRFrame(). After turning off ANGLE by webgl.disable-angle;true, this issue is gone. So, I would suspect if ANGLE update in 64 causes this regression, https://bugzilla.mozilla.org/show_bug.cgi?id=1489279.
Attached image WebVRPref.JPG

To reproduce this. Go to https://webvr.info/samples/04-simple-mirroring.html in above FF 64 and enter the immersive mode. It usually needs to wait for about 20 minutes. Then we can see the FPS drop to be less than 20.

If we turn off ANGLE with webgl.disable-angle;true, it will not happen. Per this attachment, we can see this regression is from WebGLContext::GetVRFrame(). @jgilbert, do you have idea about the recent ANGLE update?

Flags: needinfo?(jgilbert)

It can be reproduced after turning off ANGLE with webgl.disable-angle;true although I have to wait around 30 minutes. So, I don't think it is related to ANGLE.

Flags: needinfo?(jgilbert)

After using MozRegression, I notice it is happened among the versions from 2018-10-10~2018-10-13. One patch that may cause this regression is Bug 1473399.

See Also: → 1515886

we are also seeing this problem with a WebVR app. everything grinds to a total halt after a certain time period (this varies). Firefox 62 has full performance. Firefox 63 demonstrated choppier, lower-FPS behaviour. Firefox 64 runs for a while, then dies and requires a restart.

We can reproduce this on Windows 64-bit optimization release build. It doesn't matter if we disable ANGLE, or enable VR process and dom.vr.external. I think it is caused by the deadlock issue between VRService thread and VR_SubmitFrame thread when both of them wanna access mAPIShmem. It is because VR_SubmitFrame is writing to mExternalShmem->browserState at [1], and VRService thread is reading mAPIShmem at [2] simultaneously. So, the browserGenerationA will not be equal to browserGenerationB when happening this deadlock, then it blocks this loop[3].

[1] https://searchfox.org/mozilla-central/rev/c035ee7d3a5cd6913e7143e1bce549ffb4a566ff/gfx/vr/gfxVRExternal.cpp#848
[2] https://searchfox.org/mozilla-central/rev/c035ee7d3a5cd6913e7143e1bce549ffb4a566ff/gfx/vr/service/VRService.cpp#464
[3] https://searchfox.org/mozilla-central/rev/c035ee7d3a5cd6913e7143e1bce549ffb4a566ff/gfx/vr/gfxVRExternal.cpp#275

I have verified this patch works with SteamVR (mac-beta) on Mac OS as well after waiting more than 30 mins.

Assigned it to me because I am giving a patch to fix it.

:ccomorasu, if you are interested in helping QA, I can tell you how to verify it after it lands to m-c.

Thanks.

Assignee: cristian.comorasu → dmu
Status: NEW → ASSIGNED
QA Contact: cristian.comorasu

Hello Daosheng Mu!
That would be awesome, thank you in advance!

Hi, how would I go about testing this fix once it is available in a branch or in an experimental build? Would love to verify that our WebVR app no longer dies with Firefox63 and newer.

Pushed by dmu@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/e0034670b26a Add mutex to avoid VR shmem be deadlock when VRService and VRSubmitFrame threads are accessing it. r=kip

(In reply to Esa Ruoho from comment #18)

Hi, how would I go about testing this fix once it is available in a branch or in an experimental build? Would love to verify that our WebVR app no longer dies with Firefox63 and newer.

The Nightly build will come by 24 hours. Or instead, you can try this experimental build (https://queue.taskcluster.net/v1/task/Cg_EIVSwTZebqapj9kWLRA/runs/0/artifacts/public/build/install/sea/target.installer.exe).

Thanks.

Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla66

(In reply to Cristina Coroiu [:ccoroiu] from comment #21)

https://hg.mozilla.org/mozilla-central/rev/e0034670b26a

:ccoroiu, please help backout my patch, I verify my patch is not a right fix in the current Nightly.

Thanks for help :)

Flags: needinfo?(ccoroiu)

Backed out changeset e0034670b26a (bug 1514417) - “Verified it still drop frames from Nightly build”
Backout: https://hg.mozilla.org/integration/autoland/rev/6b9d943ecc2affe25094a0046dbc498ff2d1b96c

Status: RESOLVED → REOPENED
Flags: needinfo?(ccoroiu) → needinfo?(dmu)
Resolution: FIXED → ---
Target Milestone: mozilla66 → ---
Attachment #9038985 - Attachment is obsolete: true

MozReview-Commit-ID: FhLBI8l9SJm

After using Windows named mutex way to protect our shmem don't be deadlock when VRSubmit_Frame and VRService threads are trying to access it in multi-threaded or processes mode, this issue will not happen to me anymore. I have verified it on several Windows machines.

One more thing we need to know is this fix only fixes this issue on Windows. For other platforms, like Mac OS and Linux, we should use pthread that we did on Android. But, in order to resolve this issue ASAP and considering most users are from Windows. IMHO, we can handle other platforms' issue at the following bug.

Flags: needinfo?(dmu)
Pushed by dmu@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/ff2d5c2588a3 Using named mutex for VR threads to access Shmem on Windows. r=kip
Status: REOPENED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla67

The way for doing the QA verification:

Test 1:

  1. Using the latest version of Nightly.
  2. Make sure dom.vr.process.enabled and dom.vr.external are true at about:config.
  3. Go to https://webvr.info/samples/04-simple-mirroring.html and click Enter VR.
  4. Confirm you see the FPS is 90, then waiting for about 30 mins. If there is no FPS drops to be less than 30. That means we resolve this problem.

Test 2:

  1. Same as the test 1, but make dom.vr.process.enabled and dom.vr.external to be false.
  2. Go to https://webvr.info/samples/04-simple-mirroring.html and click Enter VR.
  3. Confirm you see the FPS is 90, then waiting for about 30 mins. If there is no FPS drops to be less than 30. That means we resolve this problem.
Flags: needinfo?(ccoroiu)

(In reply to Daosheng Mu[:daoshengmu] from comment #29)

The way for doing the QA verification:

Test 1:

  1. Using the latest version of Nightly.
  2. Make sure dom.vr.process.enabled and dom.vr.external are true at about:config.
  3. Go to https://webvr.info/samples/04-simple-mirroring.html and click Enter VR.
  4. Confirm you see the FPS is 90, then waiting for about 30 mins. If there is no FPS drops to be less than 30. That means we resolve this problem.

Test 2:

  1. Same as the test 1, but make dom.vr.process.enabled and dom.vr.external to be false.
  2. Go to https://webvr.info/samples/04-simple-mirroring.html and click Enter VR.
  3. Confirm you see the FPS is 90, then waiting for about 30 mins. If there is no FPS drops to be less than 30. That means we resolve this problem.

Sorry for I have a wrong ni?. I was trying to reach :ccomorasu.

Flags: needinfo?(ccoroiu) → needinfo?(cristian.comorasu)
Depends on: 1523926

I reproduced this issue using Fx 67.0a1 (2018-12-14), on Windows 10 x64 with Oculus Rift.
I can confirm this issue is fixed, I verified using both STR from comment 30 on the previously mentioned environment.
Cheers!

Status: RESOLVED → VERIFIED
No longer depends on: 1523925
Flags: needinfo?(cristian.comorasu)

Comment on attachment 9039693 [details]
Bug 1514417 - Using named mutex for VR threads to access Shmem on Windows.

Beta/Release Uplift Approval Request

Feature/Bug causing the regression

Bug 1473399

User impact if declined

Firefox will continue to drop FPS to be less than 20 after several minutes when running WebVR content.

Is this code covered by automated tests?

Yes

Has the fix been verified in Nightly?

Yes

Needs manual test from QE?

No

If yes, steps to reproduce

List of other uplifts needed

Bug 1523926

Risk to taking this patch

Low

Why is the change risky/not risky? (and alternatives if risky)

This is a regression from Firefox 64, and it has affected lots of WebVR users. We hope we can pick up this patch and the patch from Bug 1523926 to fix the thread deadlock issue in FF 66 and 65 if it is acceptable. I have gotten the QA verification from :ccomorasu, and it also has been tested by WebVR automated tests like WPT, Mochitest, and Reftest. Therefore, I think the risk would be low.

String changes made/needed

Attachment #9039693 - Flags: approval-mozilla-beta?

Comment on attachment 9039693 [details]
Bug 1514417 - Using named mutex for VR threads to access Shmem on Windows.

Fix for major WebVR regresison, verified in nightly.
Let's take this for beta 6.

Attachment #9039693 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Depends on: 1523923
Flags: qe-verify+
Whiteboard: [qa-triaged]

This issue is fixed on Fx 66.0b6. I verified using Oculus Rift on Windows 10 x64.

Flags: qe-verify+
Depends on: 1530588
QA Whiteboard: [qa-triaged]
Whiteboard: [qa-triaged]
No longer depends on: 1530588
Regressions: 1530588
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: