Closed Bug 1518184 Opened 6 years ago Closed 4 months ago

Webrender-enabled Firefox hangs on power-state transitions.

Categories

(Core :: Graphics: WebRender, defect, P3)

defect

Tracking

()

RESOLVED INACTIVE

People

(Reporter: mhoye, Unassigned)

References

(Depends on 1 open bug)

Details

This is current Nightly on a fully up-to-date Windows 10, hardware is a 2nd gen, dual-GPU Surface Book.

Webrender-enabled Nightly seems to hang up during power-management-state transitions. When switching from performance to power-saving mode, as well as when resuming from hibernation, the browser is wedged and unresponsive for some time. This is particularly severe when resuming from hibernation when the machine was put to sleep in battery-saver mode and resumed while plugged in - the browser becomes completely unresponsive for several minutes, and often needs to be killed. 

If it helps at all, I've force-crashed Firefox a few times with crashfirefox64.exe; searching for my email and looking at the comments will get you the relevant crashes.

Steps to reproduce are either "run nightly/win10 with webrender enabled through the transition to battery-saver mode and observe the results" or "put the laptop to sleep in battery saver, plug it in and resume".
Depends on: 1409869

When did you submit these crash reports? The only one I see from you in the last two weeks with WR+ is this:

https://crash-stats.mozilla.com/report/index/5c64df7b-57a4-41a0-9869-cb54b0181226

Thanks. So it looks like the machine has dual GPU - one NVidia and one Intel. The crash reports all indicate the Intel one as the primary GPU but I don't know if that means it was active at the time of the crash or just the first one in the list of GPUs. Powerstate changes might trigger a GPU switch so that might be a factor.

The crash stack itself is not relevant here since the crash was forced via crashfirefox64.exe. More relevant is what thread 0 was doing at the time, since that would be the thing causing the hang. The first three in your list of crash reports are somewhere in the JS engine doing regex things, which is kind of odd. The next 5 are in OneCore code which appears to be Microsoft code from what I can tell. Dunno what's doing on there, the only possibly-relevant Gecko code is the PDocAccessible::RecvFocus frame.

Given that neither the JS engine nor accessibility code are particularly related to WR, the most likely thing that's going on here is that the browser appears frozen because the graphics stack is wedged and it's not actually the main thread that's stuck. And the graphics stack might be wedged because of GPU switching triggered by powerstate changes? Just a guess but that's all I got so far.

The typical symptoms once I open up task manager are six or seven idle Nightly processes - that look a lot like tabs that aren't busy with anything, to my naive eye - and one tab that's lost its mind completely. If I kill only that process (either via task manager or via crashfirefox64 - it takes the rest of them with it.

I don't see anything in the Windows event logs related to this, though I don't know that I'd expect to. That said, am confident that I can reproduce this arbitrarily. What else can I do to make this bug actionable?

(In reply to Mike Hoye [:mhoye] from comment #4)

The typical symptoms once I open up task manager are six or seven idle Nightly processes - that look a lot like tabs that aren't busy with anything, to my naive eye - and one tab that's lost its mind completely.

It's plausible that this is the GPU process since it also shows up as a "nightly" process in task manager. One thing that might help is to confirm this - when you first start firefox, go to about:support and check the graphics section. The PID of the GPU process should be listed there. Then, when you next see the problem, check the PID of the process that's gone berserk to see if it's the GPU process.

If that is the case, then we'll want to somehow inspect what the GPU process is doing, possibly by attaching a debugger since apparently killing it takes down everything for you (which also shouldn't happen).

Priority: -- → P3
Severity: normal → S3
Status: NEW → RESOLVED
Closed: 4 months ago
Resolution: --- → INACTIVE
You need to log in before you can comment on or make changes to this bug.