Open Bug 1592530 Opened 5 years ago Updated 2 years ago

GLXVsyncThread occupied high CPU (GLXtest process failed (exited with status 1): Unable to load libGL.so.1)

Categories

(Core :: Graphics, defect, P2)

68 Branch
x86_64
Linux
defect

Tracking

()

UNCONFIRMED
Tracking Status
firefox-esr68 - ---

People

(Reporter: junjie_jia, Unassigned)

References

(Blocks 2 open bugs)

Details

Attachments

(4 files)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:70.0) Gecko/20100101 Firefox/70.0

Steps to reproduce:

  1. I use selenium and Firefox 68.2 ESR to playback scripts which was recorded by IDE 3.12
  2. the playback can work well, but I can see the CPU usage was very high (about 200%) with top command

top

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
28019 root 20 0 3417340 631952 75424 R 200.0 3.9 2:53.54 firefox-bin
3. I checked the threads of firefox with below command, I can see GLXCsyncThread occupied high CPU (about 90%)

top -bH -d 5 -n10 -p $PID

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
28065 root 20 0 3573004 847248 75960 R 95.0 5.2 2:13.32 GLXVsyncThread
28019 root 20 0 3573004 847248 75960 R 64.3 5.2 1:40.77 firefox-bin
28066 root 20 0 3573004 847248 75960 S 23.0 5.2 0:37.47 Compositor
28024 root 20 0 3573004 847248 75960 S 1.2 5.2 0:01.11 JS Helper
28029 root 20 0 3573004 847248 75960 S 1.0 5.2 0:01.45 JS Helper

Actual results:

The thread GLXVsyncThread of firefox occupied high CPU

Expected results:

It can work well on Firefox 60.8.0 ESR, GLXVsyncThread only occupied a little CPU, I think Firefox 68.2 should have the same behavior.

Component: Untriaged → Audio/Video: Playback
Product: Firefox → Core
Component: Audio/Video: Playback → Graphics

Looks important, since I assume it's making Firefox unusable on that machine.
Steps to take from here:

  • provide "about:support" output please
  • check on FF 70 and/or nightly, potentially with/without WR
  • find a regression window (with mozregression)
OS: Unspecified → Windows
Priority: -- → P2
Hardware: Unspecified → x86_64

Thanks for you response !

a. I have uploaded the output of about:support

b. We are only allowed use Firefox ESR version, and I tried latest ESR(68.2) version, I got the same issue on my env; anyway, I will try FF70, and then tell you the result

c. I didn't use mozregression before, I will try it and will update you once I can get any result

thanks !

Hello, we can solve this problem by disabling headless, thanks a lot for your help !

Hi,

I'm also encountering this problem regularly. My desktop machine at work runs Firefox (76.0+build2-0ubuntu0.19.10.1), i.e. the default version, on Ubuntu 19.10 running X-Windows (1:7.7+19ubuntu12) with the proprietary NVIDIA drivers.

I've always noticed this problem under the following circumstances, though potentially there are other ways of reproducing it.

  1. I leave Firefox running on my office machine
  2. I go home, the screen locks automatically after a few minutes.
  3. When I'm at home, I SSH into my office machine to run some CUDA workloads. Performance is catastrophal.

It took me a while to figure this out, but the reason for this is that the Firefox VSync process is hogging the GPU for no reason (see the attached screenshot from top -H).

What I suspect is that because the screen is disabled, there is no VSync to speak of anymore, and this refresh mechanism goes haywire. If that is the case, it should probably be disabled or limited to a reasonable frequency (e.g. max 60 Hz).

Thanks,
Wenzel

Jeff, who can look at this one?

Flags: needinfo?(jmuizelaar)

Maybe Andrew? I think we should consider switching to software vsync on Linux (Perhaps check that it's what Chrome does). The current implementation seems to have too many problems.

Flags: needinfo?(jmuizelaar) → needinfo?(aosmond)
Blocks: vsync
OS: Windows → Linux
See Also: → 1640779

(In reply to Jeff Muizelaar [:jrmuizel] from comment #8)

I think we should consider switching to software vsync on Linux (Perhaps check that it's what Chrome does). The current implementation seems to have too many problems.

Looks like the reporter might not have any working GL hardware acceleration, and Wenzel is using the proprietary nvidia drivers. Maybe some kind of fallback would make sense under those circumstances, but it seems to work fine with working GL hardware acceleration using free drivers.

(Jun Jie Jia from comment #2)

"windowLayerManagerType": "Basic",
"adapterDescription": "GLXtest process failed (exited with status 1): Unable to load libGL.so.1\n\n",
"webgl1Renderer": "WebGL creation failed: \n* Refused to create native OpenGL context because of blacklist entry: FEATURE_FAILURE_GLXTEST_FAILED\n* Exhausted GL driver options.",
"webgl2Renderer": "WebGL creation failed: \n* Refused to create WebGL2 context because of blacklist entry: FEATURE_FAILURE_GLXTEST_FAILED",

Summary: GLXVsyncThread occupied high CPU → GLXVsyncThread occupied high CPU (GLXtest process failed (exited with status 1): Unable to load libGL.so.1)

I think there is more of a mystery here than simply turning it off. As noted the GLXtest process failed, which is noteworthy. Also, the about:support indicates HW_COMPOSITING is blocked -- we aren't supposed to even use GLX VSync in that case:

https://searchfox.org/mozilla-central/rev/bc3600def806859c31b2c7ac06e3d69271052a89/gfx/thebes/gfxPlatformGtk.cpp#708

Could you also provide your about:support to cross reference? It isn't clear to me if you also have hardware acceleration off (and yet somehow got vsync). Thanks!

Flags: needinfo?(aosmond) → needinfo?(wenzel.jakob)

The ESR 68 about:support from the original reporter does not suggest they ever requested hardware compositing; no relevant prefs are modified. I was thinking perhaps the GPU process was enabled, we crashed a few times, disabled OpenGL compositing, but continued using GLX vsync, since it doesn't appear we have a mechanism to fallback to software in that case.

(In reply to Andrew Osmond [:aosmond] from comment #12)

Could you also provide your about:support to cross reference? It isn't clear to me if you also have hardware acceleration off (and yet somehow got vsync). Thanks!

To be clear, I am most interested in your about:support before you see the problem, but bonus points if you attach it after seeing the problem as well. It might help pinpoint if something material changed.

To answer the question what Chrome does:

https://source.chromium.org/chromium/chromium/src/+/master:ui/gl/gl_surface_glx.cc;drc=2f3d2af68d16baaac1ebd5412ddb98640db5e38b;l=333?originalUrl=https:%2F%2Fcs.chromium.org%2F

It appears to be similar to us. They reset the GL context each time (and have since the initial implementation), which is slightly different from us. They also don't appear to worry about the call failing or tracking the sync counter.

A notable difference between our software vsync implementation, and our glx implementation, is that the latter uses nanosleep and the former uses timers.

(In reply to Andrew Osmond [:aosmond] from comment #12)
> Could you also provide your about:support to cross reference? It isn't clear to me if you also have hardware acceleration off (and yet somehow got vsync). Thanks!

(In reply to Andrew Osmond [:aosmond] from comment #12)

Could you also provide your about:support to cross reference? It isn't clear to me if you also have hardware acceleration off (and yet somehow got vsync). Thanks!

Apologies for the delay, I don't have regular access to this machine due to the C-19 related shutdown. Attached now.

Flags: needinfo?(wenzel.jakob)

(Wenzel Jakob from comment #5)

proprietary NVIDIA drivers.

What I suspect is that because the screen is disabled, there is no VSync to speak of anymore, and this refresh mechanism goes haywire. If that is the case, it should probably be disabled or limited to a reasonable frequency (e.g. max 60 Hz).

bug 1628913 tracks that.

See Also: → 1628913

I experimented a bit more with this and can confirm my suspicion that Firefox's vsync process begins using 100% CPU once the display is turned off or enters standby mode, and it immediately recovers when the display becomes available again. For reference, this is with the proprietary NVIDIA drivers (which are, well, proprietary.. but widely used)

The behavior is independent of whether the screen is locked (GNOME lock screen). In particular, I tried locking the screen and SSHed into the machine, at which point the vsync thread does not shop up in top -H. But once I turn off the display, it begins hogging the CPU.

So it would appear to me that the relevant GLX not only fail, but that the timer mechanism mentioned above fail to throttle the interaction with GLX.

Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: