Open Bug 1709584 Opened 3 years ago Updated 1 year ago

TEST-UNEXPECTED-ERROR | testing/marionette/harness/marionette_harness/tests/unit/test_window_rect.py TestWindowRect.test_resize_larger_than_screen | OSError: Process has been unexpectedly closed (Exit code: 1) (Reason: No data received over socket)

Categories

(Core :: Widget: Gtk, defect)

Desktop
Linux
defect

Tracking

()

REOPENED
Tracking Status
firefox112 --- disabled

People

(Reporter: rmader, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

Logs point to a GDK/X11 IO error here.

Severity: -- → S3
Component: Graphics → Widget: Gtk

So IIUC this should be reproducible locally by running:

MOZ_X11_EGL=1 LIBGL_ALWAYS_SOFTWARE=1 ./mach marionette-test --enable-webrender testing/marionette/harness/marionette_harness/tests/unit/test_window_rect.py

I verified that the env variable is taken up - MOZ_ENABLE_WAYLAND=1 also works and produces the expected results (test including set_positon fail there).

I can't reproduce the crash here - all I get is an occasional:

FAIL testing/marionette/harness/marionette_harness/tests/unit/test_window_rect.py TestWindowRect.test_resize_larger_than_screen - AssertionError: 1536 != 3072

But it happens on both, EGL and GLX (not on Wayland though). So it looks to me like this might be a mesa bug that's already fixed in newer versions (21.0 here).

Ouch, unfortunately this was a typo by me: MOZ_x11_EGL instead of MOZ_X11_EGL. Once corrected, things fail again: https://treeherder.mozilla.org/jobs?repo=try&revision=c47509a1ed13f382b480cb4941b54ccc5801b833&selectedTaskRun=GgBWjnDoTPyIYXLksRYS-Q.0

Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---

bug 1723112 mentioned EGL and Marionette. Could bug 1712665 also play a role?

https://firefoxci.taskcluster-artifacts.net/GgBWjnDoTPyIYXLksRYS-Q/0/public/logs/live_backing.log

[task 2021-08-24T19:04:22.934Z] 19:04:22 INFO - [Parent 4599, Main Thread] WARNING: Failed to create EGLContext with khr_rbab_attribs: file /builds/worker/checkouts/gecko/gfx/gl/GLContextProviderEGL.cpp:735
[task 2021-08-24T19:04:22.935Z] 19:04:22 INFO - [Parent 4599, Main Thread] WARNING: Failed to create EGLContext with khr_robustness_attribs: file /builds/worker/checkouts/gecko/gfx/gl/GLContextProviderEGL.cpp:747

Is this relevant?

https://firefoxci.taskcluster-artifacts.net/GgBWjnDoTPyIYXLksRYS-Q/0/public/test/public/xsession-errors.log

gnome-session-check-accelerated: GL Helper exited with code 512
gnome-session-check-accelerated: GLES Helper exited with code 512
gnome-session-binary[57]: WARNING: Could not get session id for session. Check that logind is properly installed and pam_systemd is getting used at login.
_IceTransmkdir: ERROR: euid != 0,directory /tmp/.ICE-unix will not be created.
gnome-session-binary[57]: WARNING: Could not parse desktop file nm-applet.desktop or it references a not found TryExec binary
gnome-keyring-daemon: insufficient process capabilities, insecure memory might get used
GNOME_KEYRING_CONTROL=/builds/worker/.cache/keyring-YPMT80
gnome-keyring-daemon: insufficient process capabilities, insecure memory might get used
gnome-keyring-daemon: insufficient process capabilities, insecure memory might get used
GNOME_KEYRING_CONTROL=/builds/worker/.cache/keyring-YPMT80
GNOME_KEYRING_CONTROL=/builds/worker/.cache/keyring-YPMT80
SSH_AUTH_SOCK=/builds/worker/.cache/keyring-YPMT80/ssh

(gnome-shell:278): mutter-WARNING **: 18:55:38.099: Failed to use linear monitor configuration: Invalid mode 1600x1200 (-nan) for monitor 'unknown unknown'

(gnome-shell:278): mutter-WARNING **: 18:55:38.099: Failed to use fallback monitor configuration: Invalid mode 1600x1200 (-nan) for monitor 'unknown unknown'
Window manager warning: Display “:0†already has a window manager; try using the --replace option to replace the current window manager.gnome-session-binary[57]: WARNING: App 'org.gnome.Shell.desktop' exited with code 1

(gnome-shell:332): mutter-WARNING **: 18:55:38.453: Failed to use linear monitor configuration: Invalid mode 1600x1200 (-nan) for monitor 'unknown unknown'

(gnome-shell:332): mutter-WARNING **: 18:55:38.454: Failed to use fallback monitor configuration: Invalid mode 1600x1200 (-nan) for monitor 'unknown unknown'
Window manager warning: Display “:0†already has a window manager; try using the --replace option to replace the current window manager.gnome-session-binary[57]: WARNING: App 'org.gnome.Shell.desktop' exited with code 1
gnome-session-binary[57]: WARNING: App 'org.gnome.Shell.desktop' respawning too quickly
gnome-session-binary[57]: CRITICAL: We failed, but the fail whale is dead. Sorry....

(In reply to Darkspirit from comment #5)

bug 1723112 mentioned EGL and Marionette. Could bug 1712665 also play a role?

bug 1723112 looks unrelated to me - there headless mode is used and libegl only gets mentioned because it runs in glxtest even in headless mode.

bug 1712665 could indeed play a role and we should fix it either way - will look into it.

https://firefoxci.taskcluster-artifacts.net/dWu5td-QRL6ZyGGG11X_mw/0/public/logs/live_backing.log

[task 2021-08-25T09:59:38.115Z] executing ['/builds/worker/bin/test-linux.sh', '--setpref=toolkit.asyncshutdown.log=true', '--setpref=media.peerconnection.mtransport_process=false', '--setpref=network.process.enabled=false', '--allow-software-gl-layers', '--enable-webrender', '--setpref=layers.d3d11.enable-blacklist=false', '--download-symbols=true']

[task 2021-08-25T10:01:03.351Z] 10:01:03 INFO - 'MOZ_LAYERS_ALLOW_SOFTWARE_GL': '1',

[task 2021-08-25T09:59:38.121Z] ++ VERSION='18.04.5 LTS (Bionic Beaver)'

WARNING: GLX_swap_control unsupported, ASAP mode may still block on buffer swaps.: file /builds/worker/checkouts/gecko/gfx/gl/GLContextProviderGLX.cpp:225
WARNING: SGI_video_sync unsupported. Falling back to software vsync.: file /builds/worker/checkouts/gecko/gfx/thebes/gfxPlatformGtk.cpp:870

So is this LIBGL_ALWAYS_SOFTWARE=1 MOZ_X11_EGL=1 MOZ_WEBRENDER=1 ./firefox on Ubuntu 18.04 with software vsync and dmabuf webgl enabled by default?

Testing on my Debian Testing:
GALLIUM_DRIVER=softpipe is super slow.
GALLIUM_DRIVER=llvmpipe is good. Hopefully llvmpipe is the default on Ubuntu 18.04?
The third option seems removed:
$ GALLIUM_DRIVER=swr LIBGL_ALWAYS_SOFTWARE=1 MOZ_X11_EGL=1 MOZ_WEBRENDER=1 ./firefox

[GFX1-]: glxtest: libEGL initialize failed
[GFX1-]: glxtest: X error, error_code=158, request_code=150, minor_code=6
[GFX1-]: glxtest: process failed (exited with status 1)
libGL error: failed to create dri screen
libGL error: failed to load driver: swrast
[GFX1-]: Failed GL context creation for WebRender: 0
[GFX1-]: FEATURE_FAILURE_WEBRENDER_INITIALIZE_UNSPECIFIED
[GFX1-]: Failed to connect WebRenderBridgeChild.
[GFX1-]: Fallback WR to SW-WR

No EGL and no GLX (bug 1680512) = SW-WR

bug 1709585 and bug 1709586 seem to run on hardware (Worker Group: mdc1, Worker ID: t-linux64-ms-011), but this one runs in Docker:
https://firefox-ci-tc.services.mozilla.com/tasks/dWu5td-QRL6ZyGGG11X_mw

Worker Group: us-east-1
Worker ID: i-0fa95d586b5398196)
docker-image-ubuntu1804-test

https://searchfox.org/mozilla-central/rev/00be3c92c269d789663791cf518161d0f47c9b96/taskcluster/ci/docker-image/kind.yml#56
https://searchfox.org/mozilla-central/source/taskcluster/docker/recipes/ubuntu1804-test-system-setup-base.sh
-> libegl-mesa0 could simply be missing here. This xvfb tutorial explicitly installs it.

(In reply to Darkspirit from comment #8)

...
-> libegl-mesa0 could simply be missing here. This xvfb tutorial explicitly installs it.

To me it does not look like it directly fails - it only fails on specific tasks such as:

[task 2021-08-25T10:11:34.585Z] 10:11:34     INFO -  1629886294583	Marionette	TRACE	[37] MarionetteCommands actor created for window id 4294967297
[task 2021-08-25T10:11:34.588Z] 10:11:34     INFO -  1629886294588	Marionette	DEBUG	21 <- [1,6,null,{"value":{"width":1600,"height":1200}}]
[task 2021-08-25T10:11:34.589Z] 10:11:34     INFO -  1629886294589	Marionette	DEBUG	21 -> [0,7,"WebDriver:SetWindowRect",{"x":null,"y":null,"height":1040,"width":1280}]
[task 2021-08-25T10:11:34.591Z] 10:11:34     INFO -  1629886294591	Marionette	DEBUG	21 <- [1,7,null,{"x":0,"y":0,"width":1280,"height":1040}]
[task 2021-08-25T10:11:34.592Z] 10:11:34     INFO -  1629886294592	Marionette	DEBUG	21 -> [0,8,"WebDriver:SetWindowRect",{"x":null,"y":null,"height":2400,"width":3200}]
[task 2021-08-25T10:11:34.618Z] 10:11:34     INFO -  1629886294617	Marionette	DEBUG	21 <- [1,8,null,{"x":0,"y":0,"width":3200,"height":2400}]
[task 2021-08-25T10:11:35.039Z] 10:11:35     INFO -  Gdk-Message: 10:11:35.038: firefox: Fatal IO error 0 (Success) on X server :0.

One random guess of mine is that:

  • the test runs on software mesa
  • EGL may require more ram than GLX, e.g. because it's double buffered or so
  • allocating very big buffers (3200x2400) thus gets us OOMed or simply failes

So one thing to try here could be increasing the RAM limit - maybe it works around the issue.

From matrix:

it would be --worker-override "t-linux-large=gecko-t/t-linux-xlarge"
or your can adjust instance-size from default -> xlarge
where to edit: https://searchfox.org/mozilla-central/source/taskcluster/ci/test/marionette.yml#49
example of using xlarge: https://searchfox.org/mozilla-central/source/taskcluster/ci/test/awsy.yml#8

A corresponding try run unfortunately still fails[1], BUT appears to get slightly further:

[task 2021-08-26T14:36:39.114Z] 14:36:39     INFO -  1629988599114	Marionette	TRACE	[37] MarionetteCommands actor created for window id 4294967297
[task 2021-08-26T14:36:39.121Z] 14:36:39     INFO -  1629988599120	Marionette	DEBUG	21 <- [1,6,null,{"value":{"width":1600,"height":1200}}]
[task 2021-08-26T14:36:39.123Z] 14:36:39     INFO -  1629988599122	Marionette	DEBUG	21 -> [0,7,"WebDriver:SetWindowRect",{"x":null,"y":null,"height":1040,"width":1280}]
[task 2021-08-26T14:36:39.126Z] 14:36:39     INFO -  1629988599125	Marionette	DEBUG	21 <- [1,7,null,{"x":0,"y":0,"width":1280,"height":1040}]
[task 2021-08-26T14:36:39.128Z] 14:36:39     INFO -  1629988599127	Marionette	DEBUG	21 -> [0,8,"WebDriver:SetWindowRect",{"x":null,"y":null,"height":2400,"width":3200}]
[task 2021-08-26T14:36:39.154Z] 14:36:39     INFO -  1629988599153	Marionette	DEBUG	21 <- [1,8,null,{"x":0,"y":0,"width":3200,"height":2400}]
[task 2021-08-26T14:36:39.160Z] 14:36:39     INFO -  1629988599160	Marionette	DEBUG	21 -> [0,9,"WebDriver:GetWindowRect",{}]
[task 2021-08-26T14:36:39.164Z] 14:36:39     INFO -  1629988599163	Marionette	DEBUG	21 <- [1,9,null,{"x":0,"y":0,"width":3200,"height":2400}]
[task 2021-08-26T14:36:39.168Z] 14:36:39     INFO -  1629988599168	Marionette	DEBUG	21 -> [0,10,"WebDriver:SetWindowRect",{"x":0,"y":0,"height":1040,"width":1280}]
[task 2021-08-26T14:36:39.174Z] 14:36:39     INFO -  [Child 7127, Main Thread] WARNING: Scrolled rect smaller than scrollport?: file /builds/worker/checkouts/gecko/layout/generic/nsGfxScrollFrame.cpp:7067
[task 2021-08-26T14:36:39.175Z] 14:36:39     INFO -  [Child 7127, Main Thread] WARNING: Scrolled rect smaller than scrollport?: file /builds/worker/checkouts/gecko/layout/generic/nsGfxScrollFrame.cpp:7070
[task 2021-08-26T14:36:39.187Z] 14:36:39     INFO -  1629988599186	Marionette	DEBUG	21 <- [1,10,null,{"x":0,"y":0,"width":3200,"height":2400}]
[task 2021-08-26T14:36:39.195Z] 14:36:39     INFO -  1629988599194	Marionette	DEBUG	21 -> [0,11,"WebDriver:ExecuteScript",{"script":"return document.fullscreenElement;","args":[],"newSandbox":true,"sandbox":null,"line":52,"filename":"tests/testing/marionette/harness/marionette_harness/tests/unit/test_window_rect.py"}]
[task 2021-08-26T14:36:39.206Z] 14:36:39     INFO -  [Child 7127, Main Thread] WARNING: Scrolled rect smaller than scrollport?: file /builds/worker/checkouts/gecko/layout/generic/nsGfxScrollFrame.cpp:7067
[task 2021-08-26T14:36:39.206Z] 14:36:39     INFO -  [Child 7127, Main Thread] WARNING: Scrolled rect smaller than scrollport?: file /builds/worker/checkouts/gecko/layout/generic/nsGfxScrollFrame.cpp:7070
[task 2021-08-26T14:36:39.686Z] 14:36:39     INFO -  Gdk-Message: 14:36:39.685: firefox: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.

So we might be on the right track here. As said in comment 2 I also suspect this to be a Mesa bug as I can't reproduce the issue locally - maybe an update to Ubuntu 20.04.3, which has a very recent Mesa (given the hardware enablement stack is enabled).

1: https://treeherder.mozilla.org/jobs?repo=try&revision=09e173171492804ec4e46d0d7640df34641bed90&selectedTaskRun=bhZaJ2FHS6OOjs390c3OQw.0

See Also: → 1725245

Correction: another run with xlarge doesn't show us getting further, so the run above was probably an outlier. As the crash does not reproduce locally, I'll hope for bug 1725245 to maybe fix this.

Interestingly the test in question passes in an optimized build. It later on fails in test_resize_to_available_screen_size, but that could be unrelated (possibly related to bug 1684194, need to confirm).

https://treeherder.mozilla.org/jobs?repo=try&revision=265a6aa60911e6f7126abdc7250993bbb22e8745&selectedTaskRun=bfo0xsi3SJ-oBZjQV6S5Ow.0

FTR, I tried again if I can reproduce the issue locally, but it does not reproduce on my mashine.

Moving this to bug 788319.

Blocks: linux-egl
No longer blocks: 1695933

This can currently not be tested because of bug 1732671 Update: works again, that bug seems to fixed.

Depends on: 1732671

Some small updates:

  1. the test fails affect at least test-linux1804-64-qr/debug-marionette-e10s, test-linux1804-64-qr/debug-marionette-fis-e10s, test-linux1804-64-qr/opt-marionette-e10s and test-linux1804-64-qr/opt-marionette-fis-e10s, i.e. affect both debug and optimized builds.
  2. They can be run via ./mach try fuzzy --full --env MOZ_X11_EGL=1.
  3. The test usually fails in TestWindowRect.test_resize_larger_than_screen(link) at [0,8,"WebDriver:SetWindowRect",{"x":null,"y":null,"height":2400,"width":3200}], but sometimes successfully finishes and only fails early in the following TestWindowRect.test_resize_to_available_screen_size test (link).
  4. The test currently runs on llvmpipe, Mesa 20.0.8, LLVM 10.0.0 (Ubuntu 18.04.6).
  5. Local reproducer should AFAIK be: MOZ_X11_EGL=1 LIBGL_ALWAYS_SOFTWARE=1 MOZ_WEBRENDER=1 ./mach marionette-test --enable-webrender testing/marionette/harness/marionette_harness/tests/unit/test_window_rect.py.
  6. Not yet sure if increasing the ram size (--worker-override "t-linux-large=gecko-t/t-linux-xlarge") helps - if it does, not much.
Blocks: 1744060
No longer blocks: linux-egl

Please test disabling GLX vsync.

(In reply to Darkspirit from comment #18)

Please test disabling GLX vsync.

Already tried, doesn't help :(

Can you execute $ xrandr --listproviders in a try build (bug 1742708)?

(In reply to Darkspirit from comment #20)

Can you execute $ xrandr --listproviders in a try build (bug 1742708)?

This is on a headless system AFAIK, i.e. no graphics hardware (mesa driver is llvmpipe).

Blocks: linux-egl
No longer blocks: 1744060
Depends on: 1725245
OS: Unspecified → Linux
Hardware: Unspecified → Desktop
See Also: 1725245
You need to log in before you can comment on or make changes to this bug.