Open Bug 1914495 Opened 27 days ago Updated 2 days ago

gnome-screenshot binary seems to be missing on Linux 22.04 x64 Wayland opt Reftests test-linux2204-64-wayland/opt-crashtest ("Failed to spawn child process “/usr/bin/gnome-screenshot” (No such file or directory)")

Categories

(Infrastructure & Operations :: RelOps: Posix OS, defect, P2)

Tracking

(Not tracked)

ASSIGNED

People

(Reporter: dholbert, Assigned: aerickson)

References

Details

(Whiteboard: [relops-linux])

Bug 1914343 hit a harness timeout during a crashtest run, and tried to take a screenshot, but the screenshot failed, with:

Failed to spawn child process “/usr/bin/gnome-screenshot” (No such file or directory)

Log link: https://treeherder.mozilla.org/logviewer?job_id=471302633&repo=autoland

It looks like that error message is saying that gnome-screenshot binary is missing. I think our usage of gnome-screenshot goes back to bug 1835707, and it was noted there that we needed to install a new package in order for the utility to be available (bug 1835707 comment 12, bug 1835707 comment 17). So: I'm guessing that the package may not be installed on this particular configuration, which is why the binary is missing.

ahal & aerickson, looks like you both were looking into adding/verifying that gnome-screenshot was available over in bug 1835707 -- could one of you check why it's not present on this particular config, and get it added so that this sort of test-failure can be properly screenshotted?

Thanks!

Flags: needinfo?(aerickson)

https://bugzilla.mozilla.org/show_bug.cgi?id=1835707 was the Virtualbox VM, this is the bare GCP VM running Wayland.

This is a new image that's based on a bare GCP VM. gnome-screenshot was missed during image creation and validation (:alissy didn't end up using it, instead choosing to record video I think).

I have a PR to add it (https://github.com/mozilla-platform-ops/monopacker/pull/146) and have created an image, will get a pool setup for testing shortly (tracking work in https://mozilla-hub.atlassian.net/browse/RELOPS-1063).

Assignee: nobody → aerickson
Status: NEW → ASSIGNED
Flags: needinfo?(aerickson)

Thanks!

Duplicate of this bug: 1912561

The severity field is not set for this bug.
:jmaher, could you have a look please?

For more information, please visit BugBot documentation.

Flags: needinfo?(jmaher)
Severity: -- → S2
Flags: needinfo?(jmaher)
Priority: -- → P2
Component: General → RelOps: Posix OS
Product: Testing → Infrastructure & Operations
Whiteboard: [relops-linux]

Image deployed to a qualification worker pool. I've got a general qualification mach try run at https://treeherder.mozilla.org/jobs?repo=try&revision=0bbf3b6c735f2849ac0520d1599e36d20ef58642.

:dholbert, will you run some test jobs that try to use gnome-screenshot?

You should be able to target the test pool I've set up with ./mach try fuzzy --worker-override t-linux-wayland=gecko-t/t-linux-2204-wayland-relsre-image-qual ....

Flags: needinfo?(dholbert)

Thanks for doing that!

I actually don't know offhand how to reliably cause a test run to invoke gnome-screenshot; it only really happens in certain "things are looking pretty broken" scenarios.

In bug 1914343, it was a harness timeout, which I think maybe (?) we can trigger by adding an infinite loop to a testcase. Let's see if that does it:
https://treeherder.mozilla.org/jobs?repo=try&revision=1e8d4fba557e634e55f0b4f73237c939b7c9d861

Looks like that worked.

Another way to get a timeout is to increase the internal reftest timeout here

https://searchfox.org/mozilla-central/rev/ddd14e6b96331624d56d56b78c1d9a359d3e57d3/layout/tools/reftest/reftest.sys.mjs#234

and then just stick reftest-wait on a test and never remove it.

(In reply to Timothy Nikkel (:tnikkel) from comment #7)

Looks like that worked.

Indeed -- we successfully got a screenshot!

Comparing...

The "bad" log that brought this issue to our attention (linked in comment 0) had:

[task 2024-08-22T09:01:49.643Z] 09:01:49    ERROR - REFTEST ERROR | layout/generic/crashtests/1032613-1.svg | application timed out after 370 seconds with no output
[task 2024-08-22T09:01:49.694Z] 09:01:49  WARNING - REFTEST WARNING | Force-terminating active process(es).
[task 2024-08-22T09:01:49.710Z] 09:01:49     INFO - REFTEST TEST-INFO | started process screentopng
[task 2024-08-22T09:01:49.994Z] 09:01:49     INFO -  /task_172431526227533/build/tests/bin/screentopng: g_spawn_sync() of gnome-screenshot failed: Failed to spawn child process “/usr/bin/gnome-screenshot” (No such file or directory)
[task 2024-08-22T09:01:49.994Z] 09:01:49     INFO -  /task_172431526227533/build/tests/bin/screentopng: failed to create screenshot Wayland/GdkPixbuf
[task 2024-08-22T09:01:49.997Z] 09:01:49     INFO - REFTEST TEST-INFO | screentopng: exit 1

The "good" log in my Try run in comment 6 has this instead, showing screentopng successfully completing (presumably successfully invoking gnome-screenshot internally):

[task 2024-09-12T22:46:02.508Z] 22:46:02    ERROR - REFTEST ERROR | layout/generic/crashtests/1032613-1.svg | application timed out after 370 seconds with no output
[task 2024-09-12T22:46:02.508Z] 22:46:02  WARNING - REFTEST WARNING | Force-terminating active process(es).
[task 2024-09-12T22:46:02.508Z] 22:46:02     INFO - REFTEST TEST-INFO | started process screentopng
[task 2024-09-12T22:46:03.796Z] 22:46:03     INFO - REFTEST TEST-INFO | screentopng: exit 0

And the screenshot is linked in the log header and looks "real":
https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/DO4V1FyETV2X19FIzoGHlA/runs/0/artifacts/public/test_info/mozilla-test-fail-screenshot_51qa8if0.png

--> Kicking needinfo back to Andrew for whatever remaining steps remain here (or just closing-as-fixed) . Thanks again!

Flags: needinfo?(dholbert)
Flags: needinfo?(aerickson)

one probably-unrelated-but-notable-thing in my "good" Try log -- it shows a ERROR MissingSystemInfo log-line for the minidump-processing, and I'm not sure I've seen that before (and it's not present in the original 'bad' log here when it's doing the equivalent minidump-processing).

Here's that line in my Try run:
https://treeherder.mozilla.org/logviewer?job_id=474098451&repo=try&lineNumber=18098

[task 2024-09-12T22:46:16.784Z] 22:46:16     INFO - Crash dump filename: /tmp/tmp5lssyx1i.mozrunner/minidumps/75db50f4-ceb4-0b0e-d642-1563014b3d0e.dmp
[task 2024-09-12T22:46:16.785Z] 22:46:16     INFO - stderr from minidump-stackwalk:
[task 2024-09-12T22:46:16.785Z] 22:46:16     INFO - ERROR MissingSystemInfo - Error processing dump: The system information stream was not found

I'm not sure if that's normal for a Try run (e..g if it's referencing some artifact that we don't bother generating in Try builds), or was just a weird one-off; but I wanted to mention it in case it's something that's missing on this new configuration (particularly with the changes that you've made recently to create & improve this pool).

I don't think the minidump error is related to the gnome-screenshot change (I've only added a new binary). The error seems to come from https://github.com/rust-minidump/rust-minidump/blob/8f0de712671302acf63d08ffacca3b086f03c0f1/minidump-processor/src/processor.rs#L522. Not sure why that would fail.

Thanks for the confirmation. I will proceed with rolling this new image out to the production pools.

Flags: needinfo?(aerickson)

Image promoted to prod via https://github.com/mozilla-releng/fxci-config/pull/100.

Keeping this bug open to track deployment of a L3 image and L1 and L3 arm64 images with this change.

You need to log in before you can comment on or make changes to this bug.