1895092 - Ubuntu 18.04 test runners have test failures due to X failing to start and a resulting unexpected/unpredictable system theme (usually Adwaita, but sometimes Ambiance which leads to different results)

Reporter

Description

•

2 months ago

In investigating bug 1894564 and related bugs, I've discovered that our Ubuntu 18.04 test runners seem to use an unpredictable GTK theme.

I kicked off a try run with logging to print the GTK system theme that we detect and use for our own native coloring. In "good" test runs, my logging prints "Adwaita" for both the light theme and dark theme. But in "bad" test runs, my logging prints "Ambiance" for the light theme (and still "Adwaita" for the dark theme).

I suspect this is an indication that these machines are inconsistently configured, or something along those lines... (In a perfect world, our tests should all be valid regardless of theme, but there are various reasons that that's not universally true right now -- e.g. fuzzy annotations that vary per platform, system icon dependencies as in bug 1894445, etc.)

This seems to be a recent regression and be leading to a lot of new & frequent intermittent bugs, so we should try to get to the bottom of this and get it addressed ASAP.

Daniel Holbert [:dholbert]

Reporter

Comment 1

•

2 months ago

Here's a sample "good" log (with Adwaita, no mention of Ambiance):
https://treeherder.mozilla.org/logviewer?job_id=456974466&repo=try&lineNumber=1844

Here's a sample "bad" log (with Ambiance as our detected "light" system theme):
https://treeherder.mozilla.org/logviewer?job_id=456974462&repo=try&lineNumber=1839

(These logs are the same job on the same try push, so there's no reason they should differ.)

I'm suspicious that this is fallout from the machine configuration change in bug 1888460, as noted in bug 1888460 comment 14. Tentatively flagging as a regression from that bug.

Regressions: 1888460

Daniel Holbert [:dholbert]

Reporter

Comment 2

•

2 months ago

•

Edited

(In reply to Daniel Holbert [:dholbert] from comment #1)

I'm suspicious that this is fallout from the machine configuration change in bug 1888460

I confirmed bug 1888460 as the thing-that-caused-this, with one caveat.

I wrote a patch that intentionally crashes in nsLookAndFeel::PerThemeData::Init if we notice that we're using the Ambiance theme, and I pushed that patch to Try (with just reftest tasks), parented on two hg commits:
(1) bug 1888460's commit (the one that switched us to the new test runners):

URL: https://treeherder.mozilla.org/jobs?repo=try&revision=2ef749a98db14ffa537982180c2cc2a210f4d604
This one has a bunch of reftest oranges where we detected Ambiance and immediately crashed (per my patch). It's just a fraction of the test runs, though it's substantially more for tsan/asan than for non-tsan/asan. (Not sure what to make of that.)
So: some small-but-substantial portion of the reftest tasks are using the Ambiance GTK theme.

(2) the commit right before that one (where we're using the previous test runner config):

URL: https://treeherder.mozilla.org/jobs?repo=try&revision=1612019f08d077ea57cccbfe9a072ac3c1a49960
This one has no such reftest oranges.
So: none of the reftest tasks are using the Ambiance GTK theme.

However, interestingly, here's the caveat that I alluded to: both commits show that the "Linux x64 Shippable Bpgo(run)" task is failing 100%-of-the-time for both pushes, due to having the Ambiance theme.

So it seems the presence of the Ambiance theme isn't entirely new in our CI tasks, but it used to be only/consistently on the Bpgo(run) task, it seems? Whereas now, as of bug 1888460, we're getting runners with the Ambiance theme for reftest tasks as well, where it seems to be causing [or associated with] test failures due to e.g. having a different icon set (bug 1894445) and transparent scrollbar tracks (bug 1894564). These failures manifest as intermittents, but in fact they're not really intermittent -- they're pretty reliable if/when we get a runner with this theme config.

We could conceivably adjust our tests & annotations to account for this variability (like the paper-over patch that I landed in bug 1894564). But also: ideally we should have these runners use a reliable system theme, for reproducibility & consistency (at least for the same treeherder "track"); it's weird that you might retrigger a task and end up getting a retrigger that's got a completely different system-configuration.

jmaher, can you make sense of this?

Flags: needinfo?(jmaher)

Emilio Cobos Álvarez (:emilio)

Updated

•

2 months ago

Keywords: regression

Regressed by: 1888460

No longer regressions: 1888460

Emilio Cobos Álvarez (:emilio)

Comment 3

•

2 months ago

Note that adwaita should always be available so a more reliable potential workaround is just setting the GTK_THEME=Adwaita environment variable on our tests, but we should ideally get to the bottom of this...

BugBot [:suhaib / :marco/ :calixte]

Comment 4

•

2 months ago

Set release status flags based on info from the regressing bug 1888460

status-firefox125: --- → affected

status-firefox126: --- → affected

status-firefox127: --- → affected

status-firefox-esr115: --- → unaffected

Donal Meehan [:dmeehan]

Updated

•

2 months ago

status-firefox125: affected → unaffected

status-firefox-esr115: unaffected → affected

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 5

•

2 months ago

thanks for the investigation here. The change made was more on the backend, not the actual worker. We use the exact same docker image on the same instance type, just without a scratch disk. What we found was the scratch disk wasn't available to use, both in the ubuntu 18 worker and the host OS docker image. Now it could be that there is something on the docker host where the scratch disk was used for caching or something which then would affect timing.

probably the right thing we should do is ensure at the start of a test job before launching the browser that we are running the correct theme (whichever we choose to be desired). This would give consistency longer term to our CI.

Why do we get different themes? I wonder if the theme sometimes changes mid run, i.e. something happens to the OS and we fall back to Ambiance.

There is a pattern I notice, specifically the failures all take >5 minutes to download and decompress the docker image (typically 10+ minutes), whereas the successful tasks all do this in <5 minutes (usually 4 minutes), specifically this is not downloading or decompressing, but the loading of the docker image:

[taskcluster 2024-05-04 22:34:54.694Z] Loading docker image from downloaded archive.
[taskcluster 2024-05-04 22:49:43.820Z] Image 'public/image.tar.zst' from task 'VxysSFtSS96HTdlQMrt6nQ' loaded.

The workers are the same workers, no pattern I see related to region. I can think of a few reasons this would take longer:

host machine is busy doing something else
host machine has bad/invalid hardware (ram/disk going bad)

I don't know conclusively if there are passing reftest jobs that take >5 minutes to load the docker image. I did look at ~40 logs and the pattern was really clear. In fact, just loading the docker image faster would result in overall cost savings and less intermittents.

Flags: needinfo?(jmaher)

Daniel Holbert [:dholbert]

Reporter

Comment 6

•

2 months ago

•

Edited

(In reply to Joel Maher ( :jmaher ) (UTC -8) from comment #5)

probably the right thing we should do is ensure at the start of a test job before launching the browser that we are running the correct theme (whichever we choose to be desired). This would give consistency longer term to our CI.

That sounds great, as long as we can either ensure or re-validate that the theme stays correct throughout the run.

Why do we get different themes?

That is a good question. :) Theme that we're using just comes from a call to a GTK api, so something about the system GTK configuration is intermittently different, it seems.

I wonder if the theme sometimes changes mid run, i.e. something happens to the OS and we fall back to Ambiance.

That's an interesting idea, yeah. This does seem possible given that this seems to be more frequent on (though not exclusive to) "heavier" test-runs like TSAN/ASAN. Though note that we are hitting this issue the first time that Firefox starts up (in e.g. the "bad" try run linked in comment 2 here); so if something goes wrong partway through, it's earlier than when Firefox itself starts.
[EDIT: sorry, left this^ thought unfinished at first. editing comment in-place to fill out what I had meant to say.]

There is a pattern I notice, specifically the failures all take >5 minutes to download and decompress the docker image (typically 10+ minutes), whereas the successful tasks all do this in <5 minutes (usually 4 minutes), specifically this is not downloading or decompressing, but the loading of the docker image

I noticed a similar pattern in bug 1894564, but I also noticed some counterexamples where we had reftest failures of this sort (with transparent scrollbars and an unexpected icon-set) and yet still the log was pretty short. e.g. this one, where the whole job was "only" 22min long, and only 1 second elapses from the start of the log to Image 'public/image.tar.zst' from task 'Xy1CZOamS5SeLiFFT-mRrA' loaded:
https://treeherder.mozilla.org/logviewer?job_id=456585940&repo=autoland&lineNumber=12227
https://firefoxci.taskcluster-artifacts.net/bUKWv5afRVq1mHu_D8nj7w/0/public/logs/live_backing.log

Not sure what to make of that. (There's no mention of docker in this log, so perhaps the setup is slightly different.)

Daniel Holbert [:dholbert]

Reporter

Comment 7

•

2 months ago

[Note: "good"/"bad" below are just with respect to tests failing/passing.]

FWIW, I just spun up a fresh Ubuntu 18.04 VM locally -- inside gnome-boxes, installed from the latest official ISO which is ubuntu-18.04.6-desktop-amd64.iso (released in 2021) I think. Interestingly, its default configuration seems to match the "bad" state here.

In particular, in my fresh Ubuntu 18.04 VM:

gnome-tweaks shows that my theme is Ambiance, though it does seem to indicate that Adwaita is the default. (I suspect Adwaita is default-for-the-Gnome-project, but Ambiance is default-for-Ubuntu.)
In Firefox Nightly, about:support shows OS Theme: Ambiance / Adwaita
In Firefox Nightly, scrollbars have transparent trough/track, e.g. at data:text/html,<body style="background:red;height:9000px"> (including if I hover them, and if I enable Firefox Preference's "Always Show Scrollbars" checkbox)
However: if I explicitly set the GTK_THEME=Adwaita environmental variable when running Firefox, then I do get troughs/backgrounds on my scrollbars (when hovered or with always-show-scrollbars checkbox checked).
In Firefox Nightly, the moz-icon URIs from bug 1894445 comment 5 look just like the reftest screenshot there (they show two different blank-document icons), and that's true regardless of whether I run with Adwaita or not.

So now it's looking like the "bad" behavior here is just what a stock Ubuntu 18.04.6 looks like, which is interesting. And running with GTK_THEME=Adwaita env variable is one way to get "good" behavior locally at least for scrollbars, but I don't know how to get the "good" behavior with respect to icons. So: how do/did we ever get the "good" behavior in our test runners?

Maybe we've been getting that by accident, due to a race condition or some other sort of bit of chance?
Or, maybe some of our tests runners were based on older vs. newer snapshots of Ubuntu 18.04, and the older ones are what we've been using up until now?
Or, maybe some of our test runners were using some sort of more-minimal installation of Ubuntu (e.g. server vs. desktop, or something along those lines)?

Daniel Holbert [:dholbert]

Reporter

Comment 8

•

2 months ago

•

Edited

(In reply to Daniel Holbert [:dholbert] from comment #7)

Or, maybe some of our test runners were using some sort of more-minimal installation of Ubuntu (e.g. server vs. desktop, or something along those lines)?

FWIW I just created a second Ubuntu 18.04.6 VM, this time using the "server" ISO which lacks a graphical interface, plus the "gnome" meta-package which gets me a desktop environment.

That one does give me "OS Theme: Adwaita / Adwaita" shown in Firefox about:support, with opaque gray scrollbar backgrounds; and it renders moz-icon://.txt?size=16 with an icon that matches the one that we're expecting in bug 1894445 comment 3, though the moz-icon://bogus-unrecognized-icon.bogusunknown845?size=16 URI from that bug seems to just not to be recognized in this environment (and renders as a broken image icon if I reference it in an img). So: this Ubuntu-18.04-server-install-plus-gnome seems closer in terms of behavior to what we've been seeing on our "good" test runs.

I can get into essentially the same "close to good test-run" configuration if I remove the light-themes and ubuntu-mono packages from a full Ubuntu 18.04 desktop installation, too. (light-themes is what provides Ambiance; ubuntu-mono provides the icon-theme that's shown in the unexpected failures in bug 1894445).

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 9

•

2 months ago

I looked at 4500 linux logs (2400 downloaded, 110 logs >5 minutes to load docker image) from the top 3 m-c pushes and found many logs in CI that take >4 minutes to load and have success, but very few reftest logs fall into that category.

setting up ubuntu to run a full desktop inside of docker is not straightforward and is a bit hacky. As this is in-tree, anyone can edit things and it might not be obvious what changed to cause a rebuild.

should we force the theme to be GTK_THEME=Ambiance ? This could be done and we launch the browser fresh for each manifest (reftest, mochitest, wpt), so this should help reduce the chance of it failing- we could also check if the theme changes during running and post an error or flag the job/instance as bad. I am looking into how the theme might change during a test run

Daniel Holbert [:dholbert]

Reporter

Comment 10

•

2 months ago

•

Edited

(In reply to Joel Maher ( :jmaher ) (UTC -8) from comment #9)

should we force the theme to be GTK_THEME=Ambiance ?

Maybe yes, though Adwaita (not Ambiance)?

Considerations:

Neither Ambiance nor Adwaita is what modern Ubuntu uses (yaru), so from that perspective there's no strong reason to prefer either one.
Given that Ambiance is the default on Ubuntu, it must be missing in some of our test runs, or else it would have been used, right? So setting it in GTK_THEME might not help.
But GTK_THEME=Adwaita would probably help (it should consistently be available since it's part of base gnome).
However, that won't help with bug 1894445; changing GTK_THEME doesn't change the system icon set. (Maybe that's ~fine, but it's still weird that our system icons are variable and might cause some other variability outside of that bug... Likely all part of the same underlying strangeness.)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 11

•

2 months ago

Attached file Bug 1895092 - Ensure linux gtk theme is "Adwaita" for reftests. r=gbrown — Details

Phabricator Automation

Updated

•

2 months ago

Assignee: nobody → jmaher

Status: NEW → ASSIGNED

Daniel Holbert [:dholbert]

Reporter

Comment 12

•

2 months ago

Forcing the theme seems probably-fine, though it feels like a bit of a band-aid specifically for scrollbar-tracks (and maybe other coloring). It doesn't address other forms of instability that we were seeing here (that trace back to the same regressor). In particular:

There's still some new intermittency in font availability, per bug 1892222
There's still some new intermittency in icon theming, per bug 1894445 and bug 1812425

We can paper over that remaining intermittency by adjusting annotations, but it feels not-great and would be nice to get to the bottom of what's going on in those cases as well, since it seems to have started with the same commit and have the same (somewhat mysterious) root cause.

Phabricator Automation

Updated

•

2 months ago

Attachment #9400371 - Attachment description: Bug 1895092 - Ensure linux gtk theme is "Adwaita". r=gbrown → Bug 1895092 - Ensure linux gtk theme is "Adwaita" for reftests. r=gbrown

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 13

•

2 months ago

I am out of ideas for why fonts and icons would be changed as a result of removing the extra scratch disk. Given the theory that the theme is crashing or failing to fully load, I imagine the fonts are related to that as well. Maybe there is a timing issue or something or different issues with larger amounts of disk IO (reftests do run very fast).

Some thoughts on next steps:

We could move reftests back to workers with scratch disks.
for certain manifests we could run the reftests a little slower using slow-if (src)
something else I haven't thought of yet?

Daniel Holbert [:dholbert]

Reporter

Comment 14

•

2 months ago

•

Edited

(In reply to Joel Maher ( :jmaher ) (UTC -8) from comment #13)

I am out of ideas for why fonts and icons would be changed as a result of removing the extra scratch disk. Given the theory that the theme is crashing or failing to fully load, I imagine the fonts are related to that as well.

Right, it seems some collection of "inessential system resources" are occasionally not available.

Maybe there is a timing issue or something or different issues with larger amounts of disk IO (reftests do run very fast).

Note that when things go wrong and we get the wrong theme,, they go wrong before we even start running the tests; we get the wrong theme right at Firefox startup time.

Some thoughts on next steps:

We could move reftests back to workers with scratch disks.

This seems worth trying to me.

for certain manifests we could run the reftests a little slower using slow-if (src)

It's not clear to me that that annotation would make a difference here, though I might be misunderstanding.

something else I haven't thought of yet?

Ideally, it feels worth getting closer to the bottom of this, by e.g. testing much-reduced versions of our tasks to see if we can identify when/where things go wrong, and if the ever go from wrong to right or vice-versa partway through a task, etc. I'm not sure how easy it is to troubleshoot these runners, but e.g. simply seeing what gsettings get org.gnome.desktop.interface gtk-theme returns seems like a reasonable investigation tactic. (I would bet that this yields different results near the start of "good" vs. "bad" tasks, at least.) (note, this gsettings command is just cribbed from the attached patch in phabricator here.)

I'm not sure how much (more) time it's worth sinking into that, though, particularly if it's specific to these 18.04 machines & assuming that there's a sunset on our usage of Ubuntu 18.04 on the horizon anyway (which I imagine there might be before too long). So that makes me lean towards option (1) as a quick-and-easy return to a more reliable state, even if it's a bit unsatisfying.

Daniel Holbert [:dholbert]

Reporter

Comment 15

•

2 months ago

(In reply to Daniel Holbert [:dholbert] from comment #14)

Ideally, it feels worth getting closer to the bottom of this [...] e.g. simply seeing what gsettings get org.gnome.desktop.interface gtk-theme returns

I did a Try run to do a verison of this^, just logging what jmaher's attached lando getGtkTheme() function finds in runreftest.py.

It seems to always find Adwaita (what we've traditionally been getting in "good" runs), even in runs where Firefox is getting "Ambiance". That's... very strange.

Sample log:
https://firefoxci.taskcluster-artifacts.net/A0ZahdudRDC6U60KtMLCZQ/0/public/logs/live_backing.log

22:57:39     INFO -  ******dholbert: Python getGtkTheme returned: Adwaita
[...]
22:57:40     INFO - ****dholbert nsLookAndFeel::PerThemeData::Init() for theme Ambiance getting scrollbar colors:

Daniel Holbert [:dholbert]

Reporter

Comment 16

•

2 months ago

•

Edited

One new clue, earlier in the log: there's a step where we run Running command: pactl list modules short which lists a bunch of modules.

On "good" runs (where Firefox ends up automatically using Adwaita), that list has this at the end -- notably including some display/x11-related stuff:

[task 2024-05-04T23:20:59.177Z] 23:20:59     INFO -  24	module-filter-apply
[task 2024-05-04T23:20:59.177Z] 23:20:59     INFO -  25	module-x11-publish	display=:0
[task 2024-05-04T23:20:59.177Z] 23:20:59     INFO -  26	module-x11-bell	display=:0 sample=bell.ogg
[task 2024-05-04T23:20:59.177Z] 23:20:59     INFO -  27	module-x11-cork-request	display=:0
[task 2024-05-04T23:20:59.177Z] 23:20:59     INFO -  28	module-x11-xsmp	display=:0 session_manager=local/cc34549f0870:@/tmp/.ICE-unix/58,unix/cc34549f0870:/tmp/.ICE-unix/58
[task 2024-05-04T23:20:59.177Z] 23:20:59     INFO -  29	module-null-sink

source: https://firefoxci.taskcluster-artifacts.net/TuF1Oh7PSuWVljIrfejJYA/0/public/logs/live_backing.log

Whereas on "bad" runs (where we end up with Ambiance & test failures), that list ends doesn't have the X11 entries and instead just ends like this:

[task 2024-05-05T03:02:21.120Z] 03:02:21     INFO -  24	module-filter-apply
[task 2024-05-05T03:02:21.120Z] 03:02:21     INFO -  25	module-null-sink

Source: https://firefoxci.taskcluster-artifacts.net/MsmS-9LiSmGJt1PYFSTx6g/0/public/logs/live_backing.log

Aside from those module-x11-* entries, the list is otherwise the same.

So it looks like some X11 modules aren't getting loaded, or something, and so maybe we're rendering with a nerfed display in the bad runs, in some sense?

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 17

•

2 months ago

I wonder if we restart service and force load stuff. For example pulse and gnome-desktop might need extra love- good idea, I have some ideas on how we could have a pre-task sanity check.

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 18

•

2 months ago

the output you see is from here:
https://searchfox.org/mozilla-central/source/testing/mozharness/scripts/desktop_unittest.py#918

I am checking for module-x11 in the output of list modules and if it is missing trying to kill/restart, sleep, then check again. looking at the logs I downloaded from m-c, we miss these modules a lot of the time (40%); I wonder if it is a timing thing and a small sleep will help.

while the desire is to move to ubuntu 22.04 (or probably now 24.04), there seems to be lack of priority around that. Also Wayland vs X11 vs hybrid is muddying the water.

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 19

•

2 months ago

I also find it interesting that there is a high correlation of long docker image load times to failures and then the docker image fails to initialize pulse; this makes me wonder if there is a lack of consistent hardware available to our instances.

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 20

•

2 months ago

oh, and of course- why would adding a scratch disk reduce this? I think I need to test to see if a scratch disk added back results in the same number of longer docker image loads and the same number of missed modules/pulseaudio

Daniel Holbert [:dholbert]

Reporter

Comment 21

•

2 months ago

(heads-up: for organizational purposes, I'm going to mark the various bugs-that-seem-to-be-really-this-issue as depending on this bug.)

Daniel Holbert [:dholbert]

Reporter

Updated

•

2 months ago

Blocks: 1894445, 1894564, 1812425, 1892222, 1893370

Daniel Holbert [:dholbert]

Reporter

Updated

•

2 months ago

Blocks: 1894565, 1893192

Daniel Holbert [:dholbert]

Reporter

Updated

•

2 months ago

Blocks: 1893809

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 22

•

2 months ago

Attached file Bug 1895092 - retry task and worker if we do not start with gnome-session and pulseaudio. r=gbrown — Details

Pulsebot

Comment 23

•

2 months ago

Pushed by jmaher@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/bd094de94794
Ensure linux gtk theme is "Adwaita" for reftests. r=dholbert

Phabricator Automation

Updated

•

2 months ago

Attachment #9400984 - Attachment description: Bug 1895092 - terminate task and worker if we do not start with gnome-session and pulseaudio. r=gbrown → Bug 1895092 - retry task and worker if we do not start with gnome-session and pulseaudio. r=gbrown

Pulsebot

Comment 24

•

2 months ago

Pushed by jmaher@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/455cd83ca518
retry task and worker if we do not start with gnome-session and pulseaudio. r=gbrown

Cristian Tuns

Comment 25

•

2 months ago

Backed out for causing xpcshell exception

Flags: needinfo?(jmaher)

Natalia Csoregi [:nataliaCs]

Comment 26

•

2 months ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/bd094de94794

Status: ASSIGNED → RESOLVED

Closed: 2 months ago

status-firefox127: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → 127 Branch

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 27

•

1 month ago

the problem here is bugbug does test selection on autoland and didn't do it on try for me. On try the xpcshell manifests were evenly distributed among the 8 chunks, but on autoland 213 manifests were added to chunk 1 and the remaining 10 manifests were distributed among the other 7 chunks. This is always the case, but upon further investigation we found a seg fault on ubuntu which causes the window manager to not be fully active. In Bug 1896221 we handle this and once that is landed on central successfully I will reland this patch.

Flags: needinfo?(jmaher)

Pulsebot

Comment 28

•

1 month ago

Pushed by jmaher@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ecbe11f231b2
retry task and worker if we do not start with gnome-session and pulseaudio. r=gbrown

Cosmin Sabou [:CosminS]

Comment 29

•

1 month ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/ecbe11f231b2

Ryan VanderMeulen [:RyanVM][PTO June 24-28]

Updated

•

1 month ago

status-firefox126: affected → wontfix

status-firefox-esr115: affected → wontfix

Daniel Holbert [:dholbert]

Reporter

Comment 30

•

1 month ago

•

Edited

Perhaps worth uplifting these to beta? [EDIT: hmm, per next comment, maybe there's not something upliftable, yet beta still seems to be showing signs of this issue]

I'm noticing that some dependent bugs -- bug 1892222, bug 1894445, and bug 1812425 -- are still having a fair amount of intermittent unexpected-failures/passes on beta and release (to the level where it's resulting in "This failure happened more than 30 times this week! Resolving this bug is a high priority." and "Try to resolve this bug as soon as possible. If unresolved for 2 weeks, the affected test(s) may be disabled" comments from Intermittent Failures Robot).

Flags: needinfo?(jmaher)

Daniel Holbert [:dholbert]

Reporter

Comment 31

•

1 month ago

er, I guess this was fixed in the 127 Nightly timeframe, which is now current beta. Nonetheless, we seem to be getting substantial intermittent unexpected results on beta127 in the bugs that I referenced in the previous comment.

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 32

•

1 month ago

didn't we just move 127 to beta TODAY (may 27th?) - I looked at bug 1892222 and there were 2 instances on beta, both earlier hours of the day. We will know in a couple days if this is stopped on beta, then I would vote for leaving this alone and letting it merge in 4 weeks to release.

leaving the needinfo to look on wednesday

Timothy Nikkel (:tnikkel)

Comment 33

•

29 days ago

merge day was May 13 according to https://whattrainisitnow.com/calendar/

Daniel Holbert [:dholbert]

Reporter

Comment 34

•

29 days ago

•

Edited

Aha, thanks tnikkel -- the world makes sense, given a May 13th merge date.

Here's what happened here:

We landed the initial band-aid patch (forcing Adwaita) before the merge, in comment 23. That was sufficient to fix some scrollbar-related issues, as I recall (and that made it in before Nightly 127 went to beta, which is why e.g. bug 1893809 doesn't have any intermittent volume on beta at this point).
The bug was automatically resolved as FIXED when that band-aid patch merged to central, which was still when Nightly was at version 127; that's why this bug was marked as fixed-in-127. However, the bug was not fully fixed at that point, since we still hadn't mitigated the underlying issue where X was failing to start.
The patch for that more-fundamental part hit central on May 15, 2 days after the merge, in comment 28 - comment 29, when Nightly was version 128.
So beta 127 is still getting instances of this issue where X is failing to start (looking at a recent instance on beta, I confirmed that's what's happening -- module-x11 never appears in the log).

So : the patch in comment 28 - comment 29 is potentially worth uplifting from central (v128) to beta127, to fix the high-volume-orange on that channel for bug 1892222, bug 1894445, and bug 1812425. I imagine the uplift request should be trivial in terms of justification, since the change isn't user-facing.

Daniel Holbert [:dholbert]

Reporter

Comment 35

•

29 days ago

•

Edited

I'm adjusting the target milestone and release tracking flags to consider this not-fixed-until-128, i.e. not until the patch that landed in comment 28 - comment 29. With the benefit of hindsight, we probably should've marked this as "leave-open" when the first patch landed in comment 23, given that we knew it was a band-aid and there was a more fundamental problem that still needed fixing (or at least detecting & bailing), and not closed it as fixed until comment 29, which was the 128 timeframe

I imagine that will make the uplift request (that I hope we can do :) ) less confusing to release management folks; it's weird to request uplift to 127 on a bug that's already flagged as fixed in 127. :)

status-firefox127: fixed → affected

status-firefox128: --- → fixed

Summary: Ubuntu 18.04 test runners have test failures due to unexpected/unpredictable system theme (usually Adwaita, but sometimes Ambiance which leads to different results) → Ubuntu 18.04 test runners have test failures due to X failing to start and a resulting unexpected/unpredictable system theme (usually Adwaita, but sometimes Ambiance which leads to different results)

Target Milestone: 127 Branch → 128 Branch

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 36

•

29 days ago

Comment on attachment 9400371 [details]
Bug 1895092 - Ensure linux gtk theme is "Adwaita" for reftests. r=gbrown

Beta/Release Uplift Approval Request

User impact if declined: n/a
Is this code covered by automated tests?: Yes
Has the fix been verified in Nightly?: Yes
Needs manual test from QE?: No
If yes, steps to reproduce:
List of other uplifts needed: None
Risk to taking this patch: Low
Why is the change risky/not risky? (and alternatives if risky): this is test image only and should reduce the intermittent failures we see.
String changes made/needed:
Is Android affected?: No

Flags: needinfo?(jmaher)

Attachment #9400371 - Flags: approval-mozilla-beta?

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 37

•

29 days ago

Comment on attachment 9400984 [details]
Bug 1895092 - retry task and worker if we do not start with gnome-session and pulseaudio. r=gbrown

Beta/Release Uplift Approval Request

User impact if declined: n/a
Is this code covered by automated tests?: Yes
Has the fix been verified in Nightly?: Yes
Needs manual test from QE?: No
If yes, steps to reproduce:
List of other uplifts needed: None
Risk to taking this patch: Low
Why is the change risky/not risky? (and alternatives if risky): this is related to the test image only and will help reduce intermittent failures.
String changes made/needed:
Is Android affected?: Yes

Attachment #9400984 - Flags: approval-mozilla-beta?

Daniel Holbert [:dholbert]

Reporter

Comment 38

•

29 days ago

(In reply to Joel Maher ( :jmaher ) (UTC -8) from comment #36)

Comment on attachment 9400371 [details]
Bug 1895092 - Ensure linux gtk theme is "Adwaita" for reftests. r=gbrown

Beta/Release Uplift Approval Request

This first patch is already on beta127 per comment 34 first bullet point. Only the second one needs uplifting.

Pascal Chevrel:pascalc

Comment 39

•

28 days ago

Comment on attachment 9400371 [details]
Bug 1895092 - Ensure linux gtk theme is "Adwaita" for reftests. r=gbrown

This one is already on 127

Attachment #9400371 - Flags: approval-mozilla-beta? → approval-mozilla-beta-

Pascal Chevrel:pascalc

Comment 40

•

28 days ago

Comment on attachment 9400984 [details]
Bug 1895092 - retry task and worker if we do not start with gnome-session and pulseaudio. r=gbrown

Approved for 127 beta 8, thanks.

Attachment #9400984 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Pulsebot

Comment 41

•

28 days ago

uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/865a759929d2

Pascal Chevrel:pascalc

Updated

•

28 days ago

status-firefox127: affected → fixed

Daniel Holbert [:dholbert]

Reporter

Comment 42

•

2 days ago

Maybe worth an esr uplift? We're seeing these failures there now, I think.

Flags: needinfo?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 43

•

1 day ago

Comment on attachment 9400371 [details]
Bug 1895092 - Ensure linux gtk theme is "Adwaita" for reftests. r=gbrown

ESR Uplift Approval Request

If this is not a sec:{high,crit} bug, please state case for ESR consideration: test automation tweaks for consistency to remove noise.
User impact if declined: n/a
Fix Landed on Version: 128
Risk to taking this patch: Low
Why is the change risky/not risky? (and alternatives if risky): just test automation

Flags: needinfo?(jmaher)

Attachment #9400371 - Flags: approval-mozilla-esr115?

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 44

•

1 day ago

Comment on attachment 9400984 [details]
Bug 1895092 - retry task and worker if we do not start with gnome-session and pulseaudio. r=gbrown

ESR Uplift Approval Request

If this is not a sec:{high,crit} bug, please state case for ESR consideration: test automation tweaks for consistency to remove noise.
User impact if declined: n/a
Fix Landed on Version: 128
Risk to taking this patch: Low
Why is the change risky/not risky? (and alternatives if risky): test automation only.

Attachment #9400984 - Flags: approval-mozilla-esr115?

Bug 1895092 - Ensure linux gtk theme is "Adwaita" for reftests. r=gbrown 2 months ago Joel Maher ( :jmaher ) (UTC -8) 48 bytes, text/x-phabricator-request	pascalc : approval-mozilla-beta- jmaher : approval-mozilla-esr115?	Details \| Review
Bug 1895092 - retry task and worker if we do not start with gnome-session and pulseaudio. r=gbrown 2 months ago Joel Maher ( :jmaher ) (UTC -8) 48 bytes, text/x-phabricator-request	pascalc : approval-mozilla-beta+ jmaher : approval-mozilla-esr115?	Details \| Review