Intermittent TEST-UNEXPECTED-TIMEOUT | /webdriver/tests/classic/new_session/unhandled_prompt_behavior.py | single tracking bug
Categories
(Remote Protocol :: Marionette, defect, P5)
Tracking
(Not tracked)
People
(Reporter: intermittent-bug-filer, Unassigned)
References
Details
(Keywords: intermittent-failure)
Filed by: hskupin [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=478095037&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/C80X4ubHTzyEQH4HCzyq1A/runs/0/artifacts/public/logs/live_backing.log
Comment hidden (Intermittent Failures Robot) |
Comment 2•4 months ago
|
||
This specific test timeout only happening on MacOS started around October 13th and then manifested on October 15th by producing around 5 failures per day for shippable builds. Checking the logs shows that we have a delay in starting Firefox which means that the final-ui-startup
notification is sent out delayed with ~2s. In working cases the notification is received after at maximum 2.5s but now its between 4-6s for jobs timing out.
We didn't change this test nor any behavior in Remote Agent (Marionette). So I wonder if there was any change for MacOS specific recently which might have caused some slowness in startup. Stephen do you maybe have an idea? Thanks.
Comment 3•4 months ago
|
||
As it looks like the issue goes back even further and we have a huge difference in the test duration between different Wd4 jobs on MacOS for opt/shippable builds. These Treeherder jobs, which are next to each other show that.
While one has 129s the older one has 163s.
Overall the tests are running more on their limit these days which means some other factor came in as well. But that is hard to figure out.
Joel, any idea where this huge difference can originate from? Are the Mac workers shared as well between different jobs which may cause slowness of Firefox?
Comment 4•4 months ago
|
||
Locally I the tests take 80s on MacOS, and on wpt.fyi the tests on 60s only on Linux.
Comment 5•4 months ago
|
||
yes the mac workers are shared with all unittests and perftests, not builders and signers though. Nothing has changed with our task -> pools for a long time. We did force cleaning up these directories (September 24-26):
/opt/worker/downloads
/Users/cltbld/Library/Caches/Mozilla/updates/opt/worker/tasks/
if for some reason these tests were depending on data in those directories between tasks, then I could see that being affected.
I have noticed that failing tasks have:
[task 2024-10-22T12:54:06.429Z] 12:54:06 INFO - PID 3551 | 1729601646428 geckodriver::browser TRACE Failed to open /var/folders/p4/_gm58y794639164h1xnkpyyw000014/T/tmpqvlzhgva.mozrunner/MarionetteActivePort
[task 2024-10-22T12:54:06.429Z] 12:54:06 INFO - PID 3551 | 1729601646428 geckodriver::marionette TRACE Retrying in 100ms
[task 2024-10-22T12:54:06.556Z] 12:54:06 INFO - PID 3551 | 1729601646555 geckodriver::browser TRACE Failed to open /var/folders/p4/_gm58y794639164h1xnkpyyw000014/T/tmpqvlzhgva.mozrunner/MarionetteActivePort
[task 2024-10-22T12:54:06.556Z] 12:54:06 INFO - PID 3551 | 1729601646555 geckodriver::marionette TRACE Retrying in 100ms
[task 2024-10-22T12:54:06.791Z] 12:54:06 INFO - PID 3551 | 1729601646791 geckodriver::browser TRACE Failed to open /var/folders/p4/_gm58y794639164h1xnkpyyw000014/T/tmpqvlzhgva.mozrunner/MarionetteActivePort
[task 2024-10-22T12:54:06.792Z] 12:54:06 INFO - PID 3551 | 1729601646791 geckodriver::marionette TRACE Retrying in 100ms
[task 2024-10-22T12:54:07.025Z] 12:54:07 INFO - PID 3551 | 1729601647025 geckodriver::browser TRACE Failed to open /var/folders/p4/_gm58y794639164h1xnkpyyw000014/T/tmpqvlzhgva.mozrunner/MarionetteActivePort
[task 2024-10-22T12:54:07.026Z] 12:54:07 INFO - PID 3551 | 1729601647025 geckodriver::marionette TRACE Retrying in 100ms
[task 2024-10-22T12:54:07.256Z] 12:54:07 INFO - PID 3551 | 1729601647255 geckodriver::browser TRACE Failed to open /var/folders/p4/_gm58y794639164h1xnkpyyw000014/T/tmpqvlzhgva.mozrunner/MarionetteActivePort
[task 2024-10-22T12:54:07.256Z] 12:54:07 INFO - PID 3551 | 1729601647255 geckodriver::marionette TRACE Retrying in 100ms
[task 2024-10-22T12:54:07.495Z] 12:54:07 INFO - PID 3551 | 1729601647495 geckodriver::browser TRACE Failed to open /var/folders/p4/_gm58y794639164h1xnkpyyw000014/T/tmpqvlzhgva.mozrunner/MarionetteActivePort
[task 2024-10-22T12:54:07.496Z] 12:54:07 INFO - PID 3551 | 1729601647495 geckodriver::marionette TRACE Retrying in 100ms
[task 2024-10-22T12:54:07.731Z] 12:54:07 INFO - PID 3551 | 1729601647731 geckodriver::browser TRACE Failed to open /var/folders/p4/_gm58y794639164h1xnkpyyw000014/T/tmpqvlzhgva.mozrunner/MarionetteActivePort
[task 2024-10-22T12:54:07.732Z] 12:54:07 INFO - PID 3551 | 1729601647731 geckodriver::marionette TRACE Retrying in 100ms
it is probably likely that these messages exist and only are printed out when we have a failure- so we only see these messages on the failed (timedout) tasks.
for macosx, I went back on m-c to oct 10th, and osx/shippable was taking in the range of 130-165 seconds. I did a handful of retriggers and most of the tasks failed, in fact some passing ones finished faster; I think this is intermittent- but it could be machine dependent. I did notice that headless
was a lot faster, even today.
retriggering off the tip of m-c, I see ranges of 146-182s on the retriggers (180+ for original jobs). Not as many retriggers timed out on the tip, but the overall runtimes seemed longer.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 8•4 months ago
|
||
(In reply to Joel Maher ( :jmaher ) (UTC -8) from comment #5)
yes the mac workers are shared with all unittests and perftests, not builders and signers though. Nothing has changed with our task -> pools for a long time. We did force cleaning up these directories (September 24-26):
Ok, so it could happen that some other test job runs in parallel and claiming too much resources so that due to limited resources on our side it takes longer for Firefox to startup.
if for some reason these tests were depending on data in those directories between tasks, then I could see that being affected.
No, we do not have any dependency. It is just that Firefox takes longer as expected to startup.
I have noticed that failing tasks have:
[task 2024-10-22T12:54:06.429Z] 12:54:06 INFO - PID 3551 | 1729601646428 geckodriver::browser TRACE Failed to open /var/folders/p4/_gm58y794639164h1xnkpyyw000014/T/tmpqvlzhgva.mozrunner/MarionetteActivePort [task 2024-10-22T12:54:06.429Z] 12:54:06 INFO - PID 3551 | 1729601646428 geckodriver::marionette TRACE Retrying in 100ms [task 2024-10-22T12:54:06.556Z] 12:54:06 INFO - PID 3551 | 1729601646555 geckodriver::browser TRACE Failed to open /var/folders/p4/_gm58y794639164h1xnkpyyw000014/T/tmpqvlzhgva.mozrunner/MarionetteActivePort [task 2024-10-22T12:54:06.556Z] 12:54:06 INFO - PID 3551 | 1729601646555 geckodriver::marionette TRACE Retrying in 100ms [task 2024-10-22T12:54:06.791Z] 12:54:06 INFO - PID 3551 | 1729601646791 geckodriver::browser TRACE Failed to open /var/folders/p4/_gm58y794639164h1xnkpyyw000014/T/tmpqvlzhgva.mozrunner/MarionetteActivePort [task 2024-10-22T12:54:06.792Z] 12:54:06 INFO - PID 3551 | 1729601646791 geckodriver::marionette TRACE Retrying in 100ms [task 2024-10-22T12:54:07.025Z] 12:54:07 INFO - PID 3551 | 1729601647025 geckodriver::browser TRACE Failed to open /var/folders/p4/_gm58y794639164h1xnkpyyw000014/T/tmpqvlzhgva.mozrunner/MarionetteActivePort [task 2024-10-22T12:54:07.026Z] 12:54:07 INFO - PID 3551 | 1729601647025 geckodriver::marionette TRACE Retrying in 100ms [task 2024-10-22T12:54:07.256Z] 12:54:07 INFO - PID 3551 | 1729601647255 geckodriver::browser TRACE Failed to open /var/folders/p4/_gm58y794639164h1xnkpyyw000014/T/tmpqvlzhgva.mozrunner/MarionetteActivePort [task 2024-10-22T12:54:07.256Z] 12:54:07 INFO - PID 3551 | 1729601647255 geckodriver::marionette TRACE Retrying in 100ms [task 2024-10-22T12:54:07.495Z] 12:54:07 INFO - PID 3551 | 1729601647495 geckodriver::browser TRACE Failed to open /var/folders/p4/_gm58y794639164h1xnkpyyw000014/T/tmpqvlzhgva.mozrunner/MarionetteActivePort [task 2024-10-22T12:54:07.496Z] 12:54:07 INFO - PID 3551 | 1729601647495 geckodriver::marionette TRACE Retrying in 100ms [task 2024-10-22T12:54:07.731Z] 12:54:07 INFO - PID 3551 | 1729601647731 geckodriver::browser TRACE Failed to open /var/folders/p4/_gm58y794639164h1xnkpyyw000014/T/tmpqvlzhgva.mozrunner/MarionetteActivePort [task 2024-10-22T12:54:07.732Z] 12:54:07 INFO - PID 3551 | 1729601647731 geckodriver::marionette TRACE Retrying in 100ms
it is probably likely that these messages exist and only are printed out when we have a failure- so we only see these messages on the failed (timedout) tasks.
No, these messages are added in all the cases and not only for failing tasks. It just means that geckodriver tries to read the Marionette port of the started Firefox process but it isn't ready yet. But the amount of entries of these lines indeed demonstrates a startup delay.
for macosx, I went back on m-c to oct 10th, and osx/shippable was taking in the range of 130-165 seconds. I did a handful of retriggers and most of the tasks failed, in fact some passing ones finished faster; I think this is intermittent- but it could be machine dependent. I did notice that
headless
was a lot faster, even today.
Maybe this means that anything to graphic related initialization could be involved and especially in a shared worker environment causes longer delays for startup?
Markus, do you have an idea?
Comment 9•4 months ago
|
||
Because other tests are affected as well, I moved the needinfo request over to bug 1927416.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•2 months ago
|
Reporter | ||
Comment 12•2 months ago
|
||
treeherder |
New failure instance: https://treeherder.mozilla.org/logviewer?job_id=487363245&repo=try
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 16•15 days ago
|
||
https://wiki.mozilla.org/Bug_Triage#Intermittent_Test_Failure_Cleanup
For more information, please visit BugBot documentation.
Reporter | ||
Comment 17•13 days ago
|
||
treeherder |
New failure instance: https://treeherder.mozilla.org/logviewer?job_id=493406902&repo=autoland
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Description
•