Intermittent TEST-UNEXPECTED-TIMEOUT | /webdriver/tests/classic/new_session/unhandled_prompt_behavior.py | single tracking bug
Categories
(Remote Protocol :: Marionette, defect, P5)
Tracking
(Not tracked)
People
(Reporter: intermittent-bug-filer, Unassigned)
References
Details
(Keywords: intermittent-failure)
Filed by: hskupin [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=478095037&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/C80X4ubHTzyEQH4HCzyq1A/runs/0/artifacts/public/logs/live_backing.log
| Comment hidden (Intermittent Failures Robot) |
Comment 2•1 year ago
|
||
This specific test timeout only happening on MacOS started around October 13th and then manifested on October 15th by producing around 5 failures per day for shippable builds. Checking the logs shows that we have a delay in starting Firefox which means that the final-ui-startup notification is sent out delayed with ~2s. In working cases the notification is received after at maximum 2.5s but now its between 4-6s for jobs timing out.
We didn't change this test nor any behavior in Remote Agent (Marionette). So I wonder if there was any change for MacOS specific recently which might have caused some slowness in startup. Stephen do you maybe have an idea? Thanks.
Comment 3•1 year ago
|
||
As it looks like the issue goes back even further and we have a huge difference in the test duration between different Wd4 jobs on MacOS for opt/shippable builds. These Treeherder jobs, which are next to each other show that.
While one has 129s the older one has 163s.
Overall the tests are running more on their limit these days which means some other factor came in as well. But that is hard to figure out.
Joel, any idea where this huge difference can originate from? Are the Mac workers shared as well between different jobs which may cause slowness of Firefox?
Comment 4•1 year ago
|
||
Locally I the tests take 80s on MacOS, and on wpt.fyi the tests on 60s only on Linux.
Comment 5•1 year ago
|
||
yes the mac workers are shared with all unittests and perftests, not builders and signers though. Nothing has changed with our task -> pools for a long time. We did force cleaning up these directories (September 24-26):
/opt/worker/downloads
/Users/cltbld/Library/Caches/Mozilla/updates/opt/worker/tasks/
if for some reason these tests were depending on data in those directories between tasks, then I could see that being affected.
I have noticed that failing tasks have:
[task 2024-10-22T12:54:06.429Z] 12:54:06 INFO - PID 3551 | 1729601646428 geckodriver::browser TRACE Failed to open /var/folders/p4/_gm58y794639164h1xnkpyyw000014/T/tmpqvlzhgva.mozrunner/MarionetteActivePort
[task 2024-10-22T12:54:06.429Z] 12:54:06 INFO - PID 3551 | 1729601646428 geckodriver::marionette TRACE Retrying in 100ms
[task 2024-10-22T12:54:06.556Z] 12:54:06 INFO - PID 3551 | 1729601646555 geckodriver::browser TRACE Failed to open /var/folders/p4/_gm58y794639164h1xnkpyyw000014/T/tmpqvlzhgva.mozrunner/MarionetteActivePort
[task 2024-10-22T12:54:06.556Z] 12:54:06 INFO - PID 3551 | 1729601646555 geckodriver::marionette TRACE Retrying in 100ms
[task 2024-10-22T12:54:06.791Z] 12:54:06 INFO - PID 3551 | 1729601646791 geckodriver::browser TRACE Failed to open /var/folders/p4/_gm58y794639164h1xnkpyyw000014/T/tmpqvlzhgva.mozrunner/MarionetteActivePort
[task 2024-10-22T12:54:06.792Z] 12:54:06 INFO - PID 3551 | 1729601646791 geckodriver::marionette TRACE Retrying in 100ms
[task 2024-10-22T12:54:07.025Z] 12:54:07 INFO - PID 3551 | 1729601647025 geckodriver::browser TRACE Failed to open /var/folders/p4/_gm58y794639164h1xnkpyyw000014/T/tmpqvlzhgva.mozrunner/MarionetteActivePort
[task 2024-10-22T12:54:07.026Z] 12:54:07 INFO - PID 3551 | 1729601647025 geckodriver::marionette TRACE Retrying in 100ms
[task 2024-10-22T12:54:07.256Z] 12:54:07 INFO - PID 3551 | 1729601647255 geckodriver::browser TRACE Failed to open /var/folders/p4/_gm58y794639164h1xnkpyyw000014/T/tmpqvlzhgva.mozrunner/MarionetteActivePort
[task 2024-10-22T12:54:07.256Z] 12:54:07 INFO - PID 3551 | 1729601647255 geckodriver::marionette TRACE Retrying in 100ms
[task 2024-10-22T12:54:07.495Z] 12:54:07 INFO - PID 3551 | 1729601647495 geckodriver::browser TRACE Failed to open /var/folders/p4/_gm58y794639164h1xnkpyyw000014/T/tmpqvlzhgva.mozrunner/MarionetteActivePort
[task 2024-10-22T12:54:07.496Z] 12:54:07 INFO - PID 3551 | 1729601647495 geckodriver::marionette TRACE Retrying in 100ms
[task 2024-10-22T12:54:07.731Z] 12:54:07 INFO - PID 3551 | 1729601647731 geckodriver::browser TRACE Failed to open /var/folders/p4/_gm58y794639164h1xnkpyyw000014/T/tmpqvlzhgva.mozrunner/MarionetteActivePort
[task 2024-10-22T12:54:07.732Z] 12:54:07 INFO - PID 3551 | 1729601647731 geckodriver::marionette TRACE Retrying in 100ms
it is probably likely that these messages exist and only are printed out when we have a failure- so we only see these messages on the failed (timedout) tasks.
for macosx, I went back on m-c to oct 10th, and osx/shippable was taking in the range of 130-165 seconds. I did a handful of retriggers and most of the tasks failed, in fact some passing ones finished faster; I think this is intermittent- but it could be machine dependent. I did notice that headless was a lot faster, even today.
retriggering off the tip of m-c, I see ranges of 146-182s on the retriggers (180+ for original jobs). Not as many retriggers timed out on the tip, but the overall runtimes seemed longer.
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
Comment 8•1 year ago
|
||
(In reply to Joel Maher ( :jmaher ) (UTC -8) from comment #5)
yes the mac workers are shared with all unittests and perftests, not builders and signers though. Nothing has changed with our task -> pools for a long time. We did force cleaning up these directories (September 24-26):
Ok, so it could happen that some other test job runs in parallel and claiming too much resources so that due to limited resources on our side it takes longer for Firefox to startup.
if for some reason these tests were depending on data in those directories between tasks, then I could see that being affected.
No, we do not have any dependency. It is just that Firefox takes longer as expected to startup.
I have noticed that failing tasks have:
[task 2024-10-22T12:54:06.429Z] 12:54:06 INFO - PID 3551 | 1729601646428 geckodriver::browser TRACE Failed to open /var/folders/p4/_gm58y794639164h1xnkpyyw000014/T/tmpqvlzhgva.mozrunner/MarionetteActivePort [task 2024-10-22T12:54:06.429Z] 12:54:06 INFO - PID 3551 | 1729601646428 geckodriver::marionette TRACE Retrying in 100ms [task 2024-10-22T12:54:06.556Z] 12:54:06 INFO - PID 3551 | 1729601646555 geckodriver::browser TRACE Failed to open /var/folders/p4/_gm58y794639164h1xnkpyyw000014/T/tmpqvlzhgva.mozrunner/MarionetteActivePort [task 2024-10-22T12:54:06.556Z] 12:54:06 INFO - PID 3551 | 1729601646555 geckodriver::marionette TRACE Retrying in 100ms [task 2024-10-22T12:54:06.791Z] 12:54:06 INFO - PID 3551 | 1729601646791 geckodriver::browser TRACE Failed to open /var/folders/p4/_gm58y794639164h1xnkpyyw000014/T/tmpqvlzhgva.mozrunner/MarionetteActivePort [task 2024-10-22T12:54:06.792Z] 12:54:06 INFO - PID 3551 | 1729601646791 geckodriver::marionette TRACE Retrying in 100ms [task 2024-10-22T12:54:07.025Z] 12:54:07 INFO - PID 3551 | 1729601647025 geckodriver::browser TRACE Failed to open /var/folders/p4/_gm58y794639164h1xnkpyyw000014/T/tmpqvlzhgva.mozrunner/MarionetteActivePort [task 2024-10-22T12:54:07.026Z] 12:54:07 INFO - PID 3551 | 1729601647025 geckodriver::marionette TRACE Retrying in 100ms [task 2024-10-22T12:54:07.256Z] 12:54:07 INFO - PID 3551 | 1729601647255 geckodriver::browser TRACE Failed to open /var/folders/p4/_gm58y794639164h1xnkpyyw000014/T/tmpqvlzhgva.mozrunner/MarionetteActivePort [task 2024-10-22T12:54:07.256Z] 12:54:07 INFO - PID 3551 | 1729601647255 geckodriver::marionette TRACE Retrying in 100ms [task 2024-10-22T12:54:07.495Z] 12:54:07 INFO - PID 3551 | 1729601647495 geckodriver::browser TRACE Failed to open /var/folders/p4/_gm58y794639164h1xnkpyyw000014/T/tmpqvlzhgva.mozrunner/MarionetteActivePort [task 2024-10-22T12:54:07.496Z] 12:54:07 INFO - PID 3551 | 1729601647495 geckodriver::marionette TRACE Retrying in 100ms [task 2024-10-22T12:54:07.731Z] 12:54:07 INFO - PID 3551 | 1729601647731 geckodriver::browser TRACE Failed to open /var/folders/p4/_gm58y794639164h1xnkpyyw000014/T/tmpqvlzhgva.mozrunner/MarionetteActivePort [task 2024-10-22T12:54:07.732Z] 12:54:07 INFO - PID 3551 | 1729601647731 geckodriver::marionette TRACE Retrying in 100msit is probably likely that these messages exist and only are printed out when we have a failure- so we only see these messages on the failed (timedout) tasks.
No, these messages are added in all the cases and not only for failing tasks. It just means that geckodriver tries to read the Marionette port of the started Firefox process but it isn't ready yet. But the amount of entries of these lines indeed demonstrates a startup delay.
for macosx, I went back on m-c to oct 10th, and osx/shippable was taking in the range of 130-165 seconds. I did a handful of retriggers and most of the tasks failed, in fact some passing ones finished faster; I think this is intermittent- but it could be machine dependent. I did notice that
headlesswas a lot faster, even today.
Maybe this means that anything to graphic related initialization could be involved and especially in a shared worker environment causes longer delays for startup?
Markus, do you have an idea?
Comment 9•1 year ago
|
||
Because other tests are affected as well, I moved the needinfo request over to bug 1927416.
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
Updated•1 year ago
|
| Reporter | ||
Comment 12•11 months ago
|
||
| treeherder | ||
New failure instance: https://treeherder.mozilla.org/logviewer?job_id=487363245&repo=try
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
Comment 16•10 months ago
|
||
https://wiki.mozilla.org/Bug_Triage#Intermittent_Test_Failure_Cleanup
For more information, please visit BugBot documentation.
| Reporter | ||
Comment 17•9 months ago
|
||
| treeherder | ||
New failure instance: https://treeherder.mozilla.org/logviewer?job_id=493406902&repo=autoland
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
Comment 26•7 months ago
|
||
All the recent failures here are actually from mozilla-beta because the test is marked to timeout, while on mozilla-central we have an expectation of OK, Timeout. Maybe we should just uplift the change from the latest wpt-sync? Sasha, could you please have a look? Thanks!
Comment 27•7 months ago
|
||
Lets actually move the last comment over to bug 1958736.
Updated•7 months ago
|
Comment 28•7 months ago
|
||
No more other failures. Lets close as incomplete for now.
| Reporter | ||
Comment 29•6 months ago
|
||
| treeherder | ||
New failure instance: https://treeherder.mozilla.org/logviewer?job_id=507283755&repo=mozilla-beta
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
Comment 33•5 months ago
|
||
https://wiki.mozilla.org/Bug_Triage#Intermittent_Test_Failure_Cleanup
For more information, please visit BugBot documentation.
| Comment hidden (Intermittent Failures Robot) |
| Reporter | ||
Comment 35•2 months ago
|
||
| treeherder | ||
New failure instance: https://treeherder.mozilla.org/logviewer?job_id=526825259&repo=mozilla-central
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
Comment 43•1 month ago
|
||
Bug 1996510 is causing a lot of failures. Lets add it to the dependency list. Maybe the upcoming patch on bug 1996347 might help.
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
Comment 46•14 days ago
|
||
This failure stopped to happen end of October on mozilla-central. Lets close the bug for now.
Description
•