Intermittent /mozilla/tests/webdriver/new_session/profile_root.py | single tracking bug
Categories
(Remote Protocol :: Marionette, defect, P5)
Tracking
(Not tracked)
People
(Reporter: jmaher, Unassigned)
References
(Regression)
Details
(Keywords: intermittent-failure, intermittent-testcase, regression)
Attachments
(2 obsolete files)
Reporter | ||
Comment 1•2 years ago
|
||
Additional information about this bug failures and frequency patterns can be found by running: ./mach test-info failure-report --bug 1785893
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 6•2 years ago
|
||
I'm still not sure how to handle single tracking bugs in case of multiple underlying issues. :( So nearly all the failures here are actually related to bug 1770042 which got duped to this one. And it's a regression as caused by bug 1766125. But should we mark the bug as being a regression? I don't know.
Here the exact failure message:
[task 2022-05-18T20:22:37.772Z] 20:22:37 INFO - STDOUT: > geckodriver(config=config, extra_args=extra_args)
[task 2022-05-18T20:22:37.773Z] 20:22:37 INFO - STDOUT: E Failed: DID NOT RAISE <class 'Exception'>
Updated•2 years ago
|
Comment 7•2 years ago
|
||
:jgraham, since you are the author of the regressor, bug 1766125, could you take a look?
For more information, please visit auto_nag documentation.
Comment 8•2 years ago
|
||
I spent some time trying to reproduce this before, failed, and judged that the frequency was too low to be worth spending a lot of additional time trying to track down what is presumably some kind of race condition.
Comment 9•2 years ago
|
||
Yeah not sure why P3 was set for this intermittent failure. Single tracking bugs should also get a P5 assigned by default.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 14•2 years ago
|
||
James, I saw this interesting failure:
https://treeherder.mozilla.org/logviewer?job_id=392030052&repo=try&lineNumber=92541
[task 2022-10-02T13:08:34.283Z] 13:08:34 INFO - STDOUT: Running command: ['/builds/worker/fetches/geckodriver', '--port', '45898', '-vv', '--profile-root', '/tmp/wdspec-ah4aw5os/pytest/test_profile_root_missing0/missing-path']
[task 2022-10-02T13:08:34.287Z] 13:08:34 INFO - STDOUT: /builds/worker/fetches/geckodriver: error: Unable to write to temporary directory; consider --profile-root with a writeable directory
So it seems that we have a previously created temporary folder here for this affected test? In the case of a non-writable folder we only print the usage but do not raise an error in geckodriver. We probably should do so?
Comment 15•2 years ago
|
||
The usage!
macro causes that function to return Err, which should cause geckodriver to terminate. On the test side, that should raise an exception. This happens before the server is started, so there shouldn't be a race condition there. tmp_path
is supposed to be unique per test, and if there was a problem with that we wouldn't see this message (which is what we expect). So my best guess is the wait-for-connection code is sometimes returning when it shouldn't. Maybe there's a race between shutting down a previous geckodriver instance and the current one? We could try passing in a random server port (or making the server port system-allocated and allowing clients to read it, which would be better but not trivial to ship in a backwards compatible way).
Comment 16•2 years ago
|
||
Ah I see. I missed that the macro raises an exception. So wdspec tests already use a different port for geckodriver so I don't think this is a race with another instance. Also the former test ends successfully.
To hardening this specific test I would suggest that we simply add another assert
before calling geckodriver to make sure that the temporary profile path doesn't exist. Maybe that will help us to get closer to the problem. I'll push a try build with this change and maybe we can reproduce it.
https://treeherder.mozilla.org/jobs?repo=try&revision=75753ba4ad9ab23894ea5373fbc67e175bda3fce
Comment hidden (Intermittent Failures Robot) |
Comment 18•2 years ago
|
||
Updated•2 years ago
|
Updated•2 years ago
|
Comment 19•2 years ago
|
||
And we got a failure and that shows that the additional assertion didn't fail so the profile path indeed doesn't exist:
https://treeherder.mozilla.org/logviewer?job_id=392610363&repo=try&lineNumber=152356
So it means due to some reason the wptrunner doesn't detect the thrown exception from geckodriver.
Updated•2 years ago
|
Comment 20•2 years ago
•
|
||
I pushed another try build with the debug tbpl logging enabled in CI:
https://treeherder.mozilla.org/jobs?repo=try&revision=6bc4417662cf4a592c6255c476d9ef9c55be89be
Comment hidden (obsolete) |
Updated•2 years ago
|
Comment hidden (obsolete) |
Updated•2 years ago
|
Comment 23•2 years ago
|
||
(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #20)
I pushed another try build with the debug tbpl logging enabled in CI:
https://treeherder.mozilla.org/jobs?repo=try&revision=6bc4417662cf4a592c6255c476d9ef9c55be89be
Hm, none of the Wd jobs are failing but I also cannot see any indication of debug level logs from wptrunner in the logs. Using your suggested mach logger level doesn't work given that in CI we do not run via mach. But also using --log-raw-level=debug
doesn't help.
James, do you have any further ideas? Maybe something is wrong with the --log-raw-level
argument?
Comment 24•2 years ago
|
||
We always log raw logs at debug level. If yo look at, say, https://firefoxci.taskcluster-artifacts.net/WSz0nqBrSzmaVKyGhPMPXA/1/public/test_info/wpt_raw.log you can find the debug level logs. To get the human readable output to include debug level logs you need to pass in --log-tbpl-level=debug
, because we're using the TBPL log formatter in CI for compatibility with regexp based log parsing.
Comment 25•2 years ago
|
||
But --log-tbpl-level=debug
didn't work because of ValueError: Options specified for unused formatter(s) (['tbpl']) have no effect
:
https://treeherder.mozilla.org/logviewer?job_id=392817348&repo=try&lineNumber=1673
So for me it means that we do not use the TBPL formatter or I'm missing something else.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•2 years ago
|
Comment 28•2 years ago
|
||
Ah, so we actually just output the logs to stdout and then mozharness has its own byzatine processing of them.
https://searchfox.org/mozilla-central/source/testing/mozharness/mozharness/base/config.py#373-381 suggests there's a log_level
config option that needs to be set to the numeric value of logging.DEBUG
which seems to be 10
. So probably you can do that in https://searchfox.org/mozilla-central/source/testing/mozharness/configs/web_platform_tests/prod_config.py
Or, alternatively, as I mentioned before all the logging ends up in the wpt_raw.log
artifact, so you can also just look in that one (and even reformat it using the tbpl or mach formatters), without even rerunning the jobs.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 33•2 years ago
|
||
(In reply to James Graham [:jgraham] from comment #28)
Or, alternatively, as I mentioned before all the logging ends up in the
wpt_raw.log
artifact, so you can also just look in that one (and even reformat it using the tbpl or mach formatters), without even rerunning the jobs.
As James pointed out on Element the following mozlog script can be used to do the reformatting:
https://searchfox.org/mozilla-central/source/testing/mozbase/mozlog/mozlog/scripts/format.py
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 43•2 years ago
|
||
Looks like I forgot to add the last related try build as comment and now I cannot find it anymore. I pushed a new try build with the --log-raw-level=debug
entry for the mozharness script.
https://treeherder.mozilla.org/jobs?repo=try&revision=83d30ce9dd6245db4c6fb6ad6035f58358b06dff
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 48•2 years ago
|
||
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 52•2 years ago
|
||
(In reply to Henrik Skupin [:whimboo][⌚️UTC+1] from comment #43)
Looks like I forgot to add the last related try build as comment and now I cannot find it anymore. I pushed a new try build with the
--log-raw-level=debug
entry for the mozharness script.https://treeherder.mozilla.org/jobs?repo=try&revision=83d30ce9dd6245db4c6fb6ad6035f58358b06dff
All of the triggered wdspec jobs didn't fail for this issue. As such I triggered a new try build now:
https://treeherder.mozilla.org/jobs?repo=try&revision=4b6ba9c2e551a4e5eb0f862848874f69a661d471
Comment 53•2 years ago
|
||
(In reply to Henrik Skupin [:whimboo][⌚️UTC+1] from comment #52)
All of the triggered wdspec jobs didn't fail for this issue. As such I triggered a new try build now:
https://treeherder.mozilla.org/jobs?repo=try&revision=4b6ba9c2e551a4e5eb0f862848874f69a661d471
I re-triggered around 110 wdspec tests on various platforms and build types but was not able to get this failure reproduced even once.
Maybe we have to land some extra log output in case of such a failure which includes the user permissions as set for the folder. Maybe this might give a hint why we cannot use it.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 61•2 years ago
|
||
The last failure here is from April 20th. So as it looks like the patch on bug 1824841 fixed this intermittent test failure. I'll check back later this week if that is still true and then mark the bug as WFM.
Comment 62•2 years ago
|
||
I can no longer see a problem with the test test_profile_root_missing
so this high frequent failure is gone now.
But there is still a low frequent one remaining around for test_profile_root
which runs before. Here we fail because the address is still in use:
https://treeherder.mozilla.org/logviewer?job_id=413563326&repo=autoland&lineNumber=99193-99197
[task 2023-04-24T13:32:42.413Z] 13:32:42 INFO - STDOUT: Running command: ['/builds/worker/fetches/geckodriver', '--port', '45314', '-vv', '--profile-root', '/tmp/wdspec-5mg9t7yw/pytest/test_profile_root0/geckodriver-test']
[task 2023-04-24T13:32:42.414Z] 13:32:42 INFO - Test harness output was not a valid structured log message
[task 2023-04-24T13:32:42.414Z] 13:32:42 INFO - /builds/worker/fetches/geckodriver: error: Address in use (os error 98)
[task 2023-04-24T13:32:42.415Z] 13:32:42 INFO - geckodriver 0.33.0 (94f7e9ac107e 2023-04-20 21:50 +0000)
[task 2023-04-24T13:32:42.415Z] 13:32:42 INFO - WebDriver implementation for Firefox
This is strange given that we try to find a free port before, but given that we do not lock it some other process might grab it as well. There is probably nothing to worry right now given that it happens very rarely.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 84•1 year ago
|
||
The current still left-over failure should indeed be related to bug 1571426.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 89•1 year ago
|
||
(In reply to Henrik Skupin [:whimboo][⌚️UTC+2] from comment #84)
The current still left-over failure should indeed be related to bug 1571426.
That one has been fixed now and since its landing there were no more failures.
Closing as WFM for now. Lets reopen if failures start to happen again.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 93•7 months ago
|
||
This is happening again on esr115: https://treeherder.mozilla.org/logviewer?job_id=458126541&repo=mozilla-esr115&lineNumber=214009
Comment 94•7 months ago
|
||
The fix for bug 1853120 landed for Firefox 120 and we never uplifted. It might help with the esr115 branch but given that there was only a single failure yet I don't think we should consider an uplift.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 97•5 months ago
|
||
https://wiki.mozilla.org/Bug_Triage#Intermittent_Test_Failure_Cleanup
For more information, please visit BugBot documentation.
Comment 98•3 months ago
|
||
treeherder |
New failure instance: https://treeherder.mozilla.org/logviewer?job_id=474132409&repo=mozilla-esr115
Comment 99•3 months ago
|
||
(In reply to Treeherder Bug Filer from comment #98)
New failure instance: https://treeherder.mozilla.org/logviewer?job_id=474132409&repo=mozilla-esr115
This failure is most likely based on the following line in the logs:
https://treeherder.mozilla.org/logviewer?job_id=474132409&repo=mozilla-esr115&lineNumber=51783
/builds/worker/fetches/geckodriver: error: Unable to write to temporary directory; consider --profile-root with a writeable directory
It means that geckodriver fails to create the Firefox profile because the tmp folder is not writeable? Lets see if that happens again or if it just was a random issue.
Comment hidden (Intermittent Failures Robot) |
Comment 101•2 months ago
|
||
https://wiki.mozilla.org/Bug_Triage#Intermittent_Test_Failure_Cleanup
For more information, please visit BugBot documentation.
Description
•