Open Bug 1818909 Opened 2 years ago Updated 14 days ago

Intermittent Automation Error: Received unexpected exception while running application | socket.timeout: Timed out waiting for connection on 127.0.0.1:2828 | Connection timed out after 360s

Categories

(Testing :: General, defect, P5)

defect

Tracking

(Not tracked)

People

(Reporter: intermittent-bug-filer, Unassigned)

References

(Depends on 2 open bugs)

Details

(Keywords: intermittent-failure)

Filed by: sstanca [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=406945874&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/EyxJz9rfR3qz_2xqR1h7rw/runs/0/artifacts/public/logs/live_backing.log


[task 2023-02-25T20:36:47.563Z] 20:36:47     INFO - TEST-INFO | started process GECKO(2278)
[task 2023-02-25T20:36:47.605Z] 20:36:47     INFO - GECKO(2278) | ExceptionHandler::GenerateDump cloned child 2286
[task 2023-02-25T20:36:47.607Z] 20:36:47     INFO - GECKO(2278) | ExceptionHandler::SendContinueSignalToChild sent continue signal to child
[task 2023-02-25T20:36:47.619Z] 20:36:47     INFO - GECKO(2278) | ExceptionHandler::WaitForContinueSignal waiting for continue signal...
[task 2023-02-25T20:39:47.594Z] 20:39:47     INFO - runtests.py | Waiting for browser...
[task 2023-02-25T20:39:47.595Z] 20:39:47     INFO - TEST-INFO | Main app process: killed by SIGSEGV
[task 2023-02-25T20:39:47.596Z] 20:39:47     INFO - Buffered messages finished
[task 2023-02-25T20:39:47.596Z] 20:39:47    ERROR - TEST-UNEXPECTED-FAIL | automation.py | application terminated with exit code -11
[task 2023-02-25T20:39:47.600Z] 20:39:47     INFO - runtests.py | Application ran for: 0:03:00.041212
[task 2023-02-25T20:39:47.600Z] 20:39:47     INFO - zombiecheck | Reading PID log: /tmp/tmpm9e5h2o8pidlog
[task 2023-02-25T20:39:47.601Z] 20:39:47     INFO -  Traceback (most recent call last):
[task 2023-02-25T20:39:47.601Z] 20:39:47     INFO -    File "/builds/worker/workspace/build/tests/mochitest/runtests.py", line 3589, in doTests
[task 2023-02-25T20:39:47.602Z] 20:39:47     INFO -      crashAsPass=options.crashAsPass,
[task 2023-02-25T20:39:47.602Z] 20:39:47     INFO -    File "/builds/worker/workspace/build/tests/mochitest/runtests.py", line 2891, in runApp
[task 2023-02-25T20:39:47.603Z] 20:39:47     INFO -      six.reraise(exc, value, tb)
[task 2023-02-25T20:39:47.604Z] 20:39:47     INFO -    File "/builds/worker/workspace/build/venv/lib/python3.6/site-packages/six.py", line 696, in reraise
[task 2023-02-25T20:39:47.605Z] 20:39:47     INFO -      raise value
[task 2023-02-25T20:39:47.606Z] 20:39:47     INFO -    File "/builds/worker/workspace/build/tests/mochitest/runtests.py", line 2751, in runApp
[task 2023-02-25T20:39:47.606Z] 20:39:47     INFO -      self.marionette.start_session()
[task 2023-02-25T20:39:47.607Z] 20:39:47     INFO -    File "/builds/worker/workspace/build/venv/lib/python3.6/site-packages/marionette_driver/decorators.py", line 34, in _
[task 2023-02-25T20:39:47.607Z] 20:39:47     INFO -      m._handle_socket_failure()
[task 2023-02-25T20:39:47.608Z] 20:39:47     INFO -    File "/builds/worker/workspace/build/venv/lib/python3.6/site-packages/marionette_driver/marionette.py", line 707, in _handle_socket_failure
[task 2023-02-25T20:39:47.608Z] 20:39:47     INFO -      reraise(exc_cls, exc, tb)
[task 2023-02-25T20:39:47.609Z] 20:39:47     INFO -    File "/builds/worker/workspace/build/venv/lib/python3.6/site-packages/six.py", line 696, in reraise
[task 2023-02-25T20:39:47.609Z] 20:39:47     INFO -      raise value
[task 2023-02-25T20:39:47.610Z] 20:39:47     INFO -    File "/builds/worker/workspace/build/venv/lib/python3.6/site-packages/marionette_driver/decorators.py", line 24, in _
[task 2023-02-25T20:39:47.610Z] 20:39:47     INFO -      return func(*args, **kwargs)
[task 2023-02-25T20:39:47.611Z] 20:39:47     INFO -    File "/builds/worker/workspace/build/venv/lib/python3.6/site-packages/marionette_driver/marionette.py", line 1252, in start_session
[task 2023-02-25T20:39:47.611Z] 20:39:47     INFO -      self.raise_for_port(timeout=timeout)
[task 2023-02-25T20:39:47.612Z] 20:39:47     INFO -    File "/builds/worker/workspace/build/venv/lib/python3.6/site-packages/marionette_driver/marionette.py", line 633, in raise_for_port
[task 2023-02-25T20:39:47.612Z] 20:39:47     INFO -      self.host, self.port
[task 2023-02-25T20:39:47.613Z] 20:39:47     INFO -  socket.timeout: Timed out waiting for connection on 127.0.0.1:2828!
[task 2023-02-25T20:39:47.613Z] 20:39:47    ERROR - Automation Error: Received unexpected exception while running application
[task 2023-02-25T20:39:47.614Z] 20:39:47    ERROR - 
[task 2023-02-25T20:39:47.614Z] 20:39:47     INFO - Stopping web server
[task 2023-02-25T20:39:47.617Z] 20:39:47     INFO - Server shut down.
[task 2023-02-25T20:39:47.637Z] 20:39:47     INFO - Web server killed.
[task 2023-02-25T20:39:47.638Z] 20:39:47     INFO - Stopping web socket server
[task 2023-02-25T20:39:47.658Z] 20:39:47     INFO - Stopping ssltunnel
[task 2023-02-25T20:39:47.679Z] 20:39:47     INFO - Stopping gst for v4l2loopback
[task 2023-02-25T20:39:47.680Z] 20:39:47  WARNING - leakcheck | refcount logging is off, so leaks can't be detected!
[task 2023-02-25T20:39:47.681Z] 20:39:47     INFO - runtests.py | Running tests: end.
[task 2023-02-25T20:39:47.709Z] 20:39:47     INFO - Buffered messages finished
[task 2023-02-25T20:39:47.710Z] 20:39:47     INFO - Running manifest: browser/base/content/test/popupNotifications/browser.ini
[task 2023-02-25T20:39:47.739Z] 20:39:47     INFO -  Setting pipeline to PAUSED ...
[task 2023-02-25T20:39:47.741Z] 20:39:47     INFO -  Pipeline is PREROLLING ...
[task 2023-02-25T20:39:47.743Z] 20:39:47     INFO -  Pipeline is PREROLLED ...
[task 2023-02-25T20:39:47.744Z] 20:39:47     INFO -  Setting pipeline to PLAYING ...
[task 2023-02-25T20:39:47.745Z] 20:39:47     INFO -  New clock: GstSystemClock
[task 2023-02-25T20:39:47.777Z] 20:39:47     INFO -  Got EOS from element "pipeline0".
[task 2023-02-25T20:39:47.779Z] 20:39:47     INFO -  Execution ended after 0:00:00.033321247
[task 2023-02-25T20:39:47.781Z] 20:39:47     INFO -  Setting pipeline to PAUSED ...
[task 2023-02-25T20:39:47.781Z] 20:39:47     INFO -  Setting pipeline to READY ...
[task 2023-02-25T20:39:47.782Z] 20:39:47     INFO -  (gst-launch-1.0:2298): GStreamer-CRITICAL **: 20:39:47.776: gst_object_unref: assertion '((GObject *) object)->ref_count > 0' failed
[task 2023-02-25T20:39:47.783Z] 20:39:47     INFO -  Setting pipeline to NULL ...
[task 2023-02-25T20:39:47.783Z] 20:39:47     INFO -  Freeing pipeline ...
[task 2023-02-25T20:39:47.873Z] 20:39:47     INFO - PID 2322 | pk12util: PKCS12 IMPORT SUCCESSFUL
[task 2023-02-25T20:39:47.914Z] 20:39:47     INFO - MochitestServer : launching ['/builds/worker/workspace/build/tests/bin/xpcshell', '-g', '/builds/worker/workspace/build/application/firefox', '-f', '/builds/worker/workspace/build/tests/bin/components/httpd.js', '-e', "const _PROFILE_PATH = '/tmp/tmpz_ahhiuw.mozrunner'; const _SERVER_PORT = '8888'; const _SERVER_ADDR = '127.0.0.1'; const _TEST_PREFIX = undefined; const _DISPLAY_RESULTS = false;", '-f', '/builds/worker/workspace/build/tests/mochitest/server.js']
[task 2023-02-25T20:39:47.915Z] 20:39:47     INFO - runtests.py | Server pid: 2328
[task 2023-02-25T20:39:47.934Z] 20:39:47     INFO - runtests.py | Websocket server pid: 2331
[task 2023-02-25T20:39:47.957Z] 20:39:47     INFO - runtests.py | SSL tunnel pid: 2335
[task 2023-02-25T20:39:48.010Z] 20:39:48     INFO - runtests.py | Running with scheme: http
[task 2023-02-25T20:39:48.011Z] 20:39:48     INFO - runtests.py | Running with e10s: True
[task 2023-02-25T20:39:48.012Z] 20:39:48     INFO - runtests.py | Running with fission: True
[task 2023-02-25T20:39:48.013Z] 20:39:48     INFO - runtests.py | Running with cross-origin iframes: False
[task 2023-02-25T20:39:48.014Z] 20:39:48     INFO - runtests.py | Running with serviceworker_e10s: True
[task 2023-02-25T20:39:48.015Z] 20:39:48     INFO - runtests.py | Running with socketprocess_e10s: False
[task 2023-02-25T20:39:48.016Z] 20:39:48     INFO - runtests.py | Running tests: start.
[task 2023-02-25T20:39:48.016Z] 20:39:48     INFO - 
[task 2023-02-25T20:39:48.033Z] 20:39:48  WARNING - Found 'firefox' running before starting test browser!
[task 2023-02-25T20:39:48.034Z] 20:39:48  WARNING - {'cmdline': [], 'name': 'firefox-bin', 'pid': 2285, 'ppid': 1, 'username': 'worker'}
[task 2023-02-25T20:39:48.035Z] 20:39:48  WARNING - 
[task 2023-02-25T20:39:48.036Z] 20:39:48     INFO - Application command: /builds/worker/workspace/build/application/firefox/firefox -marionette -foreground -profile /tmp/tmpz_ahhiuw.mozrunner
[task 2023-02-25T20:39:48.051Z] 20:39:48     INFO - runtests.py | Application pid: 2356
[task 2023-02-25T20:39:48.052Z] 20:39:48     INFO - TEST-INFO | started process GECKO(2356)

The underlying problem here is Waiting for browser.... As such it's most likely a dupe of bug 1414495.

Summary: Intermittent Automation Error: Received unexpected exception while running application | socket.timeout: Timed out waiting for connection on 127.0.0.1:2828! → Intermittent Automation Error: Received unexpected exception while running application | socket.timeout: Timed out waiting for connection on 127.0.0.1:2828 | Connection timed out after 360s

This seems to be quite frequent starting with 2nd of August and the increased failure rate seems to be only on OSX.
Some issue with 127.0.0.1:8888 seems to be happening on Bug 1846923, which is only on OSX as well. quote from that bug: The problem here is that why 127.0.0.1:8888 is already in use.
Maybe the spike in failure rate and Bug 1846923 have the same cause?

Btw, this is the backfill range and retriggers for bug 1846923, which point to this wptsync as the culprit.

Summary: Intermittent Automation Error: Received unexpected exception while running application | socket.timeout: Timed out waiting for connection on 127.0.0.1:2828 | Connection timed out after 360s → Frequent Automation Error: Received unexpected exception while running application | socket.timeout: Timed out waiting for connection on 127.0.0.1:2828 | Connection timed out after 360s

James, could you please have a look at this? The latest wptsync seems to have caused a new top intermittent failure. Thanks.

Flags: needinfo?(james)

Just an update here: I've investigated a bit more on the bug mentioned above, and it seems that it was not caused by the recent wptsync. The initial conclusion was based on 31 retriggers of that job with not a single failure. Here are another round of backfills and retriggers that show Bug 1846923 failed before the wptsync.

This might just be something off tree, but we need confirmation from :masterwayz <-> check sheriffs channel from element for an update on this.

Thank you! and sorry for the initial confusion.

So when checking all the failures over the last 30 days it is visible that there is a heavily increase of this failure on August 2nd. What's interesting is that this only affects Mochitest related tests but not Marionette unit tests as well.

Flags: needinfo?(james)

So yesterday I updated some software on these machines, the timeline matches, so that could also be it.
Looking at the machines affected, those are the machines I updated.

(In reply to Michelle Goossens [:masterwayz] from comment #24)

So yesterday I updated some software on these machines, the timeline matches, so that could also be it.
Looking at the machines affected, those are the machines I updated.

Do you have some details what updates have been applied? Maybe that would help to exclude possible candidates.

Note that whenever this happens on MacOS the following line can be seen for the very first test that starts Firefox:
https://treeherder.mozilla.org/logviewer?job_id=424909675&repo=mozilla-central&lineNumber=1925-1927

[task 2023-08-04T05:02:25.169Z] 05:02:25  WARNING - Found 'firefox' running before starting test browser!
[task 2023-08-04T05:02:25.170Z] 05:02:25  WARNING - {'ppid': 2006, 'username': 'cltbld', 'pid': 2007, 'cmdline': ['/opt/worker/tasks/task_169110230590190/build/application/Firefox Nightly.app/Contents/MacOS/firefox', '--marionette', '--allow-downgrade', '-profile', '/var/folders/jw/dwqnl9cd18z1sxd9j8t0klz8000014/T/tmpm8l7ewr7/profile', '-foreground', '-no-remote'], 'name': 'firefox'}
[task 2023-08-04T05:02:25.170Z] 05:02:25  WARNING - 

Is there a way to see which job(s) were running before on that machine? I would like to know why the process is still running and hasn't been killed in case there was a shutdown hang or another issue.

OS: Unspecified → macOS
Hardware: Unspecified → All

To not be affected by that port issue Mochitest could use a random Marionette port but that is currently blocked on bug 1750630.

Lets see if bug 1845518 might help here given that this caused the top #1 intermittent the last week and ended-up in a shutdown hang. if that is not the case I would have to take a look into bug 1750630.

Nevertheless we should make sure to always have no Firefox processes left after the test job finished.

Depends on: 1750630

Just in terms of that wptsync, I don't see any tooling changes that I'd expect to affect our CI jobs. So of course it's not impossible, but other theories seem more likely.

whimboo: the macOS hosts are supposed to reboot once a job finishes, effectively killing any processes. If they are supposed to do a shutdown on their own, perhaps there's a chance that logic isn't working and it was never caught as the hosts reboot after each job regardless.

Assignee: nobody → mgoossens
Status: NEW → ASSIGNED

Michelle, can you link to the code changes or a bug which describes the exact change? Thanks

Currently rolling out a potential fix, fingers crossed.
Though keep in mind that provisioning 87 machines in one go (machines 96 and below should only be affected) takes a while.

Whiteboard: [stockwell disable-recommended]
Depends on: 1855669
Assignee: michelle → nobody
Status: ASSIGNED → NEW

All the recent test failures are actually happening because of bug 1819763.

Depends on: 1819763

Please note that all the classified failures for this bug seem to be related to bug 1819763. It's visible when scrolling some lines up to see why the browser actually died:

https://treeherder.mozilla.org/logviewer?job_id=444903567&repo=autoland&lineNumber=3803

Sheriffs maybe you could re-classify the failures and make a note what to check in such a case? Thanks.

Flags: needinfo?(sheriffs)

Also I think a better place for this general bug spanning different test suites is Testing :: General.

Component: Workers → General
Product: Taskcluster → Testing

(In reply to Henrik Skupin [:whimboo][⌚️UTC+1] from comment #60)

Please note that all the classified failures for this bug seem to be related to bug 1819763. It's visible when scrolling some lines up to see why the browser actually died:

https://treeherder.mozilla.org/logviewer?job_id=444903567&repo=autoland&lineNumber=3803

Sheriffs maybe you could re-classify the failures and make a note what to check in such a case? Thanks.

Will re-classify the failures with that bug and will let the other sheriffs know.

Flags: needinfo?(sheriffs)

All the failures here this year besides one were Bug 1819763. That one was this TV failure.

[task 2024-01-17T10:53:54.041Z] 10:53:54     INFO - TEST-START | browser/extensions/formautofill/test/browser/address/browser_address_doorhanger_non_mergeable_fields.js
[task 2024-01-17T10:53:56.032Z] 10:53:56     INFO - GECKO(5382) | MEMORY STAT vsizeMaxContiguous not supported in this build configuration.
[task 2024-01-17T10:53:56.033Z] 10:53:56     INFO - GECKO(5382) | MEMORY STAT | vsize 11576MB | residentFast 472MB | heapAllocated 286MB
[task 2024-01-17T10:53:56.034Z] 10:53:56     INFO - TEST-OK | browser/extensions/formautofill/test/browser/address/browser_address_doorhanger_non_mergeable_fields.js | took 1995ms
[task 2024-01-17T10:53:56.072Z] 10:53:56     INFO - checking window state
[task 2024-01-17T10:53:56.188Z] 10:53:56     INFO - GECKO(5382) | Completed ShutdownLeaks collections in process 5382
[task 2024-01-17T10:53:56.189Z] 10:53:56     INFO - TEST-START | Shutdown
[task 2024-01-17T10:53:56.190Z] 10:53:56     INFO - Browser Chrome Test Summary
[task 2024-01-17T10:53:56.190Z] 10:53:56     INFO - Passed:  13
[task 2024-01-17T10:53:56.191Z] 10:53:56     INFO - Failed:  0
[task 2024-01-17T10:53:56.193Z] 10:53:56     INFO - Todo:    0
[task 2024-01-17T10:53:56.193Z] 10:53:56     INFO - Mode:    e10s
[task 2024-01-17T10:53:56.195Z] 10:53:56     INFO - *** End BrowserChrome Test Results ***
[task 2024-01-17T10:53:56.209Z] 10:53:56     INFO - GECKO(5382) | Exiting due to channel error.
[task 2024-01-17T10:53:56.209Z] 10:53:56     INFO - GECKO(5382) | Exiting due to channel error.
[task 2024-01-17T10:53:56.209Z] 10:53:56     INFO - GECKO(5382) | Exiting due to channel error.
[task 2024-01-17T10:53:56.213Z] 10:53:56     INFO - GECKO(5382) | Exiting due to channel error.
[task 2024-01-17T10:53:56.213Z] 10:53:56     INFO - GECKO(5382) | Exiting due to channel error.
[task 2024-01-17T10:53:56.230Z] 10:53:56     INFO - TEST-INFO | Main app process: exit 0
[task 2024-01-17T10:53:56.231Z] 10:53:56     INFO - runtests.py | Application ran for: 0:08:04.750085
[task 2024-01-17T10:53:56.232Z] 10:53:56     INFO - zombiecheck | Reading PID log: /tmp/tmpn6o_a9cepidlog
[task 2024-01-17T10:53:56.233Z] 10:53:56     INFO - ==> process 5382 launched child process 5450
[task 2024-01-17T10:53:56.233Z] 10:53:56     INFO - ==> process 5382 launched child process 5483
[task 2024-01-17T10:53:56.233Z] 10:53:56     INFO - ==> process 5382 launched child process 5524
[task 2024-01-17T10:53:56.233Z] 10:53:56     INFO - ==> process 5382 launched child process 5550
[task 2024-01-17T10:53:56.233Z] 10:53:56     INFO - ==> process 5382 launched child process 5551
[task 2024-01-17T10:53:56.233Z] 10:53:56     INFO - ==> process 5382 launched child process 5554
[task 2024-01-17T10:53:56.233Z] 10:53:56     INFO - ==> process 5382 launched child process 5594
[task 2024-01-17T10:53:56.233Z] 10:53:56     INFO - zombiecheck | Checking for orphan process with PID: 5450
[task 2024-01-17T10:53:56.233Z] 10:53:56     INFO - zombiecheck | Checking for orphan process with PID: 5483
[task 2024-01-17T10:53:56.233Z] 10:53:56     INFO - zombiecheck | Checking for orphan process with PID: 5550
[task 2024-01-17T10:53:56.233Z] 10:53:56     INFO - zombiecheck | Checking for orphan process with PID: 5551
[task 2024-01-17T10:53:56.233Z] 10:53:56     INFO - zombiecheck | Checking for orphan process with PID: 5554
[task 2024-01-17T10:53:56.233Z] 10:53:56     INFO - zombiecheck | Checking for orphan process with PID: 5524
[task 2024-01-17T10:53:56.233Z] 10:53:56     INFO - zombiecheck | Checking for orphan process with PID: 5594
[task 2024-01-17T10:53:56.235Z] 10:53:56     INFO -  Traceback (most recent call last):
[task 2024-01-17T10:53:56.235Z] 10:53:56     INFO -    File "/builds/worker/workspace/build/tests/mochitest/runtests.py", line 3769, in doTests
[task 2024-01-17T10:53:56.235Z] 10:53:56     INFO -      ret, _ = self.runApp(
[task 2024-01-17T10:53:56.235Z] 10:53:56     INFO -    File "/builds/worker/workspace/build/tests/mochitest/runtests.py", line 3029, in runApp
[task 2024-01-17T10:53:56.235Z] 10:53:56     INFO -      six.reraise(exc, value, tb)
[task 2024-01-17T10:53:56.235Z] 10:53:56     INFO -    File "/builds/worker/workspace/build/venv/lib/python3.8/site-packages/six.py", line 696, in reraise
[task 2024-01-17T10:53:56.235Z] 10:53:56     INFO -      raise value
[task 2024-01-17T10:53:56.236Z] 10:53:56     INFO -    File "/builds/worker/workspace/build/tests/mochitest/runtests.py", line 2905, in runApp
[task 2024-01-17T10:53:56.237Z] 10:53:56     INFO -      self.execute_start_script()
[task 2024-01-17T10:53:56.237Z] 10:53:56     INFO -    File "/builds/worker/workspace/build/tests/mochitest/runtests.py", line 2152, in execute_start_script
[task 2024-01-17T10:53:56.237Z] 10:53:56     INFO -      return self.marionette.execute_script(
[task 2024-01-17T10:53:56.237Z] 10:53:56     INFO -    File "/usr/lib/python3.8/contextlib.py", line 120, in __exit__
[task 2024-01-17T10:53:56.238Z] 10:53:56     INFO -      next(self.gen)
[task 2024-01-17T10:53:56.239Z] 10:53:56     INFO -    File "/builds/worker/workspace/build/venv/lib/python3.8/site-packages/marionette_driver/marionette.py", line 1600, in using_context
[task 2024-01-17T10:53:56.239Z] 10:53:56     INFO -      self.set_context(scope)
[task 2024-01-17T10:53:56.243Z] 10:53:56     INFO -    File "/builds/worker/workspace/build/venv/lib/python3.8/site-packages/marionette_driver/marionette.py", line 1578, in set_context
[task 2024-01-17T10:53:56.244Z] 10:53:56     INFO -      self._send_message("Marionette:SetContext", {"value": context})
[task 2024-01-17T10:53:56.244Z] 10:53:56     INFO -    File "/builds/worker/workspace/build/venv/lib/python3.8/site-packages/marionette_driver/decorators.py", line 34, in _
[task 2024-01-17T10:53:56.244Z] 10:53:56     INFO -      m._handle_socket_failure()
[task 2024-01-17T10:53:56.245Z] 10:53:56     INFO -    File "/builds/worker/workspace/build/venv/lib/python3.8/site-packages/marionette_driver/marionette.py", line 820, in _handle_socket_failure
[task 2024-01-17T10:53:56.245Z] 10:53:56     INFO -      reraise(exc_cls, exc, tb)
[task 2024-01-17T10:53:56.245Z] 10:53:56     INFO -    File "/builds/worker/workspace/build/venv/lib/python3.8/site-packages/six.py", line 696, in reraise
[task 2024-01-17T10:53:56.246Z] 10:53:56     INFO -      raise value
[task 2024-01-17T10:53:56.246Z] 10:53:56     INFO -    File "/builds/worker/workspace/build/venv/lib/python3.8/site-packages/marionette_driver/decorators.py", line 24, in _
[task 2024-01-17T10:53:56.246Z] 10:53:56     INFO -      return func(*args, **kwargs)
[task 2024-01-17T10:53:56.247Z] 10:53:56     INFO -    File "/builds/worker/workspace/build/venv/lib/python3.8/site-packages/marionette_driver/marionette.py", line 769, in _send_message
[task 2024-01-17T10:53:56.247Z] 10:53:56     INFO -      msg = self.client.request(name, params)
[task 2024-01-17T10:53:56.247Z] 10:53:56     INFO -    File "/builds/worker/workspace/build/venv/lib/python3.8/site-packages/marionette_driver/transport.py", line 382, in request
[task 2024-01-17T10:53:56.247Z] 10:53:56     INFO -      return self.receive()
[task 2024-01-17T10:53:56.248Z] 10:53:56     INFO -    File "/builds/worker/workspace/build/venv/lib/python3.8/site-packages/marionette_driver/transport.py", line 218, in receive
[task 2024-01-17T10:53:56.248Z] 10:53:56     INFO -      raise socket.timeout(
[task 2024-01-17T10:53:56.248Z] 10:53:56     INFO -  socket.timeout: Connection timed out after 360s
[task 2024-01-17T10:53:56.248Z] 10:53:56    ERROR - Automation Error: Received unexpected exception while running application
[task 2024-01-17T10:53:56.248Z] 10:53:56    ERROR - 
[task 2024-01-17T10:53:56.249Z] 10:53:56     INFO - Stopping web server
[task 2024-01-17T10:53:56.249Z] 10:53:56     INFO - Server shut down.
[task 2024-01-17T10:53:56.249Z] 10:53:56     INFO - Web server killed.
[task 2024-01-17T10:53:56.249Z] 10:53:56     INFO - Stopping web socket server
[task 2024-01-17T10:53:56.249Z] 10:53:56     INFO - Stopping ssltunnel
[task 2024-01-17T10:53:56.250Z] 10:53:56     INFO - Stopping gst for v4l2loopback
[task 2024-01-17T10:53:56.250Z] 10:53:56     INFO - runtests.py | Running tests: end.
[task 2024-01-17T10:53:56.311Z] 10:53:56     INFO - Buffered messages finished
[task 2024-01-17T10:53:56.312Z] 10:53:56     INFO -  TEST-INFO | checking window state
[task 2024-01-17T10:53:56.312Z] 10:53:56     INFO -  Browser Chrome Test Summary
[task 2024-01-17T10:53:56.313Z] 10:53:56     INFO -  	Passed: 364
[task 2024-01-17T10:53:56.313Z] 10:53:56     INFO -  	Failed: 0
[task 2024-01-17T10:53:56.315Z] 10:53:56     INFO -  	Todo: 0
[task 2024-01-17T10:53:56.315Z] 10:53:56     INFO -  	Mode: e10s
[task 2024-01-17T10:53:56.315Z] 10:53:56     INFO -  *** End BrowserChrome Test Results ***
[task 2024-01-17T10:53:56.315Z] 10:53:56     INFO - Buffered messages finished
[task 2024-01-17T10:53:56.315Z] 10:53:56     INFO - SUITE-END | took 485s
Summary: Frequent Automation Error: Received unexpected exception while running application | socket.timeout: Timed out waiting for connection on 127.0.0.1:2828 | Connection timed out after 360s → Intermittent Automation Error: Received unexpected exception while running application | socket.timeout: Timed out waiting for connection on 127.0.0.1:2828 | Connection timed out after 360s

(In reply to Intermittent Failures Robot from comment #79)

15 failures in 4202 pushes (0.004 failures/push) were associated with this bug in the last 7 days.

Lots of these failures these days are related to bug 1819763. I've requested sheriffs to check and re-classify this list.

(In reply to Intermittent Failures Robot from comment #102)

Platform and build breakdown:

  • linux2204-64-wayland: 2
    • debug: 1
    • opt: 1

For more details, see:
https://treeherder.mozilla.org/intermittent-failures/bugdetails?bug=1818909&startday=2024-10-21&endday=2024-10-27&tree=all

Both are jobs running under Wayland, and they face the following compositor issue:

[task 2024-10-21T20:32:29.411Z] 20:32:29     INFO - GECKO(3791) | Warning: ConnectToCompositor() try again : Connection refused
[task 2024-10-21T20:32:29.412Z] 20:32:29     INFO - GECKO(3791) | Warning: ConnectToCompositor() try again : Connection refused
[task 2024-10-21T20:32:29.412Z] 20:32:29     INFO - GECKO(3791) | Error: ConnectToCompositor() connect() failed repeatedly : Connection refused
[task 2024-10-21T20:32:29.413Z] 20:32:29     INFO - GECKO(3791) | Error: ProxiedConnection::Process(): Failed to connect to compositor
[task 2024-10-21T20:32:29.414Z] 20:32:29     INFO - GECKO(3791) |  : Connection refused
[task 2024-10-21T20:32:29.415Z] 20:32:29     INFO - GECKO(3791) | Error: we don't have any display, WAYLAND_DISPLAY='wayland-0' DISPLAY='(null)'
[task 2024-10-21T20:35:29.217Z] 20:35:29     INFO - runtests.py | Waiting for browser...

Joel, is that something that is already tracked on another bug?

Flags: needinfo?(jmaher)
OS: macOS → All

I am not aware of this bug, I assume this is a headless run? If not, maybe there are some older scripts that we did for hacks trying to get wayland coverage on 18.04 and there are bits of that lingering.

Flags: needinfo?(jmaher)

(In reply to Joel Maher ( :jmaher ) (UTC -8) from comment #104)

I am not aware of this bug, I assume this is a headless run? If not, maybe there are some older scripts that we did for hacks trying to get wayland coverage on 18.04 and there are bits of that lingering.

No, those are all new and happen for beta as well. Shall we file a dedicated new bug for those failures?

(In reply to Henrik Skupin [:whimboo][⌚️UTC+2] from comment #106)

(In reply to Joel Maher ( :jmaher ) (UTC -8) from comment #104)

I am not aware of this bug, I assume this is a headless run? If not, maybe there are some older scripts that we did for hacks trying to get wayland coverage on 18.04 and there are bits of that lingering.

No, those are all new and happen for beta as well. Shall we file a dedicated new bug for those failures?

Last time when those Wayland specific failures appeared was on 2024-10-25. Since then no more failures got classified.

You need to log in before you can comment on or make changes to this bug.