High frequency Linux [tier 2] wpt TEST-UNEXPECTED-CRASH <test-name> | expected OK when Gecko 138 merges to beta on 2025-03-31
Categories
(Testing :: web-platform-tests, defect)
Tracking
(firefox-esr115 unaffected, firefox-esr128 unaffected, firefox136 unaffected, firefox137 unaffected, firefox138+ affected)
Tracking | Status | |
---|---|---|
firefox-esr115 | --- | unaffected |
firefox-esr128 | --- | unaffected |
firefox136 | --- | unaffected |
firefox137 | --- | unaffected |
firefox138 | + | affected |
People
(Reporter: abutkovits, Unassigned, NeedInfo)
References
Details
(Keywords: intermittent-failure)
[task 2025-03-13T15:36:18.966Z] 15:36:18 INFO - PID 9779 | [9779] Sandbox: SandboxBroker: thread creation failed: ENOMEM
[task 2025-03-13T15:36:18.967Z] 15:36:18 INFO - PID 9779 | A content process crashed and MOZ_CRASHREPORTER_SHUTDOWN is set, shutting down
[task 2025-03-13T15:36:18.974Z] 15:36:18 INFO - PID 9779 | [Parent 9779, IPC I/O Parent] WARNING: process 20721 exited on signal 15: file /builds/worker/checkouts/gecko/ipc/chromium/src/chrome/common/process_watcher_posix_sigchld.cc:132
[task 2025-03-13T15:36:18.984Z] 15:36:18 INFO - PID 9779 | [GFX1-]: RenderCompositorSWGL failed mapping default framebuffer, no dt
[task 2025-03-13T15:36:19.375Z] 15:36:19 INFO - Closing window 064e1624-b26c-41b2-ab8e-862686a57712
[task 2025-03-13T15:36:19.383Z] 15:36:19 INFO - PID 9779 | 1741880179382 Marionette INFO Stopped listening on port 37321
[task 2025-03-13T15:36:19.420Z] 15:36:19 INFO - NoSuchWindowException on command, setting status to CRASH
[task 2025-03-13T15:36:19.422Z] 15:36:19 INFO - TEST-UNEXPECTED-CRASH | /css/css-grid/alignment/grid-row-axis-alignment-positioned-items-014.html | expected OK
[task 2025-03-13T15:36:19.422Z] 15:36:19 INFO - TEST-INFO took 472ms
[task 2025-03-13T15:36:19.426Z] 15:36:19 INFO - PID 9779 | JavaScript error: chrome://remote/content/marionette/cert.sys.mjs, line 47: NS_ERROR_NOT_AVAILABLE: Component returned failure code: 0x80040111 (NS_ERROR_NOT_AVAILABLE) [nsICertOverrideService.setDisableAllSecurityChecksAndLetAttackersInterceptMyData]
[task 2025-03-13T15:36:19.509Z] 15:36:19 INFO - Browser exited with return code -15
[task 2025-03-13T15:36:19.511Z] 15:36:19 INFO - Closing logging queue
[task 2025-03-13T15:36:19.511Z] 15:36:19 INFO - queue closed
[task 2025-03-13T15:36:19.551Z] 15:36:19 INFO - Application command: /builds/worker/workspace/build/application/firefox/firefox --marionette about:blank -profile /tmp/tmppran9exy
[task 2025-03-13T15:36:19.564Z] 15:36:19 INFO - PID 10911 | Gtk-Message: 15:35:10.646: Failed to load module "canberra-gtk-module"
[task 2025-03-13T15:36:19.564Z] 15:36:19 INFO - PID 10911 | Gtk-Message: 15:35:10.647: Failed to load module "canberra-gtk-module"
[task 2025-03-13T15:36:19.564Z] 15:36:19 INFO - PID 10911 | [GFX1-]: glxtest: libpci missing
[task 2025-03-13T15:36:19.564Z] 15:36:19 INFO - PID 10911 | [GFX1-]: glxtest: libEGL missing
[task 2025-03-13T15:36:19.564Z] 15:36:19 INFO - PID 10911 | [GFX1-]: glxtest: libGL.so.1 missing
[task 2025-03-13T15:36:19.564Z] 15:36:19 INFO - PID 10911 | [GFX1-]: No GPUs detected via PCI
[task 2025-03-13T15:36:19.564Z] 15:36:19 INFO - PID 10911 | 1741880111084 Marionette INFO Marionette enabled
[task 2025-03-13T15:36:19.564Z] 15:36:19 INFO - PID 10911 | 1741880111316 Marionette INFO Listening on port 39140
[task 2025-03-13T15:36:19.565Z] 15:36:19 INFO - PID 10911 | [GFX1-]: Failed GL context creation for WebRender: 0
[task 2025-03-13T15:36:19.565Z] 15:36:19 INFO - PID 10911 | [GFX1-]: FEATURE_FAILURE_WEBRENDER_INITIALIZE_UNSPECIFIED
[task 2025-03-13T15:36:19.565Z] 15:36:19 INFO - PID 10911 | [GFX1-]: Failed to connect WebRenderBridgeChild. isParent=true
[task 2025-03-13T15:36:19.565Z] 15:36:19 INFO - PID 10911 | [GFX1-]: Fallback WR to SW-WR
[task 2025-03-13T15:36:19.566Z] 15:36:19 INFO - PID 10911 | console.error: ({})
[task 2025-03-13T15:36:19.567Z] 15:36:19 INFO - PID 10911 | [ERROR fog_control] Boo, couldn't open serverknobs file at /builds/worker/workspace/build/application/firefox/interesting_serverknobs.json
[task 2025-03-13T15:36:19.567Z] 15:36:19 INFO - PID 10911 | GLib-GIO-Message: 15:35:28.903: Using the 'memory' GSettings backend. Your settings will not be saved or shared with other applications.
[task 2025-03-13T15:36:19.567Z] 15:36:19 INFO - Starting runner
[task 2025-03-13T15:36:20.132Z] 15:36:20 INFO - TEST-START | /css/css-grid/alignment/grid-row-axis-alignment-positioned-items-015.html```
Updated•1 month ago
|
Comment 1•1 month ago
|
||
The bug is marked as tracked for firefox138 (nightly). We have limited time to fix this, the soft freeze is in 10 days. However, the bug still isn't assigned.
:Honza, could you please find an assignee for this tracked bug? If you disagree with the tracking decision, please talk with the release managers.
For more information, please visit BugBot documentation.
Comment 2•1 month ago
|
||
Henrik, could you please take a look, thank you.
Comment 3•1 month ago
|
||
This is not Marionette related but looks strongly like we hit out of memory situations for all those failing cases. Here the related line from the log:
[task 2025-03-13T15:51:13.362Z] 15:51:13 INFO - PID 18494 | [18494] Sandbox: SandboxBroker: thread creation failed: ENOMEM
Maybe Jed knows more in case something changed for the SandboxBroker recently.
Comment 4•1 month ago
|
||
The most recent thing that seems relevant is bug 1553850, but that's in 137. And this isn't an issue with exceeding the thread limit (RLIMIT_NPROC
), because that would fail with EAGAIN
, not ENOMEM
.
There were a couple of issues with memory leaks or increased memory use from bug 1942129, but I think those were resolved before this test run.
I notice that the failing run is 32-bit, so we're probably running out of address space rather than memory per se. Two things I can think of:
- Keep a count of the number of extant
SandboxBroker
instances and log that in the crashing case, to see if it looks unreasonably large. - If it's large but doesn't seem to be a leak or otherwise fixable, the per-broker address space consumption can be optimized somewhat.
Leaving needinfo to myself to look into this a little more.
Comment 5•1 month ago
|
||
I can't remember exactly but it's very much likely something that rings a bell in my mind
Comment 6•1 month ago
|
||
This is a reminder regarding comment #1!
The bug is marked as tracked for firefox138 (nightly). We have limited time to fix this, the soft freeze is in 3 days. However, the bug still isn't assigned.
Comment 7•1 month ago
|
||
I couldn't reproduce this on Try, even by using the same mach try release
as the original failing run. I'll see if I can improve log messages to narrow it down a little from “nonspecific 32-bit OOM”.
Updated•25 days ago
|
Comment 8•25 days ago
|
||
Jed, here three failures from the latest beta simulation:
- crash with 285 brokers (treeherder)
- crash with 331 brokers (treeherder
- crash with 297 brokers (treeherder
What is the threshold when we are talking about too many brokers? Is that the case for those runs?
Comment 9•25 days ago
|
||
Last I recall from investigating this we were talking about just a few. That much is not expected, I'll take a look today
Comment 10•24 days ago
|
||
8:23.51 TEST_START: /service-workers/service-worker/xsl-base-url.https.html
8:23.51 INFO Closing window f5c2853c-3753-408b-8ba1-b26d6d791d75
8:23.52 pid:4422 [4422] Sandbox: SandboxBroker: socketpair success (362 brokers)
8:23.52 pid:4422 [4422] Sandbox: SandboxBroker: thread creation success (362 brokers)
8:23.60 pid:4422 [4422] Sandbox: SandboxBroker: socketpair success (363 brokers)
8:23.60 pid:4422 [4422] Sandbox: SandboxBroker: thread creation success (363 brokers)
8:23.74 TEST_END: Test OK. Subtests passed 1/1. Unexpected 0
8:23.74 INFO No more tests
Just running amd64 opt build and wpt service workers tests. I'm wondering if the wpt service workers tests are not just keeping too many service workers alive and thus we keep the sandbox broker references alive as well
Comment 11•24 days ago
|
||
So I came up with a bit of a hack, maybe we should be releasing something earlier and I could improve the number of brokers down to ~8, yet we still crash: https://treeherder.mozilla.org/logviewer?job_id=501882332&repo=try&lineNumber=11568 we still hit ENOMEM
Comment hidden (Intermittent Failures Robot) |
Comment 13•16 days ago
|
||
(In reply to :gerard-majax from comment #11)
So I came up with a bit of a hack, maybe we should be releasing something earlier and I could improve the number of brokers down to ~8, yet we still crash: https://treeherder.mozilla.org/logviewer?job_id=501882332&repo=try&lineNumber=11568 we still hit ENOMEM
That’s a nice dropdown of brokers! Do you think having a patch for that in a separate bug — like you suggested — and landing it once it’s polished could help at least address one symptom? It might even reduce the number of crashes as a result; even though it doesn't fix it completely.
Comment 14•16 days ago
|
||
(In reply to Henrik Skupin [:whimboo][⌚️UTC+2] from comment #13)
(In reply to :gerard-majax from comment #11)
So I came up with a bit of a hack, maybe we should be releasing something earlier and I could improve the number of brokers down to ~8, yet we still crash: https://treeherder.mozilla.org/logviewer?job_id=501882332&repo=try&lineNumber=11568 we still hit ENOMEM
That’s a nice dropdown of brokers! Do you think having a patch for that in a separate bug — like you suggested — and landing it once it’s polished could help at least address one symptom? It might even reduce the number of crashes as a result; even though it doesn't fix it completely.
That was really a hack to check if the theory was holding. I think Jed mentionned the fact that we also depend on CC to happen: https://bugzilla.mozilla.org/show_bug.cgi?id=1936938#c10
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Description
•