Intermittent [tier 2] /css/css-grid/alignment/grid-row-axis-alignment-sticky-positioned-items-002.html | single tracking bug
Categories
(Core :: Layout: Grid, defect, P5)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox125 | --- | unaffected |
| firefox126 | --- | unaffected |
People
(Reporter: intermittent-bug-filer, Unassigned)
References
Details
(Keywords: intermittent-failure, intermittent-testcase, regression, Whiteboard: [retriggered])
Filed by: sstanca [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=453745674&repo=mozilla-central
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/ANUBZeQPSZOBO6kqxSV9hg/runs/0/artifacts/public/logs/live_backing.log
[task 2024-04-06T10:55:34.452Z] 10:55:34 INFO - TEST-OK | /css/css-grid/alignment/grid-row-axis-alignment-sticky-positioned-items-001.html | took 338ms
[task 2024-04-06T10:55:34.453Z] 10:55:34 INFO - TEST-START | /css/css-grid/alignment/grid-row-axis-alignment-sticky-positioned-items-002.html
[task 2024-04-06T10:55:34.457Z] 10:55:34 INFO - Closing window e31cb419-d2d8-498e-adf7-e97e48432f69
[task 2024-04-06T10:55:34.476Z] 10:55:34 INFO - PID 32485 | [GFX1-]: wr_renderer_render: OutOfMemory
[task 2024-04-06T10:55:34.499Z] 10:55:34 INFO - PID 32485 | [GFX1-]: Handling webrender error 2
[task 2024-04-06T10:55:34.500Z] 10:55:34 INFO - PID 32485 | [GFX1-]: Fallback remains SW-WR
[task 2024-04-06T10:55:34.647Z] 10:55:34 INFO - PID 32485 | [32485] Sandbox: SandboxBroker: thread creation failed: ENOMEM
[task 2024-04-06T10:55:34.650Z] 10:55:34 INFO - PID 32485 | [32485] Sandbox: SandboxBroker: thread creation failed: ENOMEM
[task 2024-04-06T10:55:34.650Z] 10:55:34 INFO - PID 32485 | [Parent 32485, IPC I/O Parent] WARNING: process 12541 exited on signal 15: file /builds/worker/checkouts/gecko/ipc/chromium/src/base/process_util_posix.cc:335
[task 2024-04-06T10:55:34.651Z] 10:55:34 INFO - PID 32485 | A content process crashed and MOZ_CRASHREPORTER_SHUTDOWN is set, shutting down
[task 2024-04-06T10:55:34.653Z] 10:55:34 INFO - PID 32485 | [Parent 32485, IPC I/O Parent] WARNING: process 12543 exited on signal 15: file /builds/worker/checkouts/gecko/ipc/chromium/src/base/process_util_posix.cc:335
[task 2024-04-06T10:55:34.661Z] 10:55:34 INFO - PID 32485 | [GFX1-]: RenderCompositorSWGL failed mapping default framebuffer, no dt
[task 2024-04-06T10:55:35.033Z] 10:55:35 INFO - PID 32485 | A content process crashed and MOZ_CRASHREPORTER_SHUTDOWN is set, shutting down
[task 2024-04-06T10:55:35.038Z] 10:55:35 INFO - PID 32485 | 1712400935037 Marionette INFO Stopped listening on port 51613
[task 2024-04-06T10:55:35.077Z] 10:55:35 INFO - NoSuchWindowException on command, setting status to CRASH
[task 2024-04-06T10:55:35.078Z] 10:55:35 INFO - TEST-UNEXPECTED-CRASH | /css/css-grid/alignment/grid-row-axis-alignment-sticky-positioned-items-002.html | expected OK
[task 2024-04-06T10:55:35.078Z] 10:55:35 INFO - TEST-INFO took 626ms
[task 2024-04-06T10:55:35.082Z] 10:55:35 INFO - PID 32485 | JavaScript error: chrome://remote/content/marionette/cert.sys.mjs, line 47: NS_ERROR_NOT_AVAILABLE: Component returned failure code: 0x80040111 (NS_ERROR_NOT_AVAILABLE) [nsICertOverrideService.setDisableAllSecurityChecksAndLetAttackersInterceptMyData]
[task 2024-04-06T10:55:35.083Z] 10:55:35 INFO - PID 32485 | JavaScript error: chrome://remote/content/marionette/cert.sys.mjs, line 47: NS_ERROR_NOT_AVAILABLE: Component returned failure code: 0x80040111 (NS_ERROR_NOT_AVAILABLE) [nsICertOverrideService.setDisableAllSecurityChecksAndLetAttackersInterceptMyData]
[task 2024-04-06T10:55:45.096Z] 10:55:45 INFO - PID 32485 | WARNING: At least one completion condition is taking too long to complete. Conditions: [{"name":"ShieldRecipeClient: Cleaning up","state":"(none)","filename":"resource://normandy/lib/CleanupManager.sys.mjs","lineNumber":39,"stack":["resource://normandy/lib/CleanupManager.sys.mjs:cleanup:39","resource://normandy/Normandy.sys.mjs:uninit:161","resource:///modules/BrowserGlue.sys.mjs:_onQuitApplicationGranted/tasks<:2135","resource:///modules/BrowserGlue.sys.mjs:_onQuitApplicationGranted:2153","resource:///modules/BrowserGlue.sys.mjs:BG_observe:1120","resource:///modules/ContentCrashHandlers.sys.mjs:observe:165"]}] Barrier: profile-before-change
[task 2024-04-06T10:55:50.095Z] 10:55:50 INFO - PID 32485 | console.error: (new Error("SessionFile is closed", "resource:///modules/sessionstore/SessionFile.sys.mjs", 438))
[task 2024-04-06T10:56:36.096Z] 10:56:36 INFO - PID 32485 | FATAL ERROR: AsyncShutdown timeout in profile-before-change Conditions: [{"name":"ShieldRecipeClient: Cleaning up","state":"(none)","filename":"resource://normandy/lib/CleanupManager.sys.mjs","lineNumber":39,"stack":["resource://normandy/lib/CleanupManager.sys.mjs:cleanup:39","resource://normandy/Normandy.sys.mjs:uninit:161","resource:///modules/BrowserGlue.sys.mjs:_onQuitApplicationGranted/tasks<:2135","resource:///modules/BrowserGlue.sys.mjs:_onQuitApplicationGranted:2153","resource:///modules/BrowserGlue.sys.mjs:BG_observe:1120","resource:///modules/ContentCrashHandlers.sys.mjs:observe:165"]}] At least one completion condition failed to complete within a reasonable amount of time. Causing a crash to ensure that we do not leave the user with an unresponsive process draining resources.
[task 2024-04-06T10:56:36.097Z] 10:56:36 INFO - PID 32485 | [Parent 32485, Main Thread] ###!!! ABORT: file resource://normandy/lib/CleanupManager.sys.mjs:39
[task 2024-04-06T10:56:36.100Z] 10:56:36 INFO - PID 32485 | ExceptionHandler::GenerateDump cloned child 12549
[task 2024-04-06T10:56:36.101Z] 10:56:36 INFO - PID 32485 | ExceptionHandler::SendContinueSignalToChild sent continue signal to child
[task 2024-04-06T10:56:36.101Z] 10:56:36 INFO - PID 32485 | ExceptionHandler::WaitForContinueSignal waiting for continue signal...
[task 2024-04-06T10:56:36.263Z] 10:56:36 INFO - Browser exited with return code 11
[task 2024-04-06T10:56:36.264Z] 10:56:36 INFO - Closing logging queue
[task 2024-04-06T10:56:36.265Z] 10:56:36 INFO - queue closed
[task 2024-04-06T10:56:36.284Z] 10:56:36 INFO - Application command: /builds/worker/workspace/build/application/firefox/firefox --marionette about:blank -profile /tmp/tmpea84wgwc
[task 2024-04-06T10:56:36.293Z] 10:56:36 INFO - PID 2348 | Gtk-Message: 10:54:36.773: Failed to load module "canberra-gtk-module"
[task 2024-04-06T10:56:36.294Z] 10:56:36 INFO - PID 2348 | Gtk-Message: 10:54:36.775: Failed to load module "canberra-gtk-module"
[task 2024-04-06T10:56:36.294Z] 10:56:36 INFO - PID 2348 | [GFX1-]: glxtest: libpci missing
[task 2024-04-06T10:56:36.294Z] 10:56:36 INFO - PID 2348 | [GFX1-]: glxtest: libEGL missing
[task 2024-04-06T10:56:36.295Z] 10:56:36 INFO - PID 2348 | [GFX1-]: glxtest: libGL.so.1 missing
[task 2024-04-06T10:56:36.295Z] 10:56:36 INFO - PID 2348 | [GFX1-]: No GPUs detected via PCI
[task 2024-04-06T10:56:36.295Z] 10:56:36 INFO - PID 2348 | 1712400877170 Marionette INFO Marionette enabled
[task 2024-04-06T10:56:36.296Z] 10:56:36 INFO - PID 2348 | 1712400877301 Marionette INFO Listening on port 38192
[task 2024-04-06T10:56:36.296Z] 10:56:36 INFO - PID 2348 | [GFX1-]: Failed GL context creation for WebRender: 0
[task 2024-04-06T10:56:36.297Z] 10:56:36 INFO - PID 2348 | [GFX1-]: FEATURE_FAILURE_WEBRENDER_INITIALIZE_UNSPECIFIED
[task 2024-04-06T10:56:36.297Z] 10:56:36 INFO - PID 2348 | [GFX1-]: Failed to connect WebRenderBridgeChild. isParent=true
[task 2024-04-06T10:56:36.298Z] 10:56:36 INFO - PID 2348 | [GFX1-]: Fallback WR to SW-WR
[task 2024-04-06T10:56:36.298Z] 10:56:36 INFO - PID 2348 | GLib-GIO-Message: 10:54:39.375: Using the 'memory' GSettings backend. Your settings will not be saved or shared with other applications.
[task 2024-04-06T10:56:36.299Z] 10:56:36 INFO - PID 2348 | console.error: ({})
[task 2024-04-06T10:56:36.299Z] 10:56:36 INFO - PID 2348 | [ERROR error_support::handling] suggest-unexpected: Error from Remote Settings: Error parsing URL: relative URL with a cannot-be-a-base base
[task 2024-04-06T10:56:36.300Z] 10:56:36 INFO - PID 2348 | console.error: URLBar - QuickSuggest.SuggestBackendRust: "Ingest error: Error from Remote Settings: Error parsing URL: relative URL with a cannot-be-a-base base"
[task 2024-04-06T10:56:36.300Z] 10:56:36 INFO - Starting runner
[task 2024-04-06T10:56:36.899Z] 10:56:36 INFO - TEST-START | /css/css-grid/alignment/grid-row-axis-self-baseline-synthesized-001.html
Comment 1•2 years ago
|
||
Hi Kelly! Can you please take a look at this? I think this might be something regressed by the recent changes from Bug 1889769.
Thank you!
| Comment hidden (Intermittent Failures Robot) |
Comment 3•2 years ago
|
||
:emilio, do you know what might have caused this? kelly's patch was just to firefoxview CSS and is not a potential cause. But knowing what is might get her off the hook and give the sherifs something to work with.
Comment 4•2 years ago
|
||
[GFX1-]: wr_renderer_render: OutOfMemory
So a graphics issue? Can we get this bisected if it's frequent enough?
Comment 5•2 years ago
|
||
This is the way the backfills and retriggers look like. First time it appeared, from 14 runs, it only failed 2 times. I can do a more in depth bisection if you want.
Comment 6•2 years ago
|
||
I don't see much there that would obviously increase the chances of OOM in webrender in those patches. I guess the culprint could be anywhere really, so if you could bisect the intermediate patches it'd be great....
Comment 7•2 years ago
|
||
Hi Henrik! Can you please take a look at this and the bugs in the see also section? It seems that they are regressed by Bug 1522790.
Emilio if you want to take a look at the in depth bisection you can take a look at the link above!
Thank you!
Comment 8•2 years ago
|
||
The patch for bug 1522790 cannot have caused it but made it potentially more prominent and easier to reproduce. Please note that the original patch got backed out but we will re-land shortly again. The number of failures is still low so I would probably advice to not mark the test as intermittent crash.
Also it looks like that only Linux 18.04 WebRender Shippable is affected here. But what's strange is that we do not have any generated minidump file for all of these crashes. Gabriele, is that something you have seen before?
Here the related lines from the log:
[task 2024-04-12T23:01:01.145Z] 23:01:01 INFO - PID 32290 | [GFX1-]: wr_renderer_render: OutOfMemory
[task 2024-04-12T23:01:01.152Z] 23:01:01 INFO - PID 32290 | [GFX1-]: Handling webrender error 2
[task 2024-04-12T23:01:01.153Z] 23:01:01 INFO - PID 32290 | [GFX1-]: Fallback remains SW-WR
[task 2024-04-12T23:01:01.332Z] 23:01:01 INFO - PID 32290 | [32290] Sandbox: SandboxBroker: thread creation failed: ENOMEM
[task 2024-04-12T23:01:01.332Z] 23:01:01 INFO - PID 32290 | [32290] Sandbox: SandboxBroker: thread creation failed: ENOMEM
[task 2024-04-12T23:01:01.333Z] 23:01:01 INFO - PID 32290 | A content process crashed and MOZ_CRASHREPORTER_SHUTDOWN is set, shutting down
[task 2024-04-12T23:01:01.338Z] 23:01:01 INFO - PID 32290 | [Parent 32290, IPC I/O Parent] WARNING: process 12280 exited on signal 15: file /builds/worker/checkouts/gecko/ipc/chromium/src/base/process_util_posix.cc:335
[task 2024-04-12T23:01:01.338Z] 23:01:01 INFO - PID 32290 | [Parent 32290, IPC I/O Parent] WARNING: process 12282 exited on signal 15: file /builds/worker/checkouts/gecko/ipc/chromium/src/base/process_util_posix.cc:335
[task 2024-04-12T23:01:01.348Z] 23:01:01 INFO - PID 32290 | [GFX1-]: RenderCompositorSWGL failed mapping default framebuffer, no dt
[task 2024-04-12T23:01:01.720Z] 23:01:01 INFO - PID 32290 | A content process crashed and MOZ_CRASHREPORTER_SHUTDOWN is set, shutting down
[task 2024-04-12T23:01:01.725Z] 23:01:01 INFO - PID 32290 | 1712962861723 Marionette INFO Stopped listening on port 42181
[task 2024-04-12T23:01:01.767Z] 23:01:01 INFO - NoSuchWindowException on command, setting status to CRASH
[task 2024-04-12T23:01:01.767Z] 23:01:01 INFO - TEST-UNEXPECTED-CRASH | /css/css-grid/alignment/grid-row-axis-alignment-sticky-positioned-items-002.html | expected OK
[task 2024-04-12T23:01:01.767Z] 23:01:01 INFO - TEST-INFO took 633ms
[task 2024-04-12T23:01:01.768Z] 23:01:01 INFO - PID 32290 | JavaScript error: chrome://remote/content/marionette/cert.sys.mjs, line 47: NS_ERROR_NOT_AVAILABLE: Component returned failure code: 0x80040111 (NS_ERROR_NOT_AVAILABLE) [nsICertOverrideService.setDisableAllSecurityChecksAndLetAttackersInterceptMyData]
[task 2024-04-12T23:01:01.816Z] 23:01:01 INFO - PID 32290 | JavaScript error: chrome://remote/content/marionette/cert.sys.mjs, line 47: NS_ERROR_NOT_AVAILABLE: Component returned failure code: 0x80040111 (NS_ERROR_NOT_AVAILABLE) [nsICertOverrideService.setDisableAllSecurityChecksAndLetAttackersInterceptMyData]
[task 2024-04-12T23:01:11.832Z] 23:01:11 INFO - PID 32290 | WARNING: At least one completion condition is taking too long to complete. Conditions: [{"name":"ShieldRecipeClient: Cleaning up","state":"(none)","filename":"resource://normandy/lib/CleanupManager.sys.mjs","lineNumber":39,"stack":["resource://normandy/lib/CleanupManager.sys.mjs:cleanup:39","resource://normandy/Normandy.sys.mjs:uninit:161","resource:///modules/BrowserGlue.sys.mjs:_onQuitApplicationGranted/tasks<:2135","resource:///modules/BrowserGlue.sys.mjs:_onQuitApplicationGranted:2153","resource:///modules/BrowserGlue.sys.mjs:BG_observe:1120","resource:///modules/ContentCrashHandlers.sys.mjs:observe:165"]}] Barrier: profile-before-change
[task 2024-04-12T23:01:16.832Z] 23:01:16 INFO - PID 32290 | console.error: (new Error("SessionFile is closed", "resource:///modules/sessionstore/SessionFile.sys.mjs", 438))
[task 2024-04-12T23:02:02.833Z] 23:02:02 INFO - PID 32290 | FATAL ERROR: AsyncShutdown timeout in profile-before-change Conditions: [{"name":"ShieldRecipeClient: Cleaning up","state":"(none)","filename":"resource://normandy/lib/CleanupManager.sys.mjs","lineNumber":39,"stack":["resource://normandy/lib/CleanupManager.sys.mjs:cleanup:39","resource://normandy/Normandy.sys.mjs:uninit:161","resource:///modules/BrowserGlue.sys.mjs:_onQuitApplicationGranted/tasks<:2135","resource:///modules/BrowserGlue.sys.mjs:_onQuitApplicationGranted:2153","resource:///modules/BrowserGlue.sys.mjs:BG_observe:1120","resource:///modules/ContentCrashHandlers.sys.mjs:observe:165"]}] At least one completion condition failed to complete within a reasonable amount of time. Causing a crash to ensure that we do not leave the user with an unresponsive process draining resources.
[task 2024-04-12T23:02:02.833Z] 23:02:02 INFO - PID 32290 | [Parent 32290, Main Thread] ###!!! ABORT: file resource://normandy/lib/CleanupManager.sys.mjs:39
[task 2024-04-12T23:02:02.836Z] 23:02:02 INFO - PID 32290 | ExceptionHandler::GenerateDump cloned child 12318
[task 2024-04-12T23:02:02.837Z] 23:02:02 INFO - PID 32290 | ExceptionHandler::SendContinueSignalToChild sent continue signal to child
[task 2024-04-12T23:02:02.837Z] 23:02:02 INFO - PID 32290 | ExceptionHandler::WaitForContinueSignal waiting for continue signal...
[task 2024-04-12T23:02:02.977Z] 23:02:02 INFO - Browser exited with return code 11
Maybe it's also due to the delayed shutdown for ShieldRecipeClient?
Comment 9•2 years ago
|
||
I can see that the crash machinery kicked in from the log, so I would expect a minidump to have been generated... however there's signs that the machine was running out of memory at that point. It's possible that minidump generation failed silently because of that.
Comment 10•2 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #9)
I can see that the crash machinery kicked in from the log, so I would expect a minidump to have been generated... however there's signs that the machine was running out of memory at that point. It's possible that minidump generation failed silently because of that.
But that is for the shutdown hang that we experience here as well, or isn't it? Given that there was no such bug yet I filed bug 1891418 for Normandy client.
The resource usage of the machine doesn't actually see a OOM but maybe it's not seen by the resource usage profile?
Comment 11•2 years ago
|
||
There's a cascade of OOM failures here though they appear to be gracefully handled. Maybe it's not memory per se we're running out of, maybe it's some hard limit? Without knowing this code in detail it's hard to tell.
Comment 12•2 years ago
|
||
Hm, maybe there is a problem with GFX code that causes our crash reporter to miss the creation of minidump files? I just saw something similar on bug 1785367 comment 28 which is not a OOM situation.
Comment 13•2 years ago
•
|
||
This seems to have been fixed-by-backout of bug 1522790, which is still backed out at the moment (most recently as of bug 1522790 comment 69), though presumably it's re-landing soonish.
I see there's been various activity on that bug after the initial landing to avoid various test failures; it's not clear to me at this point whether that's going to fix this when bug 1522790 re-lands or not.
[setting status to 'unaffected' for 125 and 126 to get this off the bugdash "important regressions needing decision" list, since this doesn't actually seem to be a gecko regression per se, and the thing that made it start happening on automation is backed out for the time being]
Comment 14•2 years ago
|
||
(In reply to Henrik Skupin [:whimboo][⌚️UTC+1] from comment #12)
Hm, maybe there is a problem with GFX code that causes our crash reporter to miss the creation of minidump files? I just saw something similar on bug 1785367 comment 28 which is not a OOM situation.
When checking the results of this job it can be seen in the profile resource usage (profiler data) that the memory usage indeed didn't went up. So maybe some code tried to allocate too much memory? Or the error code is misleading? Not sure how much memory it costs to create a new SandboxBroker thread but I also don't see how NOMEM gets set when logging.
Jed, do you have an idea?
Comment 15•2 years ago
|
||
The error for exceeding RLIMIT_NPROC is EAGAIN. From a quick look at kernel/fork.c in the kernel source, ENOMEM from clone really does mean that a kernel memory allocation failed.
We are doing something slightly wrong here, because pthread_create is specified to return an error code and not (necessarily) set errno, but from a quick look at nptl/pthread_create.c in the glibc source it does both, at least for the call to clone. So the ENOMEM in the error message really does look like it's indicating an actual OOM, especially combined with the OOM error from WebRender right before it.
Comment 16•1 year ago
•
|
||
Daniel, do you remember which platforms were affected here? Sadly there is no intermittent failure accessible (and listed) anymore in the above table even not with the full history. As of now the only expected meta data that is set is for Android with Fission enabled. Here we have [OK, TIMEOUT]. Someone could try to remove and push to try and check if the test is stable now.
Comment 17•1 year ago
|
||
(In reply to Henrik Skupin [:whimboo][⌚️UTC+2] from comment #16)
Daniel, do you remember which platforms were affected here? Sadly there is no intermittent failure accessible (and listed) anymore in the above table even not with the full history.
Comment 8 says this only affected Linux 18.04 WebRender Shippable.
Comment 18•1 year ago
|
||
Ah I missed that. In that case lets close this bug as incomplete.
Description
•