Closed Bug 1919226 Opened 1 year ago Closed 1 year ago

Intermittent [Parent 2057, Main Thread] ###!!! ABORT: file resource://gre/modules/CrashMonitor.sys.mjs:165 [@ nsDebugImpl::Abort] | single tracking bug

Categories

(Toolkit :: Crash Reporting, defect)

defect

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: intermittent-bug-filer, Unassigned)

References

Details

(Keywords: crash, intermittent-failure, intermittent-testcase)

Crash Data

Filed by: smolnar [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=474607541&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/EjEUsmWMQWOCDoQI9P5xDw/runs/0/artifacts/public/logs/live_backing.log


 mozcrash Saved app info as /opt/worker/tasks/task_172655993997700/build/blobber_upload_dir/B9E53654-7F13-409B-B5D4-F088B114BB08.extra
[task 2024-09-17T08:13:36.046Z] 08:13:36     INFO - PROCESS-CRASH | [Parent 2057, Main Thread] ###!!! ABORT: file resource://gre/modules/CrashMonitor.sys.mjs:165 [@ nsDebugImpl::Abort] | testing/marionette/harness/marionette_harness/tests/unit/test_windowless.py TestWindowless.test_no_window_handles_after_silent_restart 
[task 2024-09-17T08:13:36.046Z] 08:13:36     INFO - Process type: main
[task 2024-09-17T08:13:36.046Z] 08:13:36     INFO - Process pid: 2057
[task 2024-09-17T08:13:36.046Z] 08:13:36     INFO - Mozilla crash reason: [Parent 2057, Main Thread] ###!!! ABORT: file resource://gre/modules/CrashMonitor.sys.mjs:165
[task 2024-09-17T08:13:36.046Z] 08:13:36     INFO - Crash dump filename: /var/folders/wn/b6z3hdsd5tnf7hwgjtlxlcjh000014/T/tmps6spx7wb.mozrunner/minidumps/B9E53654-7F13-409B-B5D4-F088B114BB08.dmp
[task 2024-09-17T08:13:36.046Z] 08:13:36     INFO - Operating system: Mac OS X
[task 2024-09-17T08:13:36.046Z] 08:13:36     INFO -                   10.15.7 19H2026
[task 2024-09-17T08:13:36.046Z] 08:13:36     INFO - CPU: amd64
[task 2024-09-17T08:13:36.046Z] 08:13:36     INFO -      family 6 model 158 stepping 10
[task 2024-09-17T08:13:36.046Z] 08:13:36     INFO -      12 CPUs
[task 2024-09-17T08:13:36.046Z] 08:13:36     INFO - 
[task 2024-09-17T08:13:36.046Z] 08:13:36     INFO - Crash reason:  EXC_BAD_ACCESS / KERN_INVALID_ADDRESS
[task 2024-09-17T08:13:36.046Z] 08:13:36     INFO - Crash address: 0x0000000000000000
[task 2024-09-17T08:13:36.046Z] 08:13:36     INFO - Crashing instruction: `mov dword [0x0], 0x202`
[task 2024-09-17T08:13:36.046Z] 08:13:36     INFO - Memory accessed by instruction:
[task 2024-09-17T08:13:36.046Z] 08:13:36     INFO -   0. Address: 0x0000000000000000
[task 2024-09-17T08:13:36.046Z] 08:13:36     INFO -      Size: 4
[task 2024-09-17T08:13:36.046Z] 08:13:36     INFO - Mac Crash Info:
[task 2024-09-17T08:13:36.046Z] 08:13:36     INFO - 
[task 2024-09-17T08:13:36.046Z] 08:13:36     INFO - Mac Boot Args: chunklist-security-epoch=0 -chunklist-no-rev2-dev
[task 2024-09-17T08:13:36.046Z] 08:13:36     INFO - 
[task 2024-09-17T08:13:36.046Z] 08:13:36     INFO - Process uptime: 62 seconds
[task 2024-09-17T08:13:36.046Z] 08:13:36     INFO - 
[task 2024-09-17T08:13:36.046Z] 08:13:36     INFO - Thread 0 MainThread (crashed)
[task 2024-09-17T08:13:36.046Z] 08:13:36     INFO -  0  XUL!MOZ_Crash(char const*, int, char const*) [Assertions.h:11273eec0c94bee63d38d01c68d33b4785cc88a3 : 317]
[task 2024-09-17T08:13:36.046Z] 08:13:36     INFO -     Found by: inlining
[task 2024-09-17T08:13:36.046Z] 08:13:36     INFO -  1  XUL!Abort(char const*) [nsDebugImpl.cpp:11273eec0c94bee63d38d01c68d33b4785cc88a3 : 514]
[task 2024-09-17T08:13:36.046Z] 08:13:36     INFO -     Found by: inlining
[task 2024-09-17T08:13:36.046Z] 08:13:36     INFO -  2  XUL!NS_DebugBreak [nsDebugImpl.cpp:11273eec0c94bee63d38d01c68d33b4785cc88a3 : 471 + 0xf]
[task 2024-09-17T08:13:36.046Z] 08:13:36     INFO -      rax = 0x000000010f771a48    rdx = 0x0000000000000307
[task 2024-09-17T08:13:36.046Z] 08:13:36     INFO -      rcx = 0x000000011e40a2e0    rbx = 0x000000011e91092a
[task 2024-09-17T08:13:36.046Z] 08:13:36     INFO -      rsi = 0x0000000000000307    rdi = 0x0000000000000000
[task 2024-09-17T08:13:36.046Z] 08:13:36     INFO -      rbp = 0x00007ffee050f790    rsp = 0x00007ffee050f070
[task 2024-09-17T08:13:36.046Z] 08:13:36     INFO -       r8 = 0x7fffffffffffffff     r9 = 0xffffffff00000000
[task 2024-09-17T08:13:36.046Z] 08:13:36     INFO -      r10 = 0x000000011445c318    r11 = 0x0000000006346a78
[task 2024-09-17T08:13:36.046Z] 08:13:36     INFO -      r12 = 0x0000000000000000    r13 = 0x000000011edc8b08
[task 2024-09-17T08:13:36.046Z] 08:13:36     INFO -      r14 = 0x00007ffee050f150    r15 = 0x00007ffee050f0e0
[task 2024-09-17T08:13:36.046Z] 08:13:36     INFO -      rip = 0x00000001169fabe6
[task 2024-09-17T08:13:36.047Z] 08:13:36     INFO -     Found by: given as instruction pointer in context
[task 2024-09-17T08:13:36.047Z] 08:13:36     INFO -  3  XUL!nsDebugImpl::Abort(char const*, int) [nsDebugImpl.cpp:11273eec0c94bee63d38d01c68d33b4785cc88a3 : 129 + 0xd]
[task 2024-09-17T08:13:36.047Z] 08:13:36     INFO -      rbx = 0x0000000000000002    rbp = 0x00007ffee050f7a0
[task 2024-09-17T08:13:36.047Z] 08:13:36     INFO -      rsp = 0x00007ffee050f7a0    r12 = 0x000000012fd6e1a0
[task 2024-09-17T08:13:36.047Z] 08:13:36     INFO -      r13 = 0x0000000000000009    r14 = 0x0000000000000002
[task 2024-09-17T08:13:36.047Z] 08:13:36     INFO -      r15 = 0x0000000000000000    rip = 0x00000001169facb8
[task 2024-09-17T08:13:36.047Z] 08:13:36     INFO -     Found by: call frame info
[task 2024-09-17T08:13:36.047Z] 08:13:36     INFO -  4  XUL!NS_InvokeByIndex + 0x8d
[task 2024-09-17T08:13:36.047Z] 08:13:36     INFO -      rbx = 0x0000000000000002    rbp = 0x00007ffee050f7d0
[task 2024-09-17T08:13:36.047Z] 08:13:36     INFO -      rsp = 0x00007ffee050f7b0    r12 = 0x000000012fd6e1a0
[task 2024-09-17T08:13:36.047Z] 08:13:36     INFO -      r13 = 0x0000000000000009    r14 = 0x0000000000000002
[task 2024-09-17T08:13:36.047Z] 08:13:36     INFO -      r15 = 0x0000000000000000    rip = 0x0000000116aed0ce
[task 2024-09-17T08:13:36.047Z] 08:13:36     INFO -     Found by: call frame info
[task 2024-09-17T08:13:36.047Z] 08:13:36     INFO -  5  XUL!CallMethodHelper::Invoke() [XPCWrappedNative.cpp:11273eec0c94bee63d38d01c68d33b4785cc88a3 : 1620]
[task 2024-09-17T08:13:36.047Z] 08:13:36     INFO -     Found by: inlining
[task 2024-09-17T08:13:36.047Z] 08:13:36     INFO -  6  XUL!CallMethodHelper::Call() [XPCWrappedNative.cpp:11273eec0c94bee63d38d01c68d33b4785cc88a3 : 1174]
[task 2024-09-17T08:13:36.047Z] 08:13:36     INFO -     Found by: inlining
[task 2024-09-17T08:13:36.047Z] 08:13:36     INFO -  7  XUL!XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode) [XPCWrappedNative.cpp:11273eec0c94bee63d38d01c68d33b4785cc88a3 : 1120 + 0xab1]
[task 2024-09-17T08:13:36.047Z] 08:13:36     INFO -      rbx = 0x0000000000000002    rbp = 0x00007ffee050f9f0
[task 2024-09-17T08:13:36.047Z] 08:13:36     INFO -      rsp = 0x00007ffee050f7e0    r12 = 0x00007ffee050f8a0
[task 2024-09-17T08:13:36.047Z] 08:13:36     INFO -      r13 = 0x0000000000000001    r14 = 0x0000000000000002
[task 2024-09-17T08:13:36.047Z] 08:13:36     INFO -      r15 = 0x0000000000000000    rip = 0x000000011732eb11
[task 2024-09-17T08:13:36.047Z] 08:13:36     INFO -     Found by: call frame info
[task 2024-09-17T08:13:36.047Z] 08:13:36     INFO -  8  XUL!XPC_WN_CallMethod(JSContext*, unsigned int, JS::Value*) [XPCWrappedNativeJSOps.cpp:11273eec0c94bee63d38d01c68d33b4785cc88a3 : 966 + 0x7]
[task 2024-09-17T08:13:36.047Z] 08:13:36     INFO -      rbx = 0x00007ffee050fa48    rbp = 0x00007ffee050fb10
 INFO -      rsp = 0x00007ffee050fa00    r12 = 0x0000249c71592ad8
[task 2024-09-17T08:13:36.047Z] 08:13:36     INFO -      r13 = 0x0000249c71592ad8    r14 = 0x0000000149194ef0
[task 2024-09-17T08:13:36.047Z] 08:13:36     INFO -      r15 = 0x0000000149194f10    rip = 0x000000011733015d
[task 2024-09-17T08:13:36.047Z] 08:13:36     INFO -     Found by: call frame info
[task 2024-09-17T08:13:36.047Z] 08:13:36     INFO -  9  XUL!CallJSNative(JSContext*, bool (*)(JSContext*, unsigned int, JS::Value*), js::CallReason, JS::CallArgs const&) [Interpreter.cpp:11273eec0c94bee63d38d01c68d33b4785cc88a3 : 518]
[task 2024-09-17T08:13:36.047Z] 08:13:36     INFO -     Found by: inlining
[task 2024-09-17T08:13:36.047Z] 08:13:36     INFO - 10  XUL!js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason) [Interpreter.cpp:11273eec0c94bee63d38d01c68d33b4785cc88a3 : 612 + 0xc8]
[task 2024-09-17T08:13:36.047Z] 08:13:36     INFO -      rbx = 0x000000012f335100    rbp = 0x00007ffee050fbb0
[task 2024-09-17T08:13:36.047Z] 08:13:36     INFO -      rsp = 0x00007ffee050fb20    r12 = 0x000000012f3e7c00
[task 2024-09-17T08:13:36.047Z] 08:13:36     INFO -      r13 = 0x000006f6342e4510    r14 = 0x00007ffee050fe80
[task 2024-09-17T08:13:36.047Z] 08:13:36     INFO -      r15 = 0x000000011732fed0    rip = 0x000000011afa2533
[task 2024-09-17T08:13:36.047Z] 08:13:36     INFO -     Found by: call frame info
[task 2024-09-17T08:13:36.047Z] 08:13:36     INFO - 11  XUL!InternalCall(JSContext*, js::AnyInvokeArgs const&, js::CallReason) [Interpreter.cpp:11273eec0c94bee63d38d01c68d33b4785cc88a3 : 679]
[task 2024-09-17T08:13:36.047Z] 08:13:36     INFO -     Found by: inlining
[task 2024-09-17T08:13:36.047Z] 08:13:36     INFO - 12  XUL!js::CallFromStack(JSContext*, JS::CallArgs const&, js::CallReason) [Interpreter.cpp:11273eec0c94bee63d38d01c68d33b4785cc88a3 : 684]
[task 2024-09-17T08:13:36.047Z] 08:13:36     INFO -     Found by: inlining
[task 2024-09-17T08:13:36.047Z] 08:13:36     INFO - 13  XUL!js::Interpret(JSContext*, js::RunState&) [Interpreter.cpp:11273eec0c94bee63d38d01c68d33b4785cc88a3 : 3509 + 0x9]
[task 2024-09-17T08:13:36.047Z] 08:13:36     INFO -      rbx = 0x0000000000000000    rbp = 0x00007ffee050fef0
[task 2024-09-17T08:13:36.047Z] 08:13:36     INFO -      rsp = 0x00007ffee050fbc0    r12 = 0x00000000000000f6
[task 2024-09-17T08:13:36.047Z] 08:13:36     INFO -      r13 = 0x000000012f335100    r14 = 0x0000000133044160
[task 2024-09-17T08:13:36.047Z] 08:13:36     INFO -      r15 = 0x0000000000000075    rip = 0x000000011afad011
[task 2024-09-17T08:13:36.047Z] 08:13:36     INFO -     Found by: call frame info
[task 2024-09-17T08:13:36.047Z] 08:13:36     INFO - 14  XUL!MaybeEnterInterpreterTrampoline(JSContext*, js::RunState&) [Interpreter.cpp:11273eec0c94bee63d38d01c68d33b4785cc88a3 : 432]
[task 2024-09-17T08:13:36.047Z] 08:13:36     INFO -     Found by: inlining
[task 2024-09-17T08:13:36.047Z] 08:13:36     INFO - 15  XUL!js::RunScript(JSContext*, js::RunState&) [Interpreter.cpp:11273eec0c94bee63d38d01c68d33b4785cc88a3 : 490 + 0x132]
[task 2024-09-17T08:13:36.048Z] 08:13:36     INFO -      rbx = 0x000000012f335100    rbp = 0x00007ffee050ff70
[task 2024-09-17T08:13:36.048Z] 08:13:36     INFO -      rsp = 0x00007ffee050ff00    r12 = 0x0000000000000000
[task 2024-09-17T08:13:36.048Z] 08:13:36     INFO -      r13 = 0x00007ffee050ffc8    r14 = 0x00007ffee050ffb0
[task 2024-09-17T08:13:36.048Z] 08:13:36     INFO -      r15 = 0x0000000000000001    rip = 0x000000011afa2157
[task 2024-09-17T08:13:36.048Z] 08:13:36     INFO -     Found by: call frame info
[task 2024-09-17T08:13:36.048Z] 08:13:36     INFO - 16  XUL!js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason) [Interpreter.cpp:11273eec0c94bee63d38d01c68d33b4785cc88a3 : 644 + 0x7]
[task 2024-09-17T08:13:36.048Z] 08:13:36     INFO -      rbx = 0x000000012f335100    rbp = 0x00007ffee0510010
[task 2024-09-17T08:13:36.048Z] 08:13:36     INFO -      rsp = 0x00007ffee050ff80    r12 = 0x000000012f3e7c00
[task 2024-09-17T08:13:36.048Z] 08:13:36     INFO -      r13 = 0x00001fa8be8518f8    r14 = 0x00007ffee0510120
[task 2024-09-17T08:13:36.048Z] 08:13:36     INFO -      r15 = 0x0000000000000000    rip = 0x000000011afa27f1
[task 2024-09-17T08:13:36.048Z] 08:13:36     INFO -     Found by: call frame info
[task 2024-09-17T08:13:36.048Z] 08:13:36     INFO - 17  XUL!InternalCall(JSContext*, js::AnyInvokeArgs const&, js::CallReason) [Interpreter.cpp:11273eec0c94bee63d38d01c68d33b4785cc88a3 : 679]
[task 2024-09-17T08:13:36.048Z] 08:13:36     INFO -     Found by: inlining
[task 2024-09-17T08:13:36.048Z] 08:13:36     INFO - 18  XUL!js::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, js::AnyInvokeArgs const&, JS::MutableHandle<JS::Value>, js::CallReason) [Interpreter.cpp:11273eec0c94bee63d38d01c68d33b4785cc88a3 : 711 + 0xc]
[task 2024-09-17T08:13:36.048Z] 08:13:36     INFO -      rbx = 0x00007ffee05100e8    rbp = 0x00007ffee0510050
<...>

Looks like this is some DOM code running as triggered by the cycle collector. If I'm wrong please move to another component.

Component: Marionette → Cycle Collector
Product: Remote Protocol → Core

This doesn't have anything to do with the cycle collector. We're running some Promise code, which is running JS, which looks like it is running CrashMonitor.sys.mjs and somehow in there we are calling abort. From the line number, maybe this is related to IOUtils.profileBeforeChange.addBlocker().

Component: Cycle Collector → General
Product: Core → Toolkit
Duplicate of this bug: 1919481
Duplicate of this bug: 1921357

Mossop, would you have any idea about this? (someone suggested in triage this could be related to multi-profile work?)

Flags: needinfo?(dtownsend)

profileBeforeChange is unrelated to profiles and just a way to do stuff on shutdown. We're probably crashing here in response to a blocker taking too long to execute. If it is actually CrashMonitor then that would be this which is odd because nothing in there looks like it would take very long.

Flags: needinfo?(dtownsend)

I disagree with Dave: writing the session store file could be a long operation, especially in automation (but depending on many details).

There's a .dmp and .extra file, which should include the crash annotations, and those annotations should identify the outstanding shutdown blockers (or the active one). But I don't know how to extract crash annotations from those files.

:gsvelto: can you link to docs about inspecting those crash details from automation? Thanks!

I wonder if we even should be writing the session store on exit in automation. In the same way that we don't write Places data to disk in tests (https://bugzilla.mozilla.org/show_bug.cgi?id=1272025, but I see now that it adds options but doesn't enable them), maybe we don't want to crash in tests due to slow automation disks?

Flags: needinfo?(gsvelto)

I'm afraid we don't really have some documentation about this (but we should). The .extra file is just JSON, holding a list of annotations and their contents. In particular I believe you would be looking at AsyncShutdownTimeout which should contain the information of the blocker Firefox got stuck into. The contents of AsyncShutdownTimeout are also JSON and in particular it should have two fields (name and conditions) which are described here. Automation typically calls mozcrash.check_for_crashes() which iterates over the crashes in the system, processes them and eventually move them to another directory for further processing (the crash reports are typically cleaned up at the end of each test). In particular when we're enumerating crashes, the info.extra field here should contain all the crash annotations of each crash.

Flags: needinfo?(gsvelto)

Also FYI we plan on removing the CrashManager entirely in the near future, see bug 1911301.

(In reply to Gabriele Svelto [:gsvelto] from comment #11)

I'm afraid we don't really have some documentation about this (but we should). The .extra file is just JSON, holding a list of annotations and their contents. In particular I believe you would be looking at AsyncShutdownTimeout which should contain the information of the blocker Firefox got stuck into. The contents of AsyncShutdownTimeout are also JSON and in particular it should have two fields (name and conditions) which are described here. Automation typically calls mozcrash.check_for_crashes() which iterates over the crashes in the system, processes them and eventually move them to another directory for further processing (the crash reports are typically cleaned up at the end of each test). In particular when we're enumerating crashes, the info.extra field here should contain all the crash annotations of each crash.

Looking at the .extra here, I see:

  "AsyncShutdownTimeout": "{\"phase\":\"IOUtils: waiting for profileBeforeChange IO to complete\",\"conditions\":[{\"name\":\"CrashMonitor: Writing notifications to file after receiving profile-before-change and awaiting all checkpoints written\",\"state\":{\"profile-after-change\":true,\"final-ui-startup\":true,\"quit-application-granted\":true,\"sessionstore-windows-restored\":true,\"quit-application\":true,\"profile-change-net-teardown\":true,\"profile-change-teardown\":true,\"profile-before-change\":true},\"filename\":\"resource://gre/modules/CrashMonitor.sys.mjs\",\"lineNumber\":165,\"stack\":[\"resource://gre/modules/CrashMonitor.sys.mjs:init:165\",\"resource://gre/modules/nsCrashMonitor.sys.mjs:observe:20\"]}]}",

Digging in, it's very clear that this is waiting for the session store final write to occur, via some observer notifications. Either that write is legitimately taking too long, or the observer notification is getting lost.

Looking at more of the crashes, they're all in the Marionette test TestWindowless.test_no_window_handles_after_silent_restart. It seems plausible that we're doing something unusual with session restore after these restarts, triggering this issue. Whether the issue is "real", or an artifact of the test harness, I cannot say.

:whimboo: you seem most knowledgable about the test in question. Are you aware of any possible interactions with the session restore mechanism that might cause the notification to get missed (or not sent, i.e., if no windows are opened?) that might be relevant? Can you investigate a little further based on the logs?

Flags: needinfo?(hskupin)

(In reply to Nick Alexander :nalexander [he/him] from comment #13)

Looking at more of the crashes, they're all in the Marionette test TestWindowless.test_no_window_handles_after_silent_restart. It seems plausible that we're doing something unusual with session restore after these restarts, triggering this issue. Whether the issue is "real", or an artifact of the test harness, I cannot say.

:whimboo: you seem most knowledgable about the test in question. Are you aware of any possible interactions with the session restore mechanism that might cause the notification to get missed (or not sent, i.e., if no windows are opened?) that might be relevant? Can you investigate a little further based on the logs?

This test isn't doing anything with session restore but it might be a side-effect for MacOS only when restarting Firefox in silent mode? Maybe someone more familiar with session restore could comment on?

Is that failure also reproducible locally? A Gecko profile would be nice to have which could indicate some insights.

Flags: needinfo?(hskupin)

Alexander this could actually be related to bug 1758634 which actually depends on the "AsyncShutdown timeout in IOUtils: waiting for profileBeforeChange IO to complete" issue from the crash reporter (bug 1764420 ).

Component: General → Crash Reporting
Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.