Intermittent [tier 2] twinopen | application crashed [@ PLDHashTable::Search(void const*) const]
Categories
(Core :: DOM: Content Processes, defect)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox-esr91 | --- | unaffected |
| firefox99 | --- | unaffected |
| firefox100 | --- | fixed |
| firefox101 | --- | fixed |
People
(Reporter: intermittent-bug-filer, Assigned: jstutte)
References
(Regression)
Details
(Keywords: crash, intermittent-failure, regression)
Crash Data
Attachments
(1 obsolete file)
Filed by: mlaza [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=372881883&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Sz_GhY_qRwKW5JAQbFN3ow/runs/0/artifacts/public/logs/live_backing.log
[task 2022-03-30T20:42:40.634Z] 20:42:40 INFO - TEST-INFO | 1136: exit 1
[task 2022-03-30T20:42:40.635Z] 20:42:40 INFO - mozcrash checking /var/folders/lr/3q5jb2915jg6946yxtmc2498000014/T/tmpj_dw6b8e/profile/minidumps for minidumps...
[task 2022-03-30T20:42:40.635Z] 20:42:40 INFO - mozcrash Downloading symbols from: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/MRTUYzu4S8e1CjTQPaRNWg/artifacts/public/build/target.crashreporter-symbols.zip
[task 2022-03-30T20:43:27.741Z] 20:43:27 INFO - mozcrash Copy/paste: /opt/worker/tasks/task_164867257605894/fetches/minidump-stackwalk/minidump-stackwalk --symbols-url=https://symbols.mozilla.org/ --human /var/folders/lr/3q5jb2915jg6946yxtmc2498000014/T/tmpj_dw6b8e/profile/minidumps/A288DE5D-9DBA-4A02-85AD-EA83171A13D3.dmp /var/folders/lr/3q5jb2915jg6946yxtmc2498000014/T/tmpk2z677v3
[task 2022-03-30T20:43:33.926Z] 20:43:33 INFO - mozcrash Saved minidump as /opt/worker/tasks/task_164867257605894/build/blobber_upload_dir/A288DE5D-9DBA-4A02-85AD-EA83171A13D3.dmp
[task 2022-03-30T20:43:33.926Z] 20:43:33 INFO - mozcrash Saved app info as /opt/worker/tasks/task_164867257605894/build/blobber_upload_dir/A288DE5D-9DBA-4A02-85AD-EA83171A13D3.extra
[task 2022-03-30T20:43:33.926Z] 20:43:33 INFO - PROCESS-CRASH | twinopen | application crashed [@ PLDHashTable::Search(void const*) const]
[task 2022-03-30T20:43:33.927Z] 20:43:33 INFO - Crash dump filename: /var/folders/lr/3q5jb2915jg6946yxtmc2498000014/T/tmpj_dw6b8e/profile/minidumps/A288DE5D-9DBA-4A02-85AD-EA83171A13D3.dmp
[task 2022-03-30T20:43:33.927Z] 20:43:33 INFO - Operating system: Mac OS X
[task 2022-03-30T20:43:33.927Z] 20:43:33 INFO - 10.15.7 19H524
[task 2022-03-30T20:43:33.927Z] 20:43:33 INFO - CPU: amd64
[task 2022-03-30T20:43:33.927Z] 20:43:33 INFO - family 6 model 158 stepping 10
[task 2022-03-30T20:43:33.927Z] 20:43:33 INFO - 12 CPUs
[task 2022-03-30T20:43:33.928Z] 20:43:33 INFO - Crash reason: EXC_BAD_ACCESS / KERN_INVALID_ADDRESS
[task 2022-03-30T20:43:33.928Z] 20:43:33 INFO - Crash address: 0x8
[task 2022-03-30T20:43:33.928Z] 20:43:33 INFO - Mac Crash Info:
[task 2022-03-30T20:43:33.928Z] 20:43:33 INFO - Process uptime: 43 seconds
[task 2022-03-30T20:43:33.928Z] 20:43:33 INFO - Thread 0 MainThread (crashed)
[task 2022-03-30T20:43:33.928Z] 20:43:33 INFO - 0 XUL!PLDHashTable::Search(void const*) const [PLDHashTable.cpp:0dbe666a3aa6a97c41dd0e67b3f92a7e36a601fa : 496 + 0xd]
[task 2022-03-30T20:43:33.929Z] 20:43:33 INFO - rax = 0x0000000000000019 rdx = 0x0000000000000080
[task 2022-03-30T20:43:33.929Z] 20:43:33 INFO - rcx = 0x0000000000000001 rbx = 0x0000000000000000
[task 2022-03-30T20:43:33.929Z] 20:43:33 INFO - rsi = 0x00007ffeeee148e8 rdi = 0x0000000000000000
[task 2022-03-30T20:43:33.929Z] 20:43:33 INFO - rbp = 0x00007ffeeee14860 rsp = 0x00007ffeeee14810
[task 2022-03-30T20:43:33.929Z] 20:43:33 INFO - r8 = 0x00000000ffffffff r9 = 0x0000000100f00140
[task 2022-03-30T20:43:33.930Z] 20:43:33 INFO - r10 = 0x0000000100f00148 r11 = 0x0000000100f00000
[task 2022-03-30T20:43:33.930Z] 20:43:33 INFO - r12 = 0x0000000000000000 r13 = 0x00000001048f4400
[task 2022-03-30T20:43:33.930Z] 20:43:33 INFO - r14 = 0x000000010108f860 r15 = 0x00000001094d1000
[task 2022-03-30T20:43:33.930Z] 20:43:33 INFO - rip = 0x000000010b1d9611
[task 2022-03-30T20:43:33.931Z] 20:43:33 INFO - Found by: given as instruction pointer in context
[task 2022-03-30T20:43:33.931Z] 20:43:33 INFO - 1 XUL!mozilla::dom::ContentParent::ActorDestroy(mozilla::ipc::IProtocol::ActorDestroyReason) [ContentParent.cpp:0dbe666a3aa6a97c41dd0e67b3f92a7e36a601fa : 2093 + 0x18]
[task 2022-03-30T20:43:33.931Z] 20:43:33 INFO - rbx = 0x0000000000000000 rbp = 0x00007ffeeee149b0
[task 2022-03-30T20:43:33.931Z] 20:43:33 INFO - rsp = 0x00007ffeeee14870 r12 = 0x0000000000000000
[task 2022-03-30T20:43:33.932Z] 20:43:33 INFO - r13 = 0x00000001048f4400 r14 = 0x000000010108f860
[task 2022-03-30T20:43:33.932Z] 20:43:33 INFO - r15 = 0x00000001094d1000 rip = 0x000000010bd188b5
[task 2022-03-30T20:43:33.932Z] 20:43:33 INFO - Found by: call frame info
[task 2022-03-30T20:43:33.932Z] 20:43:33 INFO - 2 XUL!mozilla::ipc::IProtocol::DestroySubtree(mozilla::ipc::IProtocol::ActorDestroyReason) [ProtocolUtils.cpp:0dbe666a3aa6a97c41dd0e67b3f92a7e36a601fa : 577 + 0xc]
[task 2022-03-30T20:43:33.932Z] 20:43:33 INFO - rbx = 0x00000001094d1110 rbp = 0x00007ffeeee14a00
[task 2022-03-30T20:43:33.933Z] 20:43:33 INFO - rsp = 0x00007ffeeee149c0 r12 = 0x00000001094d1110
[task 2022-03-30T20:43:33.933Z] 20:43:33 INFO - r13 = 0x00000001094d1000 r14 = 0x0000000000000003
[task 2022-03-30T20:43:33.933Z] 20:43:33 INFO - r15 = 0x00000001094d1000 rip = 0x000000010b4be44e
[task 2022-03-30T20:43:33.933Z] 20:43:33 INFO - Found by: call frame info
[task 2022-03-30T20:43:33.933Z] 20:43:33 INFO - 3 XUL!mozilla::dom::PContentParent::OnChannelClose() [PContentParent.cpp: : 16719 + 0x9]
[task 2022-03-30T20:43:33.933Z] 20:43:33 INFO - rbx = 0x00000001094d1000 rbp = 0x00007ffeeee14a20
[task 2022-03-30T20:43:33.934Z] 20:43:33 INFO - rsp = 0x00007ffeeee14a10 r12 = 0x000000010486d380
[task 2022-03-30T20:43:33.934Z] 20:43:33 INFO - r13 = 0x00007ffeeee14a38 r14 = 0x000000010483c980
[task 2022-03-30T20:43:33.934Z] 20:43:33 INFO - r15 = 0x00000001094d1078 rip = 0x000000010bd9a533
[task 2022-03-30T20:43:33.934Z] 20:43:33 INFO - Found by: call frame info
[task 2022-03-30T20:43:33.934Z] 20:43:33 INFO - 4 XUL!mozilla::ipc::MessageChannel::Close() [MessageChannel.cpp:0dbe666a3aa6a97c41dd0e67b3f92a7e36a601fa : 2082 + 0x66]
[task 2022-03-30T20:43:33.935Z] 20:43:33 INFO - rbx = 0x000000010104eb30 rbp = 0x00007ffeeee14a80
[task 2022-03-30T20:43:33.935Z] 20:43:33 INFO - rsp = 0x00007ffeeee14a30 r12 = 0x000000010486d380
[task 2022-03-30T20:43:33.935Z] 20:43:33 INFO - r13 = 0x00007ffeeee14a38 r14 = 0x000000010483c980
[task 2022-03-30T20:43:33.935Z] 20:43:33 INFO - r15 = 0x00000001094d1078 rip = 0x000000010b4ab8c9
[task 2022-03-30T20:43:33.935Z] 20:43:33 INFO - Found by: call frame info
[task 2022-03-30T20:43:33.935Z] 20:43:33 INFO - 5 XUL!mozilla::ThreadEventQueue::RunShutdownTasks() [ThreadEventQueue.cpp:0dbe666a3aa6a97c41dd0e67b3f92a7e36a601fa : 315 + 0x5]
[task 2022-03-30T20:43:33.936Z] 20:43:33 INFO - rbx = 0x0000000000000001 rbp = 0x00007ffeeee14ad0
[task 2022-03-30T20:43:33.936Z] 20:43:33 INFO - rsp = 0x00007ffeeee14a90 r12 = 0x0000000101083108
[task 2022-03-30T20:43:33.936Z] 20:43:33 INFO - r13 = 0x0000000123837100 r14 = 0x0000000000000001
[task 2022-03-30T20:43:33.936Z] 20:43:33 INFO - r15 = 0x0000000000000001 rip = 0x000000010b22d842
[task 2022-03-30T20:43:33.936Z] 20:43:33 INFO - Found by: call frame info
[task 2022-03-30T20:43:33.937Z] 20:43:33 INFO - 6 XUL!nsThreadManager::Shutdown() [nsThreadManager.cpp:0dbe666a3aa6a97c41dd0e67b3f92a7e36a601fa : 368 + 0xd]
[task 2022-03-30T20:43:33.937Z] 20:43:33 INFO - rbx = 0x00007ffeeee14cb0 rbp = 0x00007ffeeee14c50
[task 2022-03-30T20:43:33.937Z] 20:43:33 INFO - rsp = 0x00007ffeeee14ae0 r12 = 0x0000000000000000
[task 2022-03-30T20:43:33.937Z] 20:43:33 INFO - r13 = 0x0000000000000000 r14 = 0x0000000101016ea8
[task 2022-03-30T20:43:33.937Z] 20:43:33 INFO - r15 = 0x00000001127f9120 rip = 0x000000010b240106
[task 2022-03-30T20:43:33.937Z] 20:43:33 INFO - Found by: call frame info
[task 2022-03-30T20:43:33.938Z] 20:43:33 INFO - 7 XUL!mozilla::ShutdownXPCOM(nsIServiceManager*) [XPCOMInit.cpp:0dbe666a3aa6a97c41dd0e67b3f92a7e36a601fa : 667 + 0xc]
[task 2022-03-30T20:43:33.938Z] 20:43:33 INFO - rbx = 0x00007ffeeee14cb0 rbp = 0x00007ffeeee14cf0
[task 2022-03-30T20:43:33.938Z] 20:43:33 INFO - rsp = 0x00007ffeeee14c60 r12 = 0x0000000000000000
[task 2022-03-30T20:43:33.938Z] 20:43:33 INFO - r13 = 0x0000000000000000 r14 = 0x0000000101016ea8
[task 2022-03-30T20:43:33.938Z] 20:43:33 INFO - r15 = 0x0000000101087c00 rip = 0x000000010b2607ac
[task 2022-03-30T20:43:33.939Z] 20:43:33 INFO - Found by: call frame info
[task 2022-03-30T20:43:33.939Z] 20:43:33 INFO - 8 XUL!ScopedXPCOMStartup::~ScopedXPCOMStartup() [nsAppRunner.cpp:0dbe666a3aa6a97c41dd0e67b3f92a7e36a601fa : 2055 + 0x7]
[task 2022-03-30T20:43:33.939Z] 20:43:33 INFO - rbx = 0x00007ffeeee14d10 rbp = 0x00007ffeeee14d90
[task 2022-03-30T20:43:33.939Z] 20:43:33 INFO - rsp = 0x00007ffeeee14d00 r12 = 0x0000000000000000
[task 2022-03-30T20:43:33.939Z] 20:43:33 INFO - r13 = 0x0000000000000000 r14 = 0x000000010100f248
[task 2022-03-30T20:43:33.940Z] 20:43:33 INFO - r15 = 0x0000000000000000 rip = 0x000000010fc7a894
[task 2022-03-30T20:43:33.940Z] 20:43:33 INFO - Found by: call frame info
[task 2022-03-30T20:43:33.940Z] 20:43:33 INFO - 9 XUL!XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) [nsAppRunner.cpp:0dbe666a3aa6a97c41dd0e67b3f92a7e36a601fa : 5941 + 0x18]
[task 2022-03-30T20:43:33.940Z] 20:43:33 INFO - rbx = 0x000000010100f248 rbp = 0x00007ffeeee14e10
[task 2022-03-30T20:43:33.940Z] 20:43:33 INFO - rsp = 0x00007ffeeee14da0 r12 = 0x0000000101079a80
[task 2022-03-30T20:43:33.941Z] 20:43:33 INFO - r13 = 0x0000000000000000 r14 = 0x0000000000000000
[task 2022-03-30T20:43:33.941Z] 20:43:33 INFO - r15 = 0x00007ffeeee14e28 rip = 0x000000010fc7d435
[task 2022-03-30T20:43:33.941Z] 20:43:33 INFO - Found by: call frame info
[task 2022-03-30T20:43:33.941Z] 20:43:33 INFO - 10 XUL!XRE_main(int, char**, mozilla::BootstrapConfig const&) [nsAppRunner.cpp:0dbe666a3aa6a97c41dd0e67b3f92a7e36a601fa : 5991 + 0xf]
[task 2022-03-30T20:43:33.941Z] 20:43:33 INFO - rbx = 0x0000000000000004 rbp = 0x00007ffeeee14f50
[task 2022-03-30T20:43:33.942Z] 20:43:33 INFO - rsp = 0x00007ffeeee14e20 r12 = 0x00007ffeeee14e28
[task 2022-03-30T20:43:33.942Z] 20:43:33 INFO - r13 = 0x00007ffeeee153e0 r14 = 0x00007ffeeee14f78
[task 2022-03-30T20:43:33.942Z] 20:43:33 INFO - r15 = 0x00007ffeeee153e0 rip = 0x000000010fc7d767
[task 2022-03-30T20:43:33.942Z] 20:43:33 INFO - Found by: call frame info
[task 2022-03-30T20:43:33.942Z] 20:43:33 INFO - 11 firefox!main [nsBrowserApp.cpp:0dbe666a3aa6a97c41dd0e67b3f92a7e36a601fa : 395 + 0x78]
[task 2022-03-30T20:43:33.942Z] 20:43:33 INFO - rbx = 0x00007ffeeee14f78 rbp = 0x00007ffeeee153c0
[task 2022-03-30T20:43:33.943Z] 20:43:33 INFO - rsp = 0x00007ffeeee14f60 r12 = 0x0000000100deccf7
[task 2022-03-30T20:43:33.943Z] 20:43:33 INFO - r13 = 0x00007ffeeee153e0 r14 = 0x0000000000000004
[task 2022-03-30T20:43:33.943Z] 20:43:33 INFO - r15 = 0x00007ffeeee14f00 rip = 0x0000000100dec27f
[task 2022-03-30T20:43:33.943Z] 20:43:33 INFO - Found by: call frame info
[task 2022-03-30T20:43:33.943Z] 20:43:33 INFO - 12 libdyld.dylib!start + 0x0
[task 2022-03-30T20:43:33.943Z] 20:43:33 INFO - rbx = 0x0000000000000000 rbp = 0x00007ffeeee153d0
[task 2022-03-30T20:43:33.944Z] 20:43:33 INFO - rsp = 0x00007ffeeee153d0 r12 = 0x0000000000000000
[task 2022-03-30T20:43:33.944Z] 20:43:33 INFO - r13 = 0x0000000000000000 r14 = 0x0000000000000000
[task 2022-03-30T20:43:33.944Z] 20:43:33 INFO - r15 = 0x0000000000000000 rip = 0x00007fff6f297cc9
[task 2022-03-30T20:43:33.944Z] 20:43:33 INFO - Found by: call frame info
[task 2022-03-30T20:43:33.944Z] 20:43:33 INFO - Thread 1
Updated•3 years ago
|
Comment 1•3 years ago
|
||
Another shutdown thing. ContentProcessManager is null?
| Assignee | ||
Comment 2•3 years ago
•
|
||
So there are two levels of things here:
-
Bug 1632740 did add a lifecycle check if we are too late in shutdown to create a new
ContentProcessManager. In general any use ofcpmshould be null-checked now, in particular in shutdown related functions likeActorDestroyas seen here. This would paper over this null crash, at least, and we should do this. -
But why would we want to
ActorDestroyaContentParentthis late in shutdown? We already enteredShutdownPhase::XPCOMShutdownFinalwhen we call thensThreadManager::get().Shutdown();which in turn callsThreadEventQueue::RunShutdownTaskson the main thread. IIUC this has been added by bug 1738103 and bug 1738104 together withRegisterShutdownTaskandUnregisterShutdownTaskfunctions. I assume (without having tried to follow down the code paths) that we registered a shutdown task that somehow kept alive theContentParentuntil now. This might indicate a missingUnregisterShutdownTasksomewhere?
| Assignee | ||
Updated•3 years ago
|
| Assignee | ||
Comment 3•3 years ago
•
|
||
(In reply to Jens Stutte [:jstutte] from comment #2)
- But why would we want to
ActorDestroyaContentParentthis late in shutdown? We already enteredShutdownPhase::XPCOMShutdownFinalwhen we call thensThreadManager::get().Shutdown();which in turn callsThreadEventQueue::RunShutdownTaskson the main thread. IIUC this has been added by bug 1738103 and bug 1738104 together withRegisterShutdownTaskandUnregisterShutdownTaskfunctions. I assume (without having tried to follow down the code paths) that we registered a shutdown task that somehow kept alive theContentParentuntil now. This might indicate a missingUnregisterShutdownTasksomewhere?
Looking at our sequence in ShutdownXPCOM I also would probably expect to run ThreadEventQueue::RunShutdownTasks for the main thread earlier (maybe as part of ShutdownPhase::XPCOMShutdownThreads). Where it happens right now it might be easily bypassed by our fast shutdown, and I am not sure this is expected?
Updated•3 years ago
|
| Assignee | ||
Comment 4•3 years ago
|
||
(In reply to Jens Stutte [:jstutte] from comment #2)
I assume (without having tried to follow down the code paths) that we registered a shutdown task that somehow kept alive the
ContentParentuntil now. This might indicate a missingUnregisterShutdownTasksomewhere?
Talking with Olli it might also just be a side-effect of the ContentParent not being cycle collected earlier and thus outliving the ContentProcessManager. It might well be just the right thing to check the cpm for existence in ActorDestroy to not rely on fragile ordering problems.
Leaving the ni? to :nika more as a FYI than a real question.
Updated•3 years ago
|
Comment 5•3 years ago
|
||
What this crash means is that we had a ContentParent which didn't get shut down properly before the main thread is being shut down. This should generally not happen due to the async shutdown blocker (https://searchfox.org/mozilla-central/rev/711e1cea1cb584057c50aac0a26a3f7c969eda66/dom/ipc/ContentParent.cpp#3559-3570). In this case we must be starting the content process and its actor too late during shutdown such that it isn't getting caught before the main thread is shutting down. Without this change it would probably have been a memory leak of some kind as ActorDestroy would never be called.
This has nothing to do with cycle collection timing, as the ContentParent can't be cycle-collected while an IPC connection is holding it alive, as that's a non-cycle-collected reference from the IPC layer.
| Assignee | ||
Comment 6•3 years ago
|
||
(In reply to Nika Layzell [:nika] (ni? for response) from comment #5)
What this crash means is that we had a
ContentParentwhich didn't get shut down properly before the main thread is being shut down.
Thanks! This confirms my gut feeling from comment 2 that we should not see this.
In this case we must be starting the content process and its actor too late during shutdown such that it isn't getting caught before the main thread is shutting down. Without this change it would probably have been a memory leak of some kind as
ActorDestroywould never be called.
The revision that failed here does already contain the AppShutdown::IsInOrBeyond(ShutdownPhase::AppShutdown) check on browser creation. Unless there is another way to create a ContentParent I would rule out we see a late creation here.
This should generally not happen due to the async shutdown blocker (https://searchfox.org/mozilla-central/rev/711e1cea1cb584057c50aac0a26a3f7c969eda66/dom/ipc/ContentParent.cpp#3559-3570).
It seems to me that (unless we have a reason to KillHard) we remove the shutdown blockers only at ActorDestroy. I assume the reasoning here is:
- We send a
ShutDownProcess(SEND_SHUTDOWN_MESSAGE);to the child and mark the process as dead but keep the actor alive whenBlockShutdownis called. - We expect the child to close the channel and thus receive an
ActorDestroyautomatically from the IPC layer that then removes the shutdown blocker.
Now apparently we do not hang at "profile-before-change", so the shutdown blocker seemed to have no effect here. We might see a race here where this condition makes us not add the shutdown blockers too early ? The shutdown blockers are added in Init which is only called from ContentParent::LaunchSubprocessResolve, so some event loop cycles later than CreateBrowser and we might have entered shutdown in the meantime.
I wonder if we should just add the blockers earlier during creation.
Updated•3 years ago
|
| Assignee | ||
Comment 8•3 years ago
|
||
Updated•3 years ago
|
Updated•3 years ago
|
| Comment hidden (Intermittent Failures Robot) |
| Comment hidden (Intermittent Failures Robot) |
Comment 11•3 years ago
|
||
Comment on attachment 9270385 [details]
Bug 1762299: Avoid race between process creation callback and application shutdown. r?smaug
Revision D142660 was moved to bug 1764251. Setting attachment 9270385 [details] to obsolete.
| Assignee | ||
Comment 12•3 years ago
|
||
Crashes are down and bug 1764251 is fixed as well as some other shutdown weirdness.
Updated•3 years ago
|
Description
•