Closed Bug 1762299 Opened 3 years ago Closed 3 years ago

Intermittent [tier 2] twinopen | application crashed [@ PLDHashTable::Search(void const*) const]

Categories

(Core :: DOM: Content Processes, defect)

defect

Tracking

()

RESOLVED FIXED
Tracking Status
firefox-esr91 --- unaffected
firefox99 --- unaffected
firefox100 --- fixed
firefox101 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: jstutte)

References

(Regression)

Details

(Keywords: crash, intermittent-failure, regression)

Crash Data

Attachments

(1 obsolete file)

Filed by: mlaza [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=372881883&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Sz_GhY_qRwKW5JAQbFN3ow/runs/0/artifacts/public/logs/live_backing.log


[task 2022-03-30T20:42:40.634Z] 20:42:40     INFO -  TEST-INFO | 1136: exit 1
[task 2022-03-30T20:42:40.635Z] 20:42:40     INFO -  mozcrash checking /var/folders/lr/3q5jb2915jg6946yxtmc2498000014/T/tmpj_dw6b8e/profile/minidumps for minidumps...
[task 2022-03-30T20:42:40.635Z] 20:42:40     INFO -  mozcrash Downloading symbols from: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/MRTUYzu4S8e1CjTQPaRNWg/artifacts/public/build/target.crashreporter-symbols.zip
[task 2022-03-30T20:43:27.741Z] 20:43:27     INFO -  mozcrash Copy/paste: /opt/worker/tasks/task_164867257605894/fetches/minidump-stackwalk/minidump-stackwalk --symbols-url=https://symbols.mozilla.org/ --human /var/folders/lr/3q5jb2915jg6946yxtmc2498000014/T/tmpj_dw6b8e/profile/minidumps/A288DE5D-9DBA-4A02-85AD-EA83171A13D3.dmp /var/folders/lr/3q5jb2915jg6946yxtmc2498000014/T/tmpk2z677v3
[task 2022-03-30T20:43:33.926Z] 20:43:33     INFO -  mozcrash Saved minidump as /opt/worker/tasks/task_164867257605894/build/blobber_upload_dir/A288DE5D-9DBA-4A02-85AD-EA83171A13D3.dmp
[task 2022-03-30T20:43:33.926Z] 20:43:33     INFO -  mozcrash Saved app info as /opt/worker/tasks/task_164867257605894/build/blobber_upload_dir/A288DE5D-9DBA-4A02-85AD-EA83171A13D3.extra
[task 2022-03-30T20:43:33.926Z] 20:43:33     INFO -  PROCESS-CRASH | twinopen | application crashed [@ PLDHashTable::Search(void const*) const]
[task 2022-03-30T20:43:33.927Z] 20:43:33     INFO -  Crash dump filename: /var/folders/lr/3q5jb2915jg6946yxtmc2498000014/T/tmpj_dw6b8e/profile/minidumps/A288DE5D-9DBA-4A02-85AD-EA83171A13D3.dmp
[task 2022-03-30T20:43:33.927Z] 20:43:33     INFO -  Operating system: Mac OS X
[task 2022-03-30T20:43:33.927Z] 20:43:33     INFO -                    10.15.7 19H524
[task 2022-03-30T20:43:33.927Z] 20:43:33     INFO -  CPU: amd64
[task 2022-03-30T20:43:33.927Z] 20:43:33     INFO -       family 6 model 158 stepping 10
[task 2022-03-30T20:43:33.927Z] 20:43:33     INFO -       12 CPUs
[task 2022-03-30T20:43:33.928Z] 20:43:33     INFO -  Crash reason:  EXC_BAD_ACCESS / KERN_INVALID_ADDRESS
[task 2022-03-30T20:43:33.928Z] 20:43:33     INFO -  Crash address: 0x8
[task 2022-03-30T20:43:33.928Z] 20:43:33     INFO -  Mac Crash Info:
[task 2022-03-30T20:43:33.928Z] 20:43:33     INFO -  Process uptime: 43 seconds
[task 2022-03-30T20:43:33.928Z] 20:43:33     INFO -  Thread 0 MainThread (crashed)
[task 2022-03-30T20:43:33.928Z] 20:43:33     INFO -   0  XUL!PLDHashTable::Search(void const*) const [PLDHashTable.cpp:0dbe666a3aa6a97c41dd0e67b3f92a7e36a601fa : 496 + 0xd]
[task 2022-03-30T20:43:33.929Z] 20:43:33     INFO -       rax = 0x0000000000000019    rdx = 0x0000000000000080
[task 2022-03-30T20:43:33.929Z] 20:43:33     INFO -       rcx = 0x0000000000000001    rbx = 0x0000000000000000
[task 2022-03-30T20:43:33.929Z] 20:43:33     INFO -       rsi = 0x00007ffeeee148e8    rdi = 0x0000000000000000
[task 2022-03-30T20:43:33.929Z] 20:43:33     INFO -       rbp = 0x00007ffeeee14860    rsp = 0x00007ffeeee14810
[task 2022-03-30T20:43:33.929Z] 20:43:33     INFO -        r8 = 0x00000000ffffffff     r9 = 0x0000000100f00140
[task 2022-03-30T20:43:33.930Z] 20:43:33     INFO -       r10 = 0x0000000100f00148    r11 = 0x0000000100f00000
[task 2022-03-30T20:43:33.930Z] 20:43:33     INFO -       r12 = 0x0000000000000000    r13 = 0x00000001048f4400
[task 2022-03-30T20:43:33.930Z] 20:43:33     INFO -       r14 = 0x000000010108f860    r15 = 0x00000001094d1000
[task 2022-03-30T20:43:33.930Z] 20:43:33     INFO -       rip = 0x000000010b1d9611
[task 2022-03-30T20:43:33.931Z] 20:43:33     INFO -      Found by: given as instruction pointer in context
[task 2022-03-30T20:43:33.931Z] 20:43:33     INFO -   1  XUL!mozilla::dom::ContentParent::ActorDestroy(mozilla::ipc::IProtocol::ActorDestroyReason) [ContentParent.cpp:0dbe666a3aa6a97c41dd0e67b3f92a7e36a601fa : 2093 + 0x18]
[task 2022-03-30T20:43:33.931Z] 20:43:33     INFO -       rbx = 0x0000000000000000    rbp = 0x00007ffeeee149b0
[task 2022-03-30T20:43:33.931Z] 20:43:33     INFO -       rsp = 0x00007ffeeee14870    r12 = 0x0000000000000000
[task 2022-03-30T20:43:33.932Z] 20:43:33     INFO -       r13 = 0x00000001048f4400    r14 = 0x000000010108f860
[task 2022-03-30T20:43:33.932Z] 20:43:33     INFO -       r15 = 0x00000001094d1000    rip = 0x000000010bd188b5
[task 2022-03-30T20:43:33.932Z] 20:43:33     INFO -      Found by: call frame info
[task 2022-03-30T20:43:33.932Z] 20:43:33     INFO -   2  XUL!mozilla::ipc::IProtocol::DestroySubtree(mozilla::ipc::IProtocol::ActorDestroyReason) [ProtocolUtils.cpp:0dbe666a3aa6a97c41dd0e67b3f92a7e36a601fa : 577 + 0xc]
[task 2022-03-30T20:43:33.932Z] 20:43:33     INFO -       rbx = 0x00000001094d1110    rbp = 0x00007ffeeee14a00
[task 2022-03-30T20:43:33.933Z] 20:43:33     INFO -       rsp = 0x00007ffeeee149c0    r12 = 0x00000001094d1110
[task 2022-03-30T20:43:33.933Z] 20:43:33     INFO -       r13 = 0x00000001094d1000    r14 = 0x0000000000000003
[task 2022-03-30T20:43:33.933Z] 20:43:33     INFO -       r15 = 0x00000001094d1000    rip = 0x000000010b4be44e
[task 2022-03-30T20:43:33.933Z] 20:43:33     INFO -      Found by: call frame info
[task 2022-03-30T20:43:33.933Z] 20:43:33     INFO -   3  XUL!mozilla::dom::PContentParent::OnChannelClose() [PContentParent.cpp: : 16719 + 0x9]
[task 2022-03-30T20:43:33.933Z] 20:43:33     INFO -       rbx = 0x00000001094d1000    rbp = 0x00007ffeeee14a20
[task 2022-03-30T20:43:33.934Z] 20:43:33     INFO -       rsp = 0x00007ffeeee14a10    r12 = 0x000000010486d380
[task 2022-03-30T20:43:33.934Z] 20:43:33     INFO -       r13 = 0x00007ffeeee14a38    r14 = 0x000000010483c980
[task 2022-03-30T20:43:33.934Z] 20:43:33     INFO -       r15 = 0x00000001094d1078    rip = 0x000000010bd9a533
[task 2022-03-30T20:43:33.934Z] 20:43:33     INFO -      Found by: call frame info
[task 2022-03-30T20:43:33.934Z] 20:43:33     INFO -   4  XUL!mozilla::ipc::MessageChannel::Close() [MessageChannel.cpp:0dbe666a3aa6a97c41dd0e67b3f92a7e36a601fa : 2082 + 0x66]
[task 2022-03-30T20:43:33.935Z] 20:43:33     INFO -       rbx = 0x000000010104eb30    rbp = 0x00007ffeeee14a80
[task 2022-03-30T20:43:33.935Z] 20:43:33     INFO -       rsp = 0x00007ffeeee14a30    r12 = 0x000000010486d380
[task 2022-03-30T20:43:33.935Z] 20:43:33     INFO -       r13 = 0x00007ffeeee14a38    r14 = 0x000000010483c980
[task 2022-03-30T20:43:33.935Z] 20:43:33     INFO -       r15 = 0x00000001094d1078    rip = 0x000000010b4ab8c9
[task 2022-03-30T20:43:33.935Z] 20:43:33     INFO -      Found by: call frame info
[task 2022-03-30T20:43:33.935Z] 20:43:33     INFO -   5  XUL!mozilla::ThreadEventQueue::RunShutdownTasks() [ThreadEventQueue.cpp:0dbe666a3aa6a97c41dd0e67b3f92a7e36a601fa : 315 + 0x5]
[task 2022-03-30T20:43:33.936Z] 20:43:33     INFO -       rbx = 0x0000000000000001    rbp = 0x00007ffeeee14ad0
[task 2022-03-30T20:43:33.936Z] 20:43:33     INFO -       rsp = 0x00007ffeeee14a90    r12 = 0x0000000101083108
[task 2022-03-30T20:43:33.936Z] 20:43:33     INFO -       r13 = 0x0000000123837100    r14 = 0x0000000000000001
[task 2022-03-30T20:43:33.936Z] 20:43:33     INFO -       r15 = 0x0000000000000001    rip = 0x000000010b22d842
[task 2022-03-30T20:43:33.936Z] 20:43:33     INFO -      Found by: call frame info
[task 2022-03-30T20:43:33.937Z] 20:43:33     INFO -   6  XUL!nsThreadManager::Shutdown() [nsThreadManager.cpp:0dbe666a3aa6a97c41dd0e67b3f92a7e36a601fa : 368 + 0xd]
[task 2022-03-30T20:43:33.937Z] 20:43:33     INFO -       rbx = 0x00007ffeeee14cb0    rbp = 0x00007ffeeee14c50
[task 2022-03-30T20:43:33.937Z] 20:43:33     INFO -       rsp = 0x00007ffeeee14ae0    r12 = 0x0000000000000000
[task 2022-03-30T20:43:33.937Z] 20:43:33     INFO -       r13 = 0x0000000000000000    r14 = 0x0000000101016ea8
[task 2022-03-30T20:43:33.937Z] 20:43:33     INFO -       r15 = 0x00000001127f9120    rip = 0x000000010b240106
[task 2022-03-30T20:43:33.937Z] 20:43:33     INFO -      Found by: call frame info
[task 2022-03-30T20:43:33.938Z] 20:43:33     INFO -   7  XUL!mozilla::ShutdownXPCOM(nsIServiceManager*) [XPCOMInit.cpp:0dbe666a3aa6a97c41dd0e67b3f92a7e36a601fa : 667 + 0xc]
[task 2022-03-30T20:43:33.938Z] 20:43:33     INFO -       rbx = 0x00007ffeeee14cb0    rbp = 0x00007ffeeee14cf0
[task 2022-03-30T20:43:33.938Z] 20:43:33     INFO -       rsp = 0x00007ffeeee14c60    r12 = 0x0000000000000000
[task 2022-03-30T20:43:33.938Z] 20:43:33     INFO -       r13 = 0x0000000000000000    r14 = 0x0000000101016ea8
[task 2022-03-30T20:43:33.938Z] 20:43:33     INFO -       r15 = 0x0000000101087c00    rip = 0x000000010b2607ac
[task 2022-03-30T20:43:33.939Z] 20:43:33     INFO -      Found by: call frame info
[task 2022-03-30T20:43:33.939Z] 20:43:33     INFO -   8  XUL!ScopedXPCOMStartup::~ScopedXPCOMStartup() [nsAppRunner.cpp:0dbe666a3aa6a97c41dd0e67b3f92a7e36a601fa : 2055 + 0x7]
[task 2022-03-30T20:43:33.939Z] 20:43:33     INFO -       rbx = 0x00007ffeeee14d10    rbp = 0x00007ffeeee14d90
[task 2022-03-30T20:43:33.939Z] 20:43:33     INFO -       rsp = 0x00007ffeeee14d00    r12 = 0x0000000000000000
[task 2022-03-30T20:43:33.939Z] 20:43:33     INFO -       r13 = 0x0000000000000000    r14 = 0x000000010100f248
[task 2022-03-30T20:43:33.940Z] 20:43:33     INFO -       r15 = 0x0000000000000000    rip = 0x000000010fc7a894
[task 2022-03-30T20:43:33.940Z] 20:43:33     INFO -      Found by: call frame info
[task 2022-03-30T20:43:33.940Z] 20:43:33     INFO -   9  XUL!XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) [nsAppRunner.cpp:0dbe666a3aa6a97c41dd0e67b3f92a7e36a601fa : 5941 + 0x18]
[task 2022-03-30T20:43:33.940Z] 20:43:33     INFO -       rbx = 0x000000010100f248    rbp = 0x00007ffeeee14e10
[task 2022-03-30T20:43:33.940Z] 20:43:33     INFO -       rsp = 0x00007ffeeee14da0    r12 = 0x0000000101079a80
[task 2022-03-30T20:43:33.941Z] 20:43:33     INFO -       r13 = 0x0000000000000000    r14 = 0x0000000000000000
[task 2022-03-30T20:43:33.941Z] 20:43:33     INFO -       r15 = 0x00007ffeeee14e28    rip = 0x000000010fc7d435
[task 2022-03-30T20:43:33.941Z] 20:43:33     INFO -      Found by: call frame info
[task 2022-03-30T20:43:33.941Z] 20:43:33     INFO -  10  XUL!XRE_main(int, char**, mozilla::BootstrapConfig const&) [nsAppRunner.cpp:0dbe666a3aa6a97c41dd0e67b3f92a7e36a601fa : 5991 + 0xf]
[task 2022-03-30T20:43:33.941Z] 20:43:33     INFO -       rbx = 0x0000000000000004    rbp = 0x00007ffeeee14f50
[task 2022-03-30T20:43:33.942Z] 20:43:33     INFO -       rsp = 0x00007ffeeee14e20    r12 = 0x00007ffeeee14e28
[task 2022-03-30T20:43:33.942Z] 20:43:33     INFO -       r13 = 0x00007ffeeee153e0    r14 = 0x00007ffeeee14f78
[task 2022-03-30T20:43:33.942Z] 20:43:33     INFO -       r15 = 0x00007ffeeee153e0    rip = 0x000000010fc7d767
[task 2022-03-30T20:43:33.942Z] 20:43:33     INFO -      Found by: call frame info
[task 2022-03-30T20:43:33.942Z] 20:43:33     INFO -  11  firefox!main [nsBrowserApp.cpp:0dbe666a3aa6a97c41dd0e67b3f92a7e36a601fa : 395 + 0x78]
[task 2022-03-30T20:43:33.942Z] 20:43:33     INFO -       rbx = 0x00007ffeeee14f78    rbp = 0x00007ffeeee153c0
[task 2022-03-30T20:43:33.943Z] 20:43:33     INFO -       rsp = 0x00007ffeeee14f60    r12 = 0x0000000100deccf7
[task 2022-03-30T20:43:33.943Z] 20:43:33     INFO -       r13 = 0x00007ffeeee153e0    r14 = 0x0000000000000004
[task 2022-03-30T20:43:33.943Z] 20:43:33     INFO -       r15 = 0x00007ffeeee14f00    rip = 0x0000000100dec27f
[task 2022-03-30T20:43:33.943Z] 20:43:33     INFO -      Found by: call frame info
[task 2022-03-30T20:43:33.943Z] 20:43:33     INFO -  12  libdyld.dylib!start + 0x0
[task 2022-03-30T20:43:33.943Z] 20:43:33     INFO -       rbx = 0x0000000000000000    rbp = 0x00007ffeeee153d0
[task 2022-03-30T20:43:33.944Z] 20:43:33     INFO -       rsp = 0x00007ffeeee153d0    r12 = 0x0000000000000000
[task 2022-03-30T20:43:33.944Z] 20:43:33     INFO -       r13 = 0x0000000000000000    r14 = 0x0000000000000000
[task 2022-03-30T20:43:33.944Z] 20:43:33     INFO -       r15 = 0x0000000000000000    rip = 0x00007fff6f297cc9
[task 2022-03-30T20:43:33.944Z] 20:43:33     INFO -      Found by: call frame info
[task 2022-03-30T20:43:33.944Z] 20:43:33     INFO -  Thread 1
Component: XPCOM → DOM: Content Processes

Another shutdown thing. ContentProcessManager is null?

So there are two levels of things here:

  1. Bug 1632740 did add a lifecycle check if we are too late in shutdown to create a new ContentProcessManager. In general any use of cpm should be null-checked now, in particular in shutdown related functions like ActorDestroy as seen here. This would paper over this null crash, at least, and we should do this.

  2. But why would we want to ActorDestroy a ContentParent this late in shutdown? We already entered ShutdownPhase::XPCOMShutdownFinal when we call the nsThreadManager::get().Shutdown(); which in turn calls ThreadEventQueue::RunShutdownTasks on the main thread. IIUC this has been added by bug 1738103 and bug 1738104 together with RegisterShutdownTask and UnregisterShutdownTask functions. I assume (without having tried to follow down the code paths) that we registered a shutdown task that somehow kept alive the ContentParent until now. This might indicate a missing UnregisterShutdownTask somewhere?

Flags: needinfo?(nika)
Regressed by: 1632740, 1738103
Regressed by: 1738104
No longer regressed by: 1738103

(In reply to Jens Stutte [:jstutte] from comment #2)

  1. But why would we want to ActorDestroy a ContentParent this late in shutdown? We already entered ShutdownPhase::XPCOMShutdownFinal when we call the nsThreadManager::get().Shutdown(); which in turn calls ThreadEventQueue::RunShutdownTasks on the main thread. IIUC this has been added by bug 1738103 and bug 1738104 together with RegisterShutdownTask and UnregisterShutdownTask functions. I assume (without having tried to follow down the code paths) that we registered a shutdown task that somehow kept alive the ContentParent until now. This might indicate a missing UnregisterShutdownTask somewhere?

Looking at our sequence in ShutdownXPCOM I also would probably expect to run ThreadEventQueue::RunShutdownTasks for the main thread earlier (maybe as part of ShutdownPhase::XPCOMShutdownThreads). Where it happens right now it might be easily bypassed by our fast shutdown, and I am not sure this is expected?

Has Regression Range: --- → yes

(In reply to Jens Stutte [:jstutte] from comment #2)

I assume (without having tried to follow down the code paths) that we registered a shutdown task that somehow kept alive the ContentParent until now. This might indicate a missing UnregisterShutdownTask somewhere?

Talking with Olli it might also just be a side-effect of the ContentParent not being cycle collected earlier and thus outliving the ContentProcessManager. It might well be just the right thing to check the cpm for existence in ActorDestroy to not rely on fragile ordering problems.

Leaving the ni? to :nika more as a FYI than a real question.

See Also: → 1762086, 1762084
See Also: → 1762404

What this crash means is that we had a ContentParent which didn't get shut down properly before the main thread is being shut down. This should generally not happen due to the async shutdown blocker (https://searchfox.org/mozilla-central/rev/711e1cea1cb584057c50aac0a26a3f7c969eda66/dom/ipc/ContentParent.cpp#3559-3570). In this case we must be starting the content process and its actor too late during shutdown such that it isn't getting caught before the main thread is shutting down. Without this change it would probably have been a memory leak of some kind as ActorDestroy would never be called.

This has nothing to do with cycle collection timing, as the ContentParent can't be cycle-collected while an IPC connection is holding it alive, as that's a non-cycle-collected reference from the IPC layer.

Flags: needinfo?(nika)

(In reply to Nika Layzell [:nika] (ni? for response) from comment #5)

What this crash means is that we had a ContentParent which didn't get shut down properly before the main thread is being shut down.

Thanks! This confirms my gut feeling from comment 2 that we should not see this.

In this case we must be starting the content process and its actor too late during shutdown such that it isn't getting caught before the main thread is shutting down. Without this change it would probably have been a memory leak of some kind as ActorDestroy would never be called.

The revision that failed here does already contain the AppShutdown::IsInOrBeyond(ShutdownPhase::AppShutdown) check on browser creation. Unless there is another way to create a ContentParent I would rule out we see a late creation here.

This should generally not happen due to the async shutdown blocker (https://searchfox.org/mozilla-central/rev/711e1cea1cb584057c50aac0a26a3f7c969eda66/dom/ipc/ContentParent.cpp#3559-3570).

It seems to me that (unless we have a reason to KillHard) we remove the shutdown blockers only at ActorDestroy. I assume the reasoning here is:

Now apparently we do not hang at "profile-before-change", so the shutdown blocker seemed to have no effect here. We might see a race here where this condition makes us not add the shutdown blockers too early ? The shutdown blockers are added in Init which is only called from ContentParent::LaunchSubprocessResolve, so some event loop cycles later than CreateBrowser and we might have entered shutdown in the meantime.

I wonder if we should just add the blockers earlier during creation.

Crash Signature: [@ PLDHashTable::Search(void const*) const] → [@ PLDHashTable::Search(void const*) const] [@ PLDHashTable::Search | mozilla::dom::ContentParent::ActorDestroy]
Assignee: nobody → jstutte
Status: NEW → ASSIGNED
See Also: → 1762604
Crash Signature: [@ PLDHashTable::Search(void const*) const] [@ PLDHashTable::Search | mozilla::dom::ContentParent::ActorDestroy] → [@ PLDHashTable::Search(void const*) const] [@ PLDHashTable::Search | mozilla::dom::ContentParent::ActorDestroy]
Keywords: regression
See Also: → 1764181

Comment on attachment 9270385 [details]
Bug 1762299: Avoid race between process creation callback and application shutdown. r?smaug

Revision D142660 was moved to bug 1764251. Setting attachment 9270385 [details] to obsolete.

Attachment #9270385 - Attachment is obsolete: true
Depends on: 1764251

Crashes are down and bug 1764251 is fixed as well as some other shutdown weirdness.

Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: