Closed Bug 1383025 Opened 7 years ago Closed 7 years ago

Crash in OOM | large | NS_ABORT_OOM | nsBaseHashtable<T>::Put | std::_Func_impl<T>::_Do_call

Categories

(Core :: IPC, defect)

56 Branch
All
Windows
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla56
Tracking Status
firefox-esr52 --- unaffected
firefox54 --- unaffected
firefox55 --- unaffected
firefox56 --- fixed

People

(Reporter: philipp, Assigned: cyu)

Details

(4 keywords)

Crash Data

Attachments

(1 file, 1 obsolete file)

This bug was filed from the Socorro interface and is 
report bp-c04b26a9-f0e3-4e5d-8b4c-8ce6a0170721.
=============================================================
Crashing Thread (0)
Frame 	Module 	Signature 	Source
0 	xul.dll 	NS_ABORT_OOM(unsigned __int64) 	xpcom/base/nsDebugImpl.cpp:610
1 	xul.dll 	nsBaseHashtable<nsUint32HashKey, nsString, nsString>::Put(unsigned int const&, nsString const&) 	obj-firefox/dist/include/nsBaseHashtable.h:139
2 	xul.dll 	std::_Func_impl<<lambda_91a466a6b5af4ebe45e0470871dfbe1e>, std::allocator<int>, void, nsString>::_Do_call(nsString&&) 	vs2015u3/VC/include/functional:212
3 	xul.dll 	std::_Func_class<void, nsString>::operator()(nsString) 	vs2015u3/VC/include/functional:279
4 	xul.dll 	mozilla::detail::RunnableFunction<<lambda_086226435a855bd91635c9f307dd2ae4> >::Run() 	obj-firefox/dist/include/nsThreadUtils.h:507
5 	xul.dll 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp:1446
6 	xul.dll 	mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) 	ipc/glue/MessagePump.cpp:97
7 	xul.dll 	MessageLoop::RunHandler() 	ipc/chromium/src/base/message_loop.cc:313
8 	xul.dll 	MessageLoop::Run() 	ipc/chromium/src/base/message_loop.cc:293
9 	xul.dll 	nsBaseAppShell::Run() 	widget/nsBaseAppShell.cpp:156
10 	xul.dll 	nsAppShell::Run() 	widget/windows/nsAppShell.cpp:271
11 	xul.dll 	nsAppStartup::Run() 	toolkit/components/startup/nsAppStartup.cpp:287
12 	xul.dll 	XREMain::XRE_mainRun() 	toolkit/xre/nsAppRunner.cpp:4596
13 	xul.dll 	XREMain::XRE_main(int, char** const, mozilla::BootstrapConfig const&) 	toolkit/xre/nsAppRunner.cpp:4779
14 	xul.dll 	XRE_main(int, char** const, mozilla::BootstrapConfig const&) 	toolkit/xre/nsAppRunner.cpp:4874
15 	firefox.exe 	NS_internal_main(int, char**, char**) 	browser/app/nsBrowserApp.cpp:309
16 	firefox.exe 	wmain 	toolkit/xre/nsWindowsWMain.cpp:115
17 	firefox.exe 	__scrt_common_main_seh 	f:/dd/vctools/crt/vcstartup/src/startup/exe_common.inl:253
18 	kernel32.dll 	BaseThreadInitThunk 	
19 	ntdll.dll 	RtlUserThreadStart

crash reports like these started appearing on the nightly channel after 56.0a1 build 20170623045418.
this would be the changelog to the build one day earlier: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=edb7e1ddd9b61e2af2a75cfe5baa0f92a54a2716&tochange=594cc32b632396a867ef1f98428968b224d82151
The only hashtable I could find mapping nsUint32HashKey to nsString is HangMonitorParent::mBrowserCrashDumpIds. Could bug 1360308  be related? It landed on 6-22, so it might have been first included in the 6-23 Nightly.
Flags: needinfo?(cyu)
Oh, it's bug 1360308. For this crash https://crash-stats.mozilla.com/report/index/599198df-c7aa-4694-bf1f-3b9a90170724#tab-details the crash stack is:

 	xul.dll!PLDHashTable::ChangeTable Line 495	C++
 	xul.dll!PLDHashTable::Add Line 573	C++
 	xul.dll!nsBaseHashtable<nsUint32HashKey,nsString,nsString>::Put Line 138	C++
>	xul.dll!`anonymous-namespace'::HangMonitorParent::SendHangNotification::__l14::<lambda> Line 698	C++
 	[External Code]	
 	xul.dll!mozilla::detail::RunnableFunction<<lambda_c7b88c9c8753ac0d9600ffa25701d4eb> >::Run Line 528	C++
(...)

I'll take a look.
Assignee: nobody → cyu
Group: core-security
This is the missing kungFuDeathGrip in bug 1360308 when we are taking minidumps async in HangMonitorParent::SendHangNotification() and the lambda references the corrupted "this". Fix by refcounting HangMonitorParent.
Attachment #8889810 - Flags: review?(continuation)
Group: core-security → dom-core-security
Component: General → IPC
Comment on attachment 8889810 [details] [diff] [review]
Retain the HangMonitorParent instance when generating minidumps async

Review of attachment 8889810 [details] [diff] [review]:
-----------------------------------------------------------------

Bill said he could review this. I'm not sure how the lifetime for top level protocols is supposed to work. Off-hand it doesn't seem like just adding an AddRef and Release would fix this.

::: dom/ipc/ProcessHangMonitor.cpp
@@ +212,5 @@
>  {
>  public:
>    explicit HangMonitorParent(ProcessHangMonitor* aMonitor);
> +
> +  NS_DECL_ISUPPORTS

You don't need to make this class ISupports just to get refcounting. You should instead declare NS_INLINE_DECL_REFCOUNTING(HangMonitorParent).

@@ +698,5 @@
>      // We've been handed a partial minidump; complete it with plugin and
>      // content process dumps.
>      const PluginHangData& phd = aHangData.get_PluginHangData();
>  
> +    RefPtr<HangMonitorParent> self{this};

Should this be "(this)" instead of "{this}"? I don't know what the brackets are for.
Attachment #8889810 - Flags: review?(continuation) → review?(wmccloskey)
Comment on attachment 8889810 [details] [diff] [review]
Retain the HangMonitorParent instance when generating minidumps async

Review of attachment 8889810 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/ipc/ProcessHangMonitor.cpp
@@ +698,5 @@
>      // We've been handed a partial minidump; complete it with plugin and
>      // content process dumps.
>      const PluginHangData& phd = aHangData.get_PluginHangData();
>  
> +    RefPtr<HangMonitorParent> self{this};

This is a new C++ feature, but I don't think we should use it here since it's inconsistent with all our other code.

@@ +708,2 @@
>                         aResult);
> +        self->OnTakeFullMinidumpComplete(aHangData, aResult);

I'm pretty sure that this will still crash since OnTakeFullMinidumpComplete accesses mProcess, which will be null after RemoveProcess is called.

Do we still care about the dump after the process has shut down? I suspect not. In that case, maybe we could take a WeakPtr (from mfbt) to the HangMonitorParent and exit if it's no longer valid.
Attachment #8889810 - Flags: review?(wmccloskey) → review-
FWIW, top-level actors are manually managed, so the refcounting parts of the patch are correct.
Oh, I was also going to say that if you are going to refcount processhangmonitor, you should make mHangMonitorActor into a RefPtr<>, rather than leaving it as a raw pointer. That will make the ownership clearer.
I use WeakPtr for this case. The consumer of HangMonitorParent::SendHangNotification() is http://searchfox.org/mozilla-central/rev/8a61c71153a79cda2e1ae7d477564347c607cc5f/browser/modules/ProcessHangMonitor.jsm#178 which updates the UI and increment a telemetry counter. Updating UI for a dead process is pointless. The only concern is the telemetry counter, but I guess this is acceptable.
Attachment #8889810 - Attachment is obsolete: true
Attachment #8890327 - Flags: review?(wmccloskey)
Attachment #8890327 - Flags: review?(wmccloskey) → review+
https://hg.mozilla.org/mozilla-central/rev/c5ccbf267da5
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Group: dom-core-security → core-security-release
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.