Closed Bug 976363 Opened 6 years ago Closed 6 years ago

MessageLoop can leak its pending tasks

Categories

(Core :: IPC, defect)

x86
macOS
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla30

People

(Reporter: ehsan, Assigned: ehsan)

References

Details

(Whiteboard: [MemShrink:P2])

Attachments

(1 file, 1 obsolete file)

This was uncovered with my work in bug 935778 where I got some instances of a WeakReference<MessageLoopIdleTask> leak.

So the chromium code that we imported here just throws up its hands and leaks these objects explicitly.  I checked the current upstream code and the leaks are still there but the mentions of purify are removed.

I think we need to do better.  It could be that deleting these objects can result to double-frees, but we should hopefully be able to find and fix those leaks.  Also, note that this currently blocks me landing the checks in bug 935778.
Comment on attachment 8381012 [details] [diff] [review]
Do not leak chromium Task objects when a MessageLoop goes away

(I'm actually interested in both of your reviews here!)
Attachment #8381012 - Flags: review?(bent.mozilla)
Attachment #8381012 - Flags: review?(benjamin)
Whiteboard: [MemShrink]
ping?  I'd really like to land this sooner than later before people start adding new leaks!
Comment on attachment 8381012 [details] [diff] [review]
Do not leak chromium Task objects when a MessageLoop goes away

Review of attachment 8381012 [details] [diff] [review]:
-----------------------------------------------------------------

This looks ok to me. I wish we had more confidence in this code...
Attachment #8381012 - Flags: review?(bent.mozilla) → review+
I'm more than a little concerned about this. In what case do we actually have pending tasks in the MessageLoop destructor, and what kind are they? We take extraordinary pains for that never to happen in the XPCOM version of the event loop and it seems like we ought to be similarly anal about this version.
Flags: needinfo?(ehsan)
Here's a stack to the MessageLoop destructor.  The only object that I've found which has this problem so far is MessageLoopIdleTask, but only because it inherits from SupportsWeakPtr, which means that its weak references are now leak-checked.  We may run into this problem with other objects that go undetected too.

(lldb) bt
* thread #1: tid = 0x41d014, 0x00000001015d7cd0 XUL`(anonymous namespace)::MessageLoopIdleTask::~MessageLoopIdleTask(this=0x00000001216afb50) + 16 at nsMessageLoop.cpp:36, queue = 'com.apple.main-thread, stop reason = step in
    frame #0: 0x00000001015d7cd0 XUL`(anonymous namespace)::MessageLoopIdleTask::~MessageLoopIdleTask(this=0x00000001216afb50) + 16 at nsMessageLoop.cpp:36
    frame #1: 0x0000000101bc8e7d XUL`MessageLoop::DeletePendingTasks(this=0x00000001003e7330) + 381 at message_loop.cc:410
    frame #2: 0x0000000101bc8ba4 XUL`MessageLoop::~MessageLoop(this=0x00000001003e7330) + 276 at message_loop.cc:168
    frame #3: 0x00000001015a72c5 XUL`MessageLoopForUI::~MessageLoopForUI(this=0x00000001003e7330) + 21 at message_loop.h:459
    frame #4: 0x00000001015a7275 XUL`MessageLoopForUI::~MessageLoopForUI(this=0x00000001003e7330) + 21 at message_loop.h:459
    frame #5: 0x00000001015a7299 XUL`MessageLoopForUI::~MessageLoopForUI(this=0x00000001003e7330) + 25 at message_loop.h:459
    frame #6: 0x000000010159db0e XUL`mozilla::ShutdownXPCOM(servMgr=0x0000000000000000) + 2206 at nsXPComInit.cpp:869
    frame #7: 0x000000010159d265 XUL`NS_ShutdownXPCOM(servMgr=0x00000001003e74e8) + 21 at nsXPComInit.cpp:645
    frame #8: 0x00000001052be0aa XUL`ScopedXPCOMStartup::~ScopedXPCOMStartup(this=0x000000010030e490) + 234 at nsAppRunner.cpp:1189
    frame #9: 0x00000001052bdfb5 XUL`ScopedXPCOMStartup::~ScopedXPCOMStartup(this=0x000000010030e490) + 21 at nsAppRunner.cpp:1170
    frame #10: 0x00000001052c7a07 XUL`XREMain::XRE_main(this=0x00007fff5fbfeae0, argc=5, argv=0x00007fff5fbff3e0, aAppData=0x00007fff5fbfed78) + 807 at nsAppRunner.cpp:4088
    frame #11: 0x00000001052c7e0d XUL`XRE_main(argc=5, argv=0x00007fff5fbff3e0, aAppData=0x00007fff5fbfed78, aFlags=0) + 77 at nsAppRunner.cpp:4273
    frame #12: 0x00000001000020d7 firefox`do_main(argc=5, argv=0x00007fff5fbff3e0, xreDirectory=0x0000000100327340) + 1639 at nsBrowserApp.cpp:282
    frame #13: 0x0000000100001611 firefox`main(argc=5, argv=0x00007fff5fbff3e0) + 321 at nsBrowserApp.cpp:643
    frame #14: 0x0000000100001074 firefox`start + 52
Flags: needinfo?(ehsan)
Note that for reproducing, all one has to do is to run the mochitests in dom/browser-element.
Whiteboard: [MemShrink] → [MemShrink:P2]
Andrew, have you found this through LeakSanitizer?  It should happen on all mochitest-2 runs iirc (if those are the ones running dom/browser-element)
Flags: needinfo?(continuation)
I don't have LSAN working on mochitests yet.
Flags: needinfo?(continuation)
LSAN works on Mochitests now.  The only things I saw containing MessageLoop looked like this, one in M4 and one in M5.  Let me know if the allocation stacks wouldn't contain MessageLoop or something.

Indirect leak of 456 byte(s) in 3 object(s) allocated from:
  #0 0x446434 in calloc /builds/slave/moz-toolchain/src/llvm/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:90
  #1 0x7f047e591224 in event_mm_calloc_ /builds/slave/try-l64-asan-00000000000000000/build/ipc/chromium/src/third_party/libevent/event.c:2663
  #2 0x7f047e591224 in event_base_once /builds/slave/try-l64-asan-00000000000000000/build/ipc/chromium/src/third_party/libevent/event.c:1684
  #3 0x7f047e5cbde3 in base::MessagePumpLibevent::Run(base::MessagePump::Delegate*) /builds/slave/try-l64-asan-00000000000000000/build/ipc/chromium/src/base/message_pump_libevent.cc:339
  #4 0x7f047e6187f3 in RunInternal /builds/slave/try-l64-asan-00000000000000000/build/ipc/chromium/src/base/message_loop.cc:226
  #5 0x7f047e6187f3 in RunHandler /builds/slave/try-l64-asan-00000000000000000/build/ipc/chromium/src/base/message_loop.cc:219
  #6 0x7f047e6187f3 in MessageLoop::Run() /builds/slave/try-l64-asan-00000000000000000/build/ipc/chromium/src/base/message_loop.cc:193
  #7 0x7f047e63fa32 in base::Thread::ThreadMain() /builds/slave/try-l64-asan-00000000000000000/build/ipc/chromium/src/base/thread.cc:162
(In reply to Andrew McCreight [:mccr8] from comment #11)
> LSAN works on Mochitests now.  The only things I saw containing MessageLoop
> looked like this, one in M4 and one in M5.  Let me know if the allocation
> stacks wouldn't contain MessageLoop or something.

The leaking object is allocated here: <http://mxr.mozilla.org/mozilla-central/source/xpcom/base/nsMessageLoop.cpp#149>, with stacks resembling:

(lldb) bt
* thread #1: tid = 0x819644, 0x00000001015d6c73 XUL`nsMessageLoop::PostIdleTask(this=0x00000001196e52a0, aTask=0x00000001196e52e0, aEnsureRunsAfterMS=2000) + 19 at nsMessageLoop.cpp:148, queue = 'com.apple.main-thread, stop reason = breakpoint 1.1
    frame #0: 0x00000001015d6c73 XUL`nsMessageLoop::PostIdleTask(this=0x00000001196e52a0, aTask=0x00000001196e52e0, aEnsureRunsAfterMS=2000) + 19 at nsMessageLoop.cpp:148
    frame #1: 0x00000001016a8c69 XUL`NS_InvokeByIndex(that=0x00000001196e52a0, methodIndex=3, paramCount=2, params=0x00007fff5fbf4a58) + 537 at xptcinvoke_x86_64_unix.cpp:162
    frame #2: 0x00000001032ba6f4 XUL`CallMethodHelper::Invoke(this=0x00007fff5fbf4a10) + 84 at XPCWrappedNative.cpp:2406
    frame #3: 0x00000001032aecd7 XUL`CallMethodHelper::Call(this=0x00007fff5fbf4a10) + 263 at XPCWrappedNative.cpp:1747
    frame #4: 0x000000010328e1b2 XUL`XPCWrappedNative::CallMethod(ccx=0x00007fff5fbf4c00, mode=CALL_METHOD) + 226 at XPCWrappedNative.cpp:1714
    frame #5: 0x00000001032907ad XUL`XPC_WN_CallMethod(cx=0x00000001105b4170, argc=2, vp=0x0000000110f75170) + 909 at XPCWrappedNativeJSOps.cpp:1285
    frame #6: 0x000000010630f005 XUL`js::CallJSNative(cx=0x00000001105b4170, native=0x0000000103290420, args=0x00007fff5fbf52a0)(JSContext*, unsigned int, JS::Value*), JS::CallArgs const&) + 165 at jscntxtinlines.h:239
    frame #7: 0x00000001062cf57c XUL`js::Invoke(cx=0x00000001105b4170, args=CallArgs at 0x00007fff5fbf52a0, construct=NO_CONSTRUCT) + 1164 at Interpreter.cpp:476
    frame #8: 0x00000001062c59ff XUL`Interpret(cx=0x00000001105b4170, state=0x00007fff5fbf8120) + 53311 at Interpreter.cpp:2614
    frame #9: 0x00000001062b88b9 XUL`js::RunScript(cx=0x00000001105b4170, state=0x00007fff5fbf8120) + 505 at Interpreter.cpp:423
    frame #10: 0x00000001062cf6b2 XUL`js::Invoke(cx=0x00000001105b4170, args=CallArgs at 0x00007fff5fbf81d0, construct=NO_CONSTRUCT) + 1474 at Interpreter.cpp:495
    frame #11: 0x000000010604dc48 XUL`js_fun_apply(cx=0x00000001105b4170, argc=2, vp=0x0000000110f75098) + 1720 at jsfun.cpp:1021
    frame #12: 0x000000010630f005 XUL`js::CallJSNative(cx=0x00000001105b4170, native=0x000000010604d590, args=0x00007fff5fbf8cd0)(JSContext*, unsigned int, JS::Value*), JS::CallArgs const&) + 165 at jscntxtinlines.h:239
    frame #13: 0x00000001062cf57c XUL`js::Invoke(cx=0x00000001105b4170, args=CallArgs at 0x00007fff5fbf8cd0, construct=NO_CONSTRUCT) + 1164 at Interpreter.cpp:476
    frame #14: 0x00000001062c59ff XUL`Interpret(cx=0x00000001105b4170, state=0x00007fff5fbfbb50) + 53311 at Interpreter.cpp:2614
    frame #15: 0x00000001062b88b9 XUL`js::RunScript(cx=0x00000001105b4170, state=0x00007fff5fbfbb50) + 505 at Interpreter.cpp:423
    frame #16: 0x00000001062cf6b2 XUL`js::Invoke(cx=0x00000001105b4170, args=CallArgs at 0x00007fff5fbfbc00, construct=NO_CONSTRUCT) + 1474 at Interpreter.cpp:495
    frame #17: 0x00000001062cfead XUL`js::Invoke(cx=0x00000001105b4170, thisv=0x00007fff5fbfbe10, fval=0x00007fff5fbfc3b0, argc=1, argv=0x00007fff5fbfc2e0, rval=JS::MutableHandleValue at 0x00007fff5fbfbd80) + 829 at Interpreter.cpp:532
    frame #18: 0x000000010602b35f XUL`JS_CallFunctionValue(cx=0x00000001105b4170, obj=JS::HandleObject at 0x00007fff5fbfbe50, fval=JS::HandleValue at 0x00007fff5fbfbe48, args=0x00007fff5fbfc260, rval=JS::MutableHandleValue at 0x00007fff5fbfbe38) + 335 at jsapi.cpp:4935
    frame #19: 0x0000000103aa6789 XUL`nsFrameMessageManager::ReceiveMessage(this=0x00000001281def60, aTarget=0x000000011467e6c0, aMessage=0x000000012ae6f738, aIsSync=false, aCloneData=0x00007fff5fbfc9c8, aCpows=0x00007fff5fbfc9a0, aPrincipal=0x0000000000000000, aJSONRetVal=0x0000000000000000) + 7833 at nsFrameMessageManager.cpp:1046
    frame #20: 0x0000000103a9b241 XUL`nsAsyncMessageToChild::Run(this=0x000000012ae6f710) + 513 at nsFrameLoader.cpp:2300
    frame #21: 0x000000010168f299 XUL`nsThread::ProcessNextEvent(this=0x00000001003289c0, mayWait=false, result=0x00007fff5fbfcbd3) + 1561 at nsThread.cpp:643
    frame #22: 0x000000010158fdab XUL`NS_ProcessPendingEvents(thread=0x00000001003289c0, timeout=20) + 171 at nsThreadUtils.cpp:210
    frame #23: 0x0000000103148359 XUL`nsBaseAppShell::NativeEventCallback(this=0x00000001105ed480) + 201 at nsBaseAppShell.cpp:98
    frame #24: 0x00000001030d3f4c XUL`nsAppShell::ProcessGeckoEvents(aInfo=0x00000001105ed480) + 428 at nsAppShell.mm:388
    frame #25: 0x00007fff9324b731 CoreFoundation`__CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ + 17
    frame #26: 0x00007fff9323cea2 CoreFoundation`__CFRunLoopDoSources0 + 242
    frame #27: 0x00007fff9323c62f CoreFoundation`__CFRunLoopRun + 831
    frame #28: 0x00007fff9323c0b5 CoreFoundation`CFRunLoopRunSpecific + 309
    frame #29: 0x00007fff92ab2a0d HIToolbox`RunCurrentEventLoopInMode + 226
    frame #30: 0x00007fff92ab2685 HIToolbox`ReceiveNextEventCommon + 173
    frame #31: 0x00007fff92ab25bc HIToolbox`_BlockUntilNextEventMatchingListInModeWithFilter + 65
    frame #32: 0x00007fff9441a3de AppKit`_DPSNextEvent + 1434
    frame #33: 0x00007fff94419a2b AppKit`-[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] + 122
    frame #34: 0x00000001030d2eb7 XUL`-[GeckoNSApplication nextEventMatchingMask:untilDate:inMode:dequeue:](self=0x0000000100322980, _cmd=0x00007fff94e4d5c3, mask=18446744073709551615, expiration=0x422d63c37f00000d, mode=0x00007fff7d491d00, flag='\x01') + 119 at nsAppShell.mm:165
    frame #35: 0x00007fff9440db2c AppKit`-[NSApplication run] + 553
    frame #36: 0x00000001030d4a32 XUL`nsAppShell::Run(this=0x00000001105ed480) + 162 at nsAppShell.mm:742
    frame #37: 0x0000000104d9591c XUL`nsAppStartup::Run(this=0x0000000110f0c0b0) + 156 at nsAppStartup.cpp:276
    frame #38: 0x0000000104c4eb44 XUL`XREMain::XRE_mainRun(this=0x00007fff5fbfeca0) + 6004 at nsAppRunner.cpp:4008
    frame #39: 0x0000000104c4f349 XUL`XREMain::XRE_main(this=0x00007fff5fbfeca0, argc=5, argv=0x00007fff5fbff5a0, aAppData=0x00007fff5fbfef38) + 697 at nsAppRunner.cpp:4075
    frame #40: 0x0000000104c4f7bd XUL`XRE_main(argc=5, argv=0x00007fff5fbff5a0, aAppData=0x00007fff5fbfef38, aFlags=0) + 77 at nsAppRunner.cpp:4285
    frame #41: 0x0000000100002107 firefox`do_main(argc=5, argv=0x00007fff5fbff5a0, xreDirectory=0x0000000100327340) + 1639 at nsBrowserApp.cpp:282
    frame #42: 0x0000000100001641 firefox`main(argc=5, argv=0x00007fff5fbff5a0) + 321 at nsBrowserApp.cpp:643
    frame #43: 0x00000001000010a4 firefox`start + 52
Yeah, that doesn't look like the stack I saw in comment 11, which was the only stack I saw that contained the string "MessageLoop"... Oh, but wait, I've seen a bajillion stacks with NS_InvokeByIndex in it, and nothing above it, so it is possible that is at least some of what that is.  I really need to try LSAN with non-opt builds.
Comment on attachment 8381012 [details] [diff] [review]
Do not leak chromium Task objects when a MessageLoop goes away

This code is really the ugliest hybrid of the XPCOM event loop and the chromium event loop I can imagine. The method in question is supposed to run a nsIRunnable when the event loop is idle and it does this by posting a chromium Task using PostIdleTask.

In practice when we shut down, the MessageLoop should be empty because we've already stopped all competing threads and spun the event loop to remove all pending tasks. This however doesn't process "idle" tasks so we're left leaking this one.

So what I'd like to do is this:

In MessageLoop::DeletePendingTasks:
* MOZ_ASSERT if work_queue_ isn't empty (replace the first part of the patch here)
* delete everything in deferred_non_nestable_work_queue_ (keep the second part of the patch here)

Ehsan can you try that and see if it passes try?
Attachment #8381012 - Flags: review?(benjamin)
Seems to work locally, testing on try now.
Attachment #8381012 - Attachment is obsolete: true
Comment on attachment 8388142 [details] [diff] [review]
Assert that the work queue is empty when a MessageLoop goes away, and delete the tasks in the deferred queue

This seems to work.
Attachment #8388142 - Flags: review?(benjamin)
Attachment #8388142 - Flags: review?(benjamin) → review+
(Also removed a now useless comment: https://hg.mozilla.org/integration/mozilla-inbound/rev/f16600ef3313)
https://hg.mozilla.org/mozilla-central/rev/bf5956ac5ba7
https://hg.mozilla.org/mozilla-central/rev/f16600ef3313
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla30
Depends on: 986113
You need to log in before you can comment on or make changes to this bug.