Closed Bug 1392130 Opened 7 years ago Closed 6 years ago

Intermittent leakcheck | default process: 1992|1720|984 bytes leaked (BackstagePass, CondVar, IdlePeriod, Mutex, ThreadEventTarget, ...)

Categories

(Firefox :: Session Restore, defect, P3)

defect

Tracking

()

RESOLVED INCOMPLETE
Firefox 60
Tracking Status
firefox60 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: mikedeboer)

References

(Depends on 1 open bug)

Details

(Keywords: intermittent-failure, memory-leak, Whiteboard: [stockwell unknown])

Attachments

(1 file)

Keywords: mlk
This bug has failed 37 times in the last 7 days. Most of the failures happen on windows and the rest on linux. This happens only on debug.

Here is an example of a recent log:
https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=139249228&lineNumber=9134

This is a part of the log where the test failed:
17:45:35     INFO -  TEST-INFO | leakcheck | default process: leaked 2 XPCWrappedNative
17:45:35     INFO -  TEST-INFO | leakcheck | default process: leaked 1 XPCWrappedNativeProto
17:45:35     INFO -  TEST-INFO | leakcheck | default process: leaked 2 XPCWrappedNativeTearOff
17:45:35     INFO -  TEST-INFO | leakcheck | default process: leaked 1 nsJSPrincipals
17:45:35     INFO -  TEST-INFO | leakcheck | default process: leaked 1 nsStringBuffer
17:45:35     INFO -  TEST-INFO | leakcheck | default process: leaked 5 nsTArray_base
17:45:35     INFO -  TEST-INFO | leakcheck | default process: leaked 1 nsThread
17:45:35     INFO -  TEST-INFO | leakcheck | default process: leaked 1 nsTimer
17:45:35     INFO -  TEST-INFO | leakcheck | default process: leaked 1 nsTimerImpl
17:45:35     INFO -  TEST-INFO | leakcheck | default process: leaked 1 nsXPCWrappedJS
17:45:35     INFO -  TEST-INFO | leakcheck | default process: leaked 1 nsXPCWrappedJSClass
17:45:35     INFO -  TEST-INFO | leakcheck | default process: leaked 3 xptiInterfaceInfo
17:45:35    ERROR -  3133 ERROR TEST-UNEXPECTED-FAIL | leakcheck | default process: 984 bytes leaked (BackstagePass, CondVar, IdlePeriod, Mutex, ThreadEventTarget, ...)
17:45:35     INFO -  == BloatView: ALL (cumulative) LEAK AND BLOAT STATISTICS, gpu process 1148
17:45:35     INFO -       |<----------------Class--------------->|<-----Bytes------>|<----Objects---->|
17:45:35     INFO -       |                                      | Per-Inst   Leaked|   Total      Rem|
17:45:35     INFO -     0 |TOTAL                                 |       27        0|   49044        0|
17:45:35     INFO -  nsTraceRefcnt::DumpStatistics: 136 entries
17:45:35     INFO -  TEST-PASS | leakcheck | gpu process: no leaks detected!
17:45:35     INFO -  == BloatView: ALL (cumulative) LEAK AND BLOAT STATISTICS, tab process 4672
17:45:35     INFO -       |<----------------Class--------------->|<-----Bytes------>|<----Objects---->|
17:45:35     INFO -       |                                      | Per-Inst   Leaked|   Total      Rem|
17:45:35     INFO -     0 |TOTAL                                 |       32       12|   62502        2|
17:45:35     INFO -   696 |nsStringBuffer                        |        8        8|   17658        1|
17:45:35     INFO -   735 |nsTArray_base                         |        4        4|   14687        1|
17:45:35     INFO -  nsTraceRefcnt::DumpStatistics: 799 entries
17:45:35     INFO -  TEST-INFO | leakcheck | tab process: leaked 1 nsStringBuffer
17:45:35     INFO -  TEST-INFO | leakcheck | tab process: leaked 1 nsTArray_base
17:45:35     INFO -  leakcheck | tab process: 12 bytes leaked (nsStringBuffer, nsTArray_base)
17:45:35     INFO -  == BloatView: ALL (cumulative) LEAK AND BLOAT STATISTICS, tab process 4780
17:45:35     INFO -       |<----------------Class--------------->|<-----Bytes------>|<----Objects---->|
17:45:35     INFO -       |                                      | Per-Inst   Leaked|   Total      Rem|
17:45:35     INFO -     0 |TOTAL                                 |       28       12|  199862        2|
17:45:35     INFO -   766 |nsStringBuffer                        |        8        8|   26265        1|
17:45:35     INFO -   807 |nsTArray_base                         |        4        4|   56779        1|
17:45:35     INFO -  nsTraceRefcnt::DumpStatistics: 880 entries
17:45:35     INFO -  TEST-INFO | leakcheck | tab process: leaked 1 nsStringBuffer
17:45:35     INFO -  TEST-INFO | leakcheck | tab process: leaked 1 nsTArray_base
17:45:35     INFO -  leakcheck | tab process: 12 bytes leaked (nsStringBuffer, nsTArray_base)
17:45:35     INFO -  runtests.py | Running tests: end.


:selenamarie : Hi, can you please take a look at this bug?
Flags: needinfo?(sdeckelmann)
Whiteboard: [stockwell needswork]
This is a wild guess.. passing over to XPConnect given that BackstagePass lives there.
Component: General → XPConnect
Flags: needinfo?(sdeckelmann)
Andrew -- Can you have a look and tell me if I'm barking up the wrong tree?
Flags: needinfo?(continuation)
(In reply to Selena Deckelmann :selenamarie :selena use ni? pronoun: she from comment #14)
> Andrew -- Can you have a look and tell me if I'm barking up the wrong tree?

That's certainly a reasonable place to start.

Looking at these failures, they are scattered across a wide variety of test suites. They all also involve an nsTimer as well as a backstage pass. I think the cause of this is going to be somebody adding a new timer somewhere that gets added when we're in the middle of shutdown. I'm not sure how to investigate this. Maybe we could log information about leaking backstage passes.

One of the failures is in dom/battery/test/test_battery_unprivileged.html which does almost nothing.

Do you have any ideas on how we could investigate this, Nathan? I'll try to think of something.
Flags: needinfo?(continuation) → needinfo?(nfroyd)
It looks like timers are shut down before we do the final GC, so I'm not sure how timer leaks like this can happen.
I have been running builds with leak logging all day, though I have yet to get something actionable.  We'll see what the latest round of retriggers returns.
Finally hit something for a leaking nsTimer:

21:47:29     INFO -  #00: NS_LogAddRef [xpcom/base/nsTraceRefcnt.cpp:1035]
21:47:29     INFO -  #01: nsTimer::AddRef() [xpcom/threads/nsTimerImpl.cpp:316]
21:47:29     INFO -  #02: nsTimerImpl::nsTimerImpl(nsITimer *) [xpcom/threads/nsTimerImpl.cpp:338]
21:47:29     INFO -  #03: nsTimer::nsTimer() [xpcom/threads/nsTimerImpl.h:224]
21:47:29     INFO -  #04: mozilla::GenericFactory::CreateInstance(nsISupports *,nsID const &,void * *) [xpcom/components/GenericFactory.cpp:17]
21:47:29     INFO -  #05: nsComponentManagerImpl::CreateInstance(nsID const &,nsISupports *,nsID const &,void * *) [xpcom/components/nsComponentManager.cpp:1003]
21:47:29     INFO -  #06: nsJSCID::CreateInstance(JS::Handle<JS::Value>,JSContext *,unsigned char,JS::MutableHandle<JS::Value>) [js/xpconnect/src/XPCJSID.cpp:661]
21:47:29     INFO -  #07: NS_InvokeByIndex
21:47:29     INFO -  #08: CallMethodHelper::Invoke() [js/xpconnect/src/XPCWrappedNative.cpp:1996]
21:47:29     INFO -  #09: XPCWrappedNative::CallMethod(XPCCallContext &,XPCWrappedNative::CallMode) [js/xpconnect/src/XPCWrappedNative.cpp:1282]
21:47:29     INFO -  #10: XPC_WN_CallMethod(JSContext *,unsigned int,JS::Value *) [js/xpconnect/src/XPCWrappedNativeJSOps.cpp:929]
21:47:29     INFO -  #11: js::CallJSNative(JSContext *,bool (*)(JSContext *,unsigned int,JS::Value *),JS::CallArgs const &) [js/src/jscntxtinlines.h:291]
21:47:29     INFO -  #12: js::InternalCallOrConstruct(JSContext *,JS::CallArgs const &,js::MaybeConstruct) [js/src/vm/Interpreter.cpp:473]
21:47:29     INFO -  #13: InternalCall [js/src/vm/Interpreter.cpp:522]
21:47:29     INFO -  #14: Interpret [js/src/vm/Interpreter.cpp:3067]
21:47:29     INFO -  #15: js::RunScript(JSContext *,js::RunState &) [js/src/vm/Interpreter.cpp:423]
21:47:29     INFO -  #16: js::InternalCallOrConstruct(JSContext *,JS::CallArgs const &,js::MaybeConstruct) [js/src/vm/Interpreter.cpp:495]
21:47:29     INFO -  #17: InternalCall [js/src/vm/Interpreter.cpp:522]
21:47:29     INFO -  #18: Interpret [js/src/vm/Interpreter.cpp:3067]
21:47:29     INFO -  #19: js::RunScript(JSContext *,js::RunState &) [js/src/vm/Interpreter.cpp:423]
21:47:29     INFO -  #20: js::InternalCallOrConstruct(JSContext *,JS::CallArgs const &,js::MaybeConstruct) [js/src/vm/Interpreter.cpp:495]
21:47:29     INFO -  #21: InternalCall [js/src/vm/Interpreter.cpp:522]
21:47:29     INFO -  #22: Interpret [js/src/vm/Interpreter.cpp:3067]
21:47:29     INFO -  #23: js::RunScript(JSContext *,js::RunState &) [js/src/vm/Interpreter.cpp:423]
21:47:29     INFO -  #24: js::InternalCallOrConstruct(JSContext *,JS::CallArgs const &,js::MaybeConstruct) [js/src/vm/Interpreter.cpp:495]
21:47:29     INFO -  #25: InternalCall [js/src/vm/Interpreter.cpp:522]
21:47:29     INFO -  #26: Interpret [js/src/vm/Interpreter.cpp:3067]
21:47:29     INFO -  #27: js::RunScript(JSContext *,js::RunState &) [js/src/vm/Interpreter.cpp:423]
21:47:29     INFO -  #28: js::InternalCallOrConstruct(JSContext *,JS::CallArgs const &,js::MaybeConstruct) [js/src/vm/Interpreter.cpp:495]
21:47:29     INFO -  #29: InternalCall [js/src/vm/Interpreter.cpp:522]
21:47:29     INFO -  #30: Interpret [js/src/vm/Interpreter.cpp:3067]
21:47:29     INFO -  #31: js::RunScript(JSContext *,js::RunState &) [js/src/vm/Interpreter.cpp:423]
21:47:29     INFO -  #32: js::InternalCallOrConstruct(JSContext *,JS::CallArgs const &,js::MaybeConstruct) [js/src/vm/Interpreter.cpp:495]
21:47:29     INFO -  #33: InternalCall [js/src/vm/Interpreter.cpp:522]
21:47:29     INFO -  #34: Interpret [js/src/vm/Interpreter.cpp:3067]
21:47:29     INFO -  #35: js::RunScript(JSContext *,js::RunState &) [js/src/vm/Interpreter.cpp:423]
21:47:29     INFO -  #36: js::InternalCallOrConstruct(JSContext *,JS::CallArgs const &,js::MaybeConstruct) [js/src/vm/Interpreter.cpp:495]
21:47:29     INFO -  #37: InternalCall [js/src/vm/Interpreter.cpp:522]
21:47:29     INFO -  #38: js::Call(JSContext *,JS::Handle<JS::Value>,JS::Handle<JS::Value>,js::AnyInvokeArgs const &,JS::MutableHandle<JS::Value>) [js/src/vm/Interpreter.cpp:541]
21:47:29     INFO -  #39: JS_CallFunctionValue(JSContext *,JS::Handle<JSObject *>,JS::Handle<JS::Value>,JS::HandleValueArray const &,JS::MutableHandle<JS::Value>) [js/src/jsapi.cpp:2962]
21:47:29     INFO -  #40: nsXPCWrappedJSClass::CallMethod(nsXPCWrappedJS *,unsigned short,XPTMethodDescriptor const *,nsXPTCMiniVariant *) [js/xpconnect/src/XPCWrappedJSClass.cpp:1317]
21:47:29     INFO -  #41: nsXPCWrappedJS::CallMethod(unsigned short,XPTMethodDescriptor const *,nsXPTCMiniVariant *) [js/xpconnect/src/XPCWrappedJS.cpp:615]
21:47:29     INFO -  #42: PrepareAndDispatch [xpcom/reflect/xptcall/md/win32/xptcstubs.cpp:85]
21:47:29     INFO -  #43: SharedStub [xpcom/reflect/xptcall/md/win32/xptcstubs.cpp:113]
21:47:29     INFO -  #44: nsObserverList::NotifyObservers(nsISupports *,char const *,char16_t const *) [xpcom/ds/nsObserverList.cpp:112]
21:47:29     INFO -  #45: nsObserverService::NotifyObservers(nsISupports *,char const *,char16_t const *) [xpcom/ds/nsObserverService.cpp:300]
21:47:29     INFO -  #46: nsAppStartup::Quit(unsigned int) [toolkit/components/startup/nsAppStartup.cpp:426]
21:47:29     INFO -  #47: NS_InvokeByIndex
21:47:29     INFO -  #48: CallMethodHelper::Invoke() [js/xpconnect/src/XPCWrappedNative.cpp:1996]
21:47:29     INFO -  #49: XPCWrappedNative::CallMethod(XPCCallContext &,XPCWrappedNative::CallMode) [js/xpconnect/src/XPCWrappedNative.cpp:1282]
21:47:29     INFO -  #50: XPC_WN_CallMethod(JSContext *,unsigned int,JS::Value *) [js/xpconnect/src/XPCWrappedNativeJSOps.cpp:929]
21:47:29     INFO -  #51: js::CallJSNative(JSContext *,bool (*)(JSContext *,unsigned int,JS::Value *),JS::CallArgs const &) [js/src/jscntxtinlines.h:291]
21:47:29     INFO -  #52: js::InternalCallOrConstruct(JSContext *,JS::CallArgs const &,js::MaybeConstruct) [js/src/vm/Interpreter.cpp:473]
21:47:29     INFO -  #53: InternalCall [js/src/vm/Interpreter.cpp:522]
21:47:29     INFO -  #54: Interpret [js/src/vm/Interpreter.cpp:3067]
21:47:29     INFO -  #55: js::RunScript(JSContext *,js::RunState &) [js/src/vm/Interpreter.cpp:423]
21:47:29     INFO -  #56: js::InternalCallOrConstruct(JSContext *,JS::CallArgs const &,js::MaybeConstruct) [js/src/vm/Interpreter.cpp:495]
21:47:29     INFO -  #57: InternalCall [js/src/vm/Interpreter.cpp:522]
21:47:29     INFO -  #58: js::Call(JSContext *,JS::Handle<JS::Value>,JS::Handle<JS::Value>,js::AnyInvokeArgs const &,JS::MutableHandle<JS::Value>) [js/src/vm/Interpreter.cpp:541]
21:47:29     INFO -  #59: JS_CallFunctionValue(JSContext *,JS::Handle<JSObject *>,JS::Handle<JS::Value>,JS::HandleValueArray const &,JS::MutableHandle<JS::Value>) [js/src/jsapi.cpp:2962]
21:47:29     INFO -  #60: nsFrameMessageManager::ReceiveMessage(nsISupports *,nsIFrameLoader *,bool,nsTSubstring<char16_t> const &,bool,mozilla::dom::ipc::StructuredCloneData *,mozilla::jsipc::CpowHolder *,nsIPrincipal *,nsTArray<mozilla::dom::ipc::StructuredCloneData> *) [dom/base/nsFrameMessageManager.cpp:1097]
21:47:29     INFO -  #61: nsFrameMessageManager::ReceiveMessage(nsISupports *,nsIFrameLoader *,bool,nsTSubstring<char16_t> const &,bool,mozilla::dom::ipc::StructuredCloneData *,mozilla::jsipc::CpowHolder *,nsIPrincipal *,nsTArray<mozilla::dom::ipc::StructuredCloneData> *) [dom/base/nsFrameMessageManager.cpp:1127]
21:47:29     INFO -  #62: nsFrameMessageManager::ReceiveMessage(nsISupports *,nsIFrameLoader *,bool,nsTSubstring<char16_t> const &,bool,mozilla::dom::ipc::StructuredCloneData *,mozilla::jsipc::CpowHolder *,nsIPrincipal *,nsTArray<mozilla::dom::ipc::StructuredCloneData> *) [dom/base/nsFrameMessageManager.cpp:1127]
21:47:29     INFO -  #63: nsFrameMessageManager::ReceiveMessage(nsISupports *,nsIFrameLoader *,bool,nsTSubstring<char16_t> const &,bool,mozilla::dom::ipc::StructuredCloneData *,mozilla::jsipc::CpowHolder *,nsIPrincipal *,nsTArray<mozilla::dom::ipc::StructuredCloneData> *) [dom/base/nsFrameMessageManager.cpp:1127]
21:47:29     INFO -  #64: nsFrameMessageManager::ReceiveMessage(nsISupports *,nsIFrameLoader *,nsTSubstring<char16_t> const &,bool,mozilla::dom::ipc::StructuredCloneData *,mozilla::jsipc::CpowHolder *,nsIPrincipal *,nsTArray<mozilla::dom::ipc::StructuredCloneData> *) [dom/base/nsFrameMessageManager.cpp:908]
21:47:29     INFO -  #65: mozilla::dom::TabParent::ReceiveMessage(nsTString<char16_t> const &,bool,mozilla::dom::ipc::StructuredCloneData *,mozilla::jsipc::CpowHolder *,nsIPrincipal *,nsTArray<mozilla::dom::ipc::StructuredCloneData> *) [dom/ipc/TabParent.cpp:2606]
21:47:29     INFO -  #66: mozilla::dom::TabParent::RecvAsyncMessage(nsTString<char16_t> const &,nsTArray<mozilla::jsipc::CpowEntry> &&,IPC::Principal const &,mozilla::dom::ClonedMessageData const &) [dom/ipc/TabParent.cpp:1756]
21:47:29     INFO -  #67: mozilla::dom::PBrowserParent::OnMessageReceived(IPC::Message const &) [s3:gecko-generated-sources:4aa086ea2887982c9d7757e867174071ef64bbfdf5b075da7d23b8f2a7d9f8a18f0a6b1aed915bc9ee47c909ccf8e86d82ec9b128cdae060dc80cdf518827817/ipc/ipdl/PBrowserParent.cpp::1962]
21:47:29     INFO -  #68: mozilla::dom::PContentParent::OnMessageReceived(IPC::Message const &) [s3:gecko-generated-sources:bdd99131a225ce0cd9726225e43b73c1ca406c159a53bf5fd35da36f33512ab0d2c07abd7636a9b762352908731d60452485279361871e159412de383d61d69d/ipc/ipdl/PContentParent.cpp::3201]
21:47:29     INFO -  #69: mozilla::ipc::MessageChannel::DispatchAsyncMessage(IPC::Message const &) [ipc/glue/MessageChannel.cpp:2120]
21:47:29     INFO -  #70: mozilla::ipc::MessageChannel::DispatchMessageW(IPC::Message &&) [ipc/glue/MessageChannel.cpp:2051]
21:47:29     INFO -  #71: mozilla::ipc::MessageChannel::RunMessage(mozilla::ipc::MessageChannel::MessageTask &) [ipc/glue/MessageChannel.cpp:1895]
21:47:29     INFO -  #72: mozilla::ipc::MessageChannel::MessageTask::Run() [ipc/glue/MessageChannel.cpp:1929]
21:47:29     INFO -  #73: nsThread::ProcessNextEvent(bool,bool *) [xpcom/threads/nsThread.cpp:1038]
21:47:29     INFO -  #74: NS_ProcessNextEvent(nsIThread *,bool) [xpcom/threads/nsThreadUtils.cpp:513]
21:47:29     INFO -  #75: mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate *) [ipc/glue/MessagePump.cpp:125]
21:47:29     INFO -  #76: MessageLoop::RunInternal() [ipc/chromium/src/base/message_loop.cc:326]
21:47:29     INFO -  #77: MessageLoop::RunHandler() [ipc/chromium/src/base/message_loop.cc:320]
21:47:29     INFO -  #78: MessageLoop::Run() [ipc/chromium/src/base/message_loop.cc:300]
21:47:29     INFO -  #79: nsBaseAppShell::Run() [widget/nsBaseAppShell.cpp:160]
21:47:29     INFO -  #80: nsAppShell::Run() [widget/windows/nsAppShell.cpp:230]
21:47:29     INFO -  #81: nsAppStartup::Run() [toolkit/components/startup/nsAppStartup.cpp:289]
21:47:29     INFO -  #82: XREMain::XRE_mainRun() [toolkit/xre/nsAppRunner.cpp:4686]
21:47:29     INFO -  #83: XREMain::XRE_main(int,char * * const,mozilla::BootstrapConfig const &) [toolkit/xre/nsAppRunner.cpp:4848]
21:47:29     INFO -  #84: XRE_main(int,char * * const,mozilla::BootstrapConfig const &) [toolkit/xre/nsAppRunner.cpp:4943]
21:47:29     INFO -  #85: mozilla::BootstrapImpl::XRE_main(int,char * * const,mozilla::BootstrapConfig const &) [toolkit/xre/Bootstrap.cpp:49]
21:47:29     INFO -  #86: do_main [browser/app/nsBrowserApp.cpp:232]
21:47:29     INFO -  #87: NS_internal_main(int,char * *,char * *) [browser/app/nsBrowserApp.cpp:304]
21:47:29     INFO -  #88: wmain [toolkit/xre/nsWindowsWMain.cpp:114]
21:47:29     INFO -  #89: __scrt_common_main_seh [f:/dd/vctools/crt/vcstartup/src/startup/exe_common.inl:253]

A JS stack here would be super-nice; nsAppStartup::Quit is notifying for quit-application and there are...surprisingly many JS observers for quit-application.  It might almost be easier to figure out who's creating timers in JS and work backwards from there.  Andrew, does that shed any light on the situation?
Flags: needinfo?(nfroyd) → needinfo?(continuation)
The profiler knows how to walk JS frames...I wonder if we could teach the leak logger to understand JS frames too.
Thanks. That confirms what I thought about it being JS. The quit thing is interesting. I can try looking at the observers for that like you said. There's DumpJSStack() or something which can get a JS stack, so presumably something could be juryrigged for the leak logger to get a JS stack. I can take a look at that tomorrow unless you want to. We probably only need the top frame.
- 46 failures in the last 7 days
- there is an increase in number of failures, starting with November 9th
- occurs only on the debug build type
- most failures are on the windows 7 and windows10-64 platforms

Here is a recent log file: https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=143948978&lineNumber=33754

And a snippet with the failure from it:

05:26:46     INFO -  TEST-INFO | leakcheck | default process: leaked 1 nsTimerImpl
33751
05:26:46     INFO -  TEST-INFO | leakcheck | default process: leaked 1 nsXPCWrappedJS
33752
05:26:46     INFO -  TEST-INFO | leakcheck | default process: leaked 1 nsXPCWrappedJSClass
33753
05:26:46     INFO -  TEST-INFO | leakcheck | default process: leaked 3 xptiInterfaceInfo
33754
05:26:46    ERROR -  2803 ERROR TEST-UNEXPECTED-FAIL | leakcheck | default process: 984 bytes leaked (BackstagePass, CondVar, IdlePeriod, Mutex, ThreadEventTarget, ...)
33755
05:26:46     INFO -  == BloatView: ALL (cumulative) LEAK AND BLOAT STATISTICS, gpu process 3472
33756
05:26:46     INFO -       |<----------------Class--------------->|<-----Bytes------>|<----Objects---->|
33757
05:26:46     INFO -       |                                      | Per-Inst   Leaked|   Total      Rem|
33758
05:26:46     INFO -     0 |TOTAL                                 |       23        0| 1051005        0|
33759
05:26:46     INFO -  nsTraceRefcnt::DumpStatistics: 138 entries
Flags: needinfo?(overholt)
Comment 18, comment 19, and comment 20 seem like they're on to something. I'll give this to mccr8 assuming he's still picking at options here.
Assignee: nobody → continuation
Flags: needinfo?(overholt)
Priority: -- → P3
Summary: Intermittent leakcheck | default process: 1992 bytes leaked (BackstagePass, CondVar, IdlePeriod, Mutex, ThreadEventTarget, ...) → Intermittent leakcheck | default process: 1992|1720|984 bytes leaked (BackstagePass, CondVar, IdlePeriod, Mutex, ThreadEventTarget, ...)
The frequency seems to have dropped off, though I don't know how the two dupes factor into that. I'll file a bug about adding a mode to get JS stacks for leaks.
Flags: needinfo?(continuation)
Depends on: 1420975
This has failed 33 times in the last 7 days on Linux x32, Linux x64, Windows 10, Windows 7, affecting debug build type. 
Failing test suites: debug-mochitest-e10, debug-mochitest-browser-chrome-e10,debug-crashtest-e10, debug-mochitest-browser-chrome-3, debug-mochitest-browser-chrome-1.

Link to a recent log: https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=154771041&lineNumber=13077.

This bug depends on https://bugzilla.mozilla.org/show_bug.cgi?id=1420975 

:overholt do you have any updates on this bug?
Flags: needinfo?(overholt)
I'll try to figure out something here.
Flags: needinfo?(overholt)
This increased a lot Sunday, so maybe something landed that made it worse.
It looks like about half of these failures are in -qr platforms. Maybe this leak happens a lot more with QuantumRender and that got enabled on tree herder recently?
It sounds like the windows QR builds are recent, but they were added in December, so if there's a regression, it was likely a code change instead of the builds being added.
I have a try push for win10-64-qe crash-e10s tests going, with JS stack logging added for nsTimer, but it hasn't hit the leak yet, in 93 retriggers. It doesn't help that about half of those runs are blue ("Duration:Not started (queued for 1,412 minute(s))").

https://treeherder.mozilla.org/#/jobs?repo=try&revision=e52927ec4b6291ae6895e7338e22a2a478316f6f
After about 120 retriggers I finally got a JS stack:

00:15:05     INFO -  JS allocation stack:
00:15:05     INFO -  0 looseTimer() ["resource:///modules/sessionstore/SessionStore.jsm":4825]
00:15:05     INFO -  1 ssi_onQuitApplicationGranted/<() ["resource:///modules/sessionstore/SessionStore.jsm":1617]
00:15:05     INFO -  2 trigger() ["resource://gre/modules/AsyncShutdown.jsm":704]
00:15:05     INFO -  3 _wait() ["resource://gre/modules/AsyncShutdown.jsm":851]
00:15:05     INFO -  4 wait() ["resource://gre/modules/AsyncShutdown.jsm":835]
00:15:05     INFO -  5 observe() ["resource://gre/modules/AsyncShutdown.jsm":518]
00:15:05     INFO -  6 onStopped() ["chrome://reftest/content/reftest.jsm":732]
00:15:05     INFO -  7 _notifyStopped() ["chrome://reftest/content/httpd.jsm":814]
00:15:05     INFO -  8 run() ["chrome://reftest/content/httpd.jsm":495]
A comment in that code says:
    // Ensure that the timer is both canceled once we are done with it
    // and not garbage-collected until then.
So maybe sometimes we're holding onto it for a little too long?

This code was added in bug 1255977.
Component: XPConnect → Session Restore
Product: Core → Firefox
Will, could you possibly look at this intermittent leak? Thanks.
Assignee: continuation → nobody
Flags: needinfo?(wiwang)
FWIW, that comment looks copy-pasted from code added in bug 917764, so maybe it's worth to take a look at that other promise too.
Depends on: 1429512
Thank you for finding this, Andrew!

I'm considering several solutions to resolve both this intermittent failure and the original crash issue, and I am going to post further actions.

Keep the NI for tracking.
Some possible solutions for now:
1. hold the timer shorter, as Andrew said in the comment 44
2. cancel the timer at timing between notification 'quit-application' and some notification at later stage
3. not using the loose timer?

For some reasons, I cannot reply soon and do more tests...
but I will try to leave my comment if I can.

Andrew and Nathan, thank you again for locating the timer here, I think it's not easy to find.

Yes emilio, I think it's a good time to examine that as well.
Flags: needinfo?(wiwang)
Taking a look tomorrow (GMT).
Assignee: nobody → mdeboer
Status: NEW → ASSIGNED
NB: the concern raised in comment 46 is fortunately _not_ a concern, because AsyncShutdown.jsm uses PromiseUtils.defer() properly and cancels the timer when it should, releasing its hold on the thread.
Comment on attachment 8943930 [details]
Bug 1392130 - Ensure that when one of the Sessionstore shutdown blockers finishes and wins the race, to properly cleanup the running timer and observers.

https://reviewboard.mozilla.org/r/214276/#review220448

::: browser/components/sessionstore/SessionStore.jsm:1626
(Diff revision 1)
> -              resolve();
> -            }, "ipc:content-shutdown");
> -          });
> -          promises.push(promiseICS);
>  
> -          return Promise.race(promises);
> +          let defers = [];

Now that there are `promises` and `defers`, could you document both?

::: browser/components/sessionstore/SessionStore.jsm:1628
(Diff revision 1)
> -          });
> -          promises.push(promiseICS);
>  
> -          return Promise.race(promises);
> +          let defers = [];
> +          // Ensure that the timer fires earlier than the AsyncShutdown crash timer.
> +          defers.push(this.looseTimer(AsyncShutdown.DELAY_CRASH_MS - 10000),

That is a very magic constant. Could you update the documentation of the method? Also, I'd be more comfortable if you gave a name to this constant.

Also, what happens if that's < 0 for some reason?

::: browser/components/sessionstore/SessionStore.jsm:1635
(Diff revision 1)
> +          // Add these monitors to the list of Promises to start the race.
> +          promises.push(...defers.map(deferred => deferred.promise));
> +
> +          // When a Promise won the race, make sure we clean up the running timer
> +          // and observers.
> +          return Promise.race(promises).then(() => defers.forEach(deferred => deferred.reject()));

Could you put this on several lines?

e.g.

```js
return Promise.race(promises)
    .then(() =>
       defers.forEach(deferred => deferred.reject())
    ));
```

::: toolkit/components/asyncshutdown/AsyncShutdown.jsm:365
(Diff revision 1)
>        return getPhase;
>      }
>      return undefined;
> +  },
> +
> +  get DELAY_CRASH_MS() {

Good idea.

Could you document it? Basically informing users of where this constant comes from.
Attachment #8943930 - Flags: review?(dteller) → review+
Pushed by mdeboer@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/6594f1cefd6a
Ensure that when one of the Sessionstore shutdown blockers finishes and wins the race, to properly cleanup the running timer and observers. r=Yoric
Comment on attachment 8943930 [details]
Bug 1392130 - Ensure that when one of the Sessionstore shutdown blockers finishes and wins the race, to properly cleanup the running timer and observers.

https://reviewboard.mozilla.org/r/214276/#review220510


Static analysis found 1 defect in this patch.
 - 1 defect found by mozlint

You can run this analysis locally with:
 - `./mach lint check path/to/file` (Python/Javascript/wpt)


If you see a problem in this automated review, please report it here: http://bit.ly/2y9N9Vx


::: browser/components/sessionstore/SessionStore.jsm:1628
(Diff revision 2)
> -              Services.obs.removeObserver(obs, topic);
> -              resolve();
> -            }, "oop-frameloader-crashed");
> -          });
> -          promises.push(promiseOFC);
> +            let deferred = PromiseUtils.defer();
> +            const cleanup = () => Services.obs.removeObserver(deferred.resolve, topic);
> +            Services.obs.addObserver(deferred.resolve, topic);
> +            deferred.promise.then(cleanup, cleanup);
> +            return deferred;
> +          }

Error: Missing semicolon. [eslint: semi]
Backout by aiakab@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9913d09905e2
Backed out changeset 6594f1cefd6a for eslint failures on builds/worker/checkouts/gecko/browser/components/sessionstore/SessionStore.jsm:1628:12
Pushed by mdeboer@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d08baa0066c7
Ensure that when one of the Sessionstore shutdown blockers finishes and wins the race, to properly cleanup the running timer and observers. r=Yoric
https://hg.mozilla.org/mozilla-central/rev/d08baa0066c7
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 60
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Andrew, I think my patch here should've fixed the leak... isn't the bug supposed to be reopened when we can verify that the patch that just landed did _not_ fix the leak? Or did you see that already?
Flags: needinfo?(continuation)
My impression was that for intermittent oranges it was good to leave them open for a bit to verify that the orange actually went away. This signature is also generic enough that there may be other causes for this leak.
Flags: needinfo?(continuation)
Alright, a bit of process here and there never hurts ;)
Status: REOPENED → ASSIGNED
https://wiki.mozilla.org/Bug_Triage#Intermittent_Test_Failure_Cleanup
Status: ASSIGNED → RESOLVED
Closed: 6 years ago6 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: