Closed Bug 1308652 Opened 3 years ago Closed 3 years ago
58 bytes, text/x-review-board-request
58 bytes, text/x-review-board-request
I'm looking at bug 1271182 some more, and in particular the negative leaks of an IPC::Message, and my current best guess is that there is some kind of double free occurring. Either that or there is some mistake with, say, the way MOZ_COUNT_CTOR or DTOR is called with a move or copy constructor for IPC::Message. I added a check to NS_LogDtor, in the part of the code relating to tracking serial numbers of live objects, to ensure that when we destroy an object that we currently think it is live, and it fails to match that, with this stack: #01: IPC::Message::~Message() [mfbt/RefPtr.h:77] #02: mozilla::ipc::MessageChannel::DequeueOne(IPC::Message*) [clang/include/c++/v1/deque:2572] #03: mozilla::ipc::MessageChannel::OnMaybeDequeueOne() [ipc/glue/MessageChannel.cpp:1567] #04: mozilla::detail::RunnableMethodImpl<bool (mozilla::ipc::MessageChannel::*)(), false, true>::Run [xpcom/glue/nsThreadUtils.h:729] #05: mozilla::ipc::MessageChannel::DequeueTask::Run() [ipc/glue/MessageChannel.h:561] #06: nsThread::ProcessNextEvent(bool, bool*) [xpcom/glue/nsCOMPtr.h:402] #07: NS_ProcessPendingEvents(nsIThread*, unsigned int) [xpcom/glue/nsThreadUtils.cpp:232] #08: nsBaseAppShell::NativeEventCallback() [widget/nsBaseAppShell.cpp:98] [...] #24: XREMain::XRE_main(int, char**, nsXREAppData const*) [toolkit/xre/nsAppRunner.cpp:4542] from this try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=022493e337d1 ASan runs haven't picked up anything like this, so a logging problem seems more likely, but then again this negative leak issue is usually happening on OSX, and we only run ASan on Linux. I'm adding some additional instrumentation to distinguish the case where we never saw a ctor at all from the case where we already saw a dtor for the object. Hopefully I'll still be able to reproduce it. https://treeherder.mozilla.org/#/jobs?repo=try&revision=49def6c4c5e633a160662498e6723a95e5f31513
I should also say that the negative leak of an IPC::Message always occurs alongside a negative leak of CancelableRunnable and a Runnable, but sometimes there is a negative leak of the runnables without the message.
The test in question, browser_aboutURLs.js, loads every about: page. I'm not sure what the goal of that is. In one failing run that I added logging to, the "double free" was happening during the loading of about:bloat. This page, with the behavior defined in nsAboutBloat.cpp, calls nsTraceRefcnt::DumpStatistics(). One of the first things that method does is set gLogging to NoLogging, which stops functions like NS_LogCtor from doing anything. I suspect what is happening is that while we're in the middle of this call, some other thread (in this particular case it appears to be IPC related, but I'm not sure how this is happening on another thread) creates some stuff, but nsTraceRefcnt doesn't record that fact because it is NoLogging. Then later we destroy that object, so we get a negative leak. The easiest fix is just to make this test not do about:bloat (there's already a mechanism for it). Fixing about:bloat doesn't really seem like it is worth the effort, but maybe at least a non-fatal assert for this situation would at least prevent this mysterious failures. I'll confirm my hypothesis and unhide this bug if it is correct. As a side note, the test browser_aboutURLs.js looks like it is permafail on L64 debug now: Assertion failure: originAttrsLoadInfo.mUserContextId == originAttrsLoadContext.mUserContextId (The value of mUserContextId in the loadContext and in the loadInfo are not the same!) https://treeherder.mozilla.org/#/jobs?repo=try&revision=0bc0f0053c844e2d614c9f585cbb7225caea2929&selectedJob=28866297 I'm not sure how bad that assertion is...
Assignee: nobody → continuation
Component: IPC → XPCOM
(In reply to Andrew McCreight [:mccr8] from comment #2) > The easiest fix is just to make this test not do about:bloat (there's > already a mechanism for it). Fixing about:bloat doesn't really seem like it > is worth the effort, but maybe at least a non-fatal assert for this > situation would at least prevent this mysterious failures. Does anybody really use about:bloat at this point? Could we just remove it entirely?
(In reply to Nathan Froyd [:froydnj] from comment #3) > Does anybody really use about:bloat at this point? Could we just remove it > entirely? I doubt it. I'll email dbaron, as he seems like the most likely person.
dbaron said via email he does not use about:bloat. I'll remove it. I haven't been able to verify that this is what is happening here, but I'm almost certain that is what the issue is. There's some nsTraceRefcnt infrastructure that exists only to support about:bloat, and I'll file a followup to remove that.
Component: XPCOM → Networking
Summary: Possible double free in MessageChannel::DequeueOne() → Remove about:bloat
try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d160c400f75002751696c764c5490b68e661d6d9 I'm waiting on some retriggers of bc4 to try to get higher confidence that this actually fixed the issue.
Comment on attachment 8799551 [details] Bug 1308652, part 2 - Only allow nsTraceRefcnt::DumpStatistics to be called once. https://reviewboard.mozilla.org/r/84692/#review83298
Attachment #8799551 - Flags: review?(nfroyd) → review+
Comment on attachment 8799550 [details] Bug 1308652, part 1 - Remove about:bloat. https://reviewboard.mozilla.org/r/84690/#review83296 Do we not need to remove any tests that reference this? Or are those tests perma-disabled everywhere?
Attachment #8799550 - Flags: review?(nfroyd) → review+
(In reply to Nathan Froyd [:froydnj] from comment #10) > Do we not need to remove any tests that reference this? Or are those tests > perma-disabled everywhere? There are no tests as far as I can tell, aside from aboutURLs which just iterates over every about: page automatically, so no changes are needed. If there were, there would be more intermittent leaks.
Pushed by firstname.lastname@example.org: https://hg.mozilla.org/integration/autoland/rev/e90822faa5d1 part 1 - Remove about:bloat. r=froydnj https://hg.mozilla.org/integration/autoland/rev/ebae7411d587 part 2 - Only allow nsTraceRefcnt::DumpStatistics to be called once. r=froydnj
(In reply to Andrew McCreight [:mccr8] from comment #2) > As a side note, the test browser_aboutURLs.js looks like it is permafail on > L64 debug now: Is there a follow-up bug filed for fixing and re-enabling this test on Linux/Windows?
Comment on attachment 8799550 [details] Bug 1308652, part 1 - Remove about:bloat. Approval Request Comment [Feature/regressing bug #]: Sort of bug 1266022, though the underlying issue is very old. (from at least 1999...) [User impact if declined]: None. This is just causing intermittent failures on TreeHerder. [Describe test coverage new/current, TreeHerder]: I added some assertions to guard against this particular failure. [Risks and why]: Very low. It just deletes some debug-only code. [String/UUID change made/needed]: none
(In reply to Ryan VanderMeulen [:RyanVM] from comment #13) > Is there a follow-up bug filed for fixing and re-enabling this test on > Linux/Windows? I was just going to do that in bug 1271182.
Comment on attachment 8799550 [details] Bug 1308652, part 1 - Remove about:bloat. Fixes an intermittent, the patch looks trivial, Aurora51+, Beta50+
You need to log in before you can comment on or make changes to this bug.