Closed Bug 833143 Opened 8 years ago Closed 4 years ago

Don't GC in nsXREDirProvider::DoShutdown()


(Core :: XPCOM, defect)

Not set



Tracking Status
firefox-esr52 --- wontfix
firefox54 --- wontfix
firefox55 --- wontfix
firefox56 --- fixed


(Reporter: smaug, Assigned: smaug)


(Blocks 3 open bugs)



(1 file, 1 obsolete file)

For some reason we force a GC (but not CC) in nsXREDirProvider::DoShutdown().
Just running one GC may not release too much.

Based on the blame the code has been there way before cycle collector was implemented,849#827
Summary: Don't for GC in nsXREDirProvider::DoShutdown() → Don't GC in nsXREDirProvider::DoShutdown()
Benjamin, by any chance, do you recall the reason for the GC?
Is that Phase 2c really needed and why?
No, this was a port of the old seamonkey startup code in the Firefox 0.9.3 cycle, and so I kept this out of caution, not because I think we need it.
Assignee: nobody → bugs
Attached patch don't GC (obsolete) — Splinter Review
Crossing fingers try results will look ok

(though, I really don't see any reasons for that GC.)
Originally the JS_GC is from Bug 97622.
The reasoning to add GC calls was weak
The crashes should have been fixed in some other way.
Kaie, perhaps you have some comments to this?
I guess the problems related to NSS shutdown were fixed in Bug 189205, and the other problem
was about autocomplete+mork which isn't a combination used anymore.
Attachment #705095 - Flags: review?(wmccloskey)
Attachment #705095 - Flags: review?(benjamin)
Attachment #705095 - Flags: review?(wmccloskey) → review+
Attachment #705095 - Flags: review?(benjamin) → review+
Closed: 8 years ago
Resolution: --- → FIXED
And backed out. WebRTC code has some bug...
Depends on: 834383
No longer depends on: 834383
Resolution: FIXED → ---
Depends on: 834383
This is, unsurprising, the major contributor for a few of the slow shutdown profiles collected recently.
Yup. Can't land the patch before WebRTC (bug 834383) is fixed.

Crossing fingers. Maybe better luck this time. Tryserver looked ok.
Closed: 8 years ago8 years ago
Resolution: --- → FIXED
Depends on: 837843
Resolution: FIXED → ---
Duplicate of this bug: 580068
Is this available again to try to fix?
Flags: needinfo?(ryanvm)
Better to ask smaug that.
Flags: needinfo?(ryanvm) → needinfo?(bugs)
FWIW, about:compartments doesn't exist any more. Of course, nobody figured out what the problem was, so there could be other lurking problems.
How can we know if we don't try to merge the code again and see if someone finds a failure?
Flags: needinfo?(continuation)
Well, somebody should at least try opening various about:pages and see if it hangs, with the patch applied.
Flags: needinfo?(continuation)
brunoais, maybe you could test this? The patch applies to m-c with --fuzz=6
Flags: needinfo?(bugs)
Sorry. I don't know the terminology well.
What does "m-c" mean?

Is "--fuzz=6" the command line argument to add to "activate" this merge?
Flags: needinfo?(bugs)
I mean could you build latest version of mozilla-central (m-c) with the patch.
To patch the source code you'd need to run for example
patch -p1 --fuzz=6 < /path/to/the/patch
Flags: needinfo?(bugs)
Where can I find the program called "patch"?
Flags: needinfo?(bugs)
Ok, perhaps it is easier if I just upload the patch to try and you can try the binaries :)

Wait for an hour or two, and check whether
the builds are ready (no need to wait all the tests to be ready), and after that they should show up 
Look for 947d95a6a657 there.
Flags: needinfo?(bugs)
I have been experimenting with a copy of my main firefox profile and it seems to be working as expected. I'll try now to keep it on for a full day and use it a bit in the mean time.
I've been trying through all this days as a secondary browser. I've been using almost as a mirror or my main version (current stable one) (side-by-side and repeating operations except in things that should only be done once).

It ran as expected and without problems. So I don't know if I was really useful or not because I was unable to break it... I hope I was.
Flags: needinfo?(bugs)
Sounds like we could try landing the patch again.
(In theory the patch itself is just fine, and if it causes problems, then issues are elsewhere. )

Keeping the needinfo so that I don't forget to land the patch.
Really odd looking crash. Dalvik doesn't like the change?
The crashes are all inside Java_org_mozilla_gecko_GeckoAppShell_notifyBatteryChange.  I wonder if there's some kind of battery notification thing that is normally getting shut down by this last GC, but with your patch it isn't, so very deep in shutdown we get a battery notification and but the event loop is gone or whatever and we crash.
kats, do you know anything about this notifyBatteryCharge thing and how it is supposed to be shut down, or who might know about it?  Thanks.
Flags: needinfo?(bugmail.mozilla)
I don't know too much about it but from looking at the code that notification should not get fired if the notifications are disabled by calling [1]. Working backwards that should happen once all the observers are unregistered from the sBatteryObservers manager in Hal.cpp, by calling [2]. There's a handful of callers to this [3] so perhaps it would be easiest to add logging to those callers and see which one fires when we have a GC but doesn't fire without the GC. Or if you're familiar with those call sites you could do a code inspection and maybe figure it out.

Flags: needinfo?(bugmail.mozilla)
It sounds like this is causing a common source of shutdown hangs, so it might be worth doing a full try push and see how things are.
I guess I'll do that this week. If Android is still a problem, we can just leave this GC on Android. There's probably less to clean up there, so it shouldn't be as much of an issue.
Flags: needinfo?(continuation)
Depends on: 1359935
Android looked good. I only saw one failure, bug 833143, which was just a minor test problem, which I have a patch for.
Attachment #705095 - Attachment is obsolete: true
Note to sheriffs: this may cause new intermittent leaks.
Keywords: checkin-needed
Pushed by
Don't GC in nsXREDirProvider::DoShutdown. r=bsmedberg, r=billm
Keywords: checkin-needed
Backed out bug 1355480 and bug 833143 on suspicion of triggering assertion CompositorThreadHolder::IsInCompositorThread() on Windows 8 x64 M-e10s:

Bug 1355480

Bug 833143

Push with failure:
Failure log:

13:02:10     INFO - GECKO(1800) | Assertion failure: CompositorThreadHolder::IsInCompositorThread(), at c:/builds/moz2_slave/m-in-w64-d-0000000000000000000/build/src/gfx/layers/AnimationHelper.cpp:28
13:02:10     INFO - GECKO(1800) | #01: mozilla::layers::CompositorAnimationStorage::Release() [obj-firefox/dist/include/mozilla/layers/AnimationHelper.h:83]
13:02:10     INFO - 
13:02:10     INFO - GECKO(1800) | #02: mozilla::layers::CompositorBridgeParent::ActorDestroy(mozilla::ipc::IProtocol::ActorDestroyReason) [gfx/layers/ipc/CompositorBridgeParent.cpp:644]
13:02:10     INFO - 
13:02:10     INFO - GECKO(1800) | #03: mozilla::layers::PCompositorBridgeParent::OnChannelClose() [obj-firefox/ipc/ipdl/PCompositorBridgeParent.cpp:1808]
13:02:10     INFO - 
13:02:10     INFO - GECKO(1800) | #04: mozilla::ipc::MessageChannel::NotifyChannelClosed() [ipc/glue/MessageChannel.cpp:2651]
13:02:10     INFO - 
13:02:10     INFO - GECKO(1800) | #05: mozilla::ipc::MessageChannel::OnNotifyMaybeChannelError() [ipc/glue/MessageChannel.cpp:2512]
13:02:10     INFO - 
13:02:10     INFO - GECKO(1800) | #06: mozilla::detail::RunnableMethodImpl<mozilla::ipc::MessageChannel * const,void ( mozilla::ipc::MessageChannel::*)(void),0,1>::Run() [xpcom/threads/nsThreadUtils.h:911]
13:02:10     INFO - 
13:02:10     INFO - GECKO(1800) | #07: MessageLoop::RunTask(already_AddRefed<mozilla::Runnable>) [ipc/chromium/src/base/]
13:02:10     INFO - 
13:02:10     INFO - GECKO(1800) | #08: MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask &&) [ipc/chromium/src/base/]
13:02:10     INFO - 
13:02:10     INFO - GECKO(1800) | #09: MessageLoop::DoWork() [ipc/chromium/src/base/]
13:02:10     INFO - 
13:02:10     INFO - GECKO(1800) | #10: base::MessagePumpForUI::DoRunLoop() [ipc/chromium/src/base/]
13:02:10     INFO - 
13:02:10     INFO - GECKO(1800) | #11: base::MessagePumpWin::RunWithDispatcher(base::MessagePump::Delegate *,base::MessagePumpWin::Dispatcher *) [ipc/chromium/src/base/]
13:02:10     INFO - 
13:02:10     INFO - GECKO(1800) | #12: MessageLoop::RunHandler() [ipc/chromium/src/base/]
13:02:10     INFO - 
13:02:10     INFO - GECKO(1800) | #13: MessageLoop::Run() [ipc/chromium/src/base/]
13:02:10     INFO - 
13:02:10     INFO - GECKO(1800) | #14: base::Thread::ThreadMain() [ipc/chromium/src/base/]
13:02:10     INFO - 
13:02:10     INFO - GECKO(1800) | #15: `anonymous namespace'::ThreadFunc [ipc/chromium/src/base/]
13:02:10     INFO - 
13:02:10     INFO - GECKO(1800) | #16: KERNEL32.DLL + 0x167e
13:02:10     INFO - 
13:02:10     INFO - GECKO(1800) | #17: ntdll.dll + 0x1c3f1
Flags: needinfo?(continuation)
Depends on: 1361825
Flags: needinfo?(continuation)
There's a similar looking assertion failure on OSX in the failure link in comment 43. I'm unable to reproduce locally. I'll see if it shows up in a retrigger. There's no stack, which might make it harder to figure out. Bug 1361825 (and the possible OS X issue) feels like they are just an existing race between IPDL ActorDestroy (for Compositor or GMP) and thread shutdown.
This looks green on OSX and Windows:

Maybe some existing code fixed the compositor shutdown race or whatever it was.
Component: General → XPCOM
Keywords: checkin-needed
Pushed by
Don't GC in nsXREDirProvider::DoShutdown. r=bsmedberg, r=billm
Keywords: checkin-needed
Closed: 8 years ago4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Comment on attachment 8863040 [details] [diff] [review]
Don't GC in nsXREDirProvider::DoShutdown

Approval Request Comment
[Feature/Bug causing the regression]: very old
[User impact if declined]: This fixes a major source of shutdown hangs. I see almost 10,000 crashes across all versions that have shutdownhang in the signature and JS_GC in the proto signature, and this patch should fix at least some of them. I see 160k shutdownhang crashes that do NOT contain JS_GC, so maybe that's about 6.25% of all shutdown hangs.
[Is this code covered by automated tests?]: yes
[Has the fix been verified in Nightly?]: It has been in Nightly for a couple of weeks. No regressions have been reported, and no shutdown hangs with JS_GC in the proto signature have been reported since then.
[Needs manual test from QE? If yes, steps to reproduce]: no
[List of other uplifts needed for the feature/fix]: none
[Is the change risky?]: Somewhat.
[Why is the change risky/not risky?]:  Changing shutdown is always a little risky, and this patch did get backed out a few times over the years for weird issues in debug builds (that were detected in automation).
[String changes made/needed]: none
Attachment #8863040 - Flags: approval-mozilla-beta?
> so maybe that's about 6.25% of all shutdown hangs.
Sorry, my math was wrong. That should be "about 5.8% of all shutdown hangs".
OS: Linux → All
Hardware: x86 → All
Version: unspecified → Trunk
Comment on attachment 8863040 [details] [diff] [review]
Don't GC in nsXREDirProvider::DoShutdown

I think given the age of this bug it doesn't need to skip trains.
Attachment #8863040 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
Depends on: 1340425
Duplicate of this bug: 1364798
You need to log in before you can comment on or make changes to this bug.