833143 - Don't GC in nsXREDirProvider::DoShutdown()

Assignee

Description

•

13 years ago

For some reason we force a GC (but not CC) in nsXREDirProvider::DoShutdown(). Just running one GC may not release too much. Based on the blame the code has been there way before cycle collector was implemented http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/toolkit/xre/nsXREDirProvider.cpp&rev=1.69&mark=827,849#827

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 1

•

13 years ago

More recent version of the code http://hg.mozilla.org/mozilla-central/annotate/425835f6a9e2/toolkit/xre/nsXREDirProvider.cpp#l838

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Updated

•

13 years ago

Summary: Don't for GC in nsXREDirProvider::DoShutdown() → Don't GC in nsXREDirProvider::DoShutdown()

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 2

•

13 years ago

Benjamin, by any chance, do you recall the reason for the GC? Is that Phase 2c really needed and why?

Benjamin Smedberg

Comment 3

•

13 years ago

No, this was a port of the old seamonkey startup code in the Firefox 0.9.3 cycle, and so I kept this out of caution, not because I think we need it.

Benoit Girard (:BenWa)

Updated

•

13 years ago

Blocks: shutdown-faster

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Updated

•

13 years ago

Assignee: nobody → bugs

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 4

•

13 years ago

Attached patch don't GC (obsolete) — Details — Splinter Review

Crossing fingers try results will look ok https://tbpl.mozilla.org/?tree=Try&rev=e278f5a296fb (though, I really don't see any reasons for that GC.)

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 5

•

13 years ago

Originally the JS_GC is from Bug 97622. The reasoning to add GC calls was weak https://bugzilla.mozilla.org/show_bug.cgi?id=97622#c66 The crashes should have been fixed in some other way.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 6

•

13 years ago

Kaie, perhaps you have some comments to this?

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 7

•

13 years ago

I guess the problems related to NSS shutdown were fixed in Bug 189205, and the other problem was about autocomplete+mork which isn't a combination used anymore.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Updated

•

13 years ago

Attachment #705095 - Flags: review?(wmccloskey)

Attachment #705095 - Flags: review?(benjamin)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Updated

•

13 years ago

Attachment #705095 - Flags: review?(wmccloskey) → review+

Benjamin Smedberg

Updated

•

13 years ago

Attachment #705095 - Flags: review?(benjamin) → review+

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 8

•

13 years ago

https://hg.mozilla.org/mozilla-central/rev/689690a17de3

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 9

•

13 years ago

And backed out. WebRTC code has some bug... https://hg.mozilla.org/mozilla-central/rev/680e46fecff0

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Updated

•

13 years ago

Depends on: 834383

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Updated

•

13 years ago

Status: RESOLVED → REOPENED

No longer depends on: 834383

Resolution: FIXED → ---

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Updated

•

13 years ago

Depends on: 834383

Benoit Girard (:BenWa)

Comment 10

•

13 years ago

This is, unsurprising, the major contributor for a few of the slow shutdown profiles collected recently.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 11

•

13 years ago

Yup. Can't land the patch before WebRTC (bug 834383) is fixed.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 12

•

13 years ago

https://hg.mozilla.org/mozilla-central/rev/31268d71c33c Crossing fingers. Maybe better luck this time. Tryserver looked ok.

Status: REOPENED → RESOLVED

Closed: 13 years ago → 13 years ago

Resolution: --- → FIXED

Alice0775 White

Updated

•

13 years ago

Depends on: 837843

Ryan VanderMeulen [:RyanVM]

Comment 13

•

13 years ago

Backed out for bug 837843. https://hg.mozilla.org/mozilla-central/rev/2360c3c46aca

Ryan VanderMeulen [:RyanVM]

Updated

•

13 years ago

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

brunoais

Comment 15

•

11 years ago

Is this available again to try to fix?

brunoais

Updated

•

11 years ago

Flags: needinfo?(ryanvm)

Ryan VanderMeulen [:RyanVM]

Comment 16

•

11 years ago

Better to ask smaug that.

Flags: needinfo?(ryanvm) → needinfo?(bugs)

Andrew McCreight [:mccr8]

Comment 17

•

11 years ago

FWIW, about:compartments doesn't exist any more. Of course, nobody figured out what the problem was, so there could be other lurking problems.

brunoais

Comment 18

•

11 years ago

How can we know if we don't try to merge the code again and see if someone finds a failure?

Flags: needinfo?(continuation)

Andrew McCreight [:mccr8]

Comment 19

•

11 years ago

Well, somebody should at least try opening various about:pages and see if it hangs, with the patch applied.

Flags: needinfo?(continuation)

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 20

•

11 years ago

brunoais, maybe you could test this? The patch applies to m-c with --fuzz=6

Flags: needinfo?(bugs)

brunoais

Comment 21

•

11 years ago

Sorry. I don't know the terminology well. What does "m-c" mean? Is "--fuzz=6" the command line argument to add to "activate" this merge?

Flags: needinfo?(bugs)

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 22

•

11 years ago

I mean could you build latest version of mozilla-central (m-c) with the patch. To patch the source code you'd need to run for example patch -p1 --fuzz=6 < /path/to/the/patch

Flags: needinfo?(bugs)

brunoais

Comment 23

•

11 years ago

Where can I find the program called "patch"?

brunoais

Updated

•

11 years ago

Flags: needinfo?(bugs)

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 24

•

11 years ago

Ok, perhaps it is easier if I just upload the patch to try and you can try the binaries :) Wait for an hour or two, and check https://tbpl.mozilla.org/?tree=Try&rev=947d95a6a657 whether the builds are ready (no need to wait all the tests to be ready), and after that they should show up in http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/?C=N;O=A Look for 947d95a6a657 there.

Flags: needinfo?(bugs)

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 25

•

11 years ago

Builds are here http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/opettay@mozilla.com-947d95a6a657/

brunoais

Comment 26

•

11 years ago

I have been experimenting with a copy of my main firefox profile and it seems to be working as expected. I'll try now to keep it on for a full day and use it a bit in the mean time.

brunoais

Comment 27

•

11 years ago

I've been trying through all this days as a secondary browser. I've been using almost as a mirror or my main version (current stable one) (side-by-side and repeating operations except in things that should only be done once). It ran as expected and without problems. So I don't know if I was really useful or not because I was unable to break it... I hope I was.

Flags: needinfo?(bugs)

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 28

•

11 years ago

Sounds like we could try landing the patch again. (In theory the patch itself is just fine, and if it causes problems, then issues are elsewhere. ) Keeping the needinfo so that I don't forget to land the patch.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 29

•

11 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/006353e45fa7

Flags: needinfo?(bugs)

Ed Morley [:emorley]

Comment 30

•

11 years ago

Reverted for robocop crashes: https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=006353e45fa7&jobname=robocop remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/d014b12e6afc

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 31

•

11 years ago

Really odd looking crash. Dalvik doesn't like the change?

Andrew McCreight [:mccr8]

Comment 32

•

11 years ago

The crashes are all inside Java_org_mozilla_gecko_GeckoAppShell_notifyBatteryChange. I wonder if there's some kind of battery notification thing that is normally getting shut down by this last GC, but with your patch it isn't, so very deep in shutdown we get a battery notification and but the event loop is gone or whatever and we crash.

Andrew McCreight [:mccr8]

Comment 33

•

11 years ago

kats, do you know anything about this notifyBatteryCharge thing and how it is supposed to be shut down, or who might know about it? Thanks.

Flags: needinfo?(bugmail.mozilla)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 34

•

11 years ago

I don't know too much about it but from looking at the code that notification should not get fired if the notifications are disabled by calling [1]. Working backwards that should happen once all the observers are unregistered from the sBatteryObservers manager in Hal.cpp, by calling [2]. There's a handful of callers to this [3] so perhaps it would be easiest to add logging to those callers and see which one fires when we have a GC but doesn't fire without the GC. Or if you're familiar with those call sites you could do a code inspection and maybe figure it out. [1] http://mxr.mozilla.org/mozilla-central/source/mobile/android/base/GeckoBatteryManager.java?rev=2b5e76962106#190 [2] http://mxr.mozilla.org/mozilla-central/source/hal/Hal.cpp?rev=326396f9cc4e#350 [3] http://mxr.mozilla.org/mozilla-central/ident?i=UnregisterBatteryObserver

Flags: needinfo?(bugmail.mozilla)

Andrew McCreight [:mccr8]

Updated

•

9 years ago

Blocks: 1359840

Andrew McCreight [:mccr8]

Comment 35

•

9 years ago

It sounds like this is causing a common source of shutdown hangs, so it might be worth doing a full try push and see how things are.

Andrew McCreight [:mccr8]

Comment 36

•

9 years ago

I guess I'll do that this week. If Android is still a problem, we can just leave this GC on Android. There's probably less to clean up there, so it shouldn't be as much of an issue.

Flags: needinfo?(continuation)

Andrew McCreight [:mccr8]

Comment 37

•

9 years ago

try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=83b53f8d4bec1fe94e11ccb8b9574251f0c255cc

Flags: needinfo?(continuation)

Andrew McCreight [:mccr8]

Updated

•

9 years ago

Depends on: 1359935

Andrew McCreight [:mccr8]

Comment 38

•

9 years ago

Android looked good. I only saw one failure, bug 833143, which was just a minor test problem, which I have a patch for.

Andrew McCreight [:mccr8]

Comment 39

•

9 years ago

Attached patch Don't GC in nsXREDirProvider::DoShutdown — Details — Splinter Review

Attachment #8863040 - Flags: review+

Andrew McCreight [:mccr8]

Updated

•

9 years ago

Attachment #705095 - Attachment is obsolete: true

Andrew McCreight [:mccr8]

Comment 40

•

9 years ago

try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=686633edd3a68ac6eeb8e10dc0e4e3532a0bbce1

Andrew McCreight [:mccr8]

Comment 41

•

9 years ago

Note to sheriffs: this may cause new intermittent leaks.

Keywords: checkin-needed

Pulsebot

Comment 42

•

9 years ago

Pushed by ryanvm@gmail.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/5e01d4124eae Don't GC in nsXREDirProvider::DoShutdown. r=bsmedberg, r=billm

Keywords: checkin-needed

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 43

•

9 years ago

Backed out bug 1355480 and bug 833143 on suspicion of triggering assertion CompositorThreadHolder::IsInCompositorThread() on Windows 8 x64 M-e10s: Bug 1355480 https://hg.mozilla.org/integration/mozilla-inbound/rev/1e002af2f7019f8b45a05722efb882043962ae75 Bug 833143 https://hg.mozilla.org/integration/mozilla-inbound/rev/eb15e1b1baee587c057ecc907759e223bf421315 Push with failure: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=5e01d4124eaefeb0702f31689a9223f14351368e&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=96034710&repo=mozilla-inbound 13:02:10 INFO - GECKO(1800) | Assertion failure: CompositorThreadHolder::IsInCompositorThread(), at c:/builds/moz2_slave/m-in-w64-d-0000000000000000000/build/src/gfx/layers/AnimationHelper.cpp:28 13:02:10 INFO - GECKO(1800) | #01: mozilla::layers::CompositorAnimationStorage::Release() [obj-firefox/dist/include/mozilla/layers/AnimationHelper.h:83] 13:02:10 INFO - 13:02:10 INFO - GECKO(1800) | #02: mozilla::layers::CompositorBridgeParent::ActorDestroy(mozilla::ipc::IProtocol::ActorDestroyReason) [gfx/layers/ipc/CompositorBridgeParent.cpp:644] 13:02:10 INFO - 13:02:10 INFO - GECKO(1800) | #03: mozilla::layers::PCompositorBridgeParent::OnChannelClose() [obj-firefox/ipc/ipdl/PCompositorBridgeParent.cpp:1808] 13:02:10 INFO - 13:02:10 INFO - GECKO(1800) | #04: mozilla::ipc::MessageChannel::NotifyChannelClosed() [ipc/glue/MessageChannel.cpp:2651] 13:02:10 INFO - 13:02:10 INFO - GECKO(1800) | #05: mozilla::ipc::MessageChannel::OnNotifyMaybeChannelError() [ipc/glue/MessageChannel.cpp:2512] 13:02:10 INFO - 13:02:10 INFO - GECKO(1800) | #06: mozilla::detail::RunnableMethodImpl<mozilla::ipc::MessageChannel * const,void ( mozilla::ipc::MessageChannel::*)(void),0,1>::Run() [xpcom/threads/nsThreadUtils.h:911] 13:02:10 INFO - 13:02:10 INFO - GECKO(1800) | #07: MessageLoop::RunTask(already_AddRefed<mozilla::Runnable>) [ipc/chromium/src/base/message_loop.cc:362] 13:02:10 INFO - 13:02:10 INFO - GECKO(1800) | #08: MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask &&) [ipc/chromium/src/base/message_loop.cc:372] 13:02:10 INFO - 13:02:10 INFO - GECKO(1800) | #09: MessageLoop::DoWork() [ipc/chromium/src/base/message_loop.cc:444] 13:02:10 INFO - 13:02:10 INFO - GECKO(1800) | #10: base::MessagePumpForUI::DoRunLoop() [ipc/chromium/src/base/message_pump_win.cc:213] 13:02:10 INFO - 13:02:10 INFO - GECKO(1800) | #11: base::MessagePumpWin::RunWithDispatcher(base::MessagePump::Delegate *,base::MessagePumpWin::Dispatcher *) [ipc/chromium/src/base/message_pump_win.cc:58] 13:02:10 INFO - 13:02:10 INFO - GECKO(1800) | #12: MessageLoop::RunHandler() [ipc/chromium/src/base/message_loop.cc:232] 13:02:10 INFO - 13:02:10 INFO - GECKO(1800) | #13: MessageLoop::Run() [ipc/chromium/src/base/message_loop.cc:212] 13:02:10 INFO - 13:02:10 INFO - GECKO(1800) | #14: base::Thread::ThreadMain() [ipc/chromium/src/base/thread.cc:182] 13:02:10 INFO - 13:02:10 INFO - GECKO(1800) | #15: `anonymous namespace'::ThreadFunc [ipc/chromium/src/base/platform_thread_win.cc:29] 13:02:10 INFO - 13:02:10 INFO - GECKO(1800) | #16: KERNEL32.DLL + 0x167e 13:02:10 INFO - 13:02:10 INFO - GECKO(1800) | #17: ntdll.dll + 0x1c3f1

Flags: needinfo?(continuation)

Andrew McCreight [:mccr8]

Updated

•

9 years ago

Depends on: 1361825

Andrew McCreight [:mccr8]

Updated

•

9 years ago

Flags: needinfo?(continuation)

Andrew McCreight [:mccr8]

Comment 44

•

9 years ago

There's a similar looking assertion failure on OSX in the failure link in comment 43. I'm unable to reproduce locally. I'll see if it shows up in a retrigger. There's no stack, which might make it harder to figure out. Bug 1361825 (and the possible OS X issue) feels like they are just an existing race between IPDL ActorDestroy (for Compositor or GMP) and thread shutdown.

Andrew McCreight [:mccr8]

Updated

•

9 years ago

Blocks: 1359863

Andrew McCreight [:mccr8]

Comment 45

•

8 years ago

This looks green on OSX and Windows: https://treeherder.mozilla.org/#/jobs?repo=try&revision=7474ea38e9ed8a5a946cc17a42a06fdc75d42472 Maybe some existing code fixed the compositor shutdown race or whatever it was.

Component: General → XPCOM

Keywords: checkin-needed

Pulsebot

Comment 46

•

8 years ago

Pushed by ryanvm@gmail.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/e2efdd8dcd82 Don't GC in nsXREDirProvider::DoShutdown. r=bsmedberg, r=billm

Keywords: checkin-needed

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 47

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/e2efdd8dcd82

Status: REOPENED → RESOLVED

Closed: 13 years ago → 8 years ago

status-firefox56: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla56

Andrew McCreight [:mccr8]

Updated

•

8 years ago

Blocks: 1375592

Andrew McCreight [:mccr8]

Comment 48

•

8 years ago

Comment on attachment 8863040 [details] [diff] [review] Don't GC in nsXREDirProvider::DoShutdown Approval Request Comment [Feature/Bug causing the regression]: very old [User impact if declined]: This fixes a major source of shutdown hangs. I see almost 10,000 crashes across all versions that have shutdownhang in the signature and JS_GC in the proto signature, and this patch should fix at least some of them. I see 160k shutdownhang crashes that do NOT contain JS_GC, so maybe that's about 6.25% of all shutdown hangs. [Is this code covered by automated tests?]: yes [Has the fix been verified in Nightly?]: It has been in Nightly for a couple of weeks. No regressions have been reported, and no shutdown hangs with JS_GC in the proto signature have been reported since then. [Needs manual test from QE? If yes, steps to reproduce]: no [List of other uplifts needed for the feature/fix]: none [Is the change risky?]: Somewhat. [Why is the change risky/not risky?]: Changing shutdown is always a little risky, and this patch did get backed out a few times over the years for weird issues in debug builds (that were detected in automation). [String changes made/needed]: none

Attachment #8863040 - Flags: approval-mozilla-beta?

Andrew McCreight [:mccr8]

Comment 49

•

8 years ago

> so maybe that's about 6.25% of all shutdown hangs. Sorry, my math was wrong. That should be "about 5.8% of all shutdown hangs".

Andrew McCreight [:mccr8]

Updated

•

8 years ago

status-firefox54: --- → wontfix

status-firefox55: --- → affected

status-firefox-esr52: --- → wontfix

OS: Linux → All

Hardware: x86 → All

Version: unspecified → Trunk

Julien Cristau [:jcristau]

Comment 50

•

8 years ago

Comment on attachment 8863040 [details] [diff] [review] Don't GC in nsXREDirProvider::DoShutdown I think given the age of this bug it doesn't need to skip trains.

Attachment #8863040 - Flags: approval-mozilla-beta? → approval-mozilla-beta-

Julien Cristau [:jcristau]

Updated

•

8 years ago

status-firefox55: affected → wontfix

Andrew McCreight [:mccr8]

Updated

•

8 years ago

Depends on: 1340425

don't GC 13 years ago Olli Pettay [:smaug][bugs@pettay.fi] 1.26 KB, patch	benjamin : review+ billm : review+	Details \| Diff \| Splinter Review
Don't GC in nsXREDirProvider::DoShutdown 9 years ago Andrew McCreight [:mccr8] 1.41 KB, patch	mccr8 : review+ jcristau : approval-mozilla-beta-	Details \| Diff \| Splinter Review