Closed Bug 1279614 Opened 8 years ago Closed 8 years ago

2,000 instances of "XPCOM objects created/destroyed from static ctor/dtor" emitted from xpcom/base/nsTraceRefcnt.cpp during linux64 debug testing

Categories

(Core :: XPCOM, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla50
Tracking Status
firefox50 --- fixed

People

(Reporter: erahm, Assigned: jseward)

References

(Blocks 1 open bug)

Details

(Whiteboard: [measurement:client])

Attachments

(1 file, 1 obsolete file)

> 2005 WARNING: XPCOM objects created/destroyed from static ctor/dtor: file xpcom/base/nsTraceRefcnt.cpp, line 174 This warning [1] shows up in the following test suites: > 527 - [TC] Linux64 cppunit Cpp > 188 - [TC] Linux64 mochitest-2 2 > 89 - [TC] Linux64 firefox-ui-tests functional e10s en-US > 64 - [TC] Linux64 mochitest-browser-chrome-e10s-4 bc4 > 57 - [TC] Linux64 gtest GTest > 56 - [TC] Linux64 mochitest-clipboard-e10s cl > 56 - [TC] Linux64 mochitest-browser-chrome-e10s-6 bc6 > 50 - [TC] Linux64 mochitest-browser-chrome-e10s-1 bc1 > 42 - [TC] Linux64 mochitest-plain-e10s-5 5 > 42 - [TC] Linux64 mochitest-media-e10s mda > 42 - [TC] Linux64 mochitest-browser-chrome-e10s-2 bc2 > 40 - [TC] Linux64 mochitest-browser-chrome-e10s-3 bc3 > 40 - [TC] Linux64 mochitest-browser-chrome-e10s-5 bc5 > 38 - [TC] Linux64 mochitest-browser-chrome-2 bc2 > 34 - [TC] Linux64 mochitest-browser-chrome-e10s-7 bc7 > 34 - [TC] Linux64 mochitest-plain-e10s-9 9 > 32 - [TC] Linux64 mochitest-plain-e10s-6 6 > 32 - [TC] Linux64 mochitest-plain-e10s-3 3 > 32 - [TC] Linux64 mochitest-media mda > 30 - [TC] Linux64 mochitest-chrome-1 c1 > 28 - [TC] Linux64 mochitest-plain-e10s-4 4 > 28 - [TC] Linux64 mochitest-plain-e10s-2 2 > 28 - [TC] Linux64 mochitest-5 5 > 26 - [TC] Linux64 mochitest-plain-e10s-10 10 > 26 - [TC] Linux64 mochitest-plain-e10s-7 7 > 26 - [TC] Linux64 mochitest-plain-e10s-8 8 > 26 - [TC] Linux64 mochitest-plain-e10s-1 1 > 20 - [TC] Linux64 mochitest-browser-chrome-4 bc4 > 20 - [TC] Linux64 mochitest-4 4 > 18 - [TC] Linux64 mochitest-chrome-3 c3 > 12 - [TC] Linux64 mochitest-gpu-e10s gpu > 12 - [TC] Linux64 mochitest-9 9 > 12 - [TC] Linux64 mochitest-3 3 > 12 - [TC] Linux64 mochitest-browser-chrome-6 bc6 > 10 - [TC] Linux64 mochitest-browser-chrome-1 bc1 > 10 - [TC] Linux64 reftest-noaccel-e10s-3 Ru3 > 10 - [TC] Linux64 reftest-e10s-3 R-e10s3 > 8 - [TC] Linux64 mochitest-clipboard cl > 8 - [TC] Linux64 mochitest-browser-chrome-3 bc3 > 8 - [TC] Linux64 reftest-3 Ru3 > 8 - [TC] Linux64 mochitest-gl-e10s gl > 8 - [TC] Linux64 crashtest e10s C > 8 - [TC] Linux64 reftest-3 R3 > 8 - [TC] Linux64 web-platform-tests-e10s-5 5 > 6 - [TC] Linux64 crashtest C > 6 - [TC] Linux64 mochitest-a11y a11y > 6 - [TC] Linux64 mochitest-browser-chrome-5 bc5 > 6 - [TC] Linux64 mochitest-10 10 > 4 - [TC] Linux64 reftest-e10s-6 R-e10s6 > 4 - [TC] Linux64 Dbg B > 4 - [TC] Linux64 mochitest-browser-chrome-7 bc7 > 4 - [TC] Linux64 web-platform-tests-e10s-7 7 > 4 - [TC] Linux64 reftest-noaccel-e10s-6 Ru6 > 4 - [TC] Linux64 web-platform-tests-reftests-e10s Wr-e10s > 2 - [TC] Linux64 reftest-6 R6 > 2 - [TC] Linux64 web-platform-tests-e10s-12 12 > 2 - [TC] Linux64 web-platform-tests-e10s-9 9 > 2 - [TC] Linux64 web-platform-tests-e10s-6 6 > 2 - [TC] Linux64 reftest-6 Ru6 > 2 - [TC] Linux64 reftest-e10s-8 R-e10s8 > 2 - [TC] Linux64 reftest-e10s-2 R-e10s2 > 2 - [TC] Linux64 reftest-noaccel-e10s-1 Ru1 > 2 - [TC] Linux64 web-platform-tests-e10s-4 4 > 2 - [TC] Linux64 web-platform-tests-e10s-1 1 > 2 - [TC] Linux64 reftest-e10s-4 R-e10s4 > 2 - [TC] Linux64 reftest-noaccel-e10s-5 Ru5 > 2 - [TC] Linux64 web-platform-tests-e10s-3 3 > 2 - [TC] Linux64 reftest-e10s-1 R-e10s1 > 2 - [TC] Linux64 web-platform-tests-e10s-11 11 > 2 - [TC] Linux64 web-platform-tests-e10s-2 2 > 2 - [TC] Linux64 jsreftest e10s J > 2 - [TC] Linux64 reftest-noaccel-e10s-8 Ru8 > 2 - [TC] Linux64 web-platform-tests-e10s-10 10 > 2 - [TC] Linux64 reftest-e10s-5 R-e10s5 > 2 - [TC] Linux64 reftest-e10s-7 R-e10s7 > 2 - [TC] Linux64 web-platform-tests-e10s-8 8 > 2 - [TC] Linux64 mochitest-1 1 > 2 - [TC] Linux64 reftest-noaccel-e10s-4 Ru4 > 2 - [TC] Linux64 reftest-noaccel-e10s-7 Ru7 > 2 - [TC] Linux64 reftest-noaccel-e10s-2 Ru2 It shows up in 410 tests. A few of the most prevalent: > 488 - [e10s] Shutdown > 228 - TestHashtables > 44 - TestThreadUtils > 32 - TestWebGLElementArrayCache > 32 - TestStringAPI > 26 - [e10s] browser/components/sessionstore/test/browser_crashedTabs.js > 16 - browser/base/content/test/chat/browser_chatwindow.js > 14 - TestTextFormatter > 14 - browser/base/content/test/chat/browser_focus.js > 14 - [e10s] browser/components/customizableui/test/browser_967000_button_sync.js > 12 - [e10s] browser/base/content/test/general/browser_aboutTabCrashed.js > 8 - TestAudioMixer > 8 - dom/media/tests/mochitest/test_peerConnection_scaleResolution.html > 8 - dom/base/test/chrome/test_swapFrameLoaders.xul > 6 - GeckoMediaPlugins.GMPCrossOrigin > 6 - GeckoMediaPlugins.GMPStorageAsyncShutdownTimeout > 6 - toolkit/components/aboutmemory/tests/test_aboutmemory5.xul > 6 - toolkit/components/aboutmemory/tests/test_memoryReporters2.xul > 6 - toolkit/components/aboutmemory/tests/test_dumpGCAndCCLogsToFile.xul > 6 - browser/base/content/test/social/browser_social_chatwindow.js [1] https://hg.mozilla.org/mozilla-central/annotate/b6f7d0eb61b1/xpcom/base/nsTraceRefcnt.cpp#l174
The majority appear to be mostly in cpp unit tests. There are ton in e10s shutdown as well, so that may be a concern. It appears bug 1271593 is what bumped this onto my radar, but I think that was more of a straw that broke the camels back situation.
Slightly better bisection points to bug 1258183 being the culprit adding about 1400.
Blocks: 1258183
Flags: needinfo?(jseward)
Those patches are rather large, it's not clear to me what's going on. The only thing that pops out to me is the new static global: > // This is never de-initialised, so it has to be an OffTheBooks > // variant in order not to show up as a leak in BloatView. > static OffTheBooksMutex > gTelemetryHistogramMutex("mutex for TelemetryHistogram " > "external interface"); But it's OffTheBooks, so we shouldn't be hitting traceRefcnt at all. For reference this is what things looked like *before* bug 1258183 landed: > 485 - [TC] Linux64 cppunit Cpp > 18 - [TC] Linux64 firefox-ui-tests functional e10s en-US > 4 - [TC] Linux64 Dbg B > 4 - Static Checking Build S > 3 - [TC] Linux64 gtest GTest
(In reply to Eric Rahm [:erahm] from comment #3) > Those patches are rather large, it's not clear to me what's going on. The > only thing that pops out to me is the new static global: > > > // This is never de-initialised, so it has to be an OffTheBooks > > // variant in order not to show up as a leak in BloatView. > > static OffTheBooksMutex > > gTelemetryHistogramMutex("mutex for TelemetryHistogram " > > "external interface"); So, I'm confused. Let me explain. The fragment you show, using an OffTheBooksMutex for gTelemetryHistogramMutex is slightly out of date, and got changed, for in the landed version, to a StaticMutex: static StaticMutex gTelemetryHistogramMutex; exactly so as to remove these messages. And I thought I had, but now it seems like I was only partly successful. Now I am suspecting that I fixed it for the chrome process but not for the content process.
Flags: needinfo?(jseward)
Here is the stack leading to the message. It does indeed involve gTelemetryHistogramMutex, as you suspected: XXXXXXXXX at 0x59868EB: VALGRIND_PRINTF_BACKTRACE(char const*, ...) (/usr/include/valgrind/valgrind.h:6818) by 0x5986B84: AssertActivityIsLegal() (MC-RACE/xpcom/base/nsTraceRefcnt.cpp:171) by 0x598D1AF: NS_LogCtor (MC-RACE/xpcom/base/nsTraceRefcnt.cpp:1209) by 0x5925A8C: nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_CopyWithMemutils>::nsTArray_base() (MC-RACE/ff-Og-linux64-dbg/dist/include/nsTArray-inl.h:15) by 0x5A32D01: nsTArray_Impl (MC-RACE/xpcom/glue/nsTArray.h:834) by 0x5A32D01: nsTArray (MC-RACE/xpcom/glue/nsTArray.h:2095) by 0x5A32D01: OrderingEntry (MC-RACE/ff-Og-linux64-dbg/dist/include/mozilla/DeadlockDetector.h:89) by 0x5A32D01: Add (MC-RACE/ff-Og-linux64-dbg/dist/include/mozilla/DeadlockDetector.h:177) by 0x5A32D01: mozilla::BlockingResourceBase::BlockingResourceBase(char const*, mozilla::BlockingResourceBase::BlockingResourceType) (MC-RACE/xpcom/glue/BlockingResourceBase.cpp:223) by 0x934FF1C: OffTheBooksMutex (MC-RACE/ff-Og-linux64-dbg/dist/include/mozilla/Mutex.h:47) by 0x934FF1C: Mutex (MC-RACE/ff-Og-linux64-dbg/dist/include/mozilla/StaticMutex.h:66) by 0x934FF1C: Lock (MC-RACE/ff-Og-linux64-dbg/dist/include/mozilla/StaticMutex.h:44) by 0x934FF1C: BaseAutoLock (MC-RACE/ff-Og-linux64-dbg/dist/include/mozilla/Mutex.h:164) by 0x934FF1C: TelemetryHistogram::CreateStatisticsRecorder() (MC-RACE/toolkit/components/telemetry/TelemetryHistogram.cpp:1705) by 0x9340661: mozilla::Telemetry::CreateStatisticsRecorder() (MC-RACE/toolkit/components/telemetry/Telemetry.cpp:2911) by 0x93E1179: XRE_InitChildProcess (MC-RACE/toolkit/xre/nsEmbedFunctions.cpp:321) by 0x4096D3: content_process_main(int, char**) (MC-RACE/ipc/app/../contentproc/plugin-container.cpp:224) by 0x40977F: main (MC-RACE/ipc/app/MozillaRuntimeMain.cpp:11) [12077] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /home/sewardj/MOZ/MC-RACE/xpcom/base/nsTraceRefcnt.cpp, line 175 (a second repeat of this follows)
I have two observations about this: (1) The call chain seems to start within main, so how can this really involve a static constructor? I thought they all ran before we arrived at main -- or before dlopen() returns, but there is no dlopen here. (2) The call at TelemetryHistogram.cpp:1705, which is this StaticMutexAutoLock locker(gTelemetryHistogramMutex); goes to BaseAutoLock() which just locks the underlying OffTheBooksMutex in the normal way. That seems to cause an entry to be lodged in DeadlockDetector's tables in the normal way, which eventually leads to AssertActivityIsLegal(). I see why that happens -- because StaticMutex.h says this: typedef BaseAutoUnlock<Mutex> MutexAutoUnlock; --------- Hmm. An OffTheBooksMutex participates in deadlock checking, but not in leak checking. A StaticMutex is basically an OffTheBooksMutex, so it also participates in deadlock checking. That *sounds* like it should be OK, but for some reason it isn't. I don't understand this confluence of Mutex kinds and startup limitations enough to make much more sense of this. Do you know who can help us here?
(In reply to Eric Rahm [:erahm] from comment #3) > But it's OffTheBooks, so we shouldn't be hitting traceRefcnt at all. Just to be clear .. I think it's not traceRefcnt that we're hitting. It's the lock cycle detection mechanism.
I'm guessing the problem here is that TelemetryHistograms is getting initialized before nsTraceRefCnt. If you move the call to |CreateStatisticsRecorder| [1] to after |NS_LogInit| [2] it should clear things up. You'd need to do that everywhere |CreateStatisticsRecorder| is called. Does that makes sense Julian? [1] https://dxr.mozilla.org/mozilla-central/rev/3c5025f98e561a20e24d97c91a9e4e0ec28015ea/toolkit/xre/nsEmbedFunctions.cpp#321-322 [2] https://dxr.mozilla.org/mozilla-central/rev/3c5025f98e561a20e24d97c91a9e4e0ec28015ea/toolkit/xre/nsEmbedFunctions.cpp#366-367
Flags: needinfo?(jseward)
Flags: needinfo?(jseward)
(In reply to Eric Rahm [:erahm] from comment #8) > I'm guessing the problem here is that TelemetryHistograms is getting > initialized before nsTraceRefCnt. If you move the call to > |CreateStatisticsRecorder| [1] to after |NS_LogInit| [2] it should clear > things up. You'd need to do that everywhere |CreateStatisticsRecorder| is > called. > > Does that makes sense Julian? > > [1] > https://dxr.mozilla.org/mozilla-central/rev/ > 3c5025f98e561a20e24d97c91a9e4e0ec28015ea/toolkit/xre/nsEmbedFunctions. > cpp#321-322 > [2] > https://dxr.mozilla.org/mozilla-central/rev/ > 3c5025f98e561a20e24d97c91a9e4e0ec28015ea/toolkit/xre/nsEmbedFunctions. > cpp#366-367 We should confirm if that fixes it. But looking at the things that happen between the two locations, that might be a little late - the GMP and sandboxing code might record Telemetry here.
Priority: -- → P2
Whiteboard: [measurement:client]
Attached patch bug1279614-maybe-fix-1.cset (obsolete) — Splinter Review
A possible fix, as described by erahm in comment 8. It changes the ordering in the two places I could find, that |CreateStatisticsRecorder| is called. It *seems* to fix the problem and not introduce any new races or deadlocks, but I am very nervous about messing with the initialisation order, so sanity-checking/feedback by those who understand initialisation-order issues would be a Good Thing.
Revised patch following irc chat with gfritzsche.
Attachment #8764210 - Attachment is obsolete: true
Attachment #8764280 - Flags: review?(gfritzsche)
Assignee: nobody → jseward
Comment on attachment 8764280 [details] [diff] [review] bug1279614-2.cset Review of attachment 8764280 [details] [diff] [review]: ----------------------------------------------------------------- Calling `Telemetry::CreateStatisticsRecorder()` later is only a problem if we would not record histograms as early anymore as before. Histogram recording is enabled from when `TelemetryHistogram::InitializeGlobalState()` is called. I checked with and without this patch applied and the call order of the two didn't change (`TelemetryHistogram::CreateStastisticsRecorder()` is still called before `TelemetryHistogram::InitializeGlobalState()`). So, from the perspective of Telemetry initialization order this is fine. @Jim: I'd appreciate a second look here for nsAppRunner.cpp and nsEmbedFunctions.cpp, as i don't feel so knowledgeable there. ::: toolkit/xre/nsAppRunner.cpp @@ +4442,5 @@ > > + // NB: this must happen after the creation of |ScopedLogging log| since > + // ScopedLogging::ScopedLogging calls NS_LogInit, and > + // XRE_CreateStatsObject calls Telemetry::CreateStatisticsRecorder, > + // and NS_LogInit must be called before Telemetry::CreateStatisticsRecorder. This should mention why it must be called before.
Attachment #8764280 - Flags: review?(jmathies)
Attachment #8764280 - Flags: review?(gfritzsche)
Attachment #8764280 - Flags: review+
Comment on attachment 8764280 [details] [diff] [review] bug1279614-2.cset Review of attachment 8764280 [details] [diff] [review]: ----------------------------------------------------------------- I don't see an issue with this.
Attachment #8764280 - Flags: review?(jmathies) → review+
Pushed by jseward@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/da77b49e469a 2,000 instances of "XPCOM objects created/destroyed from static ctor/dtor" emitted from xpcom/base/nsTraceRefcnt.cpp during linux64 debug testing. r=gfritzsche,jmathies.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: