Closed
Bug 1279614
Opened 8 years ago
Closed 8 years ago
2,000 instances of "XPCOM objects created/destroyed from static ctor/dtor" emitted from xpcom/base/nsTraceRefcnt.cpp during linux64 debug testing
Categories
(Core :: XPCOM, defect, P2)
Core
XPCOM
Tracking
()
RESOLVED
FIXED
mozilla50
Tracking | Status | |
---|---|---|
firefox50 | --- | fixed |
People
(Reporter: erahm, Assigned: jseward)
References
(Blocks 1 open bug)
Details
(Whiteboard: [measurement:client])
Attachments
(1 file, 1 obsolete file)
4.75 KB,
patch
|
gfritzsche
:
review+
jimm
:
review+
|
Details | Diff | Splinter Review |
> 2005 WARNING: XPCOM objects created/destroyed from static ctor/dtor: file xpcom/base/nsTraceRefcnt.cpp, line 174
This warning [1] shows up in the following test suites:
> 527 - [TC] Linux64 cppunit Cpp
> 188 - [TC] Linux64 mochitest-2 2
> 89 - [TC] Linux64 firefox-ui-tests functional e10s en-US
> 64 - [TC] Linux64 mochitest-browser-chrome-e10s-4 bc4
> 57 - [TC] Linux64 gtest GTest
> 56 - [TC] Linux64 mochitest-clipboard-e10s cl
> 56 - [TC] Linux64 mochitest-browser-chrome-e10s-6 bc6
> 50 - [TC] Linux64 mochitest-browser-chrome-e10s-1 bc1
> 42 - [TC] Linux64 mochitest-plain-e10s-5 5
> 42 - [TC] Linux64 mochitest-media-e10s mda
> 42 - [TC] Linux64 mochitest-browser-chrome-e10s-2 bc2
> 40 - [TC] Linux64 mochitest-browser-chrome-e10s-3 bc3
> 40 - [TC] Linux64 mochitest-browser-chrome-e10s-5 bc5
> 38 - [TC] Linux64 mochitest-browser-chrome-2 bc2
> 34 - [TC] Linux64 mochitest-browser-chrome-e10s-7 bc7
> 34 - [TC] Linux64 mochitest-plain-e10s-9 9
> 32 - [TC] Linux64 mochitest-plain-e10s-6 6
> 32 - [TC] Linux64 mochitest-plain-e10s-3 3
> 32 - [TC] Linux64 mochitest-media mda
> 30 - [TC] Linux64 mochitest-chrome-1 c1
> 28 - [TC] Linux64 mochitest-plain-e10s-4 4
> 28 - [TC] Linux64 mochitest-plain-e10s-2 2
> 28 - [TC] Linux64 mochitest-5 5
> 26 - [TC] Linux64 mochitest-plain-e10s-10 10
> 26 - [TC] Linux64 mochitest-plain-e10s-7 7
> 26 - [TC] Linux64 mochitest-plain-e10s-8 8
> 26 - [TC] Linux64 mochitest-plain-e10s-1 1
> 20 - [TC] Linux64 mochitest-browser-chrome-4 bc4
> 20 - [TC] Linux64 mochitest-4 4
> 18 - [TC] Linux64 mochitest-chrome-3 c3
> 12 - [TC] Linux64 mochitest-gpu-e10s gpu
> 12 - [TC] Linux64 mochitest-9 9
> 12 - [TC] Linux64 mochitest-3 3
> 12 - [TC] Linux64 mochitest-browser-chrome-6 bc6
> 10 - [TC] Linux64 mochitest-browser-chrome-1 bc1
> 10 - [TC] Linux64 reftest-noaccel-e10s-3 Ru3
> 10 - [TC] Linux64 reftest-e10s-3 R-e10s3
> 8 - [TC] Linux64 mochitest-clipboard cl
> 8 - [TC] Linux64 mochitest-browser-chrome-3 bc3
> 8 - [TC] Linux64 reftest-3 Ru3
> 8 - [TC] Linux64 mochitest-gl-e10s gl
> 8 - [TC] Linux64 crashtest e10s C
> 8 - [TC] Linux64 reftest-3 R3
> 8 - [TC] Linux64 web-platform-tests-e10s-5 5
> 6 - [TC] Linux64 crashtest C
> 6 - [TC] Linux64 mochitest-a11y a11y
> 6 - [TC] Linux64 mochitest-browser-chrome-5 bc5
> 6 - [TC] Linux64 mochitest-10 10
> 4 - [TC] Linux64 reftest-e10s-6 R-e10s6
> 4 - [TC] Linux64 Dbg B
> 4 - [TC] Linux64 mochitest-browser-chrome-7 bc7
> 4 - [TC] Linux64 web-platform-tests-e10s-7 7
> 4 - [TC] Linux64 reftest-noaccel-e10s-6 Ru6
> 4 - [TC] Linux64 web-platform-tests-reftests-e10s Wr-e10s
> 2 - [TC] Linux64 reftest-6 R6
> 2 - [TC] Linux64 web-platform-tests-e10s-12 12
> 2 - [TC] Linux64 web-platform-tests-e10s-9 9
> 2 - [TC] Linux64 web-platform-tests-e10s-6 6
> 2 - [TC] Linux64 reftest-6 Ru6
> 2 - [TC] Linux64 reftest-e10s-8 R-e10s8
> 2 - [TC] Linux64 reftest-e10s-2 R-e10s2
> 2 - [TC] Linux64 reftest-noaccel-e10s-1 Ru1
> 2 - [TC] Linux64 web-platform-tests-e10s-4 4
> 2 - [TC] Linux64 web-platform-tests-e10s-1 1
> 2 - [TC] Linux64 reftest-e10s-4 R-e10s4
> 2 - [TC] Linux64 reftest-noaccel-e10s-5 Ru5
> 2 - [TC] Linux64 web-platform-tests-e10s-3 3
> 2 - [TC] Linux64 reftest-e10s-1 R-e10s1
> 2 - [TC] Linux64 web-platform-tests-e10s-11 11
> 2 - [TC] Linux64 web-platform-tests-e10s-2 2
> 2 - [TC] Linux64 jsreftest e10s J
> 2 - [TC] Linux64 reftest-noaccel-e10s-8 Ru8
> 2 - [TC] Linux64 web-platform-tests-e10s-10 10
> 2 - [TC] Linux64 reftest-e10s-5 R-e10s5
> 2 - [TC] Linux64 reftest-e10s-7 R-e10s7
> 2 - [TC] Linux64 web-platform-tests-e10s-8 8
> 2 - [TC] Linux64 mochitest-1 1
> 2 - [TC] Linux64 reftest-noaccel-e10s-4 Ru4
> 2 - [TC] Linux64 reftest-noaccel-e10s-7 Ru7
> 2 - [TC] Linux64 reftest-noaccel-e10s-2 Ru2
It shows up in 410 tests. A few of the most prevalent:
> 488 - [e10s] Shutdown
> 228 - TestHashtables
> 44 - TestThreadUtils
> 32 - TestWebGLElementArrayCache
> 32 - TestStringAPI
> 26 - [e10s] browser/components/sessionstore/test/browser_crashedTabs.js
> 16 - browser/base/content/test/chat/browser_chatwindow.js
> 14 - TestTextFormatter
> 14 - browser/base/content/test/chat/browser_focus.js
> 14 - [e10s] browser/components/customizableui/test/browser_967000_button_sync.js
> 12 - [e10s] browser/base/content/test/general/browser_aboutTabCrashed.js
> 8 - TestAudioMixer
> 8 - dom/media/tests/mochitest/test_peerConnection_scaleResolution.html
> 8 - dom/base/test/chrome/test_swapFrameLoaders.xul
> 6 - GeckoMediaPlugins.GMPCrossOrigin
> 6 - GeckoMediaPlugins.GMPStorageAsyncShutdownTimeout
> 6 - toolkit/components/aboutmemory/tests/test_aboutmemory5.xul
> 6 - toolkit/components/aboutmemory/tests/test_memoryReporters2.xul
> 6 - toolkit/components/aboutmemory/tests/test_dumpGCAndCCLogsToFile.xul
> 6 - browser/base/content/test/social/browser_social_chatwindow.js
[1] https://hg.mozilla.org/mozilla-central/annotate/b6f7d0eb61b1/xpcom/base/nsTraceRefcnt.cpp#l174
Reporter | ||
Comment 1•8 years ago
|
||
The majority appear to be mostly in cpp unit tests. There are ton in e10s shutdown as well, so that may be a concern.
It appears bug 1271593 is what bumped this onto my radar, but I think that was more of a straw that broke the camels back situation.
Reporter | ||
Comment 2•8 years ago
|
||
Slightly better bisection points to bug 1258183 being the culprit adding about 1400.
Blocks: 1258183
Flags: needinfo?(jseward)
Reporter | ||
Comment 3•8 years ago
|
||
Those patches are rather large, it's not clear to me what's going on. The only thing that pops out to me is the new static global:
> // This is never de-initialised, so it has to be an OffTheBooks
> // variant in order not to show up as a leak in BloatView.
> static OffTheBooksMutex
> gTelemetryHistogramMutex("mutex for TelemetryHistogram "
> "external interface");
But it's OffTheBooks, so we shouldn't be hitting traceRefcnt at all.
For reference this is what things looked like *before* bug 1258183 landed:
> 485 - [TC] Linux64 cppunit Cpp
> 18 - [TC] Linux64 firefox-ui-tests functional e10s en-US
> 4 - [TC] Linux64 Dbg B
> 4 - Static Checking Build S
> 3 - [TC] Linux64 gtest GTest
Assignee | ||
Comment 4•8 years ago
|
||
(In reply to Eric Rahm [:erahm] from comment #3)
> Those patches are rather large, it's not clear to me what's going on. The
> only thing that pops out to me is the new static global:
>
> > // This is never de-initialised, so it has to be an OffTheBooks
> > // variant in order not to show up as a leak in BloatView.
> > static OffTheBooksMutex
> > gTelemetryHistogramMutex("mutex for TelemetryHistogram "
> > "external interface");
So, I'm confused. Let me explain.
The fragment you show, using an OffTheBooksMutex for gTelemetryHistogramMutex
is slightly out of date, and got changed, for in the landed version, to a
StaticMutex:
static StaticMutex gTelemetryHistogramMutex;
exactly so as to remove these messages. And I thought I had, but now
it seems like I was only partly successful. Now I am suspecting that
I fixed it for the chrome process but not for the content process.
Flags: needinfo?(jseward)
Assignee | ||
Comment 5•8 years ago
|
||
Here is the stack leading to the message. It does indeed involve
gTelemetryHistogramMutex, as you suspected:
XXXXXXXXX
at 0x59868EB: VALGRIND_PRINTF_BACKTRACE(char const*, ...) (/usr/include/valgrind/valgrind.h:6818)
by 0x5986B84: AssertActivityIsLegal() (MC-RACE/xpcom/base/nsTraceRefcnt.cpp:171)
by 0x598D1AF: NS_LogCtor (MC-RACE/xpcom/base/nsTraceRefcnt.cpp:1209)
by 0x5925A8C: nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_CopyWithMemutils>::nsTArray_base() (MC-RACE/ff-Og-linux64-dbg/dist/include/nsTArray-inl.h:15)
by 0x5A32D01: nsTArray_Impl (MC-RACE/xpcom/glue/nsTArray.h:834)
by 0x5A32D01: nsTArray (MC-RACE/xpcom/glue/nsTArray.h:2095)
by 0x5A32D01: OrderingEntry (MC-RACE/ff-Og-linux64-dbg/dist/include/mozilla/DeadlockDetector.h:89)
by 0x5A32D01: Add (MC-RACE/ff-Og-linux64-dbg/dist/include/mozilla/DeadlockDetector.h:177)
by 0x5A32D01: mozilla::BlockingResourceBase::BlockingResourceBase(char const*, mozilla::BlockingResourceBase::BlockingResourceType) (MC-RACE/xpcom/glue/BlockingResourceBase.cpp:223)
by 0x934FF1C: OffTheBooksMutex (MC-RACE/ff-Og-linux64-dbg/dist/include/mozilla/Mutex.h:47)
by 0x934FF1C: Mutex (MC-RACE/ff-Og-linux64-dbg/dist/include/mozilla/StaticMutex.h:66)
by 0x934FF1C: Lock (MC-RACE/ff-Og-linux64-dbg/dist/include/mozilla/StaticMutex.h:44)
by 0x934FF1C: BaseAutoLock (MC-RACE/ff-Og-linux64-dbg/dist/include/mozilla/Mutex.h:164)
by 0x934FF1C: TelemetryHistogram::CreateStatisticsRecorder() (MC-RACE/toolkit/components/telemetry/TelemetryHistogram.cpp:1705)
by 0x9340661: mozilla::Telemetry::CreateStatisticsRecorder() (MC-RACE/toolkit/components/telemetry/Telemetry.cpp:2911)
by 0x93E1179: XRE_InitChildProcess (MC-RACE/toolkit/xre/nsEmbedFunctions.cpp:321)
by 0x4096D3: content_process_main(int, char**) (MC-RACE/ipc/app/../contentproc/plugin-container.cpp:224)
by 0x40977F: main (MC-RACE/ipc/app/MozillaRuntimeMain.cpp:11)
[12077] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /home/sewardj/MOZ/MC-RACE/xpcom/base/nsTraceRefcnt.cpp, line 175
(a second repeat of this follows)
Assignee | ||
Comment 6•8 years ago
|
||
I have two observations about this:
(1)
The call chain seems to start within main, so how can this really
involve a static constructor? I thought they all ran before we
arrived at main -- or before dlopen() returns, but there is no
dlopen here.
(2)
The call at TelemetryHistogram.cpp:1705, which is this
StaticMutexAutoLock locker(gTelemetryHistogramMutex);
goes to BaseAutoLock() which just locks the underlying OffTheBooksMutex
in the normal way. That seems to cause an entry to be lodged in
DeadlockDetector's tables in the normal way, which eventually leads
to AssertActivityIsLegal().
I see why that happens -- because StaticMutex.h says this:
typedef BaseAutoUnlock<Mutex> MutexAutoUnlock;
---------
Hmm. An OffTheBooksMutex participates in deadlock checking, but
not in leak checking. A StaticMutex is basically an OffTheBooksMutex,
so it also participates in deadlock checking. That *sounds* like it
should be OK, but for some reason it isn't.
I don't understand this confluence of Mutex kinds and startup limitations
enough to make much more sense of this. Do you know who can help us here?
Assignee | ||
Comment 7•8 years ago
|
||
(In reply to Eric Rahm [:erahm] from comment #3)
> But it's OffTheBooks, so we shouldn't be hitting traceRefcnt at all.
Just to be clear .. I think it's not traceRefcnt that we're hitting.
It's the lock cycle detection mechanism.
Reporter | ||
Comment 8•8 years ago
|
||
I'm guessing the problem here is that TelemetryHistograms is getting initialized before nsTraceRefCnt. If you move the call to |CreateStatisticsRecorder| [1] to after |NS_LogInit| [2] it should clear things up. You'd need to do that everywhere |CreateStatisticsRecorder| is called.
Does that makes sense Julian?
[1] https://dxr.mozilla.org/mozilla-central/rev/3c5025f98e561a20e24d97c91a9e4e0ec28015ea/toolkit/xre/nsEmbedFunctions.cpp#321-322
[2] https://dxr.mozilla.org/mozilla-central/rev/3c5025f98e561a20e24d97c91a9e4e0ec28015ea/toolkit/xre/nsEmbedFunctions.cpp#366-367
Flags: needinfo?(jseward)
Assignee | ||
Updated•8 years ago
|
Flags: needinfo?(jseward)
Comment 9•8 years ago
|
||
(In reply to Eric Rahm [:erahm] from comment #8)
> I'm guessing the problem here is that TelemetryHistograms is getting
> initialized before nsTraceRefCnt. If you move the call to
> |CreateStatisticsRecorder| [1] to after |NS_LogInit| [2] it should clear
> things up. You'd need to do that everywhere |CreateStatisticsRecorder| is
> called.
>
> Does that makes sense Julian?
>
> [1]
> https://dxr.mozilla.org/mozilla-central/rev/
> 3c5025f98e561a20e24d97c91a9e4e0ec28015ea/toolkit/xre/nsEmbedFunctions.
> cpp#321-322
> [2]
> https://dxr.mozilla.org/mozilla-central/rev/
> 3c5025f98e561a20e24d97c91a9e4e0ec28015ea/toolkit/xre/nsEmbedFunctions.
> cpp#366-367
We should confirm if that fixes it.
But looking at the things that happen between the two locations, that might be a little late - the GMP and sandboxing code might record Telemetry here.
Updated•8 years ago
|
Priority: -- → P2
Whiteboard: [measurement:client]
Assignee | ||
Comment 10•8 years ago
|
||
A possible fix, as described by erahm in comment 8. It changes
the ordering in the two places I could find, that |CreateStatisticsRecorder|
is called.
It *seems* to fix the problem and not introduce any new races or
deadlocks, but I am very nervous about messing with the initialisation
order, so sanity-checking/feedback by those who understand initialisation-order
issues would be a Good Thing.
Assignee | ||
Comment 11•8 years ago
|
||
Revised patch following irc chat with gfritzsche.
Attachment #8764210 -
Attachment is obsolete: true
Assignee | ||
Updated•8 years ago
|
Attachment #8764280 -
Flags: review?(gfritzsche)
Updated•8 years ago
|
Assignee: nobody → jseward
Comment 12•8 years ago
|
||
Comment on attachment 8764280 [details] [diff] [review]
bug1279614-2.cset
Review of attachment 8764280 [details] [diff] [review]:
-----------------------------------------------------------------
Calling `Telemetry::CreateStatisticsRecorder()` later is only a problem if we would not record histograms as early anymore as before.
Histogram recording is enabled from when `TelemetryHistogram::InitializeGlobalState()` is called.
I checked with and without this patch applied and the call order of the two didn't change (`TelemetryHistogram::CreateStastisticsRecorder()` is still called before `TelemetryHistogram::InitializeGlobalState()`).
So, from the perspective of Telemetry initialization order this is fine.
@Jim: I'd appreciate a second look here for nsAppRunner.cpp and nsEmbedFunctions.cpp, as i don't feel so knowledgeable there.
::: toolkit/xre/nsAppRunner.cpp
@@ +4442,5 @@
>
> + // NB: this must happen after the creation of |ScopedLogging log| since
> + // ScopedLogging::ScopedLogging calls NS_LogInit, and
> + // XRE_CreateStatsObject calls Telemetry::CreateStatisticsRecorder,
> + // and NS_LogInit must be called before Telemetry::CreateStatisticsRecorder.
This should mention why it must be called before.
Attachment #8764280 -
Flags: review?(jmathies)
Attachment #8764280 -
Flags: review?(gfritzsche)
Attachment #8764280 -
Flags: review+
Comment 13•8 years ago
|
||
Comment on attachment 8764280 [details] [diff] [review]
bug1279614-2.cset
Review of attachment 8764280 [details] [diff] [review]:
-----------------------------------------------------------------
I don't see an issue with this.
Attachment #8764280 -
Flags: review?(jmathies) → review+
Comment 14•8 years ago
|
||
Pushed by jseward@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/da77b49e469a
2,000 instances of "XPCOM objects created/destroyed from static ctor/dtor" emitted from xpcom/base/nsTraceRefcnt.cpp during linux64 debug testing. r=gfritzsche,jmathies.
Comment 15•8 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 8 years ago
status-firefox50:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
You need to log in
before you can comment on or make changes to this bug.
Description
•