Closed
Bug 1137963
Opened 9 years ago
Closed 9 years ago
Consider using a spin lock for TraceRefCnt
Categories
(Core :: XPCOM, defect)
Core
XPCOM
Tracking
()
RESOLVED
FIXED
mozilla39
Tracking | Status | |
---|---|---|
firefox39 | --- | fixed |
People
(Reporter: mstange, Assigned: mstange)
References
Details
Attachments
(1 file, 2 obsolete files)
9.01 KB,
patch
|
Details | Diff | Splinter Review |
We're seeing very high overhead of the prlock used in nsTraceRefCnt.cpp in debug crashtests on Mac. Using a spin lock cuts the time for debug crashtests on 10.10 down by a factor of 4, from 45min to 11min.
Assignee | ||
Comment 1•9 years ago
|
||
Assignee | ||
Comment 2•9 years ago
|
||
I fired off two try pushes with debug tests on all platforms. Without patch: https://treeherder.mozilla.org/#/jobs?repo=try&revision=430cdd99c7f0 With patch: https://treeherder.mozilla.org/#/jobs?repo=try&revision=362310d7cb6b
Comment 3•9 years ago
|
||
Is it the PR_Lock overhead that's causing problems, or just that the platform's locks (pthread_mutex_t &co) are too expensive?
Assignee | ||
Comment 4•9 years ago
|
||
The platform's lock. The profiles at http://people.mozilla.org/~bgirard/cleopatra/?zippedProfile=http://tests.themasta.com/hostedprofiles/profiles-10.10.zip show almost all the time being spent in __psynch_mutexwait and __psynch_mutexdrop.
Comment 5•9 years ago
|
||
Lovely. Does OS X happen to support pthread_spinlock_t, out of curiosity?
Assignee | ||
Comment 6•9 years ago
|
||
Apparently not, but it looks like it has a replacement called OSSpinLock: https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man3/spinlock.3.html
Comment 7•9 years ago
|
||
(In reply to Markus Stange [:mstange] from comment #0) > We're seeing very high overhead of the prlock used in nsTraceRefCnt.cpp in > debug crashtests on Mac. > > Using a spin lock cuts the time for debug crashtests on 10.10 down by a > factor of 4, from 45min to 11min. What difference does it make on older versions?
Comment 8•9 years ago
|
||
An not-so-interesting 16min to 12min for crashtests on 10.8 and 10.6, but mochitest-3, which has been worrying me more than crashtests, went from 75min to 35 min on 10.8 (10.6 hit an intermittent assertion, retriggered). 10.10 m-3 is going to be difficult to measure, because while the one with the patch finished in 35 minutes, the one without is currently at 116 minutes, and will time out when it hits the suite maxtime. Looks like a wash on Windows, and slightly worse on Linux, but I'm in love with the results on Mac.
Comment 9•9 years ago
|
||
Nice find!
Comment 10•9 years ago
|
||
Unfortunately, the 10.6 mochitest-3 assertion looks real and non-intermittent.
Comment 11•9 years ago
|
||
good stuff! Looking forward for this to land.
Assignee | ||
Comment 12•9 years ago
|
||
(In reply to Phil Ringnalda (:philor) from comment #10) > Unfortunately, the 10.6 mochitest-3 assertion looks real and > non-intermittent. I've filed bug 1138604 about this and am looking at it now.
Comment 13•9 years ago
|
||
I think on TreeHerder we only use the bloat log to track when objects are created and destroyed. If the overhead here is so dramatic, we might be able to have some kind of lighter mode that would only record stuff for Ctor() and Dtor(). I think we don't generally care about the actual number of addrefs and releases.
Assignee | ||
Comment 14•9 years ago
|
||
This is the last patch plus a PR_Sleep(PR_INTERVAL_NO_WAIT), and has had a green mochitest-3 10.6 debug run on try.
Attachment #8570800 -
Attachment is obsolete: true
Attachment #8572172 -
Flags: review?(nfroyd)
Comment 15•9 years ago
|
||
Comment on attachment 8572172 [details] [diff] [review] patch Waldo might be able to move this along.
Attachment #8572172 -
Flags: review?(jwalden+bmo)
Comment 16•9 years ago
|
||
Comment on attachment 8572172 [details] [diff] [review] patch Review of attachment 8572172 [details] [diff] [review]: ----------------------------------------------------------------- Woo for lock-free code! I don't have a sense for whether immediately yielding on these things, versus trying to acquire a bunch of times, and only if those bunches of times fail fall back to yielding before retrying, is faster. I'm simply going to assume this is better. At the very least, tho, it is *correct*. If you're not in particular rush for this to land this week or anything, I'd mildly prefer another person look at this as well -- because it's been a few months since I looked at anything atomic so maybe I've gone partially stupid (quite likely at the edges, at least!), and because I suggested an optimization, and because atomics can always use more eyes. But this is basic enough I'm fine landing and getting froydnj's look post-landing, too -- I probably haven't forgotten *that* much. :-) ::: xpcom/base/nsTraceRefcnt.cpp @@ +33,5 @@ > #endif > > #include "mozilla/BlockingResourceBase.h" > #include "mozilla/PoisonIOInterposer.h" > +#include "mozilla/Atomics.h" Alphabetize. @@ +77,5 @@ > +// We use a spin lock instead of a regular mutex because this lock is usually > +// only held for a very short time, and gets grabbed at a very high frequency > +// (~100000 times per second). On Mac, the overhead of using a regular lock > +// is very high, see bug 1137963. > +static mozilla::Atomic<bool> gTraceLogLocked(false); Remove the (false) here. That suggests it's *necessary* to do that, but it's not -- the relevant member variable is initialized by the class as mValue(), which direct-initializes the field, which performs value initialization, which zero-initializes it. I'd rather have these uses never specify a false, so that people who read it get used to its semantics. Additionally. That this is effectively a lock, that users acquire to do protected work, then release for the next user to use, suggests you can use a more efficient memory ordering: ReleaseAcquire, which doesn't force threads to sync up with each other to totally order atomic operations on *this* variable with atomic operations on unrelated memory in other threads. That won't force threads to sync up quite so often, which theoretically will make acquiring and releasing faster. (In practice x86 doesn't take advantage of this optimization opportunity, but ARM does.) static mozilla::Atomic<bool, mozilla::ReleaseAcquire> gTraceLogLocked; @@ +83,5 @@ > + while (!gTraceLogLocked.compareExchange(false, true)) { \ > + PR_Sleep(PR_INTERVAL_NO_WAIT); /* yield */ \ > + } > +#define UNLOCK_TRACELOG() \ > + do { gTraceLogLocked = false; } while(0) Is there some reason these macros can't be made into inline functions, avoiding all the backslashing and syntactic-nicety nonsense?
Attachment #8572172 -
Flags: review?(jwalden+bmo) → review+
Assignee | ||
Comment 17•9 years ago
|
||
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #16) > I don't have a sense for whether immediately yielding on these things, > versus trying to acquire a bunch of times, and only if those bunches of > times fail fall back to yielding before retrying, is faster. Yeah, me neither. I just checked that it doesn't make crashtests slower by multiple minutes, and that seems to be the case. > If you're not in particular rush for this to land this week or anything, I'd > mildly prefer another person look at this as well Now that Andrew has already made the tests a lot faster going a different route (bug 1138616), I'm not in a rush to land this one. > Additionally. That this is effectively a lock, that users acquire to do > protected work, then release for the next user to use, suggests you can use > a more efficient memory ordering: ReleaseAcquire, which doesn't force > threads to sync up with each other to totally order atomic operations on > *this* variable with atomic operations on unrelated memory in other threads. > That won't force threads to sync up quite so often, which theoretically will > make acquiring and releasing faster. (In practice x86 doesn't take > advantage of this optimization opportunity, but ARM does.) > > static mozilla::Atomic<bool, mozilla::ReleaseAcquire> gTraceLogLocked; Thanks, I wanted to ask about this, but forgot. I didn't even find the place where we get the default value for the ordering policy if you don't specify any (which I did before). > @@ +83,5 @@ > > + while (!gTraceLogLocked.compareExchange(false, true)) { \ > > + PR_Sleep(PR_INTERVAL_NO_WAIT); /* yield */ \ > > + } > > +#define UNLOCK_TRACELOG() \ > > + do { gTraceLogLocked = false; } while(0) > > Is there some reason these macros can't be made into inline functions, > avoiding all the backslashing and syntactic-nicety nonsense? Probably not. I just assumed there was a reason for these to be macros in the old code, but I don't see one, so I'll make these inline functions.
Comment 18•9 years ago
|
||
(In reply to Markus Stange [:mstange] from comment #17) > Thanks, I wanted to ask about this, but forgot. I didn't even find the place > where we get the default value for the ordering policy if you don't specify > any (which I did before). The default is the most cautious variant, SequentiallyConsistent, since the others require more thought to say whether they are good enough for your situation.
Comment 19•9 years ago
|
||
(In reply to Markus Stange [:mstange] from comment #17) > I didn't even find the place > where we get the default value for the ordering policy if you don't specify > any (which I did before). It's on the declaration of mozilla::Atomic. http://hg.mozilla.org/integration/mozilla-inbound/annotate/16f240fd86de/mfbt/Atomics.h#l1001 The declaration looks this way because Atomic<T> is only defined for particular sorts of T, through specializations targeting those particular sorts. The default must go on the declaration, even tho that leaves things a little unclear to someone looking for it at the location of some specialization.
Comment 20•9 years ago
|
||
Comment on attachment 8572172 [details] [diff] [review] patch Review of attachment 8572172 [details] [diff] [review]: ----------------------------------------------------------------- This looks fine to me. If you take Waldo's advice to make {LOCK,UNLOCK}_TRACELOG inline functions, please go one step further and make them part of an RAII class. ::: xpcom/base/nsTraceRefcnt.cpp @@ +76,4 @@ > > +// We use a spin lock instead of a regular mutex because this lock is usually > +// only held for a very short time, and gets grabbed at a very high frequency > +// (~100000 times per second). On Mac, the overhead of using a regular lock Nit: please say "On some versions of OS X...", since my understanding is that it's only a problem on 10.10 (and maybe 10.9?). Have we filed a bug with Apple about this? Surely we can't be the only people running into this.
Attachment #8572172 -
Flags: review?(nfroyd) → review+
Comment 21•9 years ago
|
||
(In reply to Nathan Froyd [:froydnj] [:nfroyd] from comment #20) > Nit: please say "On some versions of OS X...", since my understanding is > that it's only a problem on 10.10 (and maybe 10.9?). Have we filed a bug > with Apple about this? Surely we can't be the only people running into this. I think it is a problem on earlier versions, too. My patch to avoid getting the lock on every addref and release sped up 10.6 debug by ~20% on a number of tests, but had no real impact on Linux. It is just much worse in 10.10.
Comment 22•9 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #21) > (In reply to Nathan Froyd [:froydnj] [:nfroyd] from comment #20) > > Nit: please say "On some versions of OS X...", since my understanding is > > that it's only a problem on 10.10 (and maybe 10.9?). Have we filed a bug > > with Apple about this? Surely we can't be the only people running into this. > > I think it is a problem on earlier versions, too. My patch to avoid getting > the lock on every addref and release sped up 10.6 debug by ~20% on a number > of tests, but had no real impact on Linux. It is just much worse in 10.10. OK! That can stay as-is, then.
Comment 23•9 years ago
|
||
Looking forward to seeing this land. Any idea on the timeframe for that? We are anxious to move debug tests to 10.10 in bug 1125998 and bug 1140246.
Flags: needinfo?(mstange)
Assignee | ||
Comment 24•9 years ago
|
||
I'll finish this up within the next few days. But I think you can go ahead with turning on tests for 10.10 since bug 1138616 has landed in the meantime which should also have made them much faster.
Flags: needinfo?(mstange)
Comment 25•9 years ago
|
||
(In reply to Markus Stange [:mstange] from comment #24) > I'll finish this up within the next few days. But I think you can go ahead > with turning on tests for 10.10 since bug 1138616 has landed in the meantime > which should also have made them much faster. Well, that caused a frequent orange, so it may need to get backed out. Though this patch may just cause the same problem.
Assignee | ||
Comment 26•9 years ago
|
||
Oh. :-(
Comment 27•9 years ago
|
||
So it looks like the bug 1138616 is still in production and the intermittents are being tracked in bug 1122249. I'm going to ask relops to reimage more 10.10 machines in bug 1140246.
Assignee | ||
Comment 28•9 years ago
|
||
With review comments addressed. https://treeherder.mozilla.org/#/jobs?repo=try&revision=94eb8844378e Using an RAII class for the lock caused a shutdown hang and turned all debug tests red, which was the reason why I didn't land this sooner. I've made a small change that fixes the deadlock. The problem was in nsTraceRefCnt::DumpStatistics, which has an nsTArray<...> entries, and we were holding the lock while that array was destroyed, and gLogging was no longer NoLogging, so the array destructor logging deadlocked. The order before my patch was: LOCK() gLogging = NoLogging create array gLogging = OnlyBloatLogging UNLOCK() destroy array <-- destructor logging here The bad version of my patch had this order: LOCK() gLogging = NoLogging create array gLogging = OnlyBloatLogging destroy array <-- destructor logging deadlocks here UNLOCK() And with AutoRestore for gLogging (current patch): LOCK() gLogging = NoLogging create array destroy array <-- destructor doesn't try to log gLogging = OnlyBloatLogging UNLOCK() The version before the patch looks wrong to me, since we count the destruction of the array, but not its creation. But since that only happens after we've already dumped the statistics table, it probably didn't matter.
Attachment #8572172 -
Attachment is obsolete: true
Comment 29•9 years ago
|
||
Did you mean to set any flags on that and/or land it?
Assignee | ||
Comment 30•9 years ago
|
||
I meant to land it as soon as the try results were in, which they are now. https://hg.mozilla.org/integration/mozilla-inbound/rev/b83f949ac00d
https://hg.mozilla.org/mozilla-central/rev/b83f949ac00d
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
status-firefox39:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla39
You need to log in
before you can comment on or make changes to this bug.
Description
•