Closed
Bug 587853
Opened 14 years ago
Closed 14 years ago
Use PR_ATOMIC_{INCREMENT|DECREMENT} macros for threadsafe addref / release
Categories
(Core :: XPCOM, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: justin.lebar+bug, Assigned: justin.lebar+bug)
References
Details
(Keywords: perf)
Attachments
(1 file, 2 obsolete files)
14.73 KB,
patch
|
benjamin
:
review+
benjamin
:
approval2.0+
|
Details | Diff | Splinter Review |
Right now our threadsafe addref/release code uses PR_Atomic{Increment|Decrement} (see nsISupportsImpl.h). The PR_Atomic* methods are not inlined, and are implemented using locks. pthread_(un)lock shows up in profiles as taking ~3% of total execution time on my machine, so this is probably a code path worth looking at. I think we could speed this up by inlining the calls and using atomic integer operations where available (x86/x64 at the very least, maybe also ARM).
Assignee | ||
Comment 1•14 years ago
|
||
jst says we're using atomic instructions for this on Windows. Looking around more...
Assignee | ||
Comment 2•14 years ago
|
||
Ah, interesting. Perhaps NS_IMPL_THREADSAFE_{ADDREF|RELEASE} calls the wrong atomic function. nsISupportsImpl.h calls PR_AtomicIncrement, but the fast one is PR_ATOMIC_INCREMENT. dbaron, do you know what's going on here?
The problem with the macro form (PR_ATOMIC_INCREMENT) is that different platform have different requirements for the variable that they modify. Windows takes a (long volatile *) while gcc uses plain (long *) i think?
Assignee | ||
Comment 4•14 years ago
|
||
I think GCC will accept pretty much anything [1]. Looks like MSVC wants a long volatile *. Certainly this shouldn't stop us from using the fast atomic operations, though. If the intrinsics are no good, we can use inline asm, right? [1] http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html
Assignee | ||
Comment 5•14 years ago
|
||
Another alternative would be typedef'ing RefcountType to be (long *) on Unix and (volatile long *) on Windows. Then we could use the macros, right?
Assignee | ||
Comment 6•14 years ago
|
||
Or rather, typedef'ing nsrefcnt to volatile long* on Windows (for both 32 and 64-bit). See nscore.h. On my Linux box, changing to the PR_ATOMIC_INCREMENT/DECREMENT macro made the generic AddRef function disappear from my profile.
Assignee | ||
Comment 7•14 years ago
|
||
And Windows is LLP64 [1], so volatile long* is a 32-bit value on both 32- and 64-bit systems, which is what we want. I wonder if we should be using InterlockedIncrementAcquire rather than InterlockedIncrement [2]. [1] http://en.wikipedia.org/wiki/64-bit#Specific_C-language_data_models [2] http://msdn.microsoft.com/en-us/library/ms683618%28v=VS.85%29.aspx
Assignee | ||
Comment 8•14 years ago
|
||
Ah, no. We probably want a full memory fence.
Assignee | ||
Comment 9•14 years ago
|
||
Tested only on Linux, where I haven't observed any problems. I'll push to try and report back here. Upon some reflection, I don't think nsrefcnt needs to be |volatile long| on Windows. So long as we only access the refcount through the atomic increment/decrement functions, I don't think it needs to be declared as volatile.
Assignee | ||
Updated•14 years ago
|
Summary: Use faster code for threadsafe addref / release → Use PR_ATOMIC_{INCREMENT|DECREMENT} macros for threadsafe addref / release
Assignee | ||
Updated•14 years ago
|
Attachment #466685 -
Flags: review? → review?(benjamin)
You don't want to simply change nsrefcnt as that would affect single threaded refcounts too.
Assignee | ||
Updated•14 years ago
|
Attachment #466685 -
Flags: review?(benjamin)
Assignee | ||
Comment 11•14 years ago
|
||
This builds fine everywhere, but crashes on Windows 2003 and 7 in AddRef(). I'll try again setting nsrefcnt to volatile everywhere. If that works, I'll make a new atomic refcnt type and see how that behaves. http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1282092514.1282093368.26864.gz http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1282100054.1282101152.1519.gz
Assignee | ||
Comment 12•14 years ago
|
||
Hm, typedef'ing everything as volatile definitely doesn't compile. I'll put away the axe and try a scalpel in the morning.
Assignee | ||
Comment 13•14 years ago
|
||
This patch looks good on Win32 debug, but the win32 opt builds are timing out on try (for reasons we suspect are unrelated to the patch). I'll report back here once we successfully run the Win32 opt tests.
Attachment #466685 -
Attachment is obsolete: true
Assignee | ||
Comment 14•14 years ago
|
||
Apparently Win32 VMs have been failing PGO builds a lot lately. Aki says new machines are coming Real Soon Now and that they'll probably fix the problem. I think we should wait on this until the new machines get up and running. We could try and spin our own PGO build and run all the tests locally, but even if they came back clean, I'd still want to push to try to confirm that the production setup doesn't break with this patch.
Assignee | ||
Comment 15•14 years ago
|
||
Got a Win32 opt build. This new one fails less than the previous patch, but still fails, now with "child process still alive after shutdown": http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1282978007.1282979433.6551.gz
Assignee | ||
Comment 16•14 years ago
|
||
This appears to be a statistically significant win on Drameo DOM on Windows and Linux: Linux x86: 176.9 (+/- 3.0) -> 180.6 = 2.1% speedup Linux x64: 223.0 (+/- 0.5) -> 226.0 = 1.3% speedup Win7 x86: 233.1 (+/- 4.5) -> 243.6 = 4.5% speedup I pushed to try before bz's landing of -fomit-frame-pointer on Mac (which gave a large speedup across the board), and TBPL isn't showing talos results from before then, so I don't have a meaningful comparison for Mac offhand. The absolute numbers are x86: 217.81, x64: 235.80, fwiw. Of course, Windows still appears to be broken, so this patch (and the perf numbers) will likely change. But this is at least promising.
Assignee | ||
Comment 17•14 years ago
|
||
Perhaps the results above were premature; my latest set of talos runs don't show a significant speedup on Drameo DOM (or any other tests that I see). New results: http://bit.ly/9X7LTf On the other hand, my last push was almost all-green on Try, with the exception of a timeout on one of the Mac Talos runs. http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1283147324.1283151667.24509.gz I'm spinning another round of tests to try and see if anything is intermittent.
Assignee | ||
Comment 18•14 years ago
|
||
Second run on try looks clean, so I'm putting this up for review.
Attachment #469087 -
Attachment is obsolete: true
Attachment #470517 -
Flags: review?(bent.mozilla)
Comment on attachment 470517 [details] [diff] [review] Patch v3 You'll need an XPCOM peer to look this over.
Attachment #470517 -
Flags: review?(bent.mozilla) → review?(benjamin)
Comment 20•14 years ago
|
||
Comment on attachment 470517 [details] [diff] [review] Patch v3 Did you check generated assembly to make sure this was actually being inlined?
Assignee | ||
Comment 21•14 years ago
|
||
They're inlined in nsJSIID::AddRef/Release on my Linux x64 box.
Updated•14 years ago
|
Attachment #470517 -
Flags: review?(benjamin)
Attachment #470517 -
Flags: review+
Attachment #470517 -
Flags: approval2.0+
Assignee | ||
Comment 22•14 years ago
|
||
http://hg.mozilla.org/mozilla-central/rev/93dab0790dca
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Hey, cool! Glad this landed. Any idea how the perf worked out?
Assignee | ||
Comment 24•14 years ago
|
||
I looked over the talos numbers using [1] and didn't see anything move. I'm not sure why when I initially looked on try I saw such a big difference. [1]: http://services.forerunnerdesigns.com/compare-talos/
You need to log in
before you can comment on or make changes to this bug.
Description
•