Closed Bug 587853 Opened 9 years ago Closed 9 years ago

Use PR_ATOMIC_{INCREMENT|DECREMENT} macros for threadsafe addref / release

Categories

(Core :: XPCOM, defect)

x86_64
All
defect
Not set

Tracking

()

RESOLVED FIXED

People

(Reporter: justin.lebar+bug, Assigned: justin.lebar+bug)

References

Details

(Keywords: perf)

Attachments

(1 file, 2 obsolete files)

Right now our threadsafe addref/release code uses PR_Atomic{Increment|Decrement} (see nsISupportsImpl.h).  The PR_Atomic* methods are not inlined, and are implemented using locks.

pthread_(un)lock shows up in profiles as taking ~3% of total execution time on my machine, so this is probably a code path worth looking at.

I think we could speed this up by inlining the calls and using atomic integer operations where available (x86/x64 at the very least, maybe also ARM).
jst says we're using atomic instructions for this on Windows.  Looking around more...
Ah, interesting.  Perhaps NS_IMPL_THREADSAFE_{ADDREF|RELEASE} calls the wrong atomic function.

nsISupportsImpl.h calls PR_AtomicIncrement, but the fast one is PR_ATOMIC_INCREMENT.

dbaron, do you know what's going on here?
The problem with the macro form (PR_ATOMIC_INCREMENT) is that different platform have different requirements for the variable that they modify. Windows takes a (long volatile *) while gcc uses plain (long *) i think?
I think GCC will accept pretty much anything [1].  Looks like MSVC wants a long volatile *.

Certainly this shouldn't stop us from using the fast atomic operations, though.  If the intrinsics are no good, we can use inline asm, right?

[1] http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Atomic-Builtins.html
Another alternative would be typedef'ing RefcountType to be (long *) on Unix and (volatile long *) on Windows.  Then we could use the macros, right?
Or rather, typedef'ing nsrefcnt to volatile long* on Windows (for both 32 and 64-bit).  See nscore.h.

On my Linux box, changing to the PR_ATOMIC_INCREMENT/DECREMENT macro made the generic AddRef function disappear from my profile.
And Windows is LLP64 [1], so volatile long* is a 32-bit value on both 32- and 64-bit systems, which is what we want.

I wonder if we should be using InterlockedIncrementAcquire rather than InterlockedIncrement [2].

[1] http://en.wikipedia.org/wiki/64-bit#Specific_C-language_data_models
[2] http://msdn.microsoft.com/en-us/library/ms683618%28v=VS.85%29.aspx
Ah, no.  We probably want a full memory fence.
Attached patch Patch v1 (obsolete) — Splinter Review
Tested only on Linux, where I haven't observed any problems.  I'll push to try and report back here.

Upon some reflection, I don't think nsrefcnt needs to be |volatile long| on Windows.  So long as we only access the refcount through the atomic increment/decrement functions, I don't think it needs to be declared as volatile.
Assignee: nobody → justin.lebar+bug
Status: NEW → ASSIGNED
Attachment #466685 - Flags: review?
Summary: Use faster code for threadsafe addref / release → Use PR_ATOMIC_{INCREMENT|DECREMENT} macros for threadsafe addref / release
Attachment #466685 - Flags: review? → review?(benjamin)
You don't want to simply change nsrefcnt as that would affect single threaded refcounts too.
Attachment #466685 - Flags: review?(benjamin)
This builds fine everywhere, but crashes on Windows 2003 and 7 in AddRef().

I'll try again setting nsrefcnt to volatile everywhere.  If that works, I'll make a new atomic refcnt type and see how that behaves.

http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1282092514.1282093368.26864.gz

http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1282100054.1282101152.1519.gz
Hm, typedef'ing everything as volatile definitely doesn't compile.

I'll put away the axe and try a scalpel in the morning.
Attached patch Patch v2 (obsolete) — Splinter Review
This patch looks good on Win32 debug, but the win32 opt builds are timing out on try (for reasons we suspect are unrelated to the patch).  I'll report back here once we successfully run the Win32 opt tests.
Attachment #466685 - Attachment is obsolete: true
Apparently Win32 VMs have been failing PGO builds a lot lately.  Aki says new machines are coming Real Soon Now and that they'll probably fix the problem.

I think we should wait on this until the new machines get up and running.  We could try and spin our own PGO build and run all the tests locally, but even if they came back clean, I'd still want to push to try to confirm that the production setup doesn't break with this patch.
Got a Win32 opt build.

This new one fails less than the previous patch, but still fails, now with "child process still alive after shutdown":

http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1282978007.1282979433.6551.gz
This appears to be a statistically significant win on Drameo DOM on Windows and Linux:

Linux x86: 176.9 (+/- 3.0) -> 180.6 = 2.1% speedup
Linux x64: 223.0 (+/- 0.5) -> 226.0 = 1.3% speedup
Win7 x86:  233.1 (+/- 4.5) -> 243.6 = 4.5% speedup

I pushed to try before bz's landing of -fomit-frame-pointer on Mac (which gave a large speedup across the board), and TBPL isn't showing talos results from before then, so I don't have a meaningful comparison for Mac offhand.  The absolute numbers are x86: 217.81, x64: 235.80, fwiw.

Of course, Windows still appears to be broken, so this patch (and the perf numbers) will likely change.  But this is at least promising.
Perhaps the results above were premature; my latest set of talos runs don't show a significant speedup on Drameo DOM (or any other tests that I see).

New results: http://bit.ly/9X7LTf

On the other hand, my last push was almost all-green on Try, with the exception of a timeout on one of the Mac Talos runs.

http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1283147324.1283151667.24509.gz

I'm spinning another round of tests to try and see if anything is intermittent.
Attached patch Patch v3Splinter Review
Second run on try looks clean, so I'm putting this up for review.
Attachment #469087 - Attachment is obsolete: true
Attachment #470517 - Flags: review?(bent.mozilla)
Comment on attachment 470517 [details] [diff] [review]
Patch v3

You'll need an XPCOM peer to look this over.
Attachment #470517 - Flags: review?(bent.mozilla) → review?(benjamin)
Comment on attachment 470517 [details] [diff] [review]
Patch v3

Did you check generated assembly to make sure this was actually being inlined?
They're inlined in nsJSIID::AddRef/Release on my Linux x64 box.
Attachment #470517 - Flags: review?(benjamin)
Attachment #470517 - Flags: review+
Attachment #470517 - Flags: approval2.0+
http://hg.mozilla.org/mozilla-central/rev/93dab0790dca
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Blocks: 592557
Hey, cool! Glad this landed. Any idea how the perf worked out?
I looked over the talos numbers using [1] and didn't see anything move.  I'm not sure why when I initially looked on try I saw such a big difference.

[1]: http://services.forerunnerdesigns.com/compare-talos/
You need to log in before you can comment on or make changes to this bug.