Closed Bug 592557 Opened 14 years ago Closed 13 years ago

Eliminate uses of PR_Atomic{Increment,Decrement} functions in favor of PR_ATOMIC_{INCREMENT,DECREMENT} macros

Categories

(Core :: General, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla5

People

(Reporter: justin.lebar+bug, Assigned: justin.lebar+bug)

References

Details

(Keywords: perf)

Attachments

(1 file, 1 obsolete file)

There are many uses of PR_Atomic{Increment,Decrement} left in the code.  Let's get rid of 'em.
Wow.  Some of these calls -- especially uses of atomic set -- are really fishy.

Feast your eyes, for instance, on

  http://mxr.mozilla.org/mozilla-central/source/security/manager/ssl/src/nsNSSComponent.cpp#300

I'm going to try to only modify code which is plausibly using the atomic ops correctly.
Just figured out an ugly Windows build error and pushed to try.  We'll see tomorrow morning what the perf looks like.

Based on the results from bug 587853, I'm not hopeful that this will speed things up a lot.  But I think it's still useful if only for hygiene.
Results from my try push are
Results from my try push are trickling in.  Looks like a 2% win on Dromaeo DOM on Linux 32- and 64-bit.  No results from other platforms yet.

http://bit.ly/btOXE3
Wait, where's your patch?
Depends on: 594829
Looks like an infra problem meant that I only got Linux builds for my latest push.  I'll post here when we get the rest of the results.
Attached patch Patch v1 (obsolete) — Splinter Review
Assignee: nobody → justin.lebar+bug
Status: NEW → ASSIGNED
New results coming in at

  http://bit.ly/d1SS2D

It's hard to tell what's statistically significant and what isn't.  For instance, the Dromaeo win from comment 4 has disappeared on Linux 32-bit.  No 64-bit results yet.
We're not going to get Linux-64 results on this push-to-try, due to another infra failure.
Miracle of miracles, we actually got a full try run with talos.  Well...mostly.  We're missing some Windows results.

  http://bit.ly/96LJ1t

This is clean on try, but it actually looks like a perf regression on a number of tests.  Maybe this is noise, but tsspider looks better, while dromaeo sunspider and dromaeo v8 look worse?  dromaeo_dom is 4.5% better on Win7, dromaeo_jslib is improved ~1% everywhere.

The results are all across the board.  I find it conceivable that this patch makes things slower, but it seems more likely that the data is unreliable.  I'm not sure how to proceed here...
Taras, maybe you have a better idea how to interpret these numbers?
(In reply to comment #11)
> Taras, maybe you have a better idea how to interpret these numbers?

Nope
Since we don't have numbers we're confident in, maybe we should wait until we branch, land on trunk, and see if we observe an improvement there.  If so, we can backport to 2.0 branch.
For kicks, I pushed to try to get new talos numbers.  But I must have gotten a dying mac slave, because I saw things like a 735% regression on ts_cold_generated_med.

The high points of the results are a 5.4% improvement in dromaeo_jslib on Win7 and WinXP, and a general 1.5-4% improvement in dromaeo_css across the board (except on mac-32, but again, maybe that's just weirdness).

I'll get new numbers and see if they're more sensible.

http://goo.gl/FgFH
-        PRInt32 n = PR_AtomicIncrement((PRInt32*)&_refc);                    \
+        PRInt32 n = NS_AtomicIncrementRefcnt(_refc);                          \

please eventually try to get the alignment of the slashes right :)

i think sometimes we find that changing from inline to function call changes a function's size and thus how the compiler handles inlining which affects the instruction cache.
(In reply to comment #15)
> please eventually try to get the alignment of the slashes right :)

Thanks for catching that.  :)

> i think sometimes we find that changing from inline to function call changes a
> function's size and thus how the compiler handles inlining which affects the
> instruction cache.

Note that here we're changing from a function call to an inline, not the other way around.  Since the inlined function consists of a single instruction, I doubt it's affecting the compiler's heuristics.  But who knows...
Here's another set of numbers: http://goo.gl/6HtG (missing some Mac-64 results).

The compare talos site just went down, but I'll analyze when it comes back up.
New links:

http://goo.gl/yOHX
http://goo.gl/mWO2

These two sets of tests were run on the exact same code.  The giant Mac-32 regressions from comment 14 didn't reproduce.  It looks like the second time, I got a bad Linux slave for the same set of tests, so Linux-32 shows a ~10% regression on a bunch of them.

Regressions common to both runs are:

* dromaeo_sunspider -- The first run shows a Win7 and Linux-32 regression of 11%, and the second shows Linux flat and Win7 still down 11%.  So I'd guess that the Windows result also noise.

* ts_cold_generated_max_shutdown -- There's a consistent large (40-16%) regression on mac-32 ts_cold_generated_max_shutdown.  Seeing as this test is clearly wonky, I imagine this is not meaningful.

* ts_shutdown -- 30-50% mac-64 regression.  All the other tests are flat, so I'd guess this doesn't mean anything.

Common upshots are a little less clear.  There's a consistent 12-14% improvement on WinXP dromaeo_v8, and about a 1% improvement on Win7 tsspider_nochrome.  Other tests show some amount of improvement, but it's hard to separate signal from noise.

I'm going to push to try again and see if those three common regressions go away.  If so, I'd be in favor of taking this, since unless something really weird is going on, it should make us faster.
Depends on: 601637
Attached patch Patch v1.1Splinter Review
Fixing whitespace (comment 15).
Attachment #473609 - Attachment is obsolete: true
Yeah, I know. The way I was thinking is that whereas before we had a fat function (AddRef/Release) calling another function (pr_inc/pr_dec), it couldn't be inlined elsewhere (everyone using addref/release). Potentially we now have a "seemingly" small function (addref/release) which could be inlined lots of places (everyone uses addref/release). In the case of AddRef/Release, you could actually be hurting the cpu cache since perhaps the classic addref/release were kept hot and now the inlined ones in lots of places wouldn't share a hot point.

If I'm right, it should be pretty easy to verify. But I don't have the resources to do that. I'm just back from vacation and have a month's worth of mail to read and projects to worry about.
I'd be really surprised if the wackyness we're seeing on Talos has to do with inlining effects.  If it were, I'd expect to see consistent slow/fastness on a given platform, not certain tests (shutdown tests??) running tens of percents slower.
More Talos runs have been added to http://goo.gl/mWO2

I've verified that the three regressions identified in comment 18 are due to outliers.

The large WinXP dromaeo_v8 win may also be the work of an outlier, and the tsspider_nochrome Win7 improvement seems marginal at best.

There's a fair amount of green (well, green on my copy of compare talos, red on sdwilsh's copy) in the dromaeo suite.  I'm not convinced that this is anything but noise, though.

The solution to this uncertainty, of course, is more Talos runs.  :)
Newest numbers are at http://goo.gl/mWO2 (same link as comment 22).

I now have enough Talos runs to say that I don't think this actually results in any perf regressions.

The large (~9%) WinXP dromaeo_v8 win persisted even after a third Talos run, so I judge that as likely to be real, although maybe not quite 9%.  There's also a more tenuous but solidly plausible ~3% Win XP/7 dromaeo_dom win.

With those numbers, I'm comfortable asking for review.
Attachment #480648 - Flags: review?(benjamin)
Keywords: perf
Comment on attachment 480648 [details] [diff] [review]
Patch v1.1

This is fine. Brendan or somebody from JS should probably rubberstamp this as well.
Attachment #480648 - Flags: review?(benjamin) → review+
Attachment #480648 - Flags: superreview?(brendan)
Comment on attachment 480648 [details] [diff] [review]
Patch v1.1

r=me for the js engine and xpconnect changes
Attachment #480648 - Flags: review+
Comment on attachment 480648 [details] [diff] [review]
Patch v1.1

Requesting a2.0.  Are we good to go here, or does this need another set of eyes?
Attachment #480648 - Flags: superreview?(brendan) → approval2.0?
More than good to go this point. Go!

/be
I'll land once the tree re-opens for non-b7 blockers.
Whiteboard: [check-in-after-b7]
This has nonzero risk and so I think we should wait.
Attachment #480648 - Flags: approval2.0? → approval2.0-
(In reply to comment #29)
> This has nonzero risk and so I think we should wait.

Until when?

If Talos is to believed, this is a sizable Windows V8 win.  If the numbers we have aren't convincing enough, I can have releng kick ten runs overnight so we can see.
Until after FF4 branches. I'm being conservative on approvals: this is a medium perf boost on a semi-contrived benchmark, and a small/medium risk that we got something wrong or that the behavior of the macro is different from the function call (or that the compiler will do optimize things incorrectly with the macro and skip necessary barriers). I'm willing to be overruled! ;-)
I definitely agree that there's risk, although if we're misunderstanding something about these macros, it's possible/likely that bug 587853 also messed something up.  But that bug changed much less than this bug, so I understand the conservatism.

We could reduce risk by making this patch smaller, since I imagine that most of these callsites aren't performance-relevant.  On the other hand, an advantage of using the macro most everywhere is that if it does do the Wrong Thing, we're more likely to notice that there's a problem.
The risk isn't unknowable, is it? I mean, we're in beta still. Let's get this in and then if we start noticing problems we can easily back this out.
I just re-ran the Talos numbers [1].

The big dromaeo-v8 improvement has gone away, I presume because the JS engine has gotten rid of atomic instructions in that benchmark's hot path.

It's still an improvement of 2% on dromaeo DOM and a 1.5% on dromaeo jslib on Windows.

Given that there's no longer a large perf win to be had, I'm OK waiting until after we branch.

[1] http://goo.gl/Bg0X3
Whiteboard: [check-in-after-b7] → [check-in-after-branch]
Blocks: post2.0
No longer blocks: post2.0
Depends on: post2.0
This patch doesn't apply cleanly on trunk any more.
Whiteboard: [check-in-after-branch] → [check-in-after-branch][not-ready]
Fixed in cedar: http://hg.mozilla.org/projects/cedar/rev/d49c938dbded
Whiteboard: [check-in-after-branch][not-ready] → [fixed-in-cedar]
http://hg.mozilla.org/mozilla-central/rev/d49c938dbded
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Whiteboard: [fixed-in-cedar]
Target Milestone: --- → mozilla2.2
Blocks: 661596
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: