592557 - Eliminate uses of PR_Atomic{Increment,Decrement} functions in favor of PR_ATOMIC_{INCREMENT,DECREMENT} macros

Justin Lebar (not reading bugmail)

Assignee

Description

•

15 years ago

There are many uses of PR_Atomic{Increment,Decrement} left in the code. Let's get rid of 'em.

Justin Lebar (not reading bugmail)

Assignee

Comment 1

•

15 years ago

Wow. Some of these calls -- especially uses of atomic set -- are really fishy. Feast your eyes, for instance, on http://mxr.mozilla.org/mozilla-central/source/security/manager/ssl/src/nsNSSComponent.cpp#300 I'm going to try to only modify code which is plausibly using the atomic ops correctly.

Justin Lebar (not reading bugmail)

Assignee

Comment 2

•

15 years ago

Just figured out an ugly Windows build error and pushed to try. We'll see tomorrow morning what the perf looks like. Based on the results from bug 587853, I'm not hopeful that this will speed things up a lot. But I think it's still useful if only for hygiene.

Justin Lebar (not reading bugmail)

Assignee

Comment 3

•

15 years ago

Results from my try push are

Justin Lebar (not reading bugmail)

Assignee

Comment 4

•

15 years ago

Results from my try push are trickling in. Looks like a 2% win on Dromaeo DOM on Linux 32- and 64-bit. No results from other platforms yet. http://bit.ly/btOXE3

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 5

•

15 years ago

Wait, where's your patch?

Justin Lebar (not reading bugmail)

Assignee

Updated

•

15 years ago

Depends on: 594829

Justin Lebar (not reading bugmail)

Assignee

Comment 6

•

15 years ago

Looks like an infra problem meant that I only got Linux builds for my latest push. I'll post here when we get the rest of the results.

Justin Lebar (not reading bugmail)

Assignee

Comment 7

•

15 years ago

Attached patch Patch v1 (obsolete) — Details — Splinter Review

Assignee: nobody → justin.lebar+bug

Status: NEW → ASSIGNED

Justin Lebar (not reading bugmail)

Assignee

Comment 8

•

15 years ago

New results coming in at http://bit.ly/d1SS2D It's hard to tell what's statistically significant and what isn't. For instance, the Dromaeo win from comment 4 has disappeared on Linux 32-bit. No 64-bit results yet.

Justin Lebar (not reading bugmail)

Assignee

Comment 9

•

15 years ago

We're not going to get Linux-64 results on this push-to-try, due to another infra failure.

Justin Lebar (not reading bugmail)

Assignee

Comment 10

•

15 years ago

Miracle of miracles, we actually got a full try run with talos. Well...mostly. We're missing some Windows results. http://bit.ly/96LJ1t This is clean on try, but it actually looks like a perf regression on a number of tests. Maybe this is noise, but tsspider looks better, while dromaeo sunspider and dromaeo v8 look worse? dromaeo_dom is 4.5% better on Win7, dromaeo_jslib is improved ~1% everywhere. The results are all across the board. I find it conceivable that this patch makes things slower, but it seems more likely that the data is unreliable. I'm not sure how to proceed here...

Justin Lebar (not reading bugmail)

Assignee

Comment 11

•

15 years ago

Taras, maybe you have a better idea how to interpret these numbers?

(dormant account)

Comment 12

•

15 years ago

(In reply to comment #11) > Taras, maybe you have a better idea how to interpret these numbers? Nope

Justin Lebar (not reading bugmail)

Assignee

Comment 13

•

15 years ago

Since we don't have numbers we're confident in, maybe we should wait until we branch, land on trunk, and see if we observe an improvement there. If so, we can backport to 2.0 branch.

Justin Lebar (not reading bugmail)

Assignee

Comment 14

•

15 years ago

For kicks, I pushed to try to get new talos numbers. But I must have gotten a dying mac slave, because I saw things like a 735% regression on ts_cold_generated_med. The high points of the results are a 5.4% improvement in dromaeo_jslib on Win7 and WinXP, and a general 1.5-4% improvement in dromaeo_css across the board (except on mac-32, but again, maybe that's just weirdness). I'll get new numbers and see if they're more sensible. http://goo.gl/FgFH

timeless

Comment 15

•

15 years ago

- PRInt32 n = PR_AtomicIncrement((PRInt32*)&_refc); \ + PRInt32 n = NS_AtomicIncrementRefcnt(_refc); \ please eventually try to get the alignment of the slashes right :) i think sometimes we find that changing from inline to function call changes a function's size and thus how the compiler handles inlining which affects the instruction cache.

Justin Lebar (not reading bugmail)

Assignee

Comment 16

•

15 years ago

(In reply to comment #15) > please eventually try to get the alignment of the slashes right :) Thanks for catching that. :) > i think sometimes we find that changing from inline to function call changes a > function's size and thus how the compiler handles inlining which affects the > instruction cache. Note that here we're changing from a function call to an inline, not the other way around. Since the inlined function consists of a single instruction, I doubt it's affecting the compiler's heuristics. But who knows...

Justin Lebar (not reading bugmail)

Assignee

Comment 17

•

15 years ago

Here's another set of numbers: http://goo.gl/6HtG (missing some Mac-64 results). The compare talos site just went down, but I'll analyze when it comes back up.

Justin Lebar (not reading bugmail)

Assignee

Comment 18

•

15 years ago

New links: http://goo.gl/yOHX http://goo.gl/mWO2 These two sets of tests were run on the exact same code. The giant Mac-32 regressions from comment 14 didn't reproduce. It looks like the second time, I got a bad Linux slave for the same set of tests, so Linux-32 shows a ~10% regression on a bunch of them. Regressions common to both runs are: * dromaeo_sunspider -- The first run shows a Win7 and Linux-32 regression of 11%, and the second shows Linux flat and Win7 still down 11%. So I'd guess that the Windows result also noise. * ts_cold_generated_max_shutdown -- There's a consistent large (40-16%) regression on mac-32 ts_cold_generated_max_shutdown. Seeing as this test is clearly wonky, I imagine this is not meaningful. * ts_shutdown -- 30-50% mac-64 regression. All the other tests are flat, so I'd guess this doesn't mean anything. Common upshots are a little less clear. There's a consistent 12-14% improvement on WinXP dromaeo_v8, and about a 1% improvement on Win7 tsspider_nochrome. Other tests show some amount of improvement, but it's hard to separate signal from noise. I'm going to push to try again and see if those three common regressions go away. If so, I'd be in favor of taking this, since unless something really weird is going on, it should make us faster.

Justin Lebar (not reading bugmail)

Assignee

Updated

•

15 years ago

Depends on: 601637

Justin Lebar (not reading bugmail)

Assignee

Comment 19

•

15 years ago

Attached patch Patch v1.1 — Details — Splinter Review

Fixing whitespace (comment 15).

Attachment #473609 - Attachment is obsolete: true

timeless

Comment 20

•

15 years ago

Yeah, I know. The way I was thinking is that whereas before we had a fat function (AddRef/Release) calling another function (pr_inc/pr_dec), it couldn't be inlined elsewhere (everyone using addref/release). Potentially we now have a "seemingly" small function (addref/release) which could be inlined lots of places (everyone uses addref/release). In the case of AddRef/Release, you could actually be hurting the cpu cache since perhaps the classic addref/release were kept hot and now the inlined ones in lots of places wouldn't share a hot point. If I'm right, it should be pretty easy to verify. But I don't have the resources to do that. I'm just back from vacation and have a month's worth of mail to read and projects to worry about.

Justin Lebar (not reading bugmail)

Assignee

Comment 21

•

15 years ago

I'd be really surprised if the wackyness we're seeing on Talos has to do with inlining effects. If it were, I'd expect to see consistent slow/fastness on a given platform, not certain tests (shutdown tests??) running tens of percents slower.

Justin Lebar (not reading bugmail)

Assignee

Comment 22

•

15 years ago

More Talos runs have been added to http://goo.gl/mWO2 I've verified that the three regressions identified in comment 18 are due to outliers. The large WinXP dromaeo_v8 win may also be the work of an outlier, and the tsspider_nochrome Win7 improvement seems marginal at best. There's a fair amount of green (well, green on my copy of compare talos, red on sdwilsh's copy) in the dromaeo suite. I'm not convinced that this is anything but noise, though. The solution to this uncertainty, of course, is more Talos runs. :)

Justin Lebar (not reading bugmail)

Assignee

Comment 23

•

15 years ago

Newest numbers are at http://goo.gl/mWO2 (same link as comment 22). I now have enough Talos runs to say that I don't think this actually results in any perf regressions. The large (~9%) WinXP dromaeo_v8 win persisted even after a third Talos run, so I judge that as likely to be real, although maybe not quite 9%. There's also a more tenuous but solidly plausible ~3% Win XP/7 dromaeo_dom win. With those numbers, I'm comfortable asking for review.

Justin Lebar (not reading bugmail)

Assignee

Updated

•

15 years ago

Attachment #480648 - Flags: review?(benjamin)

Justin Lebar (not reading bugmail)

Assignee

Updated

•

15 years ago

Keywords: perf

Benjamin Smedberg

Comment 24

•

15 years ago

Comment on attachment 480648 [details] [diff] [review] Patch v1.1 This is fine. Brendan or somebody from JS should probably rubberstamp this as well.

Attachment #480648 - Flags: review?(benjamin) → review+

Justin Lebar (not reading bugmail)

Assignee

Updated

•

15 years ago

Attachment #480648 - Flags: superreview?(brendan)

Andreas Gal :gal

Comment 25

•

15 years ago

Comment on attachment 480648 [details] [diff] [review] Patch v1.1 r=me for the js engine and xpconnect changes

Attachment #480648 - Flags: review+

Justin Lebar (not reading bugmail)

Assignee

Comment 26

•

15 years ago

Comment on attachment 480648 [details] [diff] [review] Patch v1.1 Requesting a2.0. Are we good to go here, or does this need another set of eyes?

Attachment #480648 - Flags: superreview?(brendan) → approval2.0?

Brendan Eich [:brendan]

Comment 27

•

15 years ago

More than good to go this point. Go! /be

Justin Lebar (not reading bugmail)

Assignee

Comment 28

•

15 years ago

I'll land once the tree re-opens for non-b7 blockers.

Justin Lebar (not reading bugmail)

Assignee

Updated

•

15 years ago

Whiteboard: [check-in-after-b7]

Benjamin Smedberg

Comment 29

•

15 years ago

This has nonzero risk and so I think we should wait.

Attachment #480648 - Flags: approval2.0? → approval2.0-

Justin Lebar (not reading bugmail)

Assignee

Comment 30

•

15 years ago

(In reply to comment #29) > This has nonzero risk and so I think we should wait. Until when? If Talos is to believed, this is a sizable Windows V8 win. If the numbers we have aren't convincing enough, I can have releng kick ten runs overnight so we can see.

Benjamin Smedberg

Comment 31

•

15 years ago

Until after FF4 branches. I'm being conservative on approvals: this is a medium perf boost on a semi-contrived benchmark, and a small/medium risk that we got something wrong or that the behavior of the macro is different from the function call (or that the compiler will do optimize things incorrectly with the macro and skip necessary barriers). I'm willing to be overruled! ;-)

Justin Lebar (not reading bugmail)

Assignee

Comment 32

•

15 years ago

I definitely agree that there's risk, although if we're misunderstanding something about these macros, it's possible/likely that bug 587853 also messed something up. But that bug changed much less than this bug, so I understand the conservatism. We could reduce risk by making this patch smaller, since I imagine that most of these callsites aren't performance-relevant. On the other hand, an advantage of using the macro most everywhere is that if it does do the Wrong Thing, we're more likely to notice that there's a problem.

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 33

•

15 years ago

The risk isn't unknowable, is it? I mean, we're in beta still. Let's get this in and then if we start noticing problems we can easily back this out.

Justin Lebar (not reading bugmail)

Assignee

Comment 34

•

15 years ago

I just re-ran the Talos numbers [1]. The big dromaeo-v8 improvement has gone away, I presume because the JS engine has gotten rid of atomic instructions in that benchmark's hot path. It's still an improvement of 2% on dromaeo DOM and a 1.5% on dromaeo jslib on Windows. Given that there's no longer a large perf win to be had, I'm OK waiting until after we branch. [1] http://goo.gl/Bg0X3

Justin Lebar (not reading bugmail)

Assignee

Updated

•

15 years ago

Whiteboard: [check-in-after-b7] → [check-in-after-branch]

Justin Lebar (not reading bugmail)

Assignee

Updated

•

15 years ago

Blocks: post2.0

Philip Chee

Updated

•

15 years ago

No longer blocks: post2.0

Depends on: post2.0

(no longer active)

Comment 35

•

14 years ago

This patch doesn't apply cleanly on trunk any more.

Whiteboard: [check-in-after-branch] → [check-in-after-branch][not-ready]

Justin Lebar (not reading bugmail)

Assignee

Comment 36

•

14 years ago

Fixed in cedar: http://hg.mozilla.org/projects/cedar/rev/d49c938dbded

Whiteboard: [check-in-after-branch][not-ready] → [fixed-in-cedar]

(no longer active)

Comment 37

•

14 years ago

http://hg.mozilla.org/mozilla-central/rev/d49c938dbded

Status: ASSIGNED → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

Whiteboard: [fixed-in-cedar]

Target Milestone: --- → mozilla2.2

Justin Lebar (not reading bugmail)

Assignee

Updated

•

14 years ago

Blocks: 661596

Patch v1 15 years ago Justin Lebar (not reading bugmail) 44.45 KB, patch		Details \| Diff \| Splinter Review
Patch v1.1 15 years ago Justin Lebar (not reading bugmail) 44.60 KB, patch	benjamin : review+ gal : review+ benjamin : approval2.0-	Details \| Diff \| Splinter Review