Closed Bug 629968 Opened 9 years ago Closed 9 years ago
Investigate possible regression (-j only) caused by changeset b90090c29571
AWFY had a small period of time (some five days between http://hg.mozilla.org/tracemonkey/rev/284811f39ca6 and http://hg.mozilla.org/tracemonkey/rev/b034f8e72b2f) where it didn't test any builds. The next build after this showed a huge -j SunSpider regression, which hg bisect pointed to http://hg.mozilla.org/tracemonkey/rev/b90090c29571 -- is this real? Here's the before and after on my computer (use http://pastebin.com/KBQ8J7fh if this is not readable enough). FROM is 4cda18415cd6 (b90090c29571's parent), and TO is b90090c29571. TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: *1.196x as slow* 392.7ms +/- 0.7% 469.7ms +/- 0.8% significant ============================================================================= 3d: ?? 82.5ms +/- 2.9% 84.1ms +/- 2.5% not conclusive: might be *1.019x as slow* cube: - 23.3ms +/- 8.9% 23.1ms +/- 7.4% morph: ?? 6.8ms +/- 3.6% 7.3ms +/- 10.4% not conclusive: might be *1.066x as slow* raytrace: *1.026x as slow* 52.5ms +/- 1.5% 53.8ms +/- 2.0% significant access: ?? 57.6ms +/- 0.9% 58.5ms +/- 1.4% not conclusive: might be *1.015x as slow* binary-trees: ?? 30.1ms +/- 0.9% 30.5ms +/- 1.3% not conclusive: might be *1.012x as slow* fannkuch: ?? 15.2ms +/- 1.3% 15.5ms +/- 2.5% not conclusive: might be *1.020x as slow* nbody: ?? 5.9ms +/- 2.4% 6.2ms +/- 3.7% not conclusive: might be *1.042x as slow* nsieve: - 6.3ms +/- 3.6% 6.3ms +/- 3.5% bitops: - 17.5ms +/- 2.0% 17.1ms +/- 3.2% 3bit-bits-in-byte: - 0.8ms +/- 24.0% 0.6ms +/- 39.1% bits-in-byte: - 8.8ms +/- 1.9% 8.7ms +/- 2.6% bitwise-and: - 1.9ms +/- 5.4% 1.9ms +/- 7.6% nsieve-bits: ?? 5.9ms +/- 2.4% 6.0ms +/- 3.1% not conclusive: might be *1.008x as slow* controlflow: 1.026x as fast 34.0ms +/- 0.8% 33.2ms +/- 1.0% significant recursive: 1.026x as fast 34.0ms +/- 0.8% 33.2ms +/- 1.0% significant crypto: *3.65x as slow* 23.4ms +/- 2.2% 85.7ms +/- 0.7% significant aes: *1.58x as slow* 12.2ms +/- 3.2% 19.3ms +/- 1.1% significant md5: *4.30x as slow* 7.5ms +/- 3.2% 32.5ms +/- 1.2% significant sha1: *9.18x as slow* 3.7ms +/- 5.9% 34.0ms +/- 1.0% significant date: ?? 49.0ms +/- 1.0% 49.3ms +/- 1.8% not conclusive: might be *1.007x as slow* format-tofte: - 32.4ms +/- 1.6% 32.3ms +/- 1.3% format-xparb: ?? 16.6ms +/- 1.4% 17.0ms +/- 4.2% not conclusive: might be *1.024x as slow* math: ?? 22.9ms +/- 1.2% 23.0ms +/- 1.6% not conclusive: might be *1.007x as slow* cordic: - 7.5ms +/- 3.2% 7.4ms +/- 3.2% partial-sums: *1.031x as slow* 11.3ms +/- 1.8% 11.6ms +/- 2.0% significant spectral-norm: - 4.2ms +/- 4.1% 4.0ms +/- 0.0% regexp: ?? 15.1ms +/- 1.0% 15.3ms +/- 1.7% not conclusive: might be *1.010x as slow* dna: ?? 15.1ms +/- 1.0% 15.3ms +/- 1.7% not conclusive: might be *1.010x as slow* string: *1.142x as slow* 90.7ms +/- 1.2% 103.6ms +/- 0.8% significant base64: ?? 3.9ms +/- 3.7% 4.0ms +/- 2.6% not conclusive: might be *1.013x as slow* fasta: - 16.7ms +/- 2.6% 16.6ms +/- 1.4% tagcloud: - 31.1ms +/- 1.5% 31.1ms +/- 1.1% unpack-code: ?? 30.5ms +/- 1.8% 30.6ms +/- 1.4% not conclusive: might be *1.003x as slow* validate-input: *2.51x as slow* 8.5ms +/- 2.8% 21.4ms +/- 1.1% significant
> 10ms on validate-input, and > 30ms on sha1. Looks like a pretty serious regression to me.
blocking2.0: --- → ?
-j only isn't the config we ship, unblocking.
blocking2.0: final+ → -
This didn't affect the tracer path only. Are you confident we didn't regress our total score with AWFY being down?
This is 284811f39ca6 vs b034f8e72b2f in the shell on my machine. I see a 13% improvement (with -j). ============================================================================= ** TOTAL **: 1.134x as fast 330.6ms +/- 0.3% 291.6ms +/- 0.8% significant ============================================================================= 3d: - 64.5ms +/- 1.7% 64.1ms +/- 2.1% cube: ?? 16.1ms +/- 3.5% 16.6ms +/- 5.7% not conclusive: might be *1.034x as slow* morph: ?? 5.5ms +/- 8.5% 5.7ms +/- 9.0% not conclusive: might be *1.027x as slow* raytrace: 1.026x as fast 42.9ms +/- 1.2% 41.8ms +/- 1.8% significant access: *1.017x as slow* 41.9ms +/- 0.7% 42.5ms +/- 0.9% significant binary-trees: *1.038x as slow* 19.8ms +/- 1.1% 20.5ms +/- 1.2% significant fannkuch: *1.021x as slow* 11.7ms +/- 1.9% 11.9ms +/- 0.9% significant nbody: - 4.7ms +/- 4.7% 4.7ms +/- 4.9% nsieve: - 5.7ms +/- 3.9% 5.5ms +/- 4.4% bitops: - 12.4ms +/- 3.9% 12.3ms +/- 3.5% 3bit-bits-in-byte: - 0.4ms +/- 58.7% 0.3ms +/- 83.0% bits-in-byte: ?? 6.8ms +/- 2.8% 6.8ms +/- 2.5% not conclusive: might be *1.007x as slow* bitwise-and: - 1.3ms +/- 16.9% 1.3ms +/- 16.6% nsieve-bits: - 4.0ms +/- 2.6% 4.0ms +/- 2.6% controlflow: *1.014x as slow* 20.8ms +/- 1.0% 21.1ms +/- 0.5% significant recursive: *1.014x as slow* 20.8ms +/- 1.0% 21.1ms +/- 0.5% significant crypto: 2.78x as fast 50.8ms +/- 0.6% 18.3ms +/- 2.0% significant aes: 1.48x as fast 14.0ms +/- 1.1% 9.4ms +/- 3.0% significant md5: 2.84x as fast 16.9ms +/- 1.2% 6.0ms +/- 1.8% significant sha1: 6.86x as fast 19.9ms +/- 0.7% 2.9ms +/- 5.0% significant date: ?? 38.7ms +/- 0.6% 39.8ms +/- 4.0% not conclusive: might be *1.028x as slow* format-tofte: ?? 23.8ms +/- 0.8% 24.6ms +/- 3.4% not conclusive: might be *1.034x as slow* format-xparb: ?? 14.9ms +/- 1.0% 15.2ms +/- 5.0% not conclusive: might be *1.020x as slow* math: ?? 15.5ms +/- 1.8% 15.6ms +/- 2.0% not conclusive: might be *1.006x as slow* cordic: ?? 4.7ms +/- 4.9% 4.7ms +/- 5.7% not conclusive: might be *1.011x as slow* partial-sums: ?? 7.3ms +/- 3.1% 7.5ms +/- 3.2% not conclusive: might be *1.014x as slow* spectral-norm: - 3.5ms +/- 6.8% 3.5ms +/- 6.9% regexp: ?? 12.7ms +/- 1.8% 12.8ms +/- 1.8% not conclusive: might be *1.016x as slow* dna: ?? 12.7ms +/- 1.8% 12.8ms +/- 1.8% not conclusive: might be *1.016x as slow* string: 1.129x as fast 73.4ms +/- 0.8% 65.0ms +/- 0.8% significant base64: *1.067x as slow* 3.0ms +/- 0.0% 3.2ms +/- 6.0% significant fasta: - 10.3ms +/- 2.2% 10.3ms +/- 2.2% tagcloud: *1.039x as slow* 24.3ms +/- 0.9% 25.3ms +/- 1.3% significant unpack-code: 1.035x as fast 22.1ms +/- 0.8% 21.4ms +/- 1.1% significant validate-input: 2.83x as fast 13.6ms +/- 1.7% 4.8ms +/- 4.0% significant
Yeah, I pasted the links in the wrong order, sorry about that. b034f8e72b2f is from Jan 14th (pre-regression), while 284811f39ca6 appeared on the tree on Jan 18th despite the Jan 14th stamp (post-regression).
So how does the -j -m -p difference look? And can we double-check the bisect result in comment 0? That looks pretty unlikely to me...
I know nothing about this code, unfortunately, but the only meaningful behavior (non-API) change in that patch, as far as I can see, is: + obj->methodWriteBarrier(cx, shape->slot, value); + obj->methodWriteBarrier(cx, *shape, value); Other than this change, I agree it's really unlikely that the bisected changeset could cause any slowdowns at all, but I tried reverting this one-line change manually and got the speed back. This doesn't look like a mistake (I would expect a type error judging by the names of the members, but that obviously doesn't happen so I'm pretty sure I don't get what's going on), but I couldn't keep my mouth shut so there you go.
Sorry, I meant - obj->methodWriteBarrier(cx, *shape, value); + obj->methodWriteBarrier(cx, shape->slot, value);
> but I tried reverting this one-line change manually and got the speed back. That counts as a double-check!
Oh, and methodWriteBarrier is overloaded; it has a signature that takes a uint32 slot, and another that takes a |const js::Shape&|. Which is why you don't get a type error.
Ugh. I did that so I could add the assertion immediately below. But I can live without the assertion if we're talking about a 13% performance hit. Sorry for the craziness. I'll revert.
Assignee: general → jorendorff
Any idea why there's a hit at all? Seems worth understanding (at least post-2.0).
bz gets points for spider-sense. Brendan and I just reached the same conclusion. The difference in speed between the two flavors of this method isn't enough to explain the huge hit. In fact, the one I switched to is supposedly the faster one. So what is happening? Possibly the two methods differ in behavior--one bumps the shape when the other wouldn't. That could easily cause a huge change in -j speed. But it would also probably be a bug. I have to look closer instead of just reverting the change. :-\
I just experimentally backed out b90090c29571 on my machine and got no measurable change. In fact if I I guess I'll try
In fact if I put a printf("DIE\n"); on that line of code and run all the sunspider tests, it doesn't hit. I guess I'll try re-bisecting the regression range of comment 4.
Bisecting definitely fingers b90090c29571. But I still can't find any difference between tip and tip-with-b90090c29571-backed-out.
I forgot to say that I tested applying that one-line change to b90090c29571's parent, not to tip. Sorry about that.
Jason, any tmstats diffs? /be
Is it possible that there's some sort of weird cache effect? How does tip compare to "right after b90090c29571 landed" perf-wise?
1. The two signatures of methodWriteBarrier do behave differently. That's probably bad and I'm going to track it down. 2. This line of code no longer executes in SunSpider, starting with http://hg.mozilla.org/tracemonkey/rev/e4d449cdd3be which was the fix for bug 577325. On my machine, SunSpider numbers are as follows: Parent of e4d449cdd3be: 472ms Parent of e4d449cdd3be + change to use shape->slot: 401ms Rev e4d449cdd3be: 467ms Rev e4d449cdd3be + change to use shape->slot: (no change) 469ms
The (const Shape &) signature bumps thrashCount whenever you redefine an existing method. If you're redefining global functions, eventually it unbrands the global object, which is pretty catastrophic--we basically don't trace then. The (uint32 slot) signature instead detects that the global object doesn't have a method barrier, and calls setOwnShape(), apparently in order to avoid the shape lookup.
Why don't we trace after unbranding?
And did bug 577325 make us unbrand no matter what?
Is this still a guaranteed b-? This sounds like a significant regression to me.
This sure feels like a blocker to me. I imagine thrashCount was meant to count all methodShapeChanges, not just the ones that happen to go through the methodShapeChange(JSContext *, const Shape &) signature. But if we make that change and don't retune, SunSpider apparently suffers. I have an unintrusive fix for the perf regression (coming soon), but to me it seems like thrashCount is messed up. I hope Brendan will take a look at it.
The short explanation for this patch is that the two signatures of methodWriteBarrier are meant to be the same, but there's a bug in the slot signature that makes us faster on SunSpider if we call it in this one place. It is nonetheless a bug. Filed bug 630354.
Comment on attachment 508536 [details] [diff] [review] v1 Please cite the thrashCount bug you filed in a FIXME. Sorry for the mess, agree we need to take this now, if not fix both bugs. /be
Should I add the checkin-needed keyword to this bug?
Jason, are you going to land this?
cdleary-bot mozilla-central merge info: http://hg.mozilla.org/mozilla-central/rev/3d78bfa161eb
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.