Closed Bug 629968 Opened 9 years ago Closed 9 years ago

Investigate possible regression (-j only) caused by changeset b90090c29571

Categories

(Core :: JavaScript Engine, defect)

x86
Linux
defect
Not set

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- -

People

(Reporter: collares, Assigned: jorendorff)

References

Details

(Keywords: regression, Whiteboard: [fixed-in-tracemonkey])

Attachments

(1 file)

AWFY had a small period of time (some five days between http://hg.mozilla.org/tracemonkey/rev/284811f39ca6 and
http://hg.mozilla.org/tracemonkey/rev/b034f8e72b2f) where it didn't test any builds. The next build after this showed a huge -j SunSpider regression, which hg bisect pointed to http://hg.mozilla.org/tracemonkey/rev/b90090c29571 -- is this real?

Here's the before and after on my computer (use http://pastebin.com/KBQ8J7fh if this is not readable enough). FROM is 4cda18415cd6 (b90090c29571's parent), and TO is b90090c29571.


TEST                   COMPARISON            FROM                 TO             DETAILS

=============================================================================

** TOTAL **:           *1.196x as slow*  392.7ms +/- 0.7%   469.7ms +/- 0.8%     significant

=============================================================================

  3d:                  ??                 82.5ms +/- 2.9%    84.1ms +/- 2.5%     not conclusive: might be *1.019x as slow*
    cube:              -                  23.3ms +/- 8.9%    23.1ms +/- 7.4% 
    morph:             ??                  6.8ms +/- 3.6%     7.3ms +/- 10.4%     not conclusive: might be *1.066x as slow*
    raytrace:          *1.026x as slow*   52.5ms +/- 1.5%    53.8ms +/- 2.0%     significant

  access:              ??                 57.6ms +/- 0.9%    58.5ms +/- 1.4%     not conclusive: might be *1.015x as slow*
    binary-trees:      ??                 30.1ms +/- 0.9%    30.5ms +/- 1.3%     not conclusive: might be *1.012x as slow*
    fannkuch:          ??                 15.2ms +/- 1.3%    15.5ms +/- 2.5%     not conclusive: might be *1.020x as slow*
    nbody:             ??                  5.9ms +/- 2.4%     6.2ms +/- 3.7%     not conclusive: might be *1.042x as slow*
    nsieve:            -                   6.3ms +/- 3.6%     6.3ms +/- 3.5% 

  bitops:              -                  17.5ms +/- 2.0%    17.1ms +/- 3.2% 
    3bit-bits-in-byte: -                   0.8ms +/- 24.0%     0.6ms +/- 39.1% 
    bits-in-byte:      -                   8.8ms +/- 1.9%     8.7ms +/- 2.6% 
    bitwise-and:       -                   1.9ms +/- 5.4%     1.9ms +/- 7.6% 
    nsieve-bits:       ??                  5.9ms +/- 2.4%     6.0ms +/- 3.1%     not conclusive: might be *1.008x as slow*

  controlflow:         1.026x as fast     34.0ms +/- 0.8%    33.2ms +/- 1.0%     significant
    recursive:         1.026x as fast     34.0ms +/- 0.8%    33.2ms +/- 1.0%     significant

  crypto:              *3.65x as slow*    23.4ms +/- 2.2%    85.7ms +/- 0.7%     significant
    aes:               *1.58x as slow*    12.2ms +/- 3.2%    19.3ms +/- 1.1%     significant
    md5:               *4.30x as slow*     7.5ms +/- 3.2%    32.5ms +/- 1.2%     significant
    sha1:              *9.18x as slow*     3.7ms +/- 5.9%    34.0ms +/- 1.0%     significant

  date:                ??                 49.0ms +/- 1.0%    49.3ms +/- 1.8%     not conclusive: might be *1.007x as slow*
    format-tofte:      -                  32.4ms +/- 1.6%    32.3ms +/- 1.3% 
    format-xparb:      ??                 16.6ms +/- 1.4%    17.0ms +/- 4.2%     not conclusive: might be *1.024x as slow*

  math:                ??                 22.9ms +/- 1.2%    23.0ms +/- 1.6%     not conclusive: might be *1.007x as slow*
    cordic:            -                   7.5ms +/- 3.2%     7.4ms +/- 3.2% 
    partial-sums:      *1.031x as slow*   11.3ms +/- 1.8%    11.6ms +/- 2.0%     significant
    spectral-norm:     -                   4.2ms +/- 4.1%     4.0ms +/- 0.0% 

  regexp:              ??                 15.1ms +/- 1.0%    15.3ms +/- 1.7%     not conclusive: might be *1.010x as slow*
    dna:               ??                 15.1ms +/- 1.0%    15.3ms +/- 1.7%     not conclusive: might be *1.010x as slow*

  string:              *1.142x as slow*   90.7ms +/- 1.2%   103.6ms +/- 0.8%     significant
    base64:            ??                  3.9ms +/- 3.7%     4.0ms +/- 2.6%     not conclusive: might be *1.013x as slow*
    fasta:             -                  16.7ms +/- 2.6%    16.6ms +/- 1.4% 
    tagcloud:          -                  31.1ms +/- 1.5%    31.1ms +/- 1.1% 
    unpack-code:       ??                 30.5ms +/- 1.8%    30.6ms +/- 1.4%     not conclusive: might be *1.003x as slow*
    validate-input:    *2.51x as slow*     8.5ms +/- 2.8%    21.4ms +/- 1.1%     significant
> 10ms on validate-input, and > 30ms on sha1. Looks like a pretty serious regression to me.
blocking2.0: --- → ?
Whiteboard: [softblocker]
blocking2.0: ? → final+
Assignee: jorendorff → general
-j only isn't the config we ship, unblocking.
blocking2.0: final+ → -
Whiteboard: [softblocker]
This didn't affect the tracer path only. Are you confident we didn't regress our total score with AWFY being down?
This is 284811f39ca6 vs b034f8e72b2f in the shell on my machine. I see a 13% improvement (with -j).

=============================================================================

** TOTAL **:           1.134x as fast    330.6ms +/- 0.3%   291.6ms +/- 0.8%     significant

=============================================================================

  3d:                  -                  64.5ms +/- 1.7%    64.1ms +/- 2.1% 
    cube:              ??                 16.1ms +/- 3.5%    16.6ms +/- 5.7%     not conclusive: might be *1.034x as slow*
    morph:             ??                  5.5ms +/- 8.5%     5.7ms +/- 9.0%     not conclusive: might be *1.027x as slow*
    raytrace:          1.026x as fast     42.9ms +/- 1.2%    41.8ms +/- 1.8%     significant

  access:              *1.017x as slow*   41.9ms +/- 0.7%    42.5ms +/- 0.9%     significant
    binary-trees:      *1.038x as slow*   19.8ms +/- 1.1%    20.5ms +/- 1.2%     significant
    fannkuch:          *1.021x as slow*   11.7ms +/- 1.9%    11.9ms +/- 0.9%     significant
    nbody:             -                   4.7ms +/- 4.7%     4.7ms +/- 4.9% 
    nsieve:            -                   5.7ms +/- 3.9%     5.5ms +/- 4.4% 

  bitops:              -                  12.4ms +/- 3.9%    12.3ms +/- 3.5% 
    3bit-bits-in-byte: -                   0.4ms +/- 58.7%     0.3ms +/- 83.0% 
    bits-in-byte:      ??                  6.8ms +/- 2.8%     6.8ms +/- 2.5%     not conclusive: might be *1.007x as slow*
    bitwise-and:       -                   1.3ms +/- 16.9%     1.3ms +/- 16.6% 
    nsieve-bits:       -                   4.0ms +/- 2.6%     4.0ms +/- 2.6% 

  controlflow:         *1.014x as slow*   20.8ms +/- 1.0%    21.1ms +/- 0.5%     significant
    recursive:         *1.014x as slow*   20.8ms +/- 1.0%    21.1ms +/- 0.5%     significant

  crypto:              2.78x as fast      50.8ms +/- 0.6%    18.3ms +/- 2.0%     significant
    aes:               1.48x as fast      14.0ms +/- 1.1%     9.4ms +/- 3.0%     significant
    md5:               2.84x as fast      16.9ms +/- 1.2%     6.0ms +/- 1.8%     significant
    sha1:              6.86x as fast      19.9ms +/- 0.7%     2.9ms +/- 5.0%     significant

  date:                ??                 38.7ms +/- 0.6%    39.8ms +/- 4.0%     not conclusive: might be *1.028x as slow*
    format-tofte:      ??                 23.8ms +/- 0.8%    24.6ms +/- 3.4%     not conclusive: might be *1.034x as slow*
    format-xparb:      ??                 14.9ms +/- 1.0%    15.2ms +/- 5.0%     not conclusive: might be *1.020x as slow*

  math:                ??                 15.5ms +/- 1.8%    15.6ms +/- 2.0%     not conclusive: might be *1.006x as slow*
    cordic:            ??                  4.7ms +/- 4.9%     4.7ms +/- 5.7%     not conclusive: might be *1.011x as slow*
    partial-sums:      ??                  7.3ms +/- 3.1%     7.5ms +/- 3.2%     not conclusive: might be *1.014x as slow*
    spectral-norm:     -                   3.5ms +/- 6.8%     3.5ms +/- 6.9% 

  regexp:              ??                 12.7ms +/- 1.8%    12.8ms +/- 1.8%     not conclusive: might be *1.016x as slow*
    dna:               ??                 12.7ms +/- 1.8%    12.8ms +/- 1.8%     not conclusive: might be *1.016x as slow*

  string:              1.129x as fast     73.4ms +/- 0.8%    65.0ms +/- 0.8%     significant
    base64:            *1.067x as slow*    3.0ms +/- 0.0%     3.2ms +/- 6.0%     significant
    fasta:             -                  10.3ms +/- 2.2%    10.3ms +/- 2.2% 
    tagcloud:          *1.039x as slow*   24.3ms +/- 0.9%    25.3ms +/- 1.3%     significant
    unpack-code:       1.035x as fast     22.1ms +/- 0.8%    21.4ms +/- 1.1%     significant
    validate-input:    2.83x as fast      13.6ms +/- 1.7%     4.8ms +/- 4.0%     significant
Yeah, I pasted the links in the wrong order, sorry about that. b034f8e72b2f is from Jan 14th (pre-regression), while 284811f39ca6 appeared on the tree on Jan 18th despite the Jan 14th stamp (post-regression).
So how does the -j -m -p difference look?

And can we double-check the bisect result in comment 0?  That looks pretty unlikely to me...
I know nothing about this code, unfortunately, but the only meaningful behavior (non-API) change in that patch, as far as I can see, is:

+                obj->methodWriteBarrier(cx, shape->slot, value);
+                obj->methodWriteBarrier(cx, *shape, value);

Other than this change, I agree it's really unlikely that the bisected changeset could cause any slowdowns at all, but I tried reverting this one-line change manually and got the speed back. 

This doesn't look like a mistake (I would expect a type error judging by the names of the members, but that obviously doesn't happen so I'm pretty sure I don't get what's going on), but I couldn't keep my mouth shut so there you go.
Sorry, I meant

-                obj->methodWriteBarrier(cx, *shape, value);
+                obj->methodWriteBarrier(cx, shape->slot, value);
> but I tried reverting this one-line change manually and got the speed back. 

That counts as a double-check!
Blocks: 623863
Oh, and methodWriteBarrier is overloaded; it has a signature that takes a uint32 slot, and another that takes a |const js::Shape&|.  Which is why you don't get a type error.
Ugh. I did that so I could add the assertion immediately below. But I can live without the assertion if we're talking about a 13% performance hit. Sorry for the craziness. I'll revert.
Assignee: general → jorendorff
Any idea why there's a hit at all?  Seems worth understanding (at least post-2.0).
bz gets points for spider-sense.

Brendan and I just reached the same conclusion. The difference in speed between the two flavors of this method isn't enough to explain the huge hit. In fact, the one I switched to is supposedly the faster one.

So what is happening?

Possibly the two methods differ in behavior--one bumps the shape when the other wouldn't. That could easily cause a huge change in -j speed. But it would also probably be a bug. I have to look closer instead of just reverting the change. :-\
I just experimentally backed out b90090c29571 on my machine and got no measurable change. In fact if I I guess I'll try
In fact if I put a printf("DIE\n"); on that line of code and run all the sunspider tests, it doesn't hit.

I guess I'll try re-bisecting the regression range of comment 4.
Bisecting definitely fingers b90090c29571.

But I still can't find any difference between tip and tip-with-b90090c29571-backed-out.
I forgot to say that I tested applying that one-line change to b90090c29571's parent, not to tip. Sorry about that.
Jason, any tmstats diffs?

/be
Is it possible that there's some sort of weird cache effect?  How does tip compare to "right after b90090c29571 landed" perf-wise?
 1. The two signatures of methodWriteBarrier do behave differently. That's
    probably bad and I'm going to track it down.

 2. This line of code no longer executes in SunSpider, starting with
      http://hg.mozilla.org/tracemonkey/rev/e4d449cdd3be
    which was the fix for bug 577325.

On my machine, SunSpider numbers are as follows:

  Parent of e4d449cdd3be:                                        472ms
  Parent of e4d449cdd3be + change to use shape->slot:            401ms
  Rev e4d449cdd3be:                                              467ms
  Rev e4d449cdd3be + change to use shape->slot: (no change)      469ms
The (const Shape &) signature bumps thrashCount whenever you redefine an existing method. If you're redefining global functions, eventually it unbrands the global object, which is pretty catastrophic--we basically don't trace then.

The (uint32 slot) signature instead detects that the global object doesn't have a method barrier, and calls setOwnShape(), apparently in order to avoid the shape lookup.
Why don't we trace after unbranding?
And did bug 577325 make us unbrand no matter what?
Is this still a guaranteed b-? This sounds like a significant regression to me.
This sure feels like a blocker to me.

I imagine thrashCount was meant to count all methodShapeChanges, not just the ones that happen to go through the methodShapeChange(JSContext *, const Shape &) signature. But if we make that change and don't retune, SunSpider apparently suffers.

I have an unintrusive fix for the perf regression (coming soon), but to me it seems like thrashCount is messed up. I hope Brendan will take a look at it.
Attached patch v1Splinter Review
The short explanation for this patch is that the two signatures of methodWriteBarrier are meant to be the same, but there's a bug in the slot signature that makes us faster on SunSpider if we call it in this one place.

It is nonetheless a bug. Filed bug 630354.
Attachment #508536 - Flags: review?(gal)
Comment on attachment 508536 [details] [diff] [review]
v1

Please cite the thrashCount bug you filed in a FIXME. Sorry for the mess, agree we need to take this now, if not fix both bugs.

/be
Attachment #508536 - Flags: review?(gal)
Attachment #508536 - Flags: review+
Attachment #508536 - Flags: approval2.0?
Attachment #508536 - Flags: approval2.0? → approval2.0+
Should I add the checkin-needed keyword to this bug?
Jason, are you going to land this?
http://hg.mozilla.org/tracemonkey/rev/3d78bfa161eb
Whiteboard: [fixed-in-tracemonkey]
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.