Closed Bug 515875 Opened 15 years ago Closed 14 years ago

TM: bits-in-byte perf regression between r32000 and r32221

Categories

(Core :: JavaScript Engine, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: n.nethercote, Unassigned)

References

Details

Attachments

(1 file)

Bug 515871 has the details.  I'll try to narrow the regression window down.
Here are the measurements I took while bisecting:

r32000:  8.5ms

r32050:  8.6ms

r32062:  8.9ms

r32069:  8.5ms

r32071:  8.5ms
r32072: 13.3ms  (regression)

r32075: 13.3ms

r32100: 12.6ms

r32115: 12.6ms
r32116: 12.6ms
r32117: busted
r32118: busted  (improvement, probably)
r32119: 10.1ms

r32122: 10.2ms

r32130: 10.0ms

r32145: 10.0ms

r32152: 10.1ms

r32154: 10.1
r32155: 10.1ms
r32156:  9.7ms  (improvement)

r32160:  9.6ms

r32221:  9.6ms


The relevant changes are:

changeset:   32072:1e581a31c017
user:        Andreas Gal <gal@mozilla.com>
date:        Thu Aug 27 18:46:45 2009 -0700
summary:     Remove explicitSavedRegs and loop hacks from nanojit (513139, r=dvander).

changeset:   32118:85e72fce5a9c
user:        Andreas Gal <gal@mozilla.com>
date:        Tue Sep 01 16:30:51 2009 -0700
summary:     Register allocate loop-spanning references (513843, r=rreitmai).

changeset:   32156:45772700955a
user:        Igor Bukanov <igor@mir2.org>
date:        Sat Sep 05 19:59:11 2009 +0400
summary:     bug 513190 - avoiding jsint tagging of the private slot data. r=jorendorff

Andreas, you want to take a look and see if there's anything that can be done about this?
Nick, I think bug 514102 fixes this small regression. Its due to changes to the way we express the loop edge, which in turn exposes a weakness in the register allocator which 514102 fixes. Want to give it at try on your test rig?
Depends on: 514102
I tried applying that patch to tip.  It had rotted a bit, but I think I fixed it up ok (trace-test passed).  With it I get a 1.11x (1ms) *slowdown* for bits-in-byte.
Here are my results with the patch:

Testcase: [I made it run a lot longer to be able to measure more precisely]

function bitsinbyte(b) {
var m = 1, c = 0;
while(m<0x100) {
if(b & m) c++;
m <<= 1;
}
return c;
}

function TimeFunc(func) {
var x, y, t;
for(var x=0; x<3000*2; x++)
for(var y=0; y<2560*2; y++) func(y);
}

TimeFunc(bitsinbyte);

Results without patch:

whale:src gal$ time ./Darwin_OPT.OBJ/js -j x.js

real	0m3.261s
user	0m3.223s
sys	0m0.012s
whale:src gal$ time ./Darwin_OPT.OBJ/js -j x.js

real	0m3.260s
user	0m3.227s
sys	0m0.013s

With patch:

whale:src gal$ time ./Darwin_OPT.OBJ/js -j x.js

real	0m3.072s
user	0m3.042s
sys	0m0.016s
whale:src gal$ time ./Darwin_OPT.OBJ/js -j x.js

real	0m3.069s
user	0m3.041s
sys	0m0.012s

So no way I am making bits in byte slower with bug 514102. Visually the generated code also looks better.
I just reported what I measured.  If you can update the patch on bug 514102 then I can test with more confidence.
Yeah, I know. This just shows how flawed the SS harness is. Also, if I made compilation a lot slower thats excluded above. So I am not ready to simply invalidate this bug just yet.
I also get a 50% perf regression at revision 32072:1e581a31c017. This is with internal 'new Date' timers, just as in SS. Can someone post a rebased version of the patch for bug 514102 so I can try that too?
1e581a31c017 caused a bunch of regressions, in particular in regexp. We had several follow-up patches to fix them. I refreshed 514102.
Attached file Full Sunspider results
Here's the results of running sunspider with --tests=bitops-bits-in-byte:

    bits-in-byte: *1.161x as slow*  10.4ms +/- 0.9%   12.1ms +/- 0.5%     significant


If I run all of SunSpider, the result are similar, but the bits-in-byte numbers are 11.3ms and 11.8ms.  I've attached the full results.


And here's the before-and-after with your longer-running version:

[wave:~/moz/SunSpider] time js0o -j bib.js

real	0m3.224s
user	0m3.193s
sys	0m0.013s
[wave:~/moz/SunSpider] time js3o -j bib.js

real	0m3.957s
user	0m3.924s
sys	0m0.014s


So all three experiments show that the patch for bug 514102 cause a slow-down.


But Cachegrind doesn't show why -- the numbers for the two are very similar:

==21597== I   refs:      83,033,838
==21597== I1  misses:        16,755
==21597== L2i misses:         7,514
==21597== I1  miss rate:       0.02%
==21597== L2i miss rate:       0.00%
==21597== 
==21597== D   refs:      41,749,767  (25,922,158 rd   + 15,827,609 wr)
==21597== D1  misses:       385,793  (   371,528 rd   +     14,265 wr)
==21597== L2d misses:       160,311  (   152,224 rd   +      8,087 wr)
==21597== D1  miss rate:        0.9% (       1.4%     +        0.0%  )
==21597== L2d miss rate:        0.3% (       0.5%     +        0.0%  )
==21597== 
==21597== L2 refs:          402,548  (   388,283 rd   +     14,265 wr)
==21597== L2 misses:        167,825  (   159,738 rd   +      8,087 wr)
==21597== L2 miss rate:         0.1% (       0.1%     +        0.0%  )
==21597== 
==21597== Branches:       9,467,051  ( 9,438,791 cond +     28,260 ind)
==21597== Mispredicts:      637,328  (   615,221 cond +     22,107 ind)
==21597== Mispred rate:         6.7% (       6.5%     +       78.2%   )

==21656== I   refs:      83,045,938
==21656== I1  misses:        16,922
==21656== L2i misses:         7,505
==21656== I1  miss rate:       0.02%
==21656== L2i miss rate:       0.00%
==21656== 
==21656== D   refs:      41,664,239  (25,834,564 rd   + 15,829,675 wr)
==21656== D1  misses:       385,731  (   371,458 rd   +     14,273 wr)
==21656== L2d misses:       160,224  (   152,128 rd   +      8,096 wr)
==21656== D1  miss rate:        0.9% (       1.4%     +        0.0%  )
==21656== L2d miss rate:        0.3% (       0.5%     +        0.0%  )
==21656== 
==21656== L2 refs:          402,653  (   388,380 rd   +     14,273 wr)
==21656== L2 misses:        167,729  (   159,633 rd   +      8,096 wr)
==21656== L2 miss rate:         0.1% (       0.1%     +        0.0%  )
==21656== 
==21656== Branches:       9,470,054  ( 9,441,795 cond +     28,259 ind)
==21656== Mispredicts:      637,302  (   615,199 cond +     22,103 ind)
==21656== Mispred rate:         6.7% (       6.5%     +       78.2%   )


And I tried Shark as well but nothing leapt out at me as the cause of the difference.  So it's a mystery to me.
The original patch _definitively_ causes a slowdown. If you look in the assembly you can see that state is spilled, so we cause a state -> sp -> loop variables load chain. Previously state was alive in a register (ECX) across the loop edge. The patch referenced above restores state in a register across the loop edge again.
I figure this bug is dead -- so much has changed since then, there's no point keeping this one open.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: