Open
Bug 676181
Opened 13 years ago
Updated 2 years ago
Load then use is inefficient on arm
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
NEW
People
(Reporter: mjrosenb, Unassigned)
Details
Attachments
(2 files)
7.68 KB,
patch
|
Details | Diff | Splinter Review | |
6.72 KB,
patch
|
Details | Diff | Splinter Review |
While looking at code generated by our YARR, I noticed an odd sequence: [jaeger] Insns mov r3, #0x200000 @ (2097152) [jaeger] Insns orr r3, r3, #0x20 @ (32) [jaeger] Insns orrs r5, r5, r3 instead, what we should produce is orr r5, #0x200020 orrs r5, #0x20 It looks like there are many other places where we generate a move into a temp register, followed by a use of the temp register and the target register, when doing two successive operations on the dest register will suffice. Another less obvious case that I found in the same test was: [jaeger] Insns mov r3, #0x610000 @ (6356992) [jaeger] Insns orr r3, r3, #0x61 @ (97) [jaeger] Insns cmp r5, r3 [jaeger] Insns ldrne pc, =0xffffffff @ (-1) In this case, it only works because the cmp can in fact be a teq without changing the result of the comparison. In this case, we can emit: eor r3, r5, #0x610000 teq r3, 0x61 ldrne pc, =0xffffffff the eor/teq sequence will not work if we were testing for negative/overflow rather than equality/zeroness. It may be tempting to try sub r3, r5, #0x610000 cmp r3, 0x61 but that is not guaranteed to set *all* of the bits correctly. Any bits that are dependent on the actual computation rather than just the final value written into the register have the potential to be wrong. It may be possible, by doing something like this: subs r3, r5, 0x.... cmpvc r3, 0x... ldrvs pc, .... the idea being that if we are testing for overflow, we do part of the subtraction, test for overflow, if there was overflow, then that was going to happen if we did part of the subtraction or all of it, so we don't bother doing the second half of the subtraction, and branch accordingly. However, I have not thought that through thoroughly, and am not convinced it is correct. The attached patch compiles, but is other than that *completely* *untested*. I should be testing it soon, and hopefully getting some numbers.
Reporter | ||
Comment 1•13 years ago
|
||
Initial results are disheartening. ============================================================================= ** TOTAL **: ?? 3625.4ms +/- 2.0% 3644.5ms +/- 1.7% not conclusive: might be *1.005x as slow* ============================================================================= 3d: *1.022x as slow* 677.3ms +/- 1.0% 692.3ms +/- 0.2% significant cube: - 265.7ms +/- 1.4% 264.5ms +/- 0.3% morph: *1.138x as slow* 118.7ms +/- 1.1% 135.1ms +/- 0.3% significant raytrace: - 292.9ms +/- 1.5% 292.7ms +/- 0.2% bitops: ?? 165.8ms +/- 0.6% 167.2ms +/- 1.9% not conclusive: might be *1.008x as slow* 3bit-bits-in-byte: ?? 12.9ms +/- 0.7% 13.9ms +/- 14.6% not conclusive: might be *1.073x as slow* nsieve-bits: - 52.3ms +/- 1.5% 51.9ms +/- 0.8% bitwise-and: ?? 23.9ms +/- 2.0% 24.6ms +/- 9.7% not conclusive: might be *1.029x as slow* bits-in-byte: ?? 76.8ms +/- 0.3% 76.9ms +/- 0.3% not conclusive: might be *1.002x as slow* math: - 404.6ms +/- 3.5% 398.7ms +/- 0.2% cordic: - 114.6ms +/- 4.0% 112.7ms +/- 0.5% partial-sums: - 180.1ms +/- 4.1% 177.2ms +/- 0.3% spectral-norm: - 109.9ms +/- 2.1% 108.8ms +/- 0.3% regexp: - 161.6ms +/- 7.1% 156.0ms +/- 0.3% dna: - 161.6ms +/- 7.1% 156.0ms +/- 0.3% string: ?? 852.4ms +/- 1.1% 856.8ms +/- 0.1% not conclusive: might be *1.005x as slow* tagcloud: ?? 267.8ms +/- 0.6% 269.4ms +/- 0.3% not conclusive: might be *1.006x as slow* unpack-code: *1.027x as slow* 279.1ms +/- 0.3% 286.7ms +/- 0.2% significant validate-input: - 103.9ms +/- 0.3% 103.8ms +/- 0.5% base64: - 54.8ms +/- 0.8% 54.5ms +/- 0.5% fasta: - 146.8ms +/- 5.2% 142.4ms +/- 0.4% access: ?? 528.7ms +/- 0.5% 536.0ms +/- 3.7% not conclusive: might be *1.014x as slow* nbody: ?? 120.8ms +/- 0.4% 122.8ms +/- 3.9% not conclusive: might be *1.017x as slow* binary-trees: - 73.5ms +/- 1.5% 73.0ms +/- 0.4% fannkuch: ?? 210.3ms +/- 0.8% 214.4ms +/- 5.4% not conclusive: might be *1.019x as slow* nsieve: ?? 124.1ms +/- 0.6% 125.7ms +/- 2.7% not conclusive: might be *1.013x as slow* date: ?? 472.8ms +/- 4.9% 474.3ms +/- 3.7% not conclusive: might be *1.003x as slow* format-tofte: ?? 244.3ms +/- 4.2% 246.7ms +/- 7.2% not conclusive: might be *1.010x as slow* format-xparb: - 228.5ms +/- 5.7% 227.6ms +/- 0.3% controlflow: - 33.7ms +/- 1.5% 33.6ms +/- 1.4% recursive: - 33.7ms +/- 1.5% 33.6ms +/- 1.4% crypto: ?? 328.6ms +/- 5.2% 329.7ms +/- 6.5% not conclusive: might be *1.004x as slow* md5: - 93.4ms +/- 7.3% 91.6ms +/- 2.6% aes: - 176.3ms +/- 5.8% 176.0ms +/- 5.9% sha1: ?? 58.8ms +/- 1.1% 62.1ms +/- 14.0% not conclusive: might be *1.055x as slow* ============================================================================= ** TOTAL **: *1.014x as slow* 16729.8ms +/- 0.5% 16962.2ms +/- 0.4% significant ============================================================================= v8: *1.014x as slow* 16729.8ms +/- 0.5% 16962.2ms +/- 0.4% significant crypto: - 2019.5ms +/- 0.6% 2016.6ms +/- 0.4% raytrace: *1.050x as slow* 2626.3ms +/- 2.0% 2757.1ms +/- 1.1% significant richards: ?? 1842.8ms +/- 0.2% 1858.2ms +/- 1.2% not conclusive: might be *1.008x as slow* splay: ?? 1351.0ms +/- 0.2% 1353.6ms +/- 0.3% not conclusive: might be *1.002x as slow* deltablue: ?? 3799.3ms +/- 1.2% 3848.4ms +/- 1.3% not conclusive: might be *1.013x as slow* earley-boyer: *1.015x as slow* 2705.7ms +/- 0.6% 2745.7ms +/- 0.4% significant regexp: - 2385.2ms +/- 0.2% 2382.4ms +/- 0.4% I don't really believe that the jitted code is actually slower. I can only assume that the extra calls to disInt (and getOp2) are negating any benefits of the better code. I am now trying to write a better variant of those functions using gcc's builtin clz.
Reporter | ||
Comment 2•13 years ago
|
||
So I've implemented a fster getInt. It is about 30% longer than I thought it was going to be, but trying it on every 32 bit integer, it is about 3 times as fast as the pre-existing code. It helps a bit with benchmarks, but unfortunately, not as much as I had expected. ============================================================================= ** TOTAL **: - 3631.6ms +/- 0.6% 3609.7ms +/- 0.1% ============================================================================= 3d: - 695.5ms +/- 0.3% 693.7ms +/- 0.2% cube: - 268.2ms +/- 0.7% 266.0ms +/- 0.5% morph: - 135.0ms +/- 0.2% 135.0ms +/- 0.2% raytrace: ?? 292.2ms +/- 0.1% 292.7ms +/- 0.3% not conclusive: might be *1.002x as slow* bitops: - 166.8ms +/- 2.1% 165.7ms +/- 0.7% 3bit-bits-in-byte: ?? 13.0ms +/- 0.5% 13.8ms +/- 7.5% not conclusive: might be *1.066x as slow* nsieve-bits: - 51.7ms +/- 0.4% 51.6ms +/- 0.3% bitwise-and: - 24.3ms +/- 5.6% 23.5ms +/- 0.7% bits-in-byte: - 77.8ms +/- 2.7% 76.8ms +/- 0.1% math: - 398.0ms +/- 0.2% 397.8ms +/- 0.1% cordic: 1.001x as fast 112.3ms +/- 0.1% 112.2ms +/- 0.1% significant partial-sums: ?? 176.8ms +/- 0.1% 177.1ms +/- 0.2% not conclusive: might be *1.002x as slow* spectral-norm: - 108.9ms +/- 0.7% 108.5ms +/- 0.1% regexp: - 155.5ms +/- 0.1% 155.6ms +/- 0.1% dna: - 155.5ms +/- 0.1% 155.6ms +/- 0.1% string: 1.006x as fast 856.5ms +/- 0.1% 851.2ms +/- 0.1% significant tagcloud: 1.007x as fast 269.6ms +/- 0.1% 267.7ms +/- 0.2% significant unpack-code: 1.013x as fast 285.5ms +/- 0.1% 281.9ms +/- 0.2% significant validate-input: 1.004x as fast 103.9ms +/- 0.1% 103.4ms +/- 0.1% significant base64: - 54.6ms +/- 0.3% 54.5ms +/- 0.2% fasta: ?? 143.0ms +/- 0.6% 143.7ms +/- 0.6% not conclusive: might be *1.005x as slow* access: - 532.4ms +/- 1.6% 527.8ms +/- 0.1% nbody: - 121.8ms +/- 1.2% 120.8ms +/- 0.1% binary-trees: - 73.2ms +/- 0.7% 73.0ms +/- 0.2% fannkuch: - 212.2ms +/- 2.8% 209.1ms +/- 0.1% nsieve: - 125.3ms +/- 0.7% 124.9ms +/- 0.2% date: - 471.6ms +/- 1.6% 464.3ms +/- 0.2% format-tofte: - 243.1ms +/- 2.6% 239.5ms +/- 0.3% format-xparb: 1.016x as fast 228.5ms +/- 0.7% 224.8ms +/- 0.4% significant controlflow: - 33.4ms +/- 0.2% 33.4ms +/- 0.2% recursive: - 33.4ms +/- 0.2% 33.4ms +/- 0.2% crypto: - 321.8ms +/- 1.1% 320.0ms +/- 0.1% md5: - 90.8ms +/- 0.8% 90.3ms +/- 0.1% aes: - 172.4ms +/- 1.6% 171.3ms +/- 0.1% sha1: 1.003x as fast 58.6ms +/- 0.3% 58.4ms +/- 0.2% significant ============================================================================= ** TOTAL **: *1.020x as slow* 16729.8ms +/- 0.5% 17070.0ms +/- 0.5% significant ============================================================================= v8: *1.020x as slow* 16729.8ms +/- 0.5% 17070.0ms +/- 0.5% significant crypto: ?? 2019.5ms +/- 0.6% 2029.0ms +/- 1.1% not conclusive: might be *1.005x as slow* raytrace: *1.070x as slow* 2626.3ms +/- 2.0% 2811.3ms +/- 1.5% significant richards: ?? 1842.8ms +/- 0.2% 1864.6ms +/- 1.7% not conclusive: might be *1.012x as slow* splay: *1.011x as slow* 1351.0ms +/- 0.2% 1366.3ms +/- 0.3% significant deltablue: *1.022x as slow* 3799.3ms +/- 1.2% 3882.8ms +/- 1.8% significant earley-boyer: *1.015x as slow* 2705.7ms +/- 0.6% 2745.5ms +/- 0.3% significant regexp: 1.006x as fast 2385.2ms +/- 0.2% 2370.5ms +/- 0.3% significant
Updated•13 years ago
|
OS: Linux → All
Hardware: x86_64 → ARM
Comment 3•2 years ago
|
||
The bug assignee didn't login in Bugzilla in the last 7 months, so the assignee is being reset.
Assignee: marty.rosenberg → nobody
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•