Closed Bug 1200274 Opened 10 years ago Closed 7 months ago

-80% performance regression in a small asm.js integer number crunching kernel since 2015-05-19.

Categories

(Core :: JavaScript Engine, defect)

41 Branch
x86
Windows 8.1
defect

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: jujjyl, Unassigned)

Details

Attachments

(1 file)

STR: Visit https://dl.dropboxusercontent.com/u/40949268/Bugs/cpu_benchmark_js.html Observed: - Firefox 40 stable channel prints ~2658. - Current Firefox Nightly prints ~532. Bisected the performance to have gone down in https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=df6f82e4a2dbc8fe04121ab4f90f48a68ded91fb&tochange=261cadb8301573f6ed2314b3d05f47c1da8b2b4c Secondary, I notice that current stable Chrome 44 prints a score of ~3400, which is +28% more than us. That score is pretty amazing, since on my system a natively compiled version of the same code also gives ~3400 on my system, so they actually match native performance exactly on that inner loop. The scores are pretty stable in Firefox and Chrome, but IE11 gets a score that fluctuates a lot between 2400-3200.
Summary: -80% performance regression in a small asm.js integer number crunching kernel sinceon 2015-05-19. → -80% performance regression in a small asm.js integer number crunching kernel since 2015-05-19.
Attached file codegen-log
There is a MoveGroup between each add! Each add has one operand which is on the stack and the other one that is in a register. Reading the loop shows that it could indeed be done only with 3 registers (plus one for the loop counter). Here's the code of the offending block: Codegen] # block1 ?:1:0 (loop header): [Codegen] .set .Llabel235, . [Codegen] instruction MoveGroup [Codegen] movl 0x1c(%rsp), %edx [Codegen] instruction AddI [Codegen] addl 0x14(%rsp), %edx [Codegen] instruction MoveGroup [Codegen] movl %edx, 0x1c(%rsp) [Codegen] movl 0x18(%rsp), %ecx [Codegen] instruction AddI [Codegen] addl 0x1c(%rsp), %ecx [Codegen] instruction MoveGroup [Codegen] movl %ecx, 0x18(%rsp) [Codegen] movl 0x14(%rsp), %ebx [Codegen] instruction AddI [Codegen] addl 0x18(%rsp), %ebx [Codegen] instruction MoveGroup [Codegen] movl %ebx, 0x14(%rsp) [Codegen] instruction AddI [Codegen] addl 0x14(%rsp), %edx [Codegen] instruction MoveGroup [Codegen] movl %edx, 0x1c(%rsp) [Codegen] instruction AddI [Codegen] addl 0x1c(%rsp), %ecx [Codegen] instruction MoveGroup [Codegen] movl %ecx, 0x18(%rsp) [Codegen] instruction AddI [Codegen] addl 0x18(%rsp), %ebx [Codegen] instruction MoveGroup [Codegen] movl %ebx, 0x14(%rsp) [Codegen] instruction AddI [Codegen] addl 0x14(%rsp), %edx [Codegen] instruction MoveGroup [Codegen] movl %edx, 0x1c(%rsp) [Codegen] instruction AddI [Codegen] addl 0x1c(%rsp), %ecx [Codegen] instruction MoveGroup [Codegen] movl %ecx, 0x18(%rsp) [Codegen] instruction AddI [Codegen] addl 0x18(%rsp), %ebx [Codegen] instruction MoveGroup [Codegen] movl %ebx, 0x14(%rsp) [Codegen] instruction AddI [Codegen] addl $1, %eax [Codegen] instruction CompareAndBranch:ne [Codegen] cmpl $0x100000, %eax [Codegen] jne .Llabel235
cc'ing register allocation specialists
It looks like we actually have two different regressions here. Here is a graphed run of the cpu_benchmark_js code (with a minor modification that instead of averaging results, the best result is taken. Here is the modified version: https://dl.dropboxusercontent.com/u/40949268/Bugs/cpu_benchmark_js_min.html) across Firefoxes every second day since 2014-09-06: https://dl.dropboxusercontent.com/u/40949268/Bugs/mips_over_time/mips_evolution.html I believe the first 4307 MIPS vs 3400 MIPS steps are noise due to CPU turbo boosting, and the 2186 MIPS vs 2769 MIPS are equally noise from turbo boosting. Therefore the most recent regression was on 2015-05-19, and the earlier regression was at 2015-01-15. Unfortunately bisection cannot find the exact commit because that is too old, but it gives only this window for the earlier regression: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=63006936ab99&tochange=c1f6345f2803 However I don't know which of the commits there might cause the issue. bbouvier: is it easy for you to check what kind of assembly code say, Firefox 2015-05-10, and Firefox 2015-01-05 emitted, if there is a difference?
(In reply to Jukka Jylänki from comment #3) > is it easy for you to check what kind of assembly code say, > Firefox 2015-05-10, and Firefox 2015-01-05 emitted, if there is a difference? I both cases, with debug builds only, you can set the environment variable IONFLAGS=codegen which will spew the generated code. Also, this would be easier for you to isolate the output of your benchmark, you might use IONFILTER (not sure if it works with asmjs), or use a JS shell.
@nbp, I don't think IONFILTER interacts with IONFLAGS=codegen (though it would be nice if it did).
Jukka: how did you benchmark this code? Have you made the module recently and tested it on former versions of Firefox? Or have you made it a long time ago and started testing it around that time? The first regression occurs after bug 1121185's patch. This bug relaxed the asm.js validation rules, which means two things: 1) this code wasn't valid asm.js code before this cset. It ran in Ion with a score of ~4200 MIPS on my machine. 2) this code is faster with --no-asmjs! When I modify the module to have it validate, I get a score of ~2700 MIPS (higher is better), which is the same as the one I have before the regalloc regression. So we have two issues here: the code is faster with --no-asmjs, and the regalloc issue made it worse *only for asm.js*. The regalloc change didn't have an impact on Ion. I suspect that the cause be either the cost of entering/leaving asm.js code, or the fact that Ion can inline more code and realize that most temporaries aren't needed.
Good catch, I did not realize old Firefoxes could not validate on such simple asm.js module. The code is new, and I automated it to download all Firefoxes from every second day using mozdownload, and run the code on them, logging the results.
Brian or Dan, any chance you could have a look at this, please?
Flags: needinfo?(sunfish)
Flags: needinfo?(bhackett1024)
Version: unspecified → 41 Branch
Brian and Dan, you were needinfoed on this 5 months ago. This commit has now been detected to impact a deployed game on Facebook, see bug 1246132. This is important! Any chance you could have a look at this?
(In reply to Jukka Jylänki from comment #9) > Brian and Dan, you were needinfoed on this 5 months ago. This commit has now > been detected to impact a deployed game on Facebook, see bug 1246132. This > is important! Any chance you could have a look at this? The commit this bug is blamed on is a major refactoring of the register allocator. This bug is about the regalloc now generating suboptimal code on a small kernel. Bug 1246132 is about the regalloc now taking a longer time than it used to on a large code base. So these seem to be different issues, and conflating them isn't really productive. I'll look at bug 1246132 next month (I'm about to leave until the end of February) since excessive regalloc time is always a problem, but it can be tricky to fix the regalloc to behave better on a particular microbenchmark without regressing other things.
Hey, I wonder if there might be a chance to revisit this at some point? Not critical for this specific snippet, but there is a minor concern whether this regression might also affect applications overall.
I don't think I'll be able to look at this anytime soon. Was performance on this testcase affected by bug 1337367?
Flags: needinfo?(bhackett1024)
Testing today's FF Nightly 54.0a1 (2017-02-19) (64-bit) on the same PC from comment 0, gives a score ~540 so does not look like bug 1337367 affected performance in either direction.
Per policy at https://wiki.mozilla.org/Bug_Triage/Projects/Bug_Handling/Bug_Husbandry#Inactive_Bugs. If this bug is not an enhancement request or a bug not present in a supported release of Firefox, then it may be reopened.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INACTIVE
Status: RESOLVED → REOPENED
Resolution: INACTIVE → ---
Flags: needinfo?(sunfish)
Severity: normal → S3

Link doesnt work, and asm.js is not a priority these dayss.

Status: REOPENED → RESOLVED
Closed: 7 years ago7 months ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: