Closed Bug 1210342 Opened 9 years ago Closed 3 months ago

Executing loop with 'instanceof' 9 times takes 100x longer than executing it 8 times

Categories

(Core :: JavaScript Engine, defect)

41 Branch
defect

Tracking

()

RESOLVED WONTFIX

People

(Reporter: tszynalski, Unassigned)

Details

Attachments

(1 file)

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:41.0) Gecko/20100101 Firefox/41.0 Build ID: 20150929144111 Steps to reproduce: Take a look at this JSPerf test case: http://jsperf.com/instanceof-8-times-vs-9-times/ See also this StackOverflow question: http://stackoverflow.com/questions/32883847/why-does-running-this-loop-9-times-take-100x-longer-than-running-it-8-times Actual results: According to JSPerf, the test where the loop (containing a single use of 'instanceof') is executed 9 times takes 100 times longer to execute than the code where it's executed 8 times. This seems specific to 'instanceof'. Removing the 'result =' assignment makes no difference. Order of test cases makes no difference. Tested on two PCs with Firefox 41.0.1 (Windows 7 and Windows XP) and the "magic limit" above which performance tanks is always 8. Expected results: The loop that gets executed 9 times should take about 10% longer to run.
My blind guess would be that this is the cost of the baseline compilation of the for-loop. I do not recall the bug number, but I remember that somebody made a patch a while ago to estimate the number of iterations such that we can compile sooner. Maybe we can re-use the same idea to prevent the compilation if the function is not frequently used.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(jdemooij)
I've tested it with different firefox versions and it seems to be introduced in version 37. Check my jsperf results at: https://jsperf.com/instanceof-8-times-vs-9-times Maybe related with this bug 1113643
(In reply to Nicolas B. Pierron [:nbp] from comment #1) > My blind guess would be that this is the cost of the baseline compilation of > the for-loop. These loops are executed millions of time, they will be compiled anyway. (In reply to Alejandro Rodríguez from comment #2) > I've tested it with different firefox versions and it seems to be introduced > in version 37. Check my jsperf results at: > https://jsperf.com/instanceof-8-times-vs-9-times > > Maybe related with this bug 1113643 That makes sense, it looks like we don't optimize the second instanceof for some reason, I'll take a look.
Attached file Shell testcase
This is pretty tricky, but I can reproduce this in the shell with the attached testcase. The harness runs a function like this (not exactly, but the idea is the same): function(count) { var Test = function(){} var t = new Test(); var d = new Date; for (var i=0; i<count; i++) { for (var j = 0; j < 8; j++) { t instanceof Test; } } return (new Date - d); } First, it runs this function with count == 1 to warm up. When that happens, the function stays in the interpreter if we have 8 iterations. If we have 9 iterations, the function is considered hot enough to enter the baseline JIT. This means the JIT will optimize the `t instanceof Test` operation by caching a number of values, including Test.prototype. After this warmup, we call this function again with count == some large number. * When we call this function a second time, both `Test` and `Test.prototype` will be different objects. * In the i < 8 case, we enter the JIT at this point and optimize the instanceof-operation. It's the first time the Baseline JIT sees this instanceof, so it records the (new) Test.prototype. * In the i < 9 case, the Baseline JIT now has seen two different values for Test.prototype when executing the instanceof: the one during the warmup call and the new one. Then, in both cases, we transition to the (optimizing) Ion JIT compiler. In the i < 8 case, it sees we recorded a single value for Test.prototype so it can easily optimize this. In the i < 9 case, it sees two recorded values for Test.prototype and it doesn't optimize the instanceof.
Flags: needinfo?(jdemooij)
Also: (1) It's a silly benchmark. Modern JS have JIT compilers that can loop-hoist or dead-code-eliminate simple expressions like `t instanceof Test` in many cases. (I think that's what we are doing in the i < 8 case.) (2) Because the harness runs your code twice (or at least that's what I think), you end up with two different values for Test and Test.prototype, so the benchmark is not very representative of actual JS code. Our optimizing JIT should handle the non-monomorphic case better though.
(In reply to Jan de Mooij [:jandem] from comment #5) > (2) Because the harness runs your code twice (or at least that's what I > think), you end up with two different values for Test and Test.prototype, so > the benchmark is not very representative of actual JS code. > > Our optimizing JIT should handle the non-monomorphic case better though. You are correct, I just verified that JSPerf runs the setup code once per each "test run", which is about 100 times for this code. So there are actually ~100 different Test objects involved. You are also correct that because we are creating a new prototype 100 times, that is not typical JS code. What's more typical is creating 100 objects based on one prototype. If we do that, the results for i < 8 and i < 9 are similar: http://jsperf.com/instanceof-8-times-vs-9-times/11 I think we must be dealing with loop-hoisting since the results don't change if you replace 't instanceof Test' with 'result = t instanceof Test'. Or does 'result = ...' qualify as dead code, seeing as we don't do anything with 'result' later on? What is surprising is that the compiler doesn't loop-hoist in the i < 9 case.
Severity: normal → S3

test1: https://share.firefox.dev/4ecAAJ1 (158ms with the profiler)
test2: https://share.firefox.dev/3XywdCu (206ms with the profiler)

There is a perf difference, but nothing like 100x. Should we close this bug?

For the reasons Jan laid out in comment 4 and comment 5, this isn't a particularly interesting bug.

The big performance cliff was simply because one version (i < 8) had monomorphic types after leaving the interpreter, whereas the i<9 version ran just long enough for the last iteration of the first loop to run in the baseline compiler, making the set of observed types in the baseline compiler polymorphic and pessimizing the code. Any system that doesn't observe types for the first few iterations (eg our interpreter) will inevitably have a similar performance cliff, although I think it should be smaller these days.

Status: NEW → RESOLVED
Closed: 3 months ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: