Closed Bug 1010417 Opened 7 years ago Closed 7 years ago

Very slow sqlite performance without asm.js

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla34

People

(Reporter: azakai, Assigned: h4writer)

References

Details

Attachments

(2 files)

Attached file sqlite.html
The attached sqlite testcase (from the massive wip benchmark) prints out a time in the console when it finishes. It takes less than a second with asm.js optimizations, and 10 seconds without. Worse, it takes 60 seconds every several runs.
Flags: needinfo?(jdemooij)
I ran tracelogging. I see 62% of time in baseline, 28% in ion. ion compile is 4%, the rest are almost zero. So perhaps we are not reaching ion for some reason?

The function where we spend the most time says

Script	Times called	Times compiled	Total time ▴	Spend time
script sqlite.js:6622:48696	1891159	0	39.31%	Baseline: 99.68%, Interpreter: 0.02%, BaselineCompilation: 0.3%, 

so it is never compiled (i assume that means by ion, not baseline), but called a huge number of times, and runs almost entirely in baseline. Next two functions (after which the rest are negligible) are

script sqlite.js:6622:138026	1891246	4	16.82%	IonMonkey: 42.79%, Baseline: 47.04%, IonCompilation: 9.32%, Interpreter: 0.04%, IonLinking: 0.24%, BaselineCompilation: 0.57%,
script sqlite.js:6622:11	10149	2	16.12%	IonMonkey: 62.3%, Baseline: 28.81%, IonCompilation: 6.93%, GC: 0.97%, MinorGC: 0.01%, Interpreter: 0.04%, IonLinking: 0.74%, BaselineCompilation: 0.21%, 

so they do ion compile, 2 and 4 times respectively, but still spend a lot of time in baseline.

Is there a way for me to find out more (why it wouldn't compile the top function)?
This benchmark gets 20% faster with --ion-limit-script-size=off. IONFLAGS=aborts doesn't show any more compilation aborts with that set (while without it it does). Tracelogging shows baseline goes down a little to 49%, ion up a little to 33%, and ion compilation goes to 10%. The top function that was online baselined before is now 3rd from the top,

script sqlite.js:6622:138026	1891245	4	21.39%	IonMonkey: 35.52%, Baseline: 53.57%, IonCompilation: 9.41%, IonLinking: 0.24%, GC: 0.95%, Interpreter: 0%, BaselineCompilation: 0.3%, MinorGC: 0%,
script sqlite.js:6622:11	10149	2	16.48%	IonMonkey: 56.95%, Baseline: 33.08%, IonCompilation: 8.65%, Interpreter: 0.05%, IonLinking: 1.01%, BaselineCompilation: 0.26%,
script sqlite.js:6622:48696	1891164	3	14.49%	Baseline: 52.2%, IonMonkey: 33.27%, IonCompilation: 13.15%, IonLinking: 0.33%, Interpreter: 0.01%, BaselineCompilation: 1.04%, 

I ran the emscripten benchmark suite locally with and without the script size limit (all without odin). Looks like it speeds lua and zlib a little bit. I suspect sqlite is more extreme in suffering from the limit, due to the single huge function. I wonder if this isn't also a factor in why things like the Unity DT2 demo are sluggish without odin.

As we have parallel compilation, is there a reason not to remove the script size limit?
Another thing I noticed, PhiAnalysis takes over 33% of ion compilation time in the top 4 functions in this benchmark, easily beating the next two which are GVN and RegisterAllocation. Is that expected?
Hannes, maybe you can help Alon investigate this? I assume it's possible to turn the attachment into a shell testcase without too much work. Thanks!
Flags: needinfo?(jdemooij) → needinfo?(hv1989)
(In reply to Alon Zakai (:azakai) from comment #3)
> Another thing I noticed, PhiAnalysis takes over 33% of ion compilation time
> in the top 4 functions in this benchmark, easily beating the next two which
> are GVN and RegisterAllocation. Is that expected?

A normal run will have:
1. RegisterAllocation: 20% - 40%
2. GVN: 10% - 20%

PhiAnalysis should be around 1%. So 33% is really bad! We should investigate that definitely.

(In reply to Alon Zakai (:azakai) from comment #2)
> I ran the emscripten benchmark suite locally with and without the script
> size limit (all without odin). Looks like it speeds lua and zlib a little
> bit. I suspect sqlite is more extreme in suffering from the limit, due to
> the single huge function. I wonder if this isn't also a factor in why things
> like the Unity DT2 demo are sluggish without odin.
> 
> As we have parallel compilation, is there a reason not to remove the script
> size limit?

Well we decided to have only 1 compilation unit at a time. So big scripts will slowdown any next compilation a lot! Technically it is possible to enable multiple compilation without issues. The reasoning behind not doing that is mostly oriented to not influence running benchmarks too much. So that we keep running "smoothly", instead of compilation taking the whole CPU over. Another reason is to not deplete battery on computers/laptops not too fast...
Flags: needinfo?(hv1989)
(In reply to Jan de Mooij [:jandem] from comment #4)
> I assume it's possible to
> turn the attachment into a shell testcase without too much work.

Yes, just remove the first and last two lines (which define a body and script tag).

(In reply to Hannes Verschore [:h4writer] from comment #5)
> (In reply to Alon Zakai (:azakai) from comment #2)
> > I ran the emscripten benchmark suite locally with and without the script
> > size limit (all without odin). Looks like it speeds lua and zlib a little
> > bit. I suspect sqlite is more extreme in suffering from the limit, due to
> > the single huge function. I wonder if this isn't also a factor in why things
> > like the Unity DT2 demo are sluggish without odin.
> > 
> > As we have parallel compilation, is there a reason not to remove the script
> > size limit?
> 
> Well we decided to have only 1 compilation unit at a time. So big scripts
> will slowdown any next compilation a lot! Technically it is possible to
> enable multiple compilation without issues. The reasoning behind not doing
> that is mostly oriented to not influence running benchmarks too much. So
> that we keep running "smoothly", instead of compilation taking the whole CPU
> over. Another reason is to not deplete battery on computers/laptops not too
> fast...

Ok, I understand about battery life, etc. However, I suspect the script size limit - which is fairly large right now :) - is not hit by anything significant aside from large compiled code, where we don't want the limit anyhow, like asm.js and mandreel. Looks like it does affect octane-mandreel, removing the script size limit makes it speed up from

27100, 27018, 27018

(3 runs) to

27218, 27249, 27249

which looks fairly consistent, almost a 1% win.

I don't see other changes on octane, although several of the tests there are very noisy so its hard to tell - but I would be very surprised if they had anything big enough to hit the limit.

So far it looks like there are only benefits to removing the limit, is there something else I should test, where we worry about a regression?
(In reply to Hannes Verschore [:h4writer] from comment #5)
> PhiAnalysis should be around 1%. So 33% is really bad! We should investigate
> that definitely.

Investigated this and it seems most time (50%) is spend in removing redundant phis. So removing phi(a,a) or phi(a,this). 25% being spend in replaceAllUseswith and 25% in IsPhiRedundant. In this case it is mostly phi(cte) and phi(phi1, phi1). I don't see an immediate way to speed this up though :(

(In reply to Alon Zakai (:azakai) from comment #6)
> So far it looks like there are only benefits to removing the limit, is there
> something else I should test, where we worry about a regression?

I guess we could do this. Maybe even tweak some things. Like disabling inlining in very very big scripts.

Though we should have a way to not stall the compilation thread. Maybe make it possible to pause compiling scripts, for new scripts, if it is taking to long? Or some metric to decide to force a compilation stop, when it is taking too long and our compiling queue is getting too long?
Are you building with GCC 4.6 by chance? That version doesn't yet optimize the C++11 final keyword, so every getOperand call on a phi gets compiled to the general virtual method call instead of the inline direct array access. That could slow down EliminatePhis.

GVN probably has a similar problem, though in that case we'll also need to restructure some of the congruentTo methods before the final optimization can help.
(In reply to Dan Gohman [:sunfish] from comment #8)
> Are you building with GCC 4.6 by chance? That version doesn't yet optimize
> the C++11 final keyword, so every getOperand call on a phi gets compiled to
> the general virtual method call instead of the inline direct array access.
> That could slow down EliminatePhis.

Just looked and I'm using 4.7. So that shouldn't be the issue.
It's probably just a large function with many locals, so that we'll have many phis to eliminate and many uses to replace.

It'd be interesting to compare this to the time Odin spends under EliminatePhis for this particular function. Most phi uses are likely from resume points and Odin doesn't have those; we could optimize that in Ion with some work.

I also wonder if we keep updating the same uses while we eliminate more phis; maybe we could be smarter there.
(In reply to Alon Zakai (:azakai) from comment #6)
> So far it looks like there are only benefits to removing the limit, is there
> something else I should test, where we worry about a regression?

A potential issue is that big scripts stall the compilation thread. But that can get solved (see bug 1013172).
Afterwards I don't see a potential issue with removing the limit.
Depends on: 1013172
sqlite.js:
1) amsjs: 1274ms
2) --no-asmjs: 8159ms
3) --no-asmjs --ion-limit-script-size=off: 6740ms
4) --no-asmjs patched: 6016ms

So this patch removes the limit for big scripts. This isn't a big issue anymore, since we can pause a task and let an higher priority script build first.

I also adjusted the heuristics a bit. The useCount was used to decide which script got a higher useCount. Now we want small scripts to get scheduled and compiled immediately. Since they take no time and the gain is immense. As the scripts gets larger, the compilation takes longer and it will stall the compilation longer. Therefore I adjusted to what I have now. This does what is expected and improves this with 700ms.
Assignee: nobody → hv1989
Attachment #8474712 - Flags: review?(jdemooij)
Comment on attachment 8474712 [details] [diff] [review]
Remove length check

Review of attachment 8474712 [details] [diff] [review]:
-----------------------------------------------------------------

Does this also affect the asmjs shell benchmarks with --no-asmjs?

::: js/src/vm/HelperThreads.cpp
@@ +560,5 @@
>          return !first->script()->hasIonScript();
>  
>      // A higher useCount indicates a higher priority.
> +    return first->script()->getUseCount() / first->script()->length() >
> +           second->script()->getUseCount() / second->script()->length();

Probably use double(first->script()->getUseCount()), and also for the second one, to get a double instead of integer division.
Attachment #8474712 - Flags: review?(jdemooij) → review+
(In reply to Jan de Mooij [:jandem] from comment #13)
> Comment on attachment 8474712 [details] [diff] [review]
> Remove length check
> 
> Review of attachment 8474712 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Does this also affect the asmjs shell benchmarks with --no-asmjs?
No, it doesn't. I think emscripten is smart enough to not make their functions too big.

> ::: js/src/vm/HelperThreads.cpp
> @@ +560,5 @@
> >          return !first->script()->hasIonScript();
> >  
> >      // A higher useCount indicates a higher priority.
> > +    return first->script()->getUseCount() / first->script()->length() >
> > +           second->script()->getUseCount() / second->script()->length();
> 
> Probably use double(first->script()->getUseCount()), and also for the second
> one, to get a double instead of integer division.

As discussed on IRC I didn't adjust to use doubles. This made us switch between scripts too often. Using ints this is more stable.

https://hg.mozilla.org/integration/mozilla-inbound/rev/cc392a569cf4
Comment on attachment 8474712 [details] [diff] [review]
Remove length check

Review of attachment 8474712 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/vm/HelperThreads.cpp
@@ +559,5 @@
>      if (first->script()->hasIonScript() != second->script()->hasIonScript())
>          return !first->script()->hasIonScript();
>  
>      // A higher useCount indicates a higher priority.
> +    return first->script()->getUseCount() / first->script()->length() >

Is the compilation time linear?
https://hg.mozilla.org/mozilla-central/rev/cc392a569cf4
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla34
Yay! \o/ Thanks Hannes!

Do we still need the shell option for this? https://hg.mozilla.org/mozilla-central/file/cc392a569cf4/js/src/shell/js.cpp#l5835
(In reply to Alon Zakai (:azakai) from comment #17)
> Yay! \o/ Thanks Hannes!
> 
> Do we still need the shell option for this?
> https://hg.mozilla.org/mozilla-central/file/cc392a569cf4/js/src/shell/js.
> cpp#l5835

Yes, it is still needed. This flag also controls the script-size limit on the mainthread and how many times IonBuilder can rebuild loops, upon discovering new types.
Oh, ok. What is the script-size limit on the main thread? (I thought this patch removed the script size limit from all Ion compilation?)
(In reply to Alon Zakai (:azakai) from comment #19)
> Oh, ok. What is the script-size limit on the main thread? (I thought this
> patch removed the script size limit from all Ion compilation?)

This script-size limit is only used when there is no background thread. So we are forced to compile on the mainthread. This stalls execution for too long. I think the limit is 2000 bytes. Now this only happens for people that don't have multiple cores (2+). I have no idea how many people still have that and are concerned for performance...
I see, thanks. Yeah, hopefully not that many people left on single core at this point...
You need to log in before you can comment on or make changes to this bug.