1010417 - Very slow sqlite performance without asm.js

Reporter

Description

•

11 years ago

Attached file sqlite.html — Details

The attached sqlite testcase (from the massive wip benchmark) prints out a time in the console when it finishes. It takes less than a second with asm.js optimizations, and 10 seconds without. Worse, it takes 60 seconds every several runs.

Jan de Mooij [:jandem]

Updated

•

11 years ago

Flags: needinfo?(jdemooij)

Alon Zakai (:azakai)

Reporter

Comment 1

•

11 years ago

I ran tracelogging. I see 62% of time in baseline, 28% in ion. ion compile is 4%, the rest are almost zero. So perhaps we are not reaching ion for some reason? The function where we spend the most time says Script Times called Times compiled Total time ▴ Spend time script sqlite.js:6622:48696 1891159 0 39.31% Baseline: 99.68%, Interpreter: 0.02%, BaselineCompilation: 0.3%, so it is never compiled (i assume that means by ion, not baseline), but called a huge number of times, and runs almost entirely in baseline. Next two functions (after which the rest are negligible) are script sqlite.js:6622:138026 1891246 4 16.82% IonMonkey: 42.79%, Baseline: 47.04%, IonCompilation: 9.32%, Interpreter: 0.04%, IonLinking: 0.24%, BaselineCompilation: 0.57%, script sqlite.js:6622:11 10149 2 16.12% IonMonkey: 62.3%, Baseline: 28.81%, IonCompilation: 6.93%, GC: 0.97%, MinorGC: 0.01%, Interpreter: 0.04%, IonLinking: 0.74%, BaselineCompilation: 0.21%, so they do ion compile, 2 and 4 times respectively, but still spend a lot of time in baseline. Is there a way for me to find out more (why it wouldn't compile the top function)?

Alon Zakai (:azakai)

Reporter

Comment 2

•

11 years ago

This benchmark gets 20% faster with --ion-limit-script-size=off. IONFLAGS=aborts doesn't show any more compilation aborts with that set (while without it it does). Tracelogging shows baseline goes down a little to 49%, ion up a little to 33%, and ion compilation goes to 10%. The top function that was online baselined before is now 3rd from the top, script sqlite.js:6622:138026 1891245 4 21.39% IonMonkey: 35.52%, Baseline: 53.57%, IonCompilation: 9.41%, IonLinking: 0.24%, GC: 0.95%, Interpreter: 0%, BaselineCompilation: 0.3%, MinorGC: 0%, script sqlite.js:6622:11 10149 2 16.48% IonMonkey: 56.95%, Baseline: 33.08%, IonCompilation: 8.65%, Interpreter: 0.05%, IonLinking: 1.01%, BaselineCompilation: 0.26%, script sqlite.js:6622:48696 1891164 3 14.49% Baseline: 52.2%, IonMonkey: 33.27%, IonCompilation: 13.15%, IonLinking: 0.33%, Interpreter: 0.01%, BaselineCompilation: 1.04%, I ran the emscripten benchmark suite locally with and without the script size limit (all without odin). Looks like it speeds lua and zlib a little bit. I suspect sqlite is more extreme in suffering from the limit, due to the single huge function. I wonder if this isn't also a factor in why things like the Unity DT2 demo are sluggish without odin. As we have parallel compilation, is there a reason not to remove the script size limit?

Alon Zakai (:azakai)

Reporter

Comment 3

•

11 years ago

Another thing I noticed, PhiAnalysis takes over 33% of ion compilation time in the top 4 functions in this benchmark, easily beating the next two which are GVN and RegisterAllocation. Is that expected?

Jan de Mooij [:jandem]

Comment 4

•

11 years ago

Hannes, maybe you can help Alon investigate this? I assume it's possible to turn the attachment into a shell testcase without too much work. Thanks!

Flags: needinfo?(jdemooij) → needinfo?(hv1989)

Hannes Verschore [:h4writer]

Assignee

Comment 5

•

11 years ago

(In reply to Alon Zakai (:azakai) from comment #3) > Another thing I noticed, PhiAnalysis takes over 33% of ion compilation time > in the top 4 functions in this benchmark, easily beating the next two which > are GVN and RegisterAllocation. Is that expected? A normal run will have: 1. RegisterAllocation: 20% - 40% 2. GVN: 10% - 20% PhiAnalysis should be around 1%. So 33% is really bad! We should investigate that definitely. (In reply to Alon Zakai (:azakai) from comment #2) > I ran the emscripten benchmark suite locally with and without the script > size limit (all without odin). Looks like it speeds lua and zlib a little > bit. I suspect sqlite is more extreme in suffering from the limit, due to > the single huge function. I wonder if this isn't also a factor in why things > like the Unity DT2 demo are sluggish without odin. > > As we have parallel compilation, is there a reason not to remove the script > size limit? Well we decided to have only 1 compilation unit at a time. So big scripts will slowdown any next compilation a lot! Technically it is possible to enable multiple compilation without issues. The reasoning behind not doing that is mostly oriented to not influence running benchmarks too much. So that we keep running "smoothly", instead of compilation taking the whole CPU over. Another reason is to not deplete battery on computers/laptops not too fast...

Flags: needinfo?(hv1989)

Alon Zakai (:azakai)

Reporter

Comment 6

•

11 years ago

(In reply to Jan de Mooij [:jandem] from comment #4) > I assume it's possible to > turn the attachment into a shell testcase without too much work. Yes, just remove the first and last two lines (which define a body and script tag). (In reply to Hannes Verschore [:h4writer] from comment #5) > (In reply to Alon Zakai (:azakai) from comment #2) > > I ran the emscripten benchmark suite locally with and without the script > > size limit (all without odin). Looks like it speeds lua and zlib a little > > bit. I suspect sqlite is more extreme in suffering from the limit, due to > > the single huge function. I wonder if this isn't also a factor in why things > > like the Unity DT2 demo are sluggish without odin. > > > > As we have parallel compilation, is there a reason not to remove the script > > size limit? > > Well we decided to have only 1 compilation unit at a time. So big scripts > will slowdown any next compilation a lot! Technically it is possible to > enable multiple compilation without issues. The reasoning behind not doing > that is mostly oriented to not influence running benchmarks too much. So > that we keep running "smoothly", instead of compilation taking the whole CPU > over. Another reason is to not deplete battery on computers/laptops not too > fast... Ok, I understand about battery life, etc. However, I suspect the script size limit - which is fairly large right now :) - is not hit by anything significant aside from large compiled code, where we don't want the limit anyhow, like asm.js and mandreel. Looks like it does affect octane-mandreel, removing the script size limit makes it speed up from 27100, 27018, 27018 (3 runs) to 27218, 27249, 27249 which looks fairly consistent, almost a 1% win. I don't see other changes on octane, although several of the tests there are very noisy so its hard to tell - but I would be very surprised if they had anything big enough to hit the limit. So far it looks like there are only benefits to removing the limit, is there something else I should test, where we worry about a regression?

Hannes Verschore [:h4writer]

Assignee

Comment 7

•

11 years ago

(In reply to Hannes Verschore [:h4writer] from comment #5) > PhiAnalysis should be around 1%. So 33% is really bad! We should investigate > that definitely. Investigated this and it seems most time (50%) is spend in removing redundant phis. So removing phi(a,a) or phi(a,this). 25% being spend in replaceAllUseswith and 25% in IsPhiRedundant. In this case it is mostly phi(cte) and phi(phi1, phi1). I don't see an immediate way to speed this up though :( (In reply to Alon Zakai (:azakai) from comment #6) > So far it looks like there are only benefits to removing the limit, is there > something else I should test, where we worry about a regression? I guess we could do this. Maybe even tweak some things. Like disabling inlining in very very big scripts. Though we should have a way to not stall the compilation thread. Maybe make it possible to pause compiling scripts, for new scripts, if it is taking to long? Or some metric to decide to force a compilation stop, when it is taking too long and our compiling queue is getting too long?

Dan Gohman [:sunfish]

Comment 8

•

11 years ago

Are you building with GCC 4.6 by chance? That version doesn't yet optimize the C++11 final keyword, so every getOperand call on a phi gets compiled to the general virtual method call instead of the inline direct array access. That could slow down EliminatePhis. GVN probably has a similar problem, though in that case we'll also need to restructure some of the congruentTo methods before the final optimization can help.

Hannes Verschore [:h4writer]

Assignee

Comment 9

•

11 years ago

(In reply to Dan Gohman [:sunfish] from comment #8) > Are you building with GCC 4.6 by chance? That version doesn't yet optimize > the C++11 final keyword, so every getOperand call on a phi gets compiled to > the general virtual method call instead of the inline direct array access. > That could slow down EliminatePhis. Just looked and I'm using 4.7. So that shouldn't be the issue.

Jan de Mooij [:jandem]

Comment 10

•

11 years ago

It's probably just a large function with many locals, so that we'll have many phis to eliminate and many uses to replace. It'd be interesting to compare this to the time Odin spends under EliminatePhis for this particular function. Most phi uses are likely from resume points and Odin doesn't have those; we could optimize that in Ion with some work. I also wonder if we keep updating the same uses while we eliminate more phis; maybe we could be smarter there.

Hannes Verschore [:h4writer]

Assignee

Comment 11

•

11 years ago

(In reply to Alon Zakai (:azakai) from comment #6) > So far it looks like there are only benefits to removing the limit, is there > something else I should test, where we worry about a regression? A potential issue is that big scripts stall the compilation thread. But that can get solved (see bug 1013172). Afterwards I don't see a potential issue with removing the limit.

Alon Zakai (:azakai)

Reporter

Updated

•

11 years ago

Depends on: 1013172

Hannes Verschore [:h4writer]

Assignee

Comment 12

•

11 years ago

Attached patch Remove length check — Details — Splinter Review

sqlite.js: 1) amsjs: 1274ms 2) --no-asmjs: 8159ms 3) --no-asmjs --ion-limit-script-size=off: 6740ms 4) --no-asmjs patched: 6016ms So this patch removes the limit for big scripts. This isn't a big issue anymore, since we can pause a task and let an higher priority script build first. I also adjusted the heuristics a bit. The useCount was used to decide which script got a higher useCount. Now we want small scripts to get scheduled and compiled immediately. Since they take no time and the gain is immense. As the scripts gets larger, the compilation takes longer and it will stall the compilation longer. Therefore I adjusted to what I have now. This does what is expected and improves this with 700ms.

Assignee: nobody → hv1989

Attachment #8474712 - Flags: review?(jdemooij)

Jan de Mooij [:jandem]

Comment 13

•

11 years ago

Comment on attachment 8474712 [details] [diff] [review] Remove length check Review of attachment 8474712 [details] [diff] [review]: ----------------------------------------------------------------- Does this also affect the asmjs shell benchmarks with --no-asmjs? ::: js/src/vm/HelperThreads.cpp @@ +560,5 @@ > return !first->script()->hasIonScript(); > > // A higher useCount indicates a higher priority. > + return first->script()->getUseCount() / first->script()->length() > > + second->script()->getUseCount() / second->script()->length(); Probably use double(first->script()->getUseCount()), and also for the second one, to get a double instead of integer division.

Attachment #8474712 - Flags: review?(jdemooij) → review+

Hannes Verschore [:h4writer]

Assignee

Comment 14

•

11 years ago

(In reply to Jan de Mooij [:jandem] from comment #13) > Comment on attachment 8474712 [details] [diff] [review] > Remove length check > > Review of attachment 8474712 [details] [diff] [review]: > ----------------------------------------------------------------- > > Does this also affect the asmjs shell benchmarks with --no-asmjs? No, it doesn't. I think emscripten is smart enough to not make their functions too big. > ::: js/src/vm/HelperThreads.cpp > @@ +560,5 @@ > > return !first->script()->hasIonScript(); > > > > // A higher useCount indicates a higher priority. > > + return first->script()->getUseCount() / first->script()->length() > > > + second->script()->getUseCount() / second->script()->length(); > > Probably use double(first->script()->getUseCount()), and also for the second > one, to get a double instead of integer division. As discussed on IRC I didn't adjust to use doubles. This made us switch between scripts too often. Using ints this is more stable. https://hg.mozilla.org/integration/mozilla-inbound/rev/cc392a569cf4

Nicolas B. Pierron [:nbp]

Comment 15

•

11 years ago

Comment on attachment 8474712 [details] [diff] [review] Remove length check Review of attachment 8474712 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/vm/HelperThreads.cpp @@ +559,5 @@ > if (first->script()->hasIonScript() != second->script()->hasIonScript()) > return !first->script()->hasIonScript(); > > // A higher useCount indicates a higher priority. > + return first->script()->getUseCount() / first->script()->length() > Is the compilation time linear?

Ryan VanderMeulen [:RyanVM]

Comment 16

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/cc392a569cf4

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla34

Alon Zakai (:azakai)

Reporter

Comment 17

•

11 years ago

Yay! \o/ Thanks Hannes! Do we still need the shell option for this? https://hg.mozilla.org/mozilla-central/file/cc392a569cf4/js/src/shell/js.cpp#l5835

Hannes Verschore [:h4writer]

Assignee

Comment 18

•

11 years ago

(In reply to Alon Zakai (:azakai) from comment #17) > Yay! \o/ Thanks Hannes! > > Do we still need the shell option for this? > https://hg.mozilla.org/mozilla-central/file/cc392a569cf4/js/src/shell/js. > cpp#l5835 Yes, it is still needed. This flag also controls the script-size limit on the mainthread and how many times IonBuilder can rebuild loops, upon discovering new types.

Alon Zakai (:azakai)

Reporter

Comment 19

•

11 years ago

Oh, ok. What is the script-size limit on the main thread? (I thought this patch removed the script size limit from all Ion compilation?)

Hannes Verschore [:h4writer]

Assignee

Comment 20

•

11 years ago

(In reply to Alon Zakai (:azakai) from comment #19) > Oh, ok. What is the script-size limit on the main thread? (I thought this > patch removed the script size limit from all Ion compilation?) This script-size limit is only used when there is no background thread. So we are forced to compile on the mainthread. This stalls execution for too long. I think the limit is 2000 bytes. Now this only happens for people that don't have multiple cores (2+). I have no idea how many people still have that and are concerned for performance...

Alon Zakai (:azakai)

Reporter

Comment 21

•

11 years ago

I see, thanks. Yeah, hopefully not that many people left on single core at this point...

sqlite.html 11 years ago Alon Zakai (:azakai) 2.34 MB, text/html		Details
Remove length check 11 years ago Hannes Verschore [:h4writer] 2.59 KB, patch	jandem : review+	Details \| Diff \| Splinter Review