Closed Bug 1198224 Opened 6 years ago Closed 6 years ago
Compilation time of an Unity exported asm
.js code blows up from 6 seconds to 586 seconds when exceptions are enabled .
I am working on a project that has about 64 megs of JS code when exported in optimized release build from Unity. The asm.js compilation of that code takes about 6 seconds. However the optimized code does not have exception catching support and in order to port the code, I need to debug with exception catching enabled. Re-deploying the same Unity project with exception catching enabled increases the asm.js code size to 96 megs, but at startup time, the asm.js compilation now took 586 seconds, which is killing iteration times for debugging. Roughly handwaving, it feels unreasonable that a 1.5x size increase in JS code should cause a 100x increase in asm.js compilation time, so I am hoping that there is something we could do here to optimize asm.js compilation of exceptions enabled code? The project is partner private under an NDA, test cases are available from me directly. Sending one to you Luke, could you peek if there is something that could be done here?
In the time report at the end of those 587 seconds, are there any slow functions reported that account for most of the time? In general, we do have problems with super-large functions taking worse-than-linear time to compile in the Ion backend. The general mitigation will be, once we have the baseline compiler (bug 1169167), to fall back on the baseline compiler for super-large functions.
There are tons of such functions. The log reads like this: Successfully compiled asm.js code (total compilation time 586134ms; stored in cache; 109 functions compiled slowly: _m84483_gshared:17:484467 (409ms), _m48681_gshared:22:1088026 (4037ms), _m60851_gshared:30:1110675 (22750ms), _m7765:37:758821 (4031ms), _m8975:38:681961 (285ms), _m33187:39:1291988 (254ms), _m11098:42:1217104 (266ms), _m36345:45:163800 (840ms), _m37401:45:1360664 (331ms), _m12253:46:196488 (271ms), _m9952:47:1034354 (323ms), _m29493:48:64999 (253ms), _m38890:49:92952 (109300ms), _m38916:49:134132 (251ms), _m38964:49:214214 (376ms), _m1308:50:9 (499ms), _m2347:50:1363973 (369ms), _m33433:52:114961 (261ms), _m25578:52:1299486 (357ms), _m20322:53:311317 (379ms), _m20345:53:356268 (310ms), _m20468:53:517335 (298ms), _m20475:53:577993 (438ms), _m20476:53:639135 (399ms), _m20578:53:797518 (826ms), _m28999:55:360788 (289ms), _m29017:55:410895 (362ms), _m29166:55:594929 (397ms), _m29298:55:784637 (554ms), _m29304:55:817573 (632ms), _m29306:55:840279 (415ms), _m29311:55:858895 (539ms), _m29314:55:876120 (289ms), _m24521:56:293970 (251ms), _m24546:56:361809 (320ms), _m24804:56:647123 (1810ms), _m24805:56:710973 (1815ms), _m19014:56:777690 (601ms), _m19017:56:809351 (556ms), _m19023:56:841189 (1118ms), _m11354:57:9 (306ms), _m11323:57:165266 (302ms), _m11555:57:508108 (531ms), _m11768:57:752201 (552ms), _m12030:57:800777 (388ms), _m12033:57:899239 (260ms), _m12036:57:1120477 (593ms), _m12077:57:1189367 (530ms), _m12079:57:1231407 (258ms), _m432:58:168089 (440ms), _m771:58:504725 (325ms), _m737:58:663615 (304ms), _m745:58:686652 (578ms), _m753:58:715085 (652ms), _m754:58:738334 (419ms), _m763:58:758434 (338ms), _m974:58:941607 (447ms), _m975:58:967161 (554ms), _m1035:58:1101067 (396ms), _m1187:58:1180788 (303ms), _m19664:60:162140 (301ms), _m19626:60:235232 (277ms), _m19692:60:338576 (270ms), _m19741:60:389861 (254ms), _m19729:60:402254 (257ms), _m20119:60:449125 (265ms), _m19774:60:532459 (366ms), _m20278:60:552664 (282ms), _m19779:60:612474 (347ms), _m19792:60:663457 (379ms), _m19822:60:707318 (360ms), _m20249:60:758163 (362ms), _m19834:60:804850 (486ms), _m20173:60:1237795 (403ms), _m38043:65:9 (2529ms), _m38044:65:63859 (3938ms), _m38063:65:291367 (265ms), _m38045:65:158173 (5365ms), _m38064:65:301972 (608ms), _m38065:65:315120 (300ms), _m38379:65:654564 (364ms), _m38470:65:850873 (272ms), _m26804:66:96690 (252ms), _m26895:66:208904 (419ms), _m26911:66:233735 (542ms), _m26944:66:274056 (1019ms), _m26962:66:316694 (1458ms), _m26987:66:392154 (1017ms), _m27005:66:435693 (1472ms), _m27126:66:630773 (279ms), _m27147:66:665700 (493ms), _m27274:66:829143 (726ms), _m27275:66:850057 (739ms), _m27287:66:900699 (1090ms), _m27306:66:964064 (442ms), _m27317:66:984827 (431ms), _m27322:66:997880 (251ms), _m37433:68:62702 (267ms), _m37438:68:75491 (287ms), _m37474:68:167799 (370ms), _m37649:68:346183 (416ms), _m37712:68:449421 (265ms), _m37857:68:591666 (1200ms), _m37902:68:663396 (534ms), _m37912:68:689968 (1557ms), _m38035:68:844445 (345ms), _m38040:68:861541 (2813ms), _m38041:68:925391 (4676ms), _m38042:68:1019705 (6112ms)) html5_release_exceptions.js Here is the function _m60851_gshared() that took 23 seconds to compile: http://pastebin.com/QJtjTicf . It's not that big, only 236 lines when unminified.
Jukka, with a stub module (all functions replaced by FFIs and global variables initialized to 0) containing the script pastebin'd from your previous comment, I get a compilation time of 1ms, on a x64 linux optimized js shell. Which platform are you using? Any chance you can provide a minimal module that compiles this particular function? (fwiw, my email is bbouvier@moco)
I am on Win 8.1 64-bit. Sent the repro demos to you via email.
Some perf recording on a 2 minutes session shows something like 30%+ of the time is spent in HashTable functions, namely the ExitMap functions. So I suspect we have a lot of collisions in the exits maps, which means a lot of FFI functions that collide (have the same signature?), or a lot of FFI functions... The hashtable operations (lookup, add, etc.) are done in addExit . The number of calls to addExit in without the exceptions is less than 5k; with exceptions, it's more than 500k. The only caller of addExit is CheckFFICall, which means we have more than 500k FFI calls when exceptions are enabled. Not sure if that is the cause of the slow compilation, or if it's an issue related to hashmap collisions.  https://dxr.mozilla.org/mozilla-central/source/js/src/asmjs/AsmJSValidate.cpp#1958
Wow, interesting. Can you #define METER in HashTable.h and then print out exits_.stats so we can see if anything stands out? I just checked ExitDescriptor::hash and it seems reasonable...
Without exceptions: ExitMap stats: searches: 4615 steps: 240632 hits: 0 misses: 4615 addOverRemoved: 0 removes: 0 removeFrees: 0 grows: 7 shrinks: 0 compresses: 0 rehashes: 0 ******************** With exceptions: searches: 1388154 steps: 3247798419 hits: 0 misses: 1388154 addOverRemoved: 0 removes: 0 removeFrees: 0 grows: 15 shrinks: 0 compresses: 0 rehashes: 0
Patch to get all the stats printed, even on optimized builds.
Hah! So it turns out that, since this cset https://hg.mozilla.org/mozilla-central/rev/cd949a8d064c#l1.144 ExitDescriptor::match has been comparing identity of the *pointers* to signatures, not the signatures themselves which means every addExit() adds a duplicate *and* we get worst-case linear chaining search in the hash table hence the slowness. Simple fix; fortunately this change is only on Aurora.
Attachment #8654441 - Flags: review?(benj)
Comment on attachment 8654441 [details] [diff] [review] oh-pointers Review of attachment 8654441 [details] [diff] [review]: ----------------------------------------------------------------- Duh, sorry about that. The number of misses seemed really high, but I only stared at the hash() function, not the match() function... Good catch!
Attachment #8654441 - Flags: review?(benj) → review+
Also: any chance we could test this? Performance related testing is hard, but in case you have an idea...
Tested locally, with the patch applied, the module without exceptions takes 5.7s to compile, and the module with exceptions takes 11.7 seconds to compile, which is much better. The linux perf tool shows that time is mostly spent in the register allocation, which is kind of expected, as mentioned in comment 1.
(In reply to Benjamin Bouvier [:bbouvier] from comment #10) > Duh, sorry about that. The number of misses seemed really high, but I only > stared at the hash() function, not the match() function... Good catch! Hehe, me too; I finally had to printf() every signature and then I saw a lot of duplicates... (In reply to Benjamin Bouvier [:bbouvier] from comment #11) > Also: any chance we could test this? Performance related testing is hard, > but in case you have an idea... It's not semantically observable, so no, not without backflips.
Comment on attachment 8654441 [details] [diff] [review] oh-pointers Approval Request Comment [Feature/regressing bug #]: 1157624 [User impact if declined]: possible massive asm.js compile time/size regression [Describe test coverage new/current, TreeHerder]: m-c [Risks and why]: very low (two char change, matching change in bug 1157624)
This needs a proper commit message for checkin. Also, please mark patches not intended for landing as obsolete so as to avoid any confusion. https://developer.mozilla.org/en-US/docs/Mercurial_FAQ#How_can_I_generate_a_patch_for_somebody_else_to_check-in_for_me.3F
Assignee: nobody → luke
Attachment #8654228 - Attachment is obsolete: true
Comment on attachment 8655042 [details] [diff] [review] oh-pointers Approval Request Comment is in comment 14.
Attachment #8655042 - Flags: approval-mozilla-aurora?
Comment on attachment 8655042 [details] [diff] [review] oh-pointers Small patch, hopefully with no regression, taking it.
Attachment #8655042 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.