Closed
Bug 793491
Opened 12 years ago
Closed 12 years ago
4 second hang in ScriptAnalysis::addTypeBarrier opening spreadsheet from Google Drive
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
VERIFIED
FIXED
mozilla18
People
(Reporter: jdm, Assigned: bhackett1024)
References
()
Details
(Keywords: regression, Whiteboard: [Snappy], sps)
Attachments
(4 files)
29.41 KB,
application/x-javascript
|
Details | |
2.21 KB,
patch
|
jandem
:
review+
|
Details | Diff | Splinter Review |
1.77 KB,
patch
|
jandem
:
review+
|
Details | Diff | Splinter Review |
2.36 KB,
patch
|
jandem
:
review+
akeybl
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
http://people.mozilla.com/~bgirard/cleopatra/?report=311978ba7117c24373d0079841fbc00a48863326 This profile seems to show that it's all just poor page design, which sucks. Can someone confirm this for me?
Comment 1•12 years ago
|
||
Can you reproduce by turning on stackwalking and making sure you're running a nightly build? That should give us a more interesting profile. You can queue a profile save by using the hotkey during a jank.
Reporter | ||
Comment 2•12 years ago
|
||
I'm running a nightly (also on Linux, which might complicate things). What must I do to turn on stackwalking? Also, the hotkey doesn't work for me - it opens the places library.
Comment 4•12 years ago
|
||
The profiler linux stackwalking is in progress. Either we use another linux profiler or we try this on another platform. Do you have STR?
Comment 5•12 years ago
|
||
I reproduce it opening the first google spreadsheet I tried. Is this TI falling down? http://people.mozilla.com/~bgirard/cleopatra/?report=3640d4eab6abc225f9df62b543ab66074cf5a448
Assignee: nobody → general
Component: General → JavaScript Engine
Summary: Huge jank opening spreadsheet from Google Drive → 4 second hang in ScriptAnalysis::addTypeBarrier opening spreadsheet from Google Drive
Comment 6•12 years ago
|
||
It certainly looks like it. Normally we don't see so much addTypeBarrier in profiles. Do you know if this is a recent regression or has TI always died on this test?
Comment 7•12 years ago
|
||
Can someone link to an actual URI that shows the problem?
Comment 8•12 years ago
|
||
(In reply to Luke Wagner [:luke] from comment #6) > Do you know if this is a recent regression or has TI always died on this > test? I've had the hang for a long time now, I don't know if the cause has changed over time though.
Comment 9•12 years ago
|
||
(In reply to Boris Zbarsky (:bz) from comment #7) > Can someone link to an actual URI that shows the problem? Just create a new spreadsheet on Google Docs and it should show the problem.
Comment 10•12 years ago
|
||
Here's a link to make this easier: https://docs.google.com/spreadsheet/ccc?key=0AvYoHvO43KG_dGItOFM3bTJZdmFxa0dfM1FYQXBOUFE
Comment 11•12 years ago
|
||
(In reply to Luke Wagner [:luke] from comment #6) > It certainly looks like it. Normally we don't see so much addTypeBarrier in > profiles. > > Do you know if this is a recent regression or has TI always died on this > test? That's hard to know since it could also have been google that changed their scripts recently. Can someone do a regression window to test if TI ever handled this correctly. If not the regression window will be TI landing so I'm flagging this as a regression.
Keywords: regression,
regressionwindow-wanted
Comment 12•12 years ago
|
||
Thanks. So what that's showing is js_Interpret directly calling types::TypeMonitorResult, which calls types::TypeSet::addType, which calls addTypeBarrier. Unfortunately, Instruments seems determined to not show me source for some reason, so I'm not sure yet what spot in js_Interpret this is. :(
Comment 13•12 years ago
|
||
Though it seems to be near a NameOperation call, fwiw.
Comment 14•12 years ago
|
||
So this looks like this code in addTypeBarrier: 2699 /* Ignore duplicate barriers. */ 2700 TypeBarrier *barrier = code.typeBarriers; 2701 while (barrier) { 2702 if (barrier->target == target && barrier->type == type && !barrier->singleton) 2703 return; 2704 barrier = barrier->next; 2705 } Any idea how to make sense of the "type" argument?
Comment 15•12 years ago
|
||
(In reply to Benoit Girard (:BenWa) from comment #5) > I reproduce it opening the first google spreadsheet I tried. Is this TI > falling down? > > http://people.mozilla.com/~bgirard/cleopatra/ > ?report=3640d4eab6abc225f9df62b543ab66074cf5a448 This profile was collected on Aurora FYI. Here's a profile collected against nightly where we get the stackwalking feature: http://people.mozilla.com/~bgirard/cleopatra/?report=9a4d82bc602e5110533bb90cad8ac3badd32a242 Pay close attention to the 'Self' count since most sample aren't spent at the leaf.
Comment 16•12 years ago
|
||
Looks like "type" tests true for isSingleObject. We're definitely in the NameOperation case in the interpreter. The op is GETGNAME or GETGPROP. I tried stopping a few times in this code, and I see things like: (gdb) pu script->ptr->getName(regs.pc)->d->u1.chars $9 = 0x115be5550 "prototype"
Comment 17•12 years ago
|
||
So my first guess is that we have some location that's seeing a huge number of objects with singleton types?
Updated•12 years ago
|
Comment 18•12 years ago
|
||
Alright. So I just added printfs to this code to get some data. Started the browser with the above URI, then quit. I see, in an opt build, 1,724,184 calls to addTypeBarrier. Of these, 515 take the early return in the loop above, while the rest make it out of the loop. Looking at these remaining 1,723,669 places where we exit the loop, I see 686,071 where we have < 100 loop iterations, 674,243 with 100 <= iterations < 200, and 363,355 with >= 200 iterations. The peak iteration count is 276, and there is only one such call. There is also only a smattering of calls with other iteration counts of > 253. But there are 6244 calls with iteration counts of 253, and 6736 calls with iteration counts in the 229-252 range. 6737 calls at 228 iterations. 6738 calls in the 139-227 range, and so forth. Looking at calls with 253 iterations, they are all, unsurprisingly, for different pc values. So it looks like there are, on this page, 6000-some different pc values each of which ends up touching several hundred objects that have singleton types. So each one ends up with a typeset of about 250 types, which requires O(N^2) time to build up because of the traversals in the above loop...
Comment 19•12 years ago
|
||
The "6000 different pc values" thing is a bit weird, I should say. ;)
Comment 20•12 years ago
|
||
bz and I looked into this. The problem is that there is a large number of GETGNAME/GETPROP instructions, each with many type barriers. The set of observed types is very small. The attached shell test demonstrates the problem, it runs 100x faster with --no-ti (350 ms vs 3 ms). Instruments shows 87% of the time is under addTypeBarrier.
Comment 21•12 years ago
|
||
I got this regression range, not sure if it's helpful http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=c918ff2f0994&tochange=ae22909cef5a
Updated•12 years ago
|
Keywords: regressionwindow-wanted
Comment 22•12 years ago
|
||
Maybe from bug 773731?
Comment 23•12 years ago
|
||
(In reply to Boris Zbarsky (:bz) from comment #22) > Maybe from bug 773731? I checked and you are correct.
Blocks: 773731
Comment 24•12 years ago
|
||
Requesting tracking; this is a pretty significant user-facing perf regression.
tracking-firefox17:
--- → ?
tracking-firefox18:
--- → ?
Comment 25•12 years ago
|
||
bhackett, via satphone + glue + tape, says: The problem is probably that ScriptAnalysis::addSingletonTypeBarrier needs a test similar to that in addTypeBarrier around the BARRIER_OBJECT_LIMIT test, which is designed to avoid this quadratic blowup. addSingletonTypeBarrier is used when fetching properties from specific objects, so will be used in GETGNAME and occasionally GETPROP.
Comment 26•12 years ago
|
||
I tried hacking that in, but it doesn't seem to help, because target->getObjectCount() never actually gets large here...
Updated•12 years ago
|
Whiteboard: [Snappy] → [Snappy], sps
Comment 27•12 years ago
|
||
How important is keeping bug 773731 in the build given this user regression?
Comment 28•12 years ago
|
||
From far-off lands, via unreliable communications methods, cometh this patch, which makes sense even to me. ;)
Attachment #668856 -
Flags: review?(jdemooij)
Comment 29•12 years ago
|
||
On the attached shell testcase, I see times like so: Pre-patch: ~350ms Pre-patch no-ti: 3ms Post-patch: 22ms Post-patch no-ti: 3ms So it's not perfect, but way better...
Comment 30•12 years ago
|
||
Comment on attachment 668856 [details] [diff] [review] Patch from bhackett Review of attachment 668856 [details] [diff] [review]: ----------------------------------------------------------------- Nice, thanks bz! Since AWFY is not updating atm it would be good to check Kraken and V8 in the shell, just to be sure. I can also do that and land the patch first thing tomorrow.
Attachment #668856 -
Flags: review?(jdemooij) → review+
Comment 31•12 years ago
|
||
Looks like no statistically significant change on Kraken in shell (in fact, the patched run is faster).
Comment 32•12 years ago
|
||
v8 is really noisy. :( I _think_ it's about the same, but hard to tell.
Comment 33•12 years ago
|
||
Jan, we should take this for 17 too, right? Would you mind doing the approval request? I'm not really qualified to do a risk analysis here....
Comment 34•12 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/9157b71a7df1 Oh, and just to make it clear: the pause on the spreadsheet link from comment 10 goes away with this patch.
Assignee: general → bhackett1024
Flags: in-testsuite?
Target Milestone: --- → mozilla18
Comment 35•12 years ago
|
||
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/c9bfb6646f04 for pretty much every single debug test suite hitting "Assertion failure: !isAnyObject() && !isUnknown(), at ../../../js/src/jsinfer.h:162"
Target Milestone: mozilla18 → ---
Comment 36•12 years ago
|
||
The relevant stack bit: 0 libxul.so!js::analyze::ScriptAnalysis::addTypeBarrier [jsinfer.h : 162 + 0x18] eip = 0x027300c8 esp = 0xbfd2ba50 ebp = 0xbfd2bab8 ebx = 0x030dd640 esi = 0x00000020 edi = 0x00000007 eax = 0x00000000 ecx = 0x00c5a32c edx = 0x00000000 efl = 0x00210246 Found by: given as instruction pointer in context 1 libxul.so!TypeConstraintProp<(PropertyAccessKind)1u>::newType [jsinfer.cpp : 990 + 0x1b] eip = 0x0273e449 esp = 0xbfd2bac0 ebp = 0xbfd2bb18 ebx = 0x030dd640 esi = 0x0a38e1ae edi = 0x096f55f0 Found by: call frame info 2 libxul.so!js::types::TypeCompartment::resolvePending [jsinferinlines.h : 1071 + 0x1e] eip = 0x026a8adc esp = 0xbfd2bb20 ebp = 0xbfd2bb78 ebx = 0x030dd640 esi = 0x0af8a6cc edi = 0x0ab65228 Found by: call frame info 3 libxul.so!js::types::TypeSet::addType [jsinferinlines.h : 1390 + 0x13] eip = 0x026a91a1 esp = 0xbfd2bb80 ebp = 0xbfd2bc08 ebx = 0x030dd640 esi = 0x096f55f0 edi = 0x00000000 Found by: call frame info 4 libxul.so!js::types::TypeMonitorResult [jsinfer.cpp : 5396 + 0x12] eip = 0x02728c20 esp = 0xbfd2bc10 ebp = 0xbfd2bc78 ebx = 0x030dd640 esi = 0x0ab60094 edi = 0x096f55f0 Found by: call frame info And another take on frame 0 from a different log: 0 XUL!js::analyze::ScriptAnalysis::addTypeBarrier [jsinferinlines.h : 1414 + 0x0] Line 1414 of jsinferinlines is the JS_ASSERT(!unknownObject()) at the top of TypeSet::getObjectCount().
Comment 37•12 years ago
|
||
A slightly better stack from a debugger: #0 js::types::Type::isObject (this=0x7fff5fbf8a98) at jsinfer.h:162 #1 0x0000000101132bcc in js::analyze::ScriptAnalysis::addTypeBarrier (this=0x148a172a0, cx=0x11965c0e0, pc=0x11803adc8 "5", target=0x147ca70a0, type={data = 32}) at jsinfer.cpp:2706 #2 0x000000010114119d in MarkPropertyAccessUnknown (cx=0x11965c0e0, script={<js::HandleBase<JSScript *>> = {<No data fields>}, ptr = 0x7fff5fbf8c30}, pc=0x11803adc8 "5", target=0x147ca70a0) at jsinfer.cpp:990 #3 0x0000000101158bee in TypeConstraintProp<(PropertyAccessKind)1>::newType (this=0x149656f20, cx=0x11965c0e0, source=0x149651838, type={data = 7}) at jsinfer.cpp:1161 #4 0x000000010108c90d in js::types::TypeCompartment::resolvePending (this=0x1183074a8, cx=0x11965c0e0) at jsinferinlines.h:1071 #5 0x000000010108c0b0 in js::types::TypeSet::addType (this=0x1483abd00, cx=0x11965c0e0, type={data = 5490666561}) at jsinferinlines.h:1390 #6 0x000000010113c5f7 in js::types::TypeMonitorResult (cx=0x11965c0e0, script={<js::HandleBase<JSScript *>> = {<No data fields>}, ptr = 0x7fff5fbfb510}, pc=0x148349b5f "�", rval=@0x7fff5fbfaa20) at jsinfer.cpp:5396 #7 0x000000010118333b in js::types::TypeScript::Monitor (cx=0x11965c0e0, script={<js::HandleBase<JSScript *>> = {<No data fields>}, ptr = 0x7fff5fbfb510}, pc=0x148349b5f "�", rval=@0x7fff5fbfaa20) at jsinferinlines.h:853 This is the 2706 if (barrier->type.isAnyObject() && type.isObject()) line that the patch added. In this case, type.isUnknown() == true.
Comment 38•12 years ago
|
||
Oh, and the point is that isObject() starts with: JS_ASSERT(!isAnyObject() && !isUnknown());
Comment 39•12 years ago
|
||
Attachment #669020 -
Flags: review?(jdemooij)
Comment 40•12 years ago
|
||
Comment on attachment 669020 [details] [diff] [review] Patch on top of the other one to fix the assertion failures, I hope Review of attachment 669020 [details] [diff] [review]: ----------------------------------------------------------------- Nice. I fixed some small style nits (if-statement with multiline conditional always gets { } with opening brace on new line), merged this with the previous patch and pushed to inbound: https://hg.mozilla.org/integration/mozilla-inbound/rev/f375fcca196a
Attachment #669020 -
Flags: review?(jdemooij) → review+
Comment 41•12 years ago
|
||
(In reply to Boris Zbarsky (:bz) from comment #33) > Jan, we should take this for 17 too, right? Would you mind doing the > approval request? I'm not really qualified to do a risk analysis here.... Sure, I will request approval a few days after it lands on m-c (also avoids any confusion with the merge today).
Comment 42•12 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/f375fcca196a
Status: NEW → RESOLVED
Closed: 12 years ago
status-firefox18:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla18
Comment 43•12 years ago
|
||
This is the combined patch I pushed to m-c. [Approval Request Comment] Bug caused by (feature/regressing bug #): Type Inference (bisecting pointed to bug 773731 but I'm pretty sure it could be triggered before that) User impact if declined: Browser hangs on certain websites like Google Docs Testing completed (on m-c, etc.): On m-c for a few days Risk to taking this patch (and alternatives if risky): Low String or UUID changes made by this patch: None
Attachment #669883 -
Flags: review+
Attachment #669883 -
Flags: approval-mozilla-beta?
Updated•12 years ago
|
status-firefox17:
--- → affected
Updated•12 years ago
|
Attachment #669883 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 45•12 years ago
|
||
Verified fixed for FF 17 and FF 18b7 using Ubuntu x86: Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/17.0 Firefox/17.0(20121119183901) Mozilla/5.0 (X11; Linux i686; rv:18.0) Gecko/20100101 Firefox/18.0(20121231071231)
You need to log in
before you can comment on or make changes to this bug.
Description
•