Last Comment Bug 793491 - 4 second hang in ScriptAnalysis::addTypeBarrier opening spreadsheet from Google Drive
: 4 second hang in ScriptAnalysis::addTypeBarrier opening spreadsheet from Goog...
Status: VERIFIED FIXED
[Snappy], sps
: regression
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: Trunk
: x86_64 Linux
: -- normal (vote)
: mozilla18
Assigned To: Brian Hackett (:bhackett)
:
Mentors:
https://docs.google.com/spreadsheet/c...
: 796972 (view as bug list)
Depends on:
Blocks: 773731
  Show dependency treegraph
 
Reported: 2012-09-23 07:50 PDT by Josh Matthews [:jdm]
Modified: 2013-01-03 06:12 PST (History)
11 users (show)
bzbarsky: in‑testsuite?
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
verified
+
verified


Attachments
Shell testcase (29.41 KB, application/x-javascript)
2012-10-03 10:36 PDT, Jan de Mooij [:jandem]
no flags Details
Patch from bhackett (2.21 KB, patch)
2012-10-06 17:37 PDT, Boris Zbarsky [:bz]
jdemooij: review+
Details | Diff | Splinter Review
Patch on top of the other one to fix the assertion failures, I hope (1.77 KB, patch)
2012-10-07 22:35 PDT, Boris Zbarsky [:bz]
jdemooij: review+
Details | Diff | Splinter Review
Patch pushed to m-c (2.36 KB, patch)
2012-10-10 01:33 PDT, Jan de Mooij [:jandem]
jdemooij: review+
akeybl: approval‑mozilla‑beta+
Details | Diff | Splinter Review

Description Josh Matthews [:jdm] 2012-09-23 07:50:14 PDT
http://people.mozilla.com/~bgirard/cleopatra/?report=311978ba7117c24373d0079841fbc00a48863326

This profile seems to show that it's all just poor page design, which sucks. Can someone confirm this for me?
Comment 1 Benoit Girard (:BenWa) 2012-09-23 12:16:28 PDT
Can you reproduce by turning on stackwalking and making sure you're running a nightly build? That should give us a more interesting profile. You can queue a profile save by using the hotkey during a jank.
Comment 2 Josh Matthews [:jdm] 2012-09-23 15:47:03 PDT
I'm running a nightly (also on Linux, which might complicate things). What must I do to turn on stackwalking? Also, the hotkey doesn't work for me - it opens the places library.
Comment 3 Timothy Nikkel (:tnikkel) 2012-10-02 10:23:54 PDT
*** Bug 796972 has been marked as a duplicate of this bug. ***
Comment 4 Benoit Girard (:BenWa) 2012-10-02 10:26:23 PDT
The profiler linux stackwalking is in progress. Either we use another linux profiler or we try this on another platform.

Do you have STR?
Comment 5 Benoit Girard (:BenWa) 2012-10-02 10:31:42 PDT
I reproduce it opening the first google spreadsheet I tried. Is this TI falling down?

http://people.mozilla.com/~bgirard/cleopatra/?report=3640d4eab6abc225f9df62b543ab66074cf5a448
Comment 6 Luke Wagner [:luke] 2012-10-02 10:37:29 PDT
It certainly looks like it.  Normally we don't see so much addTypeBarrier in profiles.

Do you know if this is a recent regression or has TI always died on this test?
Comment 7 Boris Zbarsky [:bz] 2012-10-02 10:47:29 PDT
Can someone link to an actual URI that shows the problem?
Comment 8 Timothy Nikkel (:tnikkel) 2012-10-02 10:49:22 PDT
(In reply to Luke Wagner [:luke] from comment #6)
> Do you know if this is a recent regression or has TI always died on this
> test?

I've had the hang for a long time now, I don't know if the cause has changed over time though.
Comment 9 Timothy Nikkel (:tnikkel) 2012-10-02 10:49:50 PDT
(In reply to Boris Zbarsky (:bz) from comment #7)
> Can someone link to an actual URI that shows the problem?

Just create a new spreadsheet on Google Docs and it should show the problem.
Comment 10 Benoit Girard (:BenWa) 2012-10-02 10:58:59 PDT
Here's a link to make this easier:

https://docs.google.com/spreadsheet/ccc?key=0AvYoHvO43KG_dGItOFM3bTJZdmFxa0dfM1FYQXBOUFE
Comment 11 Benoit Girard (:BenWa) 2012-10-02 11:00:58 PDT
(In reply to Luke Wagner [:luke] from comment #6)
> It certainly looks like it.  Normally we don't see so much addTypeBarrier in
> profiles.
> 
> Do you know if this is a recent regression or has TI always died on this
> test?

That's hard to know since it could also have been google that changed their scripts recently.

Can someone do a regression window to test if TI ever handled this correctly. If not the regression window will be TI landing so I'm flagging this as a regression.
Comment 12 Boris Zbarsky [:bz] 2012-10-02 11:10:13 PDT
Thanks.  So what that's showing is js_Interpret directly calling types::TypeMonitorResult, which calls types::TypeSet::addType, which calls addTypeBarrier.

Unfortunately, Instruments seems determined to not show me source for some reason, so I'm not sure yet what spot in js_Interpret this is.  :(
Comment 13 Boris Zbarsky [:bz] 2012-10-02 11:10:52 PDT
Though it seems to be near a NameOperation call, fwiw.
Comment 14 Boris Zbarsky [:bz] 2012-10-02 11:13:11 PDT
So this looks like this code in addTypeBarrier:

2699        /* Ignore duplicate barriers. */
2700        TypeBarrier *barrier = code.typeBarriers;
2701        while (barrier) {
2702            if (barrier->target == target && barrier->type == type && !barrier->singleton)
2703                return;
2704            barrier = barrier->next;
2705        }

Any idea how to make sense of the "type" argument?
Comment 15 Benoit Girard (:BenWa) 2012-10-02 11:23:03 PDT
(In reply to Benoit Girard (:BenWa) from comment #5)
> I reproduce it opening the first google spreadsheet I tried. Is this TI
> falling down?
> 
> http://people.mozilla.com/~bgirard/cleopatra/
> ?report=3640d4eab6abc225f9df62b543ab66074cf5a448

This profile was collected on Aurora FYI. Here's a profile collected against nightly where we get the stackwalking feature:

http://people.mozilla.com/~bgirard/cleopatra/?report=9a4d82bc602e5110533bb90cad8ac3badd32a242

Pay close attention to the 'Self' count since most sample aren't spent at the leaf.
Comment 16 Boris Zbarsky [:bz] 2012-10-02 11:29:31 PDT
Looks like "type" tests true for isSingleObject.

We're definitely in the NameOperation case in the interpreter.

The op is GETGNAME or GETGPROP.

I tried stopping a few times in this code, and I see things like:

(gdb) pu script->ptr->getName(regs.pc)->d->u1.chars
$9 = 0x115be5550 "prototype"
Comment 17 Boris Zbarsky [:bz] 2012-10-02 11:30:57 PDT
So my first guess is that we have some location that's seeing a huge number of objects with singleton types?
Comment 18 Boris Zbarsky [:bz] 2012-10-02 11:55:56 PDT
Alright.  So I just added printfs to this code to get some data.  Started the browser with the above URI, then quit.

I see, in an opt build, 1,724,184 calls to addTypeBarrier.  Of these, 515 take the early return in the loop above, while the rest make it out of the loop.

Looking at these remaining 1,723,669 places where we exit the loop, I see 686,071 where we have < 100 loop iterations, 674,243 with 100 <= iterations < 200, and 363,355 with >= 200 iterations.

The peak iteration count is 276, and there is only one such call.  There is also only a smattering of calls with other iteration counts of > 253.  But there are 6244 calls with iteration counts of 253, and 6736 calls with iteration counts in the 229-252 range.  6737 calls at 228 iterations.  6738 calls in the 139-227 range, and so forth.

Looking at calls with 253 iterations, they are all, unsurprisingly, for different pc values.

So it looks like there are, on this page, 6000-some different pc values each of which ends up touching several hundred objects that have singleton types.  So each one ends up with a typeset of about 250 types, which requires O(N^2) time to build up because of the traversals in the above loop...
Comment 19 Boris Zbarsky [:bz] 2012-10-02 11:56:50 PDT
The "6000 different pc values" thing is a bit weird, I should say.  ;)
Comment 20 Jan de Mooij [:jandem] 2012-10-03 10:36:32 PDT
Created attachment 667538 [details]
Shell testcase

bz and I looked into this. The problem is that there is a large number of GETGNAME/GETPROP instructions, each with many type barriers. The set of observed types is very small.

The attached shell test demonstrates the problem, it runs 100x faster with --no-ti (350 ms vs 3 ms). Instruments shows 87% of the time is under addTypeBarrier.
Comment 21 Timothy Nikkel (:tnikkel) 2012-10-03 11:40:21 PDT
I got this regression range, not sure if it's helpful
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=c918ff2f0994&tochange=ae22909cef5a
Comment 22 Boris Zbarsky [:bz] 2012-10-03 11:51:01 PDT
Maybe from bug 773731?
Comment 23 Timothy Nikkel (:tnikkel) 2012-10-03 15:49:57 PDT
(In reply to Boris Zbarsky (:bz) from comment #22)
> Maybe from bug 773731?

I checked and you are correct.
Comment 24 Boris Zbarsky [:bz] 2012-10-04 06:38:58 PDT
Requesting tracking; this is a pretty significant user-facing perf regression.
Comment 25 Boris Zbarsky [:bz] 2012-10-04 09:08:42 PDT
bhackett, via satphone + glue + tape, says:

The problem is probably that ScriptAnalysis::addSingletonTypeBarrier needs a test similar to that in addTypeBarrier around the BARRIER_OBJECT_LIMIT test, which is designed to avoid this quadratic blowup.  addSingletonTypeBarrier is used when fetching properties from specific objects, so will be used in GETGNAME and occasionally GETPROP.
Comment 26 Boris Zbarsky [:bz] 2012-10-04 09:13:53 PDT
I tried hacking that in, but it doesn't seem to help, because target->getObjectCount() never actually gets large here...
Comment 27 Alex Keybl [:akeybl] 2012-10-05 16:08:34 PDT
How important is keeping bug 773731 in the build given this user regression?
Comment 28 Boris Zbarsky [:bz] 2012-10-06 17:37:54 PDT
Created attachment 668856 [details] [diff] [review]
Patch from bhackett

From far-off lands, via unreliable communications methods, cometh this patch, which makes sense even to me.  ;)
Comment 29 Boris Zbarsky [:bz] 2012-10-06 17:41:31 PDT
On the attached shell testcase, I see times like so:

Pre-patch: ~350ms
Pre-patch no-ti: 3ms

Post-patch: 22ms
Post-patch no-ti: 3ms

So it's not perfect, but way better...
Comment 30 Jan de Mooij [:jandem] 2012-10-07 15:53:11 PDT
Comment on attachment 668856 [details] [diff] [review]
Patch from bhackett

Review of attachment 668856 [details] [diff] [review]:
-----------------------------------------------------------------

Nice, thanks bz! Since AWFY is not updating atm it would be good to check Kraken and V8 in the shell, just to be sure. I can also do that and land the patch first thing tomorrow.
Comment 31 Boris Zbarsky [:bz] 2012-10-07 18:45:13 PDT
Looks like no statistically significant change on Kraken in shell (in fact, the patched run is faster).
Comment 32 Boris Zbarsky [:bz] 2012-10-07 18:53:45 PDT
v8 is really noisy.  :(  I _think_ it's about the same, but hard to tell.
Comment 33 Boris Zbarsky [:bz] 2012-10-07 18:54:29 PDT
Jan, we should take this for 17 too, right?  Would you mind doing the approval request?  I'm not really qualified to do a risk analysis here....
Comment 34 Boris Zbarsky [:bz] 2012-10-07 20:06:46 PDT
https://hg.mozilla.org/integration/mozilla-inbound/rev/9157b71a7df1

Oh, and just to make it clear: the pause on the spreadsheet link from comment 10 goes away with this patch.
Comment 35 Phil Ringnalda (:philor) 2012-10-07 21:06:39 PDT
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/c9bfb6646f04 for pretty much every single debug test suite hitting "Assertion failure: !isAnyObject() && !isUnknown(), at ../../../js/src/jsinfer.h:162"
Comment 36 Boris Zbarsky [:bz] 2012-10-07 22:00:29 PDT
The relevant stack bit:


 0  libxul.so!js::analyze::ScriptAnalysis::addTypeBarrier [jsinfer.h : 162 + 0x18]
    eip = 0x027300c8   esp = 0xbfd2ba50   ebp = 0xbfd2bab8   ebx = 0x030dd640
    esi = 0x00000020   edi = 0x00000007   eax = 0x00000000   ecx = 0x00c5a32c
    edx = 0x00000000   efl = 0x00210246
    Found by: given as instruction pointer in context
 1  libxul.so!TypeConstraintProp<(PropertyAccessKind)1u>::newType [jsinfer.cpp : 990 + 0x1b]
    eip = 0x0273e449   esp = 0xbfd2bac0   ebp = 0xbfd2bb18   ebx = 0x030dd640
    esi = 0x0a38e1ae   edi = 0x096f55f0
    Found by: call frame info
 2  libxul.so!js::types::TypeCompartment::resolvePending [jsinferinlines.h : 1071 + 0x1e]
    eip = 0x026a8adc   esp = 0xbfd2bb20   ebp = 0xbfd2bb78   ebx = 0x030dd640
    esi = 0x0af8a6cc   edi = 0x0ab65228
    Found by: call frame info
 3  libxul.so!js::types::TypeSet::addType [jsinferinlines.h : 1390 + 0x13]
    eip = 0x026a91a1   esp = 0xbfd2bb80   ebp = 0xbfd2bc08   ebx = 0x030dd640
    esi = 0x096f55f0   edi = 0x00000000
    Found by: call frame info
 4  libxul.so!js::types::TypeMonitorResult [jsinfer.cpp : 5396 + 0x12]
    eip = 0x02728c20   esp = 0xbfd2bc10   ebp = 0xbfd2bc78   ebx = 0x030dd640
    esi = 0x0ab60094   edi = 0x096f55f0
    Found by: call frame info

And another take on frame 0 from a different log:

 0  XUL!js::analyze::ScriptAnalysis::addTypeBarrier [jsinferinlines.h : 1414 + 0x0]

Line 1414 of jsinferinlines is the JS_ASSERT(!unknownObject()) at the top of TypeSet::getObjectCount().
Comment 37 Boris Zbarsky [:bz] 2012-10-07 22:02:45 PDT
A slightly better stack from a debugger:

#0  js::types::Type::isObject (this=0x7fff5fbf8a98) at jsinfer.h:162
#1  0x0000000101132bcc in js::analyze::ScriptAnalysis::addTypeBarrier (this=0x148a172a0, cx=0x11965c0e0, pc=0x11803adc8 "5", target=0x147ca70a0, type={data = 32}) at jsinfer.cpp:2706
#2  0x000000010114119d in MarkPropertyAccessUnknown (cx=0x11965c0e0, script={<js::HandleBase<JSScript *>> = {<No data fields>}, ptr = 0x7fff5fbf8c30}, pc=0x11803adc8 "5", target=0x147ca70a0) at jsinfer.cpp:990
#3  0x0000000101158bee in TypeConstraintProp<(PropertyAccessKind)1>::newType (this=0x149656f20, cx=0x11965c0e0, source=0x149651838, type={data = 7}) at jsinfer.cpp:1161
#4  0x000000010108c90d in js::types::TypeCompartment::resolvePending (this=0x1183074a8, cx=0x11965c0e0) at jsinferinlines.h:1071
#5  0x000000010108c0b0 in js::types::TypeSet::addType (this=0x1483abd00, cx=0x11965c0e0, type={data = 5490666561}) at jsinferinlines.h:1390
#6  0x000000010113c5f7 in js::types::TypeMonitorResult (cx=0x11965c0e0, script={<js::HandleBase<JSScript *>> = {<No data fields>}, ptr = 0x7fff5fbfb510}, pc=0x148349b5f "�", rval=@0x7fff5fbfaa20) at jsinfer.cpp:5396
#7  0x000000010118333b in js::types::TypeScript::Monitor (cx=0x11965c0e0, script={<js::HandleBase<JSScript *>> = {<No data fields>}, ptr = 0x7fff5fbfb510}, pc=0x148349b5f "�", rval=@0x7fff5fbfaa20) at jsinferinlines.h:853

This is the

2706                if (barrier->type.isAnyObject() && type.isObject())

line that the patch added.  In this case, type.isUnknown() == true.
Comment 38 Boris Zbarsky [:bz] 2012-10-07 22:11:25 PDT
Oh, and the point is that isObject() starts with:

        JS_ASSERT(!isAnyObject() && !isUnknown());
Comment 39 Boris Zbarsky [:bz] 2012-10-07 22:35:31 PDT
Created attachment 669020 [details] [diff] [review]
Patch on top of the other one to fix the assertion failures, I hope
Comment 40 Jan de Mooij [:jandem] 2012-10-08 01:19:14 PDT
Comment on attachment 669020 [details] [diff] [review]
Patch on top of the other one to fix the assertion failures, I hope

Review of attachment 669020 [details] [diff] [review]:
-----------------------------------------------------------------

Nice. I fixed some small style nits (if-statement with multiline conditional always gets { } with opening brace on new line), merged this with the previous patch and pushed to inbound:

https://hg.mozilla.org/integration/mozilla-inbound/rev/f375fcca196a
Comment 41 Jan de Mooij [:jandem] 2012-10-08 02:58:13 PDT
(In reply to Boris Zbarsky (:bz) from comment #33)
> Jan, we should take this for 17 too, right?  Would you mind doing the
> approval request?  I'm not really qualified to do a risk analysis here....

Sure, I will request approval a few days after it lands on m-c (also avoids any confusion with the merge today).
Comment 42 Ed Morley [:emorley] 2012-10-08 07:38:00 PDT
https://hg.mozilla.org/mozilla-central/rev/f375fcca196a
Comment 43 Jan de Mooij [:jandem] 2012-10-10 01:33:21 PDT
Created attachment 669883 [details] [diff] [review]
Patch pushed to m-c

This is the combined patch I pushed to m-c.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Type Inference (bisecting pointed to bug 773731 but I'm pretty sure it could be triggered before that)
User impact if declined: Browser hangs on certain websites like Google Docs
Testing completed (on m-c, etc.): On m-c for a few days
Risk to taking this patch (and alternatives if risky): Low
String or UUID changes made by this patch: None
Comment 44 Jan de Mooij [:jandem] 2012-10-11 03:22:12 PDT
https://hg.mozilla.org/releases/mozilla-beta/rev/b4c78c09e9b2
Comment 45 Mihai Morar, (:MihaiMorar) 2013-01-03 06:12:05 PST
Verified fixed for FF 17 and FF 18b7 using Ubuntu x86:

Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/17.0 Firefox/17.0(20121119183901)
Mozilla/5.0 (X11; Linux i686; rv:18.0) Gecko/20100101 Firefox/18.0(20121231071231)

Note You need to log in before you can comment on or make changes to this bug.