Closed Bug 793491 Opened 12 years ago Closed 12 years ago

4 second hang in ScriptAnalysis::addTypeBarrier opening spreadsheet from Google Drive

Categories

(Core :: JavaScript Engine, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla18
Tracking Status
firefox17 + verified
firefox18 + verified

People

(Reporter: jdm, Assigned: bhackett1024)

References

()

Details

(Keywords: regression, Whiteboard: [Snappy], sps)

Attachments

(4 files)

http://people.mozilla.com/~bgirard/cleopatra/?report=311978ba7117c24373d0079841fbc00a48863326

This profile seems to show that it's all just poor page design, which sucks. Can someone confirm this for me?
Can you reproduce by turning on stackwalking and making sure you're running a nightly build? That should give us a more interesting profile. You can queue a profile save by using the hotkey during a jank.
I'm running a nightly (also on Linux, which might complicate things). What must I do to turn on stackwalking? Also, the hotkey doesn't work for me - it opens the places library.
The profiler linux stackwalking is in progress. Either we use another linux profiler or we try this on another platform.

Do you have STR?
I reproduce it opening the first google spreadsheet I tried. Is this TI falling down?

http://people.mozilla.com/~bgirard/cleopatra/?report=3640d4eab6abc225f9df62b543ab66074cf5a448
Assignee: nobody → general
Component: General → JavaScript Engine
Summary: Huge jank opening spreadsheet from Google Drive → 4 second hang in ScriptAnalysis::addTypeBarrier opening spreadsheet from Google Drive
It certainly looks like it.  Normally we don't see so much addTypeBarrier in profiles.

Do you know if this is a recent regression or has TI always died on this test?
Can someone link to an actual URI that shows the problem?
(In reply to Luke Wagner [:luke] from comment #6)
> Do you know if this is a recent regression or has TI always died on this
> test?

I've had the hang for a long time now, I don't know if the cause has changed over time though.
(In reply to Boris Zbarsky (:bz) from comment #7)
> Can someone link to an actual URI that shows the problem?

Just create a new spreadsheet on Google Docs and it should show the problem.
(In reply to Luke Wagner [:luke] from comment #6)
> It certainly looks like it.  Normally we don't see so much addTypeBarrier in
> profiles.
> 
> Do you know if this is a recent regression or has TI always died on this
> test?

That's hard to know since it could also have been google that changed their scripts recently.

Can someone do a regression window to test if TI ever handled this correctly. If not the regression window will be TI landing so I'm flagging this as a regression.
Thanks.  So what that's showing is js_Interpret directly calling types::TypeMonitorResult, which calls types::TypeSet::addType, which calls addTypeBarrier.

Unfortunately, Instruments seems determined to not show me source for some reason, so I'm not sure yet what spot in js_Interpret this is.  :(
Though it seems to be near a NameOperation call, fwiw.
So this looks like this code in addTypeBarrier:

2699        /* Ignore duplicate barriers. */
2700        TypeBarrier *barrier = code.typeBarriers;
2701        while (barrier) {
2702            if (barrier->target == target && barrier->type == type && !barrier->singleton)
2703                return;
2704            barrier = barrier->next;
2705        }

Any idea how to make sense of the "type" argument?
(In reply to Benoit Girard (:BenWa) from comment #5)
> I reproduce it opening the first google spreadsheet I tried. Is this TI
> falling down?
> 
> http://people.mozilla.com/~bgirard/cleopatra/
> ?report=3640d4eab6abc225f9df62b543ab66074cf5a448

This profile was collected on Aurora FYI. Here's a profile collected against nightly where we get the stackwalking feature:

http://people.mozilla.com/~bgirard/cleopatra/?report=9a4d82bc602e5110533bb90cad8ac3badd32a242

Pay close attention to the 'Self' count since most sample aren't spent at the leaf.
Looks like "type" tests true for isSingleObject.

We're definitely in the NameOperation case in the interpreter.

The op is GETGNAME or GETGPROP.

I tried stopping a few times in this code, and I see things like:

(gdb) pu script->ptr->getName(regs.pc)->d->u1.chars
$9 = 0x115be5550 "prototype"
So my first guess is that we have some location that's seeing a huge number of objects with singleton types?
Alright.  So I just added printfs to this code to get some data.  Started the browser with the above URI, then quit.

I see, in an opt build, 1,724,184 calls to addTypeBarrier.  Of these, 515 take the early return in the loop above, while the rest make it out of the loop.

Looking at these remaining 1,723,669 places where we exit the loop, I see 686,071 where we have < 100 loop iterations, 674,243 with 100 <= iterations < 200, and 363,355 with >= 200 iterations.

The peak iteration count is 276, and there is only one such call.  There is also only a smattering of calls with other iteration counts of > 253.  But there are 6244 calls with iteration counts of 253, and 6736 calls with iteration counts in the 229-252 range.  6737 calls at 228 iterations.  6738 calls in the 139-227 range, and so forth.

Looking at calls with 253 iterations, they are all, unsurprisingly, for different pc values.

So it looks like there are, on this page, 6000-some different pc values each of which ends up touching several hundred objects that have singleton types.  So each one ends up with a typeset of about 250 types, which requires O(N^2) time to build up because of the traversals in the above loop...
The "6000 different pc values" thing is a bit weird, I should say.  ;)
Attached file Shell testcase
bz and I looked into this. The problem is that there is a large number of GETGNAME/GETPROP instructions, each with many type barriers. The set of observed types is very small.

The attached shell test demonstrates the problem, it runs 100x faster with --no-ti (350 ms vs 3 ms). Instruments shows 87% of the time is under addTypeBarrier.
Maybe from bug 773731?
(In reply to Boris Zbarsky (:bz) from comment #22)
> Maybe from bug 773731?

I checked and you are correct.
Blocks: 773731
Requesting tracking; this is a pretty significant user-facing perf regression.
bhackett, via satphone + glue + tape, says:

The problem is probably that ScriptAnalysis::addSingletonTypeBarrier needs a test similar to that in addTypeBarrier around the BARRIER_OBJECT_LIMIT test, which is designed to avoid this quadratic blowup.  addSingletonTypeBarrier is used when fetching properties from specific objects, so will be used in GETGNAME and occasionally GETPROP.
I tried hacking that in, but it doesn't seem to help, because target->getObjectCount() never actually gets large here...
Whiteboard: [Snappy] → [Snappy], sps
How important is keeping bug 773731 in the build given this user regression?
From far-off lands, via unreliable communications methods, cometh this patch, which makes sense even to me.  ;)
Attachment #668856 - Flags: review?(jdemooij)
On the attached shell testcase, I see times like so:

Pre-patch: ~350ms
Pre-patch no-ti: 3ms

Post-patch: 22ms
Post-patch no-ti: 3ms

So it's not perfect, but way better...
Comment on attachment 668856 [details] [diff] [review]
Patch from bhackett

Review of attachment 668856 [details] [diff] [review]:
-----------------------------------------------------------------

Nice, thanks bz! Since AWFY is not updating atm it would be good to check Kraken and V8 in the shell, just to be sure. I can also do that and land the patch first thing tomorrow.
Attachment #668856 - Flags: review?(jdemooij) → review+
Looks like no statistically significant change on Kraken in shell (in fact, the patched run is faster).
v8 is really noisy.  :(  I _think_ it's about the same, but hard to tell.
Jan, we should take this for 17 too, right?  Would you mind doing the approval request?  I'm not really qualified to do a risk analysis here....
https://hg.mozilla.org/integration/mozilla-inbound/rev/9157b71a7df1

Oh, and just to make it clear: the pause on the spreadsheet link from comment 10 goes away with this patch.
Assignee: general → bhackett1024
Flags: in-testsuite?
Target Milestone: --- → mozilla18
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/c9bfb6646f04 for pretty much every single debug test suite hitting "Assertion failure: !isAnyObject() && !isUnknown(), at ../../../js/src/jsinfer.h:162"
Target Milestone: mozilla18 → ---
The relevant stack bit:


 0  libxul.so!js::analyze::ScriptAnalysis::addTypeBarrier [jsinfer.h : 162 + 0x18]
    eip = 0x027300c8   esp = 0xbfd2ba50   ebp = 0xbfd2bab8   ebx = 0x030dd640
    esi = 0x00000020   edi = 0x00000007   eax = 0x00000000   ecx = 0x00c5a32c
    edx = 0x00000000   efl = 0x00210246
    Found by: given as instruction pointer in context
 1  libxul.so!TypeConstraintProp<(PropertyAccessKind)1u>::newType [jsinfer.cpp : 990 + 0x1b]
    eip = 0x0273e449   esp = 0xbfd2bac0   ebp = 0xbfd2bb18   ebx = 0x030dd640
    esi = 0x0a38e1ae   edi = 0x096f55f0
    Found by: call frame info
 2  libxul.so!js::types::TypeCompartment::resolvePending [jsinferinlines.h : 1071 + 0x1e]
    eip = 0x026a8adc   esp = 0xbfd2bb20   ebp = 0xbfd2bb78   ebx = 0x030dd640
    esi = 0x0af8a6cc   edi = 0x0ab65228
    Found by: call frame info
 3  libxul.so!js::types::TypeSet::addType [jsinferinlines.h : 1390 + 0x13]
    eip = 0x026a91a1   esp = 0xbfd2bb80   ebp = 0xbfd2bc08   ebx = 0x030dd640
    esi = 0x096f55f0   edi = 0x00000000
    Found by: call frame info
 4  libxul.so!js::types::TypeMonitorResult [jsinfer.cpp : 5396 + 0x12]
    eip = 0x02728c20   esp = 0xbfd2bc10   ebp = 0xbfd2bc78   ebx = 0x030dd640
    esi = 0x0ab60094   edi = 0x096f55f0
    Found by: call frame info

And another take on frame 0 from a different log:

 0  XUL!js::analyze::ScriptAnalysis::addTypeBarrier [jsinferinlines.h : 1414 + 0x0]

Line 1414 of jsinferinlines is the JS_ASSERT(!unknownObject()) at the top of TypeSet::getObjectCount().
A slightly better stack from a debugger:

#0  js::types::Type::isObject (this=0x7fff5fbf8a98) at jsinfer.h:162
#1  0x0000000101132bcc in js::analyze::ScriptAnalysis::addTypeBarrier (this=0x148a172a0, cx=0x11965c0e0, pc=0x11803adc8 "5", target=0x147ca70a0, type={data = 32}) at jsinfer.cpp:2706
#2  0x000000010114119d in MarkPropertyAccessUnknown (cx=0x11965c0e0, script={<js::HandleBase<JSScript *>> = {<No data fields>}, ptr = 0x7fff5fbf8c30}, pc=0x11803adc8 "5", target=0x147ca70a0) at jsinfer.cpp:990
#3  0x0000000101158bee in TypeConstraintProp<(PropertyAccessKind)1>::newType (this=0x149656f20, cx=0x11965c0e0, source=0x149651838, type={data = 7}) at jsinfer.cpp:1161
#4  0x000000010108c90d in js::types::TypeCompartment::resolvePending (this=0x1183074a8, cx=0x11965c0e0) at jsinferinlines.h:1071
#5  0x000000010108c0b0 in js::types::TypeSet::addType (this=0x1483abd00, cx=0x11965c0e0, type={data = 5490666561}) at jsinferinlines.h:1390
#6  0x000000010113c5f7 in js::types::TypeMonitorResult (cx=0x11965c0e0, script={<js::HandleBase<JSScript *>> = {<No data fields>}, ptr = 0x7fff5fbfb510}, pc=0x148349b5f "�", rval=@0x7fff5fbfaa20) at jsinfer.cpp:5396
#7  0x000000010118333b in js::types::TypeScript::Monitor (cx=0x11965c0e0, script={<js::HandleBase<JSScript *>> = {<No data fields>}, ptr = 0x7fff5fbfb510}, pc=0x148349b5f "�", rval=@0x7fff5fbfaa20) at jsinferinlines.h:853

This is the

2706                if (barrier->type.isAnyObject() && type.isObject())

line that the patch added.  In this case, type.isUnknown() == true.
Oh, and the point is that isObject() starts with:

        JS_ASSERT(!isAnyObject() && !isUnknown());
Comment on attachment 669020 [details] [diff] [review]
Patch on top of the other one to fix the assertion failures, I hope

Review of attachment 669020 [details] [diff] [review]:
-----------------------------------------------------------------

Nice. I fixed some small style nits (if-statement with multiline conditional always gets { } with opening brace on new line), merged this with the previous patch and pushed to inbound:

https://hg.mozilla.org/integration/mozilla-inbound/rev/f375fcca196a
Attachment #669020 - Flags: review?(jdemooij) → review+
(In reply to Boris Zbarsky (:bz) from comment #33)
> Jan, we should take this for 17 too, right?  Would you mind doing the
> approval request?  I'm not really qualified to do a risk analysis here....

Sure, I will request approval a few days after it lands on m-c (also avoids any confusion with the merge today).
https://hg.mozilla.org/mozilla-central/rev/f375fcca196a
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla18
This is the combined patch I pushed to m-c.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Type Inference (bisecting pointed to bug 773731 but I'm pretty sure it could be triggered before that)
User impact if declined: Browser hangs on certain websites like Google Docs
Testing completed (on m-c, etc.): On m-c for a few days
Risk to taking this patch (and alternatives if risky): Low
String or UUID changes made by this patch: None
Attachment #669883 - Flags: review+
Attachment #669883 - Flags: approval-mozilla-beta?
Attachment #669883 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Verified fixed for FF 17 and FF 18b7 using Ubuntu x86:

Mozilla/5.0 (X11; Linux i686; rv:17.0) Gecko/17.0 Firefox/17.0(20121119183901)
Mozilla/5.0 (X11; Linux i686; rv:18.0) Gecko/20100101 Firefox/18.0(20121231071231)
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: