Closed Bug 596460 Opened 14 years ago Closed 14 years ago

Tinderbox/nightly builds for TM repo have a 9k stack frame for Invoke

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- beta7+

People

(Reporter: dmandelin, Assigned: dvander)

References

Details

(Whiteboard: [ETA: 9/24][fixed-in-tracemonkey])

Attachments

(1 file, 1 obsolete file)

We're getting a lot of bug reports for sites (e.g., gmail) that don't work because of an over-recursion error. This is happening in the check in js_Interpret, which is checking the cx->stackLimit. 

I found that in the tinderbox/nightly builds, js::Invoke has a 9k stack frame. 9206 bytes. I believe our standard limit is 128k, so we can blow that out without too much trouble with 9k stack frames in the mix.

This does not happen in my local builds. For example, in an opt build with jemalloc made with VS8.0, the stack frame is 40 bytes.

I am guessing that the PGO build somehow is inlining a ton of stuff. It calls js::Interpret through RunScript. js::Interpret has a 1396-byte stack frame in opt build, so it's clear we've got some big stuff in there.

What can we do? I don't know much about PGO builds--are there flags that can prevent certain functions from being inlined, or certain limits from being exceeded?
I /think/ the auto_inline pragma could work here.
9k is a lot per whack, but deeply nesting js::Interpret is (or should be) rare.

Whatever we do to reduce the PGO'ed stack size (not at the cost of too much perf but let's hope we run in JITted code enough that we don't care), could we automatically calibrate cx->stackLimit somehow? A one-size-fits-all setting may be the wrong thing in our modern PGO world. Something like the minimum of a large sanity limit (larger than the current Gecko setting of 512K) and N frames (where N is chosen based on a skinny-frame configuration) might be better.

/be
> 9k is a lot per whack, but deeply nesting js::Interpret is (or should be) rare.

The issue is deep nesting of js::Invoke, no?  At least that's what the summary says, though comment 0 is all about js::Interpret....
(In reply to comment #3)
> > 9k is a lot per whack, but deeply nesting js::Interpret is (or should be) rare.
> 
> The issue is deep nesting of js::Invoke, no?  At least that's what the summary
> says, though comment 0 is all about js::Interpret....

You generally get deep js::Interpret nesting via js::Invoke. Nesting deeply via eval or script injection is rare in my experience.

/be
js::Invoke isn't just calaed by eval or script injection.  it's also called by js_fun_apply and js_fun_call.  Deep nesting (certainly dozens of levels deep) of apply/call chains is not just not "rare", but is _very_ common in the various JS toolkits and in other similar JS code.  It would surprise me not at all if it happens all the time on gmail.
I believe we've had very poor experiences trying to coerce Microsoft's PGO optimizer to do what we want in the past. You can certainly try sprinkling some #pragma optimizes around, but don't expect it to listen to you very closely.
Where are you getting this 9k figure from? The Breakpad symbols from yesterday's nightly (20100914, http://symbols.mozilla.org/firefox/xul.pdb/8C0B2241E59242E0AC345B047AE5D7D22/xul.sym) have the following entry:
FUNC 19d230 64c c js::Invoke(JSContext *,js::CallArgs const &,unsigned int)

which lists a function size of 0x64c == 1612 bytes. Glancing at the line data (the following lines), you can see patterns where the optimizer has "outlined" blocks of code, relegating them to a different page because they're cold (they're omitted from the line data here). You can also see where it's inlined things by looking at the codesize per line, for example, the largest single line blame I see is:
19d426 f2 614 255513
which is:
http://hg.mozilla.org/mozilla-central/file/5588c9796f0b/js/src/jsinterp.cpp#l614

so yeah, RunScript is being inlined there, but it doesn't look like Interpret is. (I haven't actually looked at the disassembly.)
This was from the mozilla-central nightly, FWIW. The TM nightly doesn't look significantly different.
The stack size is a dynamic issue more than a static one: how much those functions allocate stack space as they run, rather than how much code is in the function definition.
(In reply to comment #7)
> Where are you getting this 9k figure from? 

The disassembly for Invoke in the debugger. I set a breakpoint at the top of the function, then go to the disassembly view:

JS_REQUIRES_STACK bool
Invoke(JSContext *cx, const CallArgs &argsRef, uint32 flags)
{
64F09F50  push        ebp  
64F09F51  mov         ebp,esp 
64F09F53  and         esp,0FFFFFFF8h 
64F09F56  mov         eax,23F4h 
64F09F5B  call        _chkstk (650CDDD0h) 

As you may already know (which I found out by single-stepping through the asm), the _chkstk function decrements the stack by eax (with some bounds checking). 0x23f4 == 9204.

When I looked at the disassembly, I also saw that RunScript is clearly inlined, but not necessarily Interpret. So, it's not entirely clear to me why the stack frame is so big or exactly what is inlined.
The _chkstk call itself is a performance issue, we should really try to avoid frames that are larger than a page.

Are there automatic objects which are particularly large? I see the following locals in Invoke and RunScript:

CallArgs
InvokeFrameGuard
JSInterpreterHook
AutoInterpPreparer
mjit::CompileStatus
(In reply to comment #5)
> js::Invoke isn't just calaed by eval or script injection.

I was talking about js::Interpret nesting ("You generally get deep js::Interpret nesting via js::Invoke. Nesting [js::Interpret was the referent of "Nesting" here] deeply via eval or script injection is rare in my experience.")
> it's also called by
> js_fun_apply and js_fun_call.  Deep nesting (certainly dozens of levels deep)
> of apply/call chains is not just not "rare", but is _very_ common in the
> various JS toolkits and in other similar JS code.  It would surprise me not at
> all if it happens all the time on gmail.

I see what the confusion you reported on IRC was about now, but I still think my words were unambiguously about js::Interpret nesting coming from js::Invoke. So we agreed the whole time :-|.

/be
Re: comment 11:

(gdb) p sizeof(CallArgs)
$1 = 16
(gdb) p sizeof(InvokeFrameGuard)
$2 = 40
(gdb) p sizeof(JSInterpreterHook)
$3 = 8
(gdb) p sizeof(AutoInterpPreparer)
$4 = 16
(gdb) p sizeof(mjit::CompileStatus)
$5 = 4

64-bit Mac.

/be
Can we try adding |__declspec(noinline)| [1] to Interpret and/or RunScript to see what happens? A random comment on the web [2] says PGO respects that declaration.

[1] http://msdn.microsoft.com/en-us/library/kxybs02x%28VS.80%29.aspx

[2] http://visual-c.itags.org/visual-c-c++/314722/ (search for noinline)
Yes. You can use NS_NEVER_INLINE, and I'd recommend trying it on Interpret first.
I mean JS_NEVER_INLINE, of course.
Attachment #476012 - Flags: review?(dvander)
Attachment #476012 - Flags: review?(dvander) → review+
Patch landed to TM as:

    http://hg.mozilla.org/tracemonkey/rev/52a659928d22

Now we need to wait to get the tinderbox build and see if the stack frame is smaller and/or the dependent bugs are fixed.
dvander, can you see if this worked?
Just updated my JS preview build, and it doesn't look like it worked from the disassembly under windbg:

0:000:x86> x xul!js::Invoke
648cd4b0 xul!js::Invoke (struct JSContext *, struct js::CallArgs *, unsigned long)
0:000:x86> u 648cd4b0
xul!js::Invoke [e:\builds\moz2_slave\tracemonkey-win32-nightly\build\js\src\jsinterp.cpp @ 495]:
648cd4b0 55              push    ebp
648cd4b1 8bec            mov     ebp,esp
648cd4b3 83e4f8          and     esp,0FFFFFFF8h
648cd4b6 b814240000      mov     eax,2414h
648cd4bb e810693200      call    xul!_chkstk (64bf3dd0)
That was from 

BuildID=20100918043804
SourceRepository=http://hg.mozilla.org/tracemonkey
SourceStamp=0a3530e79cb7

so it should have the inline override.
It did out-of-line the call to Interpret, though, and Invoke is only 695 instructions.

xul!js::Invoke (648cd4b0) [e:\builds\moz2_slave\tracemonkey-win32-nightly\build\js\src\jsinterp.cpp @ 495], 695 instructions
...
  xul!js::Invoke+0x5c5 (648cda75) [e:\builds\moz2_slave\tracemonkey-win32-nightly\build\js\src\jsinterp.cpp @ 592]:
    call to xul!js::Interpret (648ce990) [e:\builds\moz2_slave\tracemonkey-win32-nightly\build\js\src\jsinterp.cpp @ 1920]
TBQH, I don't see anything in the assembly that would get anywhere near 9K, so maybe I'm misunderstanding how chkstk is called.  There's an xor there for the security cookie checking, etc.

The biggest offset from esp that I ever see mentioned in the disassembly is 4Ch, or 76.  Maybe it's getting confused by the trampoline call, but the cleanup from that looks fine.
And in this build js::Interpret *doesn't* have a chkstk call and only allocates 534h = 1332 bytes of stack.  I think I need to look more closely at the security-cookie stuff.
cookie stuff was a red herring.  a closer reading of the Invoke assembly indicates that we're seeing all sorts of stuff partially inlined (though on separate pages in places): there are bits of XPConnect and IPC and suchlike scattered around.

sayrer and I are trying to isolate when that started now. our first suspicion was the merger of mozjs into xul.dll, but  before and after that js::Invoke used a paltry 20 bytes of stack:

xul!js::Invoke [e:\builds\moz2_slave\tracemonkey-win32-nightly\build\js\src\jsinterp.cpp @ 636]:
  636 60128de0 55              push    ebp
  636 60128de1 8bec            mov     ebp,esp
  636 60128de3 83e4f8          and     esp,0FFFFFFF8h
  639 60128de6 8b550c          mov     edx,dword ptr [ebp+0Ch]
  639 60128de9 83ec14          sub     esp,14h

The hunt continues!
cdb and I have determined that the regression range is

http://hg.mozilla.org/tracemonkey/pushloghtml?fromchange=e2e1ea2a39ce&tochange=86d684e6ea09

maybe we're looking at the slow-native removal?
Blocks: 596653
blocking2.0: --- → beta7+
Assignee: general → sayrer
Whiteboard: [ETA: 9/24]
It seems from some reading about PGO that the results of

pgomgr /clear the.pgd
pgomgr /summary /detail the.pgc the.pgd

could be illuminating here.

(Ref: http://msdn.microsoft.com/en-us/library/2kw46d8w%28v=VS.80%29.aspx)
The problem is that js::mjit::Compiler is 9KB, and PGO was inlining the call to TryCompile(). I'm going to attach a fix shortly that makes Compiler a little more reasonably sized, and also prevents TryCompile() from being inlined.

Anyway, it looks good now:

xul!js::Invoke [e:\builds\bug598663\tracemonkey\js\src\jsinterp.cpp @ 495]:
101a0a70 55              push    ebp
101a0a71 8bec            mov     ebp,esp
101a0a73 83e4f8          and     esp,0FFFFFFF8h
101a0a76 8b450c          mov     eax,dword ptr [ebp+0Ch]
101a0a79 8b10            mov     edx,dword ptr [eax]
101a0a7b 83ec44          sub     esp,44h
Attached patch fixSplinter Review
It's proving difficult to get the compiler below ~7K so I'm not going to bother.

The assembler itself is about 1KB, and there's two of them. Aside from that, there is an InternalCallSite for every opcode (gotta ask adrake about this), and the IC info structs are large.

The compiler does not re-enter unless something insane happens (like a resolver on a builtin prototype calling into script). In the event that it does become a problem, we can put the compiler on the heap.
Assignee: sayrer → dvander
Attachment #476012 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #478127 - Flags: review?(sstangl)
Attachment #478127 - Flags: review?(sstangl) → review+
http://hg.mozilla.org/tracemonkey/rev/49c198c2da9a
Whiteboard: [ETA: 9/24] → [ETA: 9/24][fixed-in-tracemonkey]
http://hg.mozilla.org/mozilla-central/rev/49c198c2da9a
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
No longer blocks: 597886
Depends on: 756864
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: