Closed
Bug 596460
Opened 14 years ago
Closed 14 years ago
Tinderbox/nightly builds for TM repo have a 9k stack frame for Invoke
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
Tracking | Status | |
---|---|---|
blocking2.0 | --- | beta7+ |
People
(Reporter: dmandelin, Assigned: dvander)
References
Details
(Whiteboard: [ETA: 9/24][fixed-in-tracemonkey])
Attachments
(1 file, 1 obsolete file)
842 bytes,
patch
|
sstangl
:
review+
|
Details | Diff | Splinter Review |
We're getting a lot of bug reports for sites (e.g., gmail) that don't work because of an over-recursion error. This is happening in the check in js_Interpret, which is checking the cx->stackLimit. I found that in the tinderbox/nightly builds, js::Invoke has a 9k stack frame. 9206 bytes. I believe our standard limit is 128k, so we can blow that out without too much trouble with 9k stack frames in the mix. This does not happen in my local builds. For example, in an opt build with jemalloc made with VS8.0, the stack frame is 40 bytes. I am guessing that the PGO build somehow is inlining a ton of stuff. It calls js::Interpret through RunScript. js::Interpret has a 1396-byte stack frame in opt build, so it's clear we've got some big stuff in there. What can we do? I don't know much about PGO builds--are there flags that can prevent certain functions from being inlined, or certain limits from being exceeded?
Comment 1•14 years ago
|
||
I /think/ the auto_inline pragma could work here.
Comment 2•14 years ago
|
||
9k is a lot per whack, but deeply nesting js::Interpret is (or should be) rare. Whatever we do to reduce the PGO'ed stack size (not at the cost of too much perf but let's hope we run in JITted code enough that we don't care), could we automatically calibrate cx->stackLimit somehow? A one-size-fits-all setting may be the wrong thing in our modern PGO world. Something like the minimum of a large sanity limit (larger than the current Gecko setting of 512K) and N frames (where N is chosen based on a skinny-frame configuration) might be better. /be
Comment 3•14 years ago
|
||
> 9k is a lot per whack, but deeply nesting js::Interpret is (or should be) rare. The issue is deep nesting of js::Invoke, no? At least that's what the summary says, though comment 0 is all about js::Interpret....
Comment 4•14 years ago
|
||
(In reply to comment #3) > > 9k is a lot per whack, but deeply nesting js::Interpret is (or should be) rare. > > The issue is deep nesting of js::Invoke, no? At least that's what the summary > says, though comment 0 is all about js::Interpret.... You generally get deep js::Interpret nesting via js::Invoke. Nesting deeply via eval or script injection is rare in my experience. /be
Comment 5•14 years ago
|
||
js::Invoke isn't just calaed by eval or script injection. it's also called by js_fun_apply and js_fun_call. Deep nesting (certainly dozens of levels deep) of apply/call chains is not just not "rare", but is _very_ common in the various JS toolkits and in other similar JS code. It would surprise me not at all if it happens all the time on gmail.
Comment 6•14 years ago
|
||
I believe we've had very poor experiences trying to coerce Microsoft's PGO optimizer to do what we want in the past. You can certainly try sprinkling some #pragma optimizes around, but don't expect it to listen to you very closely.
Comment 7•14 years ago
|
||
Where are you getting this 9k figure from? The Breakpad symbols from yesterday's nightly (20100914, http://symbols.mozilla.org/firefox/xul.pdb/8C0B2241E59242E0AC345B047AE5D7D22/xul.sym) have the following entry: FUNC 19d230 64c c js::Invoke(JSContext *,js::CallArgs const &,unsigned int) which lists a function size of 0x64c == 1612 bytes. Glancing at the line data (the following lines), you can see patterns where the optimizer has "outlined" blocks of code, relegating them to a different page because they're cold (they're omitted from the line data here). You can also see where it's inlined things by looking at the codesize per line, for example, the largest single line blame I see is: 19d426 f2 614 255513 which is: http://hg.mozilla.org/mozilla-central/file/5588c9796f0b/js/src/jsinterp.cpp#l614 so yeah, RunScript is being inlined there, but it doesn't look like Interpret is. (I haven't actually looked at the disassembly.)
Comment 8•14 years ago
|
||
This was from the mozilla-central nightly, FWIW. The TM nightly doesn't look significantly different.
The stack size is a dynamic issue more than a static one: how much those functions allocate stack space as they run, rather than how much code is in the function definition.
Reporter | ||
Comment 10•14 years ago
|
||
(In reply to comment #7) > Where are you getting this 9k figure from? The disassembly for Invoke in the debugger. I set a breakpoint at the top of the function, then go to the disassembly view: JS_REQUIRES_STACK bool Invoke(JSContext *cx, const CallArgs &argsRef, uint32 flags) { 64F09F50 push ebp 64F09F51 mov ebp,esp 64F09F53 and esp,0FFFFFFF8h 64F09F56 mov eax,23F4h 64F09F5B call _chkstk (650CDDD0h) As you may already know (which I found out by single-stepping through the asm), the _chkstk function decrements the stack by eax (with some bounds checking). 0x23f4 == 9204. When I looked at the disassembly, I also saw that RunScript is clearly inlined, but not necessarily Interpret. So, it's not entirely clear to me why the stack frame is so big or exactly what is inlined.
Comment 11•14 years ago
|
||
The _chkstk call itself is a performance issue, we should really try to avoid frames that are larger than a page. Are there automatic objects which are particularly large? I see the following locals in Invoke and RunScript: CallArgs InvokeFrameGuard JSInterpreterHook AutoInterpPreparer mjit::CompileStatus
Comment 12•14 years ago
|
||
(In reply to comment #5) > js::Invoke isn't just calaed by eval or script injection. I was talking about js::Interpret nesting ("You generally get deep js::Interpret nesting via js::Invoke. Nesting [js::Interpret was the referent of "Nesting" here] deeply via eval or script injection is rare in my experience.") > it's also called by > js_fun_apply and js_fun_call. Deep nesting (certainly dozens of levels deep) > of apply/call chains is not just not "rare", but is _very_ common in the > various JS toolkits and in other similar JS code. It would surprise me not at > all if it happens all the time on gmail. I see what the confusion you reported on IRC was about now, but I still think my words were unambiguously about js::Interpret nesting coming from js::Invoke. So we agreed the whole time :-|. /be
Comment 13•14 years ago
|
||
Re: comment 11: (gdb) p sizeof(CallArgs) $1 = 16 (gdb) p sizeof(InvokeFrameGuard) $2 = 40 (gdb) p sizeof(JSInterpreterHook) $3 = 8 (gdb) p sizeof(AutoInterpPreparer) $4 = 16 (gdb) p sizeof(mjit::CompileStatus) $5 = 4 64-bit Mac. /be
Reporter | ||
Comment 14•14 years ago
|
||
Can we try adding |__declspec(noinline)| [1] to Interpret and/or RunScript to see what happens? A random comment on the web [2] says PGO respects that declaration. [1] http://msdn.microsoft.com/en-us/library/kxybs02x%28VS.80%29.aspx [2] http://visual-c.itags.org/visual-c-c++/314722/ (search for noinline)
Comment 15•14 years ago
|
||
Yes. You can use NS_NEVER_INLINE, and I'd recommend trying it on Interpret first.
Comment 16•14 years ago
|
||
I mean JS_NEVER_INLINE, of course.
Reporter | ||
Comment 17•14 years ago
|
||
Attachment #476012 -
Flags: review?(dvander)
Assignee | ||
Updated•14 years ago
|
Attachment #476012 -
Flags: review?(dvander) → review+
Reporter | ||
Comment 18•14 years ago
|
||
Patch landed to TM as: http://hg.mozilla.org/tracemonkey/rev/52a659928d22 Now we need to wait to get the tinderbox build and see if the stack frame is smaller and/or the dependent bugs are fixed.
Comment 19•14 years ago
|
||
dvander, can you see if this worked?
Just updated my JS preview build, and it doesn't look like it worked from the disassembly under windbg: 0:000:x86> x xul!js::Invoke 648cd4b0 xul!js::Invoke (struct JSContext *, struct js::CallArgs *, unsigned long) 0:000:x86> u 648cd4b0 xul!js::Invoke [e:\builds\moz2_slave\tracemonkey-win32-nightly\build\js\src\jsinterp.cpp @ 495]: 648cd4b0 55 push ebp 648cd4b1 8bec mov ebp,esp 648cd4b3 83e4f8 and esp,0FFFFFFF8h 648cd4b6 b814240000 mov eax,2414h 648cd4bb e810693200 call xul!_chkstk (64bf3dd0)
That was from BuildID=20100918043804 SourceRepository=http://hg.mozilla.org/tracemonkey SourceStamp=0a3530e79cb7 so it should have the inline override.
It did out-of-line the call to Interpret, though, and Invoke is only 695 instructions. xul!js::Invoke (648cd4b0) [e:\builds\moz2_slave\tracemonkey-win32-nightly\build\js\src\jsinterp.cpp @ 495], 695 instructions ... xul!js::Invoke+0x5c5 (648cda75) [e:\builds\moz2_slave\tracemonkey-win32-nightly\build\js\src\jsinterp.cpp @ 592]: call to xul!js::Interpret (648ce990) [e:\builds\moz2_slave\tracemonkey-win32-nightly\build\js\src\jsinterp.cpp @ 1920]
TBQH, I don't see anything in the assembly that would get anywhere near 9K, so maybe I'm misunderstanding how chkstk is called. There's an xor there for the security cookie checking, etc. The biggest offset from esp that I ever see mentioned in the disassembly is 4Ch, or 76. Maybe it's getting confused by the trampoline call, but the cleanup from that looks fine.
And in this build js::Interpret *doesn't* have a chkstk call and only allocates 534h = 1332 bytes of stack. I think I need to look more closely at the security-cookie stuff.
Comment 25•14 years ago
|
||
FYI: http://msdn.microsoft.com/en-us/library/8dbf701c.aspx http://msdn.microsoft.com/library/aa290051.aspx
cookie stuff was a red herring. a closer reading of the Invoke assembly indicates that we're seeing all sorts of stuff partially inlined (though on separate pages in places): there are bits of XPConnect and IPC and suchlike scattered around. sayrer and I are trying to isolate when that started now. our first suspicion was the merger of mozjs into xul.dll, but before and after that js::Invoke used a paltry 20 bytes of stack: xul!js::Invoke [e:\builds\moz2_slave\tracemonkey-win32-nightly\build\js\src\jsinterp.cpp @ 636]: 636 60128de0 55 push ebp 636 60128de1 8bec mov ebp,esp 636 60128de3 83e4f8 and esp,0FFFFFFF8h 639 60128de6 8b550c mov edx,dword ptr [ebp+0Ch] 639 60128de9 83ec14 sub esp,14h The hunt continues!
cdb and I have determined that the regression range is http://hg.mozilla.org/tracemonkey/pushloghtml?fromchange=e2e1ea2a39ce&tochange=86d684e6ea09 maybe we're looking at the slow-native removal?
Updated•14 years ago
|
blocking2.0: --- → beta7+
Updated•14 years ago
|
Assignee: general → sayrer
Updated•14 years ago
|
Whiteboard: [ETA: 9/24]
It seems from some reading about PGO that the results of pgomgr /clear the.pgd pgomgr /summary /detail the.pgc the.pgd could be illuminating here. (Ref: http://msdn.microsoft.com/en-us/library/2kw46d8w%28v=VS.80%29.aspx)
Assignee | ||
Comment 29•14 years ago
|
||
The problem is that js::mjit::Compiler is 9KB, and PGO was inlining the call to TryCompile(). I'm going to attach a fix shortly that makes Compiler a little more reasonably sized, and also prevents TryCompile() from being inlined. Anyway, it looks good now: xul!js::Invoke [e:\builds\bug598663\tracemonkey\js\src\jsinterp.cpp @ 495]: 101a0a70 55 push ebp 101a0a71 8bec mov ebp,esp 101a0a73 83e4f8 and esp,0FFFFFFF8h 101a0a76 8b450c mov eax,dword ptr [ebp+0Ch] 101a0a79 8b10 mov edx,dword ptr [eax] 101a0a7b 83ec44 sub esp,44h
Assignee | ||
Comment 30•14 years ago
|
||
It's proving difficult to get the compiler below ~7K so I'm not going to bother. The assembler itself is about 1KB, and there's two of them. Aside from that, there is an InternalCallSite for every opcode (gotta ask adrake about this), and the IC info structs are large. The compiler does not re-enter unless something insane happens (like a resolver on a builtin prototype calling into script). In the event that it does become a problem, we can put the compiler on the heap.
Assignee: sayrer → dvander
Attachment #476012 -
Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #478127 -
Flags: review?(sstangl)
Updated•14 years ago
|
Attachment #478127 -
Flags: review?(sstangl) → review+
Assignee | ||
Comment 31•14 years ago
|
||
http://hg.mozilla.org/tracemonkey/rev/49c198c2da9a
Whiteboard: [ETA: 9/24] → [ETA: 9/24][fixed-in-tracemonkey]
Comment 32•14 years ago
|
||
http://hg.mozilla.org/mozilla-central/rev/49c198c2da9a
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•