Closed
Bug 484693
Opened 15 years ago
Closed 15 years ago
TM: Crash [@ _chkstk]
Categories
(Core :: JavaScript Engine, defect, P1)
Tracking
()
VERIFIED
FIXED
mozilla1.9.2a1
People
(Reporter: gkw, Assigned: graydon)
References
Details
(Keywords: crash, testcase, verified1.9.1, Whiteboard: fixed-in-tracemonkey)
Crash Data
Attachments
(7 files, 4 obsolete files)
253 bytes,
text/html
|
Details | |
1.06 KB,
patch
|
graydon
:
review+
|
Details | Diff | Splinter Review |
997 bytes,
text/plain
|
Details | |
14.44 KB,
patch
|
Details | Diff | Splinter Review | |
2.95 KB,
patch
|
gal
:
review+
|
Details | Diff | Splinter Review |
2.24 KB,
text/plain
|
Details | |
240.05 KB,
image/png
|
Details |
Here's the first jsfunfuzz bug I found from Windows XP, and I'm sure there's a lot more too. It took me a long time to reduce this, but it's something jsfunfuzz on Win keeps hitting. for (var x = 0; x < 100000; ++x) { for (y = 0; y < 3; ++y) {} var a = eval; delete eval; eval = a; var b = toString; delete toString; toString = b; } crashes TM both opt and debug WinXP builds with -j. Being unable to yet use Windbg to get a stack, I resorted to putting the script into a HTML file, and a latest-tracemonkey WinXP build does indeed crash with the signature in http://crash-stats.mozilla.com/report/index/545904af-938a-46c2-840c-35d262090322?p=1 0 js3250.dll _chkstk chkstk.asm:99 1 js3250.dll js_MonitorLoopEdge(JSContext*,unsigned int&) js/src/jstracer.cpp:4255 2 js3250.dll js_Interpret js/src/jsinterp.cpp:3129 I'm turning this security-sensitive just in case an exploit is possible.
Flags: blocking1.9.1?
Comment 1•15 years ago
|
||
2 bugs here. 1) we generate a crazy amount of traces (for every outer loop iterations). Need to find out why we are not blacklisting. 2) after a while something bad happens in the heap. We should fix 2) first, and then eliminate 1).
Comment 2•15 years ago
|
||
CCing nick. I think this is a job for valgrind.
Comment 3•15 years ago
|
||
This works fine on macosx debug (way too many traces, but no crash).
Reporter | ||
Updated•15 years ago
|
Attachment #368801 -
Attachment description: testcase (Warning: Crashes latest-tracemonkey Minefield) → testcase (Warning: Crashes Firefox 3.1 Beta 3 and latest-tracemonkey Minefield on WinXP)
Reporter | ||
Comment 4•15 years ago
|
||
I installed gdb on cygwin to try and get a stack trace, but I don't know why I had no symbols even though I had compiled debug and didn't change the paths. Nonetheless, the stack shows: 0 - js!JS_CompareValues 1 - js_GetSrcNoteOffset 2 - js_Invoke 3 - js_Invoke 4 - js!JS_ExecuteScript I'm not sure if this is entirely accurate though.
Updated•15 years ago
|
Flags: blocking1.9.1? → blocking1.9.1+
Priority: -- → P2
Reporter | ||
Comment 5•15 years ago
|
||
Comment #4 is wrong. Here's the call stack from VS 2008 Express: ===== js3250.dll!_chkstk() Line 99 Asm > js3250.dll!js_MonitorLoopEdge(JSContext * cx=0x052bd9b0, unsigned int & inlineCallCount=0) Line 4255 + 0x15 bytes C++ js3250.dll!js_Interpret(JSContext * cx=0x052bd9b0) Line 3740 + 0x56d bytes C++ js3250.dll!js_Execute(JSContext * cx=0x052bd9b0, JSObject * chain=0x06791b40, JSScript * script=0x078127c8, JSStackFrame * down=0x00000000, unsigned int flags=0, int * result=0x00000000) Line 1595 + 0x9 bytes C++ js3250.dll!JS_EvaluateUCScriptForPrincipals(JSContext * cx=0x052bd9b0, JSObject * obj=0x06791b40, JSPrincipals * principals=0x077ef8cc, const unsigned short * chars=0x07810fe8, unsigned int length=165, const char * filename=0x077ec000, unsigned int lineno=7, int * rval=0x00000000) Line 5112 + 0x19 bytes C++ gklayout.dll!nsJSContext::EvaluateString(const nsAString_internal & aScript={...}, void * aScopeObject=0x06791b40, nsIPrincipal * aPrincipal=0x077ef8c8, const char * aURL=0x077ec000, unsigned int aLineNo=7, unsigned int aVersion=0, nsAString_internal * aRetValue=0x00000000, int * aIsUndefined=0x0012edc4) Line 1603 + 0x42 bytes C++ gklayout.dll!nsScriptLoader::EvaluateScript(nsScriptLoadRequest * aRequest=0x07810f70, const nsString & aScript={...}) Line 671 + 0x7a bytes C++ gklayout.dll!nsScriptLoader::ProcessRequest(nsScriptLoadRequest * aRequest=0x07810f70) Line 585 + 0x13 bytes C++ gklayout.dll!nsScriptLoader::ProcessScriptElement(nsIScriptElement * aElement=0x07810c54) Line 539 + 0x17 bytes C++ gklayout.dll!nsScriptElement::MaybeProcessScript() Line 193 + 0x13 bytes C++ gklayout.dll!nsHTMLScriptElement::MaybeProcessScript() Line 547 + 0xb bytes C++ gklayout.dll!nsHTMLScriptElement::DoneAddingChildren(int aHaveNotified=1) Line 485 C++ gklayout.dll!HTMLContentSink::ProcessSCRIPTEndTag(nsGenericHTMLElement * content=0x07810c30, int aMalformed=0) Line 3134 + 0x12 bytes C++ gklayout.dll!SinkContext::CloseContainer(nsHTMLTag aTag=eHTMLTag_script, int aMalformed=0) Line 1023 + 0x12 bytes C++ gklayout.dll!HTMLContentSink::CloseContainer(nsHTMLTag aTag=eHTMLTag_script) Line 2389 + 0x11 bytes C++ gkparser.dll!CNavDTD::CloseContainer(nsHTMLTag aTag=eHTMLTag_script, int aMalformed=0) Line 2798 + 0x26 bytes C++ gkparser.dll!CNavDTD::HandleEndToken(CToken * aToken=0x077ef2a8) Line 1677 + 0x13 bytes C++ gkparser.dll!CNavDTD::HandleToken(CToken * aToken=0x077ef2a8, nsIParser * aParser=0x077efe90) Line 761 + 0xc bytes C++ gkparser.dll!CNavDTD::BuildModel(nsIParser * aParser=0x077efe90, nsITokenizer * aTokenizer=0x077ff1b8, nsITokenObserver * anObserver=0x00000000, nsIContentSink * aSink=0x077f0014) Line 333 + 0x16 bytes C++ gkparser.dll!nsParser::BuildModel() Line 2384 + 0x24 bytes C++ gkparser.dll!nsParser::ResumeParse(int allowIteration=1, int aIsFinalChunk=0, int aCanInterrupt=1) Line 2257 + 0xe bytes C++ gkparser.dll!nsParser::OnDataAvailable(nsIRequest * request=0x077ec0d8, nsISupports * aContext=0x00000000, nsIInputStream * pIStream=0x077ec8d0, unsigned int sourceOffset=0, unsigned int aLength=253) Line 2910 + 0x17 bytes C++ docshell.dll!nsDocumentOpenInfo::OnDataAvailable(nsIRequest * request=0x077ec0d8, nsISupports * aCtxt=0x00000000, nsIInputStream * inStr=0x077ec8d0, unsigned int sourceOffset=0, unsigned int count=253) Line 306 + 0x30 bytes C++ necko.dll!nsBaseChannel::OnDataAvailable(nsIRequest * request=0x077ec568, nsISupports * ctxt=0x00000000, nsIInputStream * stream=0x077ec8d0, unsigned int offset=0, unsigned int count=253) Line 708 + 0x52 bytes C++ necko.dll!nsInputStreamPump::OnStateTransfer() Line 508 + 0x40 bytes C++ necko.dll!nsInputStreamPump::OnInputStreamReady(nsIAsyncInputStream * stream=0x077ec8d0) Line 398 + 0xb bytes C++ xpcom_core.dll!nsInputStreamReadyEvent::Run() Line 112 C++ xpcom_core.dll!nsThread::ProcessNextEvent(int mayWait=1, int * result=0x0012f848) Line 511 C++ xpcom_core.dll!NS_ProcessNextEvent_P(nsIThread * thread=0x00d7e530, int mayWait=1) Line 230 + 0x16 bytes C++ gkwidget.dll!nsBaseAppShell::Run() Line 170 + 0xc bytes C++ tkitcmps.dll!nsAppStartup::Run() Line 192 + 0x1c bytes C++ xul.dll!XRE_main(int argc=1, char * * argv=0x00d7b650, const nsXREAppData * aAppData=0x00d7bbe8) Line 3220 + 0x25 bytes C++ firefox.exe!NS_internal_main(int argc=1, char * * argv=0x00d7b650) Line 156 + 0x12 bytes C++ firefox.exe!wmain(int argc=1, wchar_t * * argv=0x00d79560) Line 107 + 0xd bytes C++ firefox.exe!__tmainCRTStartup() Line 583 + 0x19 bytes C firefox.exe!wmainCRTStartup() Line 403 C kernel32.dll!7c817067() [Frames below may be incorrect and/or missing, no symbols loaded for kernel32.dll] nspr4.dll!_PR_NT_InitSids() Line 119 + 0x4 bytes C 0d020206() ===== Here's !exploitable output along with the stack. 0:000> !load winext\msec.dll 0:000> !exploitable -v HostMachine\HostUser Executing Processor Architecture is x86 Debuggee is in User Mode Debuggee is a live user mode debugging session on the local machine Event Type: Exception *** WARNING: Unable to verify checksum for C:\objdir-tm\dist\bin\components\docshell.dll *** WARNING: Unable to verify checksum for C:\objdir-tm\dist\bin\components\necko.dll *** WARNING: Unable to verify checksum for C:\objdir-tm\dist\bin\xpcom_core.dll *** WARNING: Unable to verify checksum for C:\objdir-tm\dist\bin\components\gkwidget.dll *** WARNING: Unable to verify checksum for C:\objdir-tm\dist\bin\components\tkitcmps.dll *** WARNING: Unable to verify checksum for C:\objdir-tm\dist\bin\xul.dll Exception Faulting Address: 0x6ff0d7 First Chance Exception Type: STATUS_STACK_OVERFLOW (0xC00000FD) Faulting Instruction:006ff0d7 test dword ptr [eax],eax Basic Block: 006ff0d7 test dword ptr [eax],eax Tainted Input Operands: eax 006ff0d9 jmp js3250!_chkstk+0x14 (006ff0c4) Exception Hash (Major/Minor): 0x7005322b.0x1f0e780d Stack Trace: js3250!_chkstk+0x27 js3250!js_MonitorLoopEdge+0x2d7 js3250!js_Interpret+0x5cb9 js3250!js_Execute+0x2dc js3250!JS_EvaluateUCScriptForPrincipals+0xe5 gklayout!nsJSContext::EvaluateString+0x2ba gklayout!nsScriptLoader::EvaluateScript+0x37a gklayout!nsScriptLoader::ProcessRequest+0xfd gklayout!nsScriptLoader::ProcessScriptElement+0x1031 gklayout!nsScriptElement::MaybeProcessScript+0x149 gklayout!nsHTMLScriptElement::MaybeProcessScript+0x24 gklayout!nsHTMLScriptElement::DoneAddingChildren+0x1f gklayout!HTMLContentSink::ProcessSCRIPTEndTag+0xcf gklayout!SinkContext::CloseContainer+0x31d gklayout!HTMLContentSink::CloseContainer+0xa0 gkparser!CNavDTD::CloseContainer+0x18a gkparser!CNavDTD::HandleEndToken+0x1dd gkparser!CNavDTD::HandleToken+0x4ae gkparser!CNavDTD::BuildModel+0x295 gkparser!nsParser::BuildModel+0xe2 gkparser!nsParser::ResumeParse+0x1bc gkparser!nsParser::OnDataAvailable+0x228 docshell!nsDocumentOpenInfo::OnDataAvailable+0x4c necko!nsBaseChannel::OnDataAvailable+0x6d necko!nsInputStreamPump::OnStateTransfer+0x23d necko!nsInputStreamPump::OnInputStreamReady+0x80 xpcom_core!nsInputStreamReadyEvent::Run+0x4a xpcom_core!nsThread::ProcessNextEvent+0x1fa xpcom_core!NS_ProcessNextEvent_P+0x53 gkwidget!nsBaseAppShell::Run+0x5d tkitcmps!nsAppStartup::Run+0x6b xul!XRE_main+0x3388 firefox!NS_internal_main+0x2b2 firefox!wmain+0x119 firefox!__tmainCRTStartup+0x1a8 firefox!wmainCRTStartup+0xf kernel32!BaseProcessStart+0x23 Instruction Address: 0x6ff0d7 Description: Stack Overflow Short Description: StackOverflow Exploitability Classification: UNKNOWN Recommended Bug Title: Stack Overflow starting at js3250!_chkstk+0x27 (Hash=0x7005322b.0x1f0e780d)
Reporter | ||
Updated•15 years ago
|
Attachment #368861 -
Attachment is obsolete: true
Updated•15 years ago
|
Priority: P2 → P1
Updated•15 years ago
|
Assignee: general → gal
Comment 6•15 years ago
|
||
Reproduced under windows. This might be an alloca overflow.
Comment 7•15 years ago
|
||
Stack overflows should not be exploitable -- anyone know otherwise? /be
Comment 8•15 years ago
|
||
I would like to know whats up here. Then we can lift the security classification.
Comment 9•15 years ago
|
||
Breakpoint 2, js_ExecuteTree (cx=0x30bc90, f=0x30e650, inlineCallCount=@0xbffff210, innermostNestedGuardp=0xbfffedb4) at ../jstracer.cpp:3910 3910 TreeInfo* ti = (TreeInfo*)f->vmprivate; 1: globalObj.dslots[-1] = 260 (gdb) c Continuing. Breakpoint 2, js_ExecuteTree (cx=0x30bc90, f=0x30e1c0, inlineCallCount=@0xbffff210, innermostNestedGuardp=0xbfffedb4) at ../jstracer.cpp:3910 3910 TreeInfo* ti = (TreeInfo*)f->vmprivate; 1: globalObj.dslots[-1] = 260 (gdb) c Continuing. Breakpoint 2, js_ExecuteTree (cx=0x30bc90, f=0x30de60, inlineCallCount=@0xbffff210, innermostNestedGuardp=0xbfffedb4) at ../jstracer.cpp:3910 3910 TreeInfo* ti = (TreeInfo*)f->vmprivate; 1: globalObj.dslots[-1] = 516 (gdb) c Continuing. The global object keeps growing new slots, eventually overflowing alloca. This is 2 bugs in one. The global object shouldn't grow like this. And we shouldn't die. I fix the latter. The former sounds like a graydon bug, since he was poking around in that part of the attic.
Comment 10•15 years ago
|
||
Don't allow the global object to grow so many slots that alloca fails.
Attachment #370338 -
Flags: review?(graydon)
Assignee | ||
Comment 11•15 years ago
|
||
Comment on attachment 370338 [details] [diff] [review] patch Not comfortable with this. We should not be using alloca() period. I know it's fast, but it's also a way to crash relatively easily. This is C++, we can do scoped smart pointers with destructors, there's no need to worry about how hard it is to ensure all exit paths from a function call free(). Also don't know why the global object is growing endlessly, nor why we've got an explosion of traces. Possibly global shape changes. Anyway, I'll post a patch shortly that adds a small smart-pointer class, and axes our alloca() calls.
Attachment #370338 -
Flags: review?(graydon) → review-
Comment 12•15 years ago
|
||
(In reply to comment #11) > We should not be using alloca() period. I second this. alloca never checks for stack overflow. So either we prefix it with custom checks for that (like the rest of code does when checking for too much native stack recursion) or eliminate its usage.
Comment 13•15 years ago
|
||
Isn't alloca for fixed size allocations kosher? We can crash on an overdeep stack without alloca, and stack overflow should not be exploitable or there's an OS bug to get fixed. /be
Comment 14•15 years ago
|
||
I am not a big fan of alloca either. It has bitten us before. There is no way we can tolerate allocation cost within ExecuteTree though. Its already way too fat. We can re-allocate and grow a per-thread area for this if we want to. But not in the hot path.
Comment 15•15 years ago
|
||
(a patch for #14 is somewhere in bugzilla)
Comment 16•15 years ago
|
||
(In reply to comment #13) > and stack overflow should not be exploitable This is simply not true with alloca. An OS may try to leave unreadable memory between stacks from different stacks, but if alloca is script-controllable, an evil script can try to over-allocate so alloca spans over the protected pages and reaches the stack for another thread.
Assignee | ||
Comment 17•15 years ago
|
||
(In reply to comment #13) > Isn't alloca for fixed size allocations kosher? We can crash on an overdeep > stack without alloca, and stack overflow should not be exploitable or there's > an OS bug to get fixed. Depends what you mean by kosher. Stack overflows, in general, are ways of crashing C++ programs. There's nothing to do but "be as conservative as you can", keep the stack shallow. Alloca's no different than stack-local buffers or what have you. They all push you closer to "crash". (In reference to Igor's point, I *believe* chkstk.asm goes page-by-page, making sure to probe all the pages it's advancing over. So if you're on a toolchain that's safety-conscious enough to do so, you're possibly safe. It's still a way of crashing.)
Comment 18•15 years ago
|
||
Yes, of course we should not crash on overdeep stacks! SpiderMonkey has lots of JS_CHECK_STACK_SIZE calls to avoid this. But my point was that a stack overflow, like a null pointer deref, on a *sane* OS, is not exploitable. So why am I looking at a light-orange bug? /be
Comment 19•15 years ago
|
||
#17 just gave us a great reason to stop using alloca here. It has to touch all the pages in between. A pre-malloced buffer doesn't have to do that. Perf win. Security win. I think we are beating a dead horse here. Lets do some patching.
Assignee | ||
Comment 20•15 years ago
|
||
This removes all the alloca() calls aside from the one gal's concerned about on the hot path. I'll post a followup that includes the hot path persistent-buffer approach from bug 475998 shortly. Meanwhile, any thoughts? Is this ok?
Attachment #370501 -
Flags: review?(gal)
Comment 21•15 years ago
|
||
The name is really ugly (jsTmpArray). Otherwise looks good. Waiting for the final piece for r+.
Comment 22•15 years ago
|
||
Comment on attachment 370501 [details] [diff] [review] First cut >+class jsTmpArray JS typenames are all JSFooType spelling, but with C++ we can (should, for internal types) lose the poor-man's JS namespace. Old Unix hackers like me love tmp or /tmp, but Tmp not so much, so TempArray seems best. We have TempValueRooter (with JS prefix but no matter). /be
Comment 23•15 years ago
|
||
(In reply to comment #22) > (From update of attachment 370501 [details] [diff] [review]) > >+class jsTmpArray > > JS typenames are all JSFooType spelling, The readme (now MDC docs?) talks about the obvious jsscalartype alternative style. /be
Updated•15 years ago
|
Assignee: gal → graydon
Assignee | ||
Comment 24•15 years ago
|
||
As predicted, it's ... challenging to compete with alloca on speed! But I think I've got it as close to noise as matters, and I think this is an important adjustment: the ability to be alloca-crashed given overlarge user input is not good. This variant switches to using a JS arena pool for the temp arrays, and incorporates gal's previous work on making the global and interp-state area persist between executions. There was some initial real cost due to misfeatures in the way we're using arena pools; it's easy to make them thrash on malloc, losing all their advantage. I've attempted to fix that a bit in this patch, and it seems that I can make up most-to-all of the speed by such corrections: the patch is faster on linux and slower on mac, but either way within less-than-1% difference. About 0.8% on mac. (In investigating this, I've also noticed that the "reserve then release" idiom we are using to make stack reconstruction "infallible" on exit is ... probably wrong? I don't see any reason why the stack arena pool will *necessarily* succeed in re-supplying you with a chunk of memory you've already released. But I guess that's an issue for a later patch.)
Attachment #370501 -
Attachment is obsolete: true
Attachment #370562 -
Flags: review?(gal)
Attachment #370501 -
Flags: review?(gal)
Comment 25•15 years ago
|
||
The first argument is not entirely correct. The previous 2-liner already stopped any issues related to overlarge user input. This patch is about avoiding alloca in general. I am not sure a 10-15ms perf loss is really the way to go. This patch still has ceilings for the maximum number of slots supported. If we already go with fixed ceilings, we can as well just use fixed size stack allocation, which is safe and unbeatable in performance. Also, why do the maximums change? 1024->512 and 64->32?
Assignee | ||
Comment 26•15 years ago
|
||
(In reply to comment #25) > The first argument is not entirely correct. The previous 2-liner already > stopped any issues related to overlarge user input. This patch is about > avoiding alloca in general. Yes, because alloca in general is a risky call to make. It's open-coded as a single unchecked sp decrement on many toolchains. Do you believe there are no ways to get any of the other allocas to hit large values? > I am not sure a 10-15ms perf loss is really the way to go. This patch still has > ceilings for the maximum number of slots supported. If we already go with fixed > ceilings, we can as well just use fixed size stack allocation, which is safe > and unbeatable in performance. Also, why do the maximums change? 1024->512 and > 64->32? You're mixing issues, though I understand your hesitation to not want to regress performance-wise. Hence why I've been poking around trying to make up the difference elsewhere. I'm really only focused on removing alloca here; it's a bad thing to get in the habit of using. This has nothing much to do with whether there's a fixed cap on the number we pass to a particular alloca. There are lots of allocas floating around jstracer. One was even in a spec-illegal position (inside a function argument list). Currently, for example, all the perf loss is focused in leavetree, in its use of the FOREACH macro, which allocas an array I'm turning into a TempArray. Nobody's even proposing to make this allocation carry a fixed-cap. I don't think it'd even be correct anymore if it did. The maximums should change because the current maximums add up to a MAX_INTERP_STACK_BYTES that's over 11,000 bytes. This is a performance problem: it means every time you currently JS_ARENA_ALLOCATE in js_ExecuteTree you're overflowing the stackPool's arena size (8192) and hitting the malloc slow-path. This is a major cost center in the profiles I'm looking at. I only found it while squeezing cost out of the TempArray's uses of arena pools as well. This is the meaning of the comment above wrt arena pools: "it's easy to make them thrash on malloc".
Comment 27•15 years ago
|
||
My point is if we have ceilings, why use pool allocation? Why are we not using fixed[maxsize] stack allocation in ExecuteTree?
Assignee | ||
Comment 28•15 years ago
|
||
(In reply to comment #27) > My point is if we have ceilings, why use pool allocation? Why are we not using > fixed[maxsize] stack allocation in ExecuteTree? Because all such things chew up stack. Just like alloca. We keep talking past one another.
Comment 29•15 years ago
|
||
I don't think this bug should be security-sensitive. Rob, can you make a final ruling? Thanks. If we do not nest js_ExecuteTree, why not just use fixed-size stack buffers? No risk of perf loss, and no risk of stack overflow beyond any unrelated unchecked recursion leading to overdeep stack. If we need to JS_CHECK_STACK_SIZE somewhere let's do that. I just do not see why we'd malloc (and retain) space that could be LIFO allocated more quickly off the stack. /be
Comment 30•15 years ago
|
||
We do nest js_ExecuteTree in case of LeaveTree from inside an interpreter recursion. Its rare but possible.
Comment 31•15 years ago
|
||
I mean to write, if we do not nest js_ExecuteTree, then it should not use heap instead of stack just in case the stack is overdeep -- the bug in that case, or in any case, is lack of JS_CHECK_STACK_SIZE somewhere. If js_ExecuteTree does nest (tree-call, duh) then the same argument applies but the missing JS_CHECK_STACK_SIZE would seem to belong in js_ExecuteTree or its caller. /be
Comment 32•15 years ago
|
||
Loops don't nest so deeply that we should move small-ish stack arrays to the heap. Again, the problem with unchecked stack growth leading to a fatal (and on any OS we should support unexploitable) crash is a lack of JS_CHECK_STACK_SIZE, *not* an abuse of stack allocation over heap allocation. If the arrays were truly huge, then sure. They aren't. /be
Comment 33•15 years ago
|
||
Simple nested loop don't cause recursion here. They all use the same stack areas. Only deep-bailing and then triggering another tree while js_ExecuteTree is still on the stack causes it.
Comment 34•15 years ago
|
||
Oh right -- so anyway, the deal with stack overflow checking in SpiderMonkey is that it's incumbent on recursive functions to JS_CHECK_STACK_SIZE. So pick the "lowest" recursive function implicated here and use that macro, and we don't need to heap-allocated anything. /be
Assignee | ||
Comment 35•15 years ago
|
||
(In reply to comment #34) > Oh right -- so anyway, the deal with stack overflow checking in SpiderMonkey is > that it's incumbent on recursive functions to JS_CHECK_STACK_SIZE. So pick the > "lowest" recursive function implicated here and use that macro, and we don't > need to heap-allocated anything. No, the deal with stack overflow checking in spidermonkey is to rewrite things so they don't chew up stack in a way that lets users DoS you (or worse: you've conveniently ignored the notion that it *can* be made to land in other thread's stacks, or the heap). You've argued exactly this point to me in the past, recall the acrobatics the parser does to limit its stack. The arrays are multiple pages per nesting. This makes them smell like fine attack targets to me. The whole point of the bug is that someone *did* manage to hit input that already overshoots the stack segment. The arguments so far about it being rare or unlikely or easily contained are pure hand-waving. CHECK_STACK_SIZE'ing all over the code might work; but it might be required in places we don't have safe non-stack-chewing bail-out paths for either. Nobody's tried it yet; and either way it's more fragile than just staying shallow(er) by design. Anyway, I'm tired of fighting this and there are other pressing blockers. If you want to overrule and commit something bandaid-y that leaves alloca or fixed stack-buffers in place, ask another reviewer. I'm not going to r+ it, because I think it's a wrong approach. I've said as much and proposed a number of alternatives. We're just wasting time rehashing this disagreement. (You should still pick up the change to lower the maximums; that's a pure performance win in your existing use of arena pools. You are *already hitting* malloc each time through executetree.)
Assignee | ||
Updated•15 years ago
|
Assignee: graydon → gal
Comment 36•15 years ago
|
||
(In reply to comment #35) > No, the deal with stack overflow checking in spidermonkey is to rewrite things > so they don't chew up stack in a way that lets users DoS you (or worse: you've > conveniently ignored the notion that it *can* be made to land in other thread's > stacks, or the heap). I have done no such thing. The OS grows the stack without exploitable crash or we do not support it. Do you have evidence that any OS we support does otherwise for stack allocations (*not* for alloca)? > You've argued exactly this point to me in the past, > recall the acrobatics the parser does to limit its stack. What do you mean? The parser is highly recursive and can overflow common OS stack ulimits, which is why we added JS_CHECK_STACK_SIZE opportunistically to its recursive functions. > The arrays are multiple pages per nesting. This makes them smell like fine > attack targets to me. Again, what OS will fail to redzone and crash safely? The way to avoid that is JS_CHECK_STACK_SIZE. > The whole point of the bug is that someone *did* manage > to hit input that already overshoots the stack segment. The arguments so far > about it being rare or unlikely or easily contained are pure hand-waving. No, you're not reading what I wrote. The bug is lack of JS_CHECK_STACK_SIZE, not abuse of heap instead of stack for LIFO store. /be
Comment 37•15 years ago
|
||
(In reply to comment #36) > (In reply to comment #35) > Again, what OS will fail to redzone and crash safely? > > The way to avoid that is JS_CHECK_STACK_SIZE. I meant to write *unsafely* -- the question was rhetorical. The next statement was about the way to avoid safe but to-be-avoided stack overflow crashes. Less heat, more light. /be
Comment 38•15 years ago
|
||
I will not apply any patch for this issue until there is consensus.
Comment 39•15 years ago
|
||
Ok, I lied. This is actually not recursive. In case of deep bail we move all the data from the native areas to the interpreter stack/global object, so we can just paint over it. So one per-thread (per trace monitor) heap allocated area is fine.
Comment 40•15 years ago
|
||
Ok then! Graydon, does the separate stack depth checking argument make sense? You are right that if a recursive function has a big enough stack buffer, we may not check stack depth enough to avoid crashing. We set the stack limit to 512K, which is small enough that any slop before the next check is not enough to crash. This would be adjustable if js_ExecuteTree were recursive. But it's not so sorry for the digression. To make it up all I can say is the patch looks (and looked) good, apart from any stack vs. heap confusion. Sorry for that, /be
Assignee | ||
Comment 41•15 years ago
|
||
With respect to:
> Again, what OS will fail to redzone and crash safely?
For future reference, here's what it looks like when I run the attached code:
$ ./stk
write var is at 0xb7d373b4
read var is at 0xb75363c4
read i=0
writing to buf[13]
read i=255
IOW, both normal C++ dynamic-sized stack allocation and, if you switch the ifdef, the use of alloca generate code that (on linux-gcc) happily walks off your thread's stack and into your neighbour's, and doesn't crash. No matter if you're "inside" the legal range of the supposed buffer. No gradual probe-every-page redzone triggering or anything. Hence quite exploitable.
Naturally, also lets you scribble on the heap.
As far as I know OSX does this too, though with slightly different spatial arrangements. The loader randomizes only a little; it's only got so many megs of possible-places to stick the thread stacks. Some day we'll have 64-bit processes, for real :)
Comment 42•15 years ago
|
||
Get a real OS! :-P (typed from a Mac -- you sure it fails too? NeXT infusion notwithstanding? I bet $1 (US rubles) that Windows does better. Our existing stack depth checks and 512K limit seem enough to save us given default settings, but a small enough ulimit could cause a crash too, at least on toy OSes. Bleah. /be
Assignee | ||
Comment 43•15 years ago
|
||
(In reply to comment #40) > This would be adjustable if js_ExecuteTree were recursive. But it's not so > sorry for the digression. To make it up all I can say is the patch looks (and > looked) good, apart from any stack vs. heap confusion. Sorry for that, Keep in mind that the 2nd version of the patch has no mallocs, only arena pools, and it'll only hit slow paths if someone's messing with us. So it's pretty "LIFO" as-is. Anyway, I'm sorry I snitted. Inappropriate of me. It sounds like the whole thing can be done in a set of fixed per-thread buffers. I'll try to write up a variant that does that. (The thing that sort of boggles me is how it's slower on OSX; the arena-pool pointer bump should be very nearly identical cost to an esp bump -- I JS_ALWAYS_INLINE it in the version I'm testing -- and I'm carving out a bunch of very expensive mallocs the existing code does. It *should* be faster.)
Assignee: gal → graydon
Assignee | ||
Comment 44•15 years ago
|
||
Bleh. I think I must have got confused about which flavour I was testing. The affect I was seeing earlier was failure-to-inline in a few cases. Removing those and not getting my testing environment messed up, then we're hitting fast-paths on the arena pool in all cases, and it's inlined so it's really just doing a compare-and-bump on arena pointers. That *might* have still cost fractionally more than alloca (which doesn't even do the compare) but as I said, the patch is paying its way performance-wise by removing the existing mallocs: the *combined* patch runs sunspider faster than unpatched now, on mac as well as linux. (I'll grant, more fixed-size buffers might be fractionally faster too, but then you're stuck needing to insert/rearrange a bunch of code to bail out when we overflow limits on those fixed sizes. This is comparatively uninvasive.) Sufficient penance? I can chuck it at a tryserver to be sure.
Attachment #370562 -
Attachment is obsolete: true
Attachment #370762 -
Flags: review?(gal)
Attachment #370562 -
Flags: review?(gal)
Updated•15 years ago
|
Attachment #370762 -
Flags: review?(gal) → review+
Comment 45•15 years ago
|
||
Comment on attachment 370762 [details] [diff] [review] faster variant The smaller sizes may hurt recursion, increasing them will hit the pool slow path. Maybe an assert to guard that we don't make them so slow that we hit the bad path. And I think I would prefer if we do the non-pool approach eventually. But we can certainly go with this for now.
Assignee | ||
Comment 46•15 years ago
|
||
Confirmed on windows VM, this patch fixes the bug also.
Assignee | ||
Comment 47•15 years ago
|
||
http://hg.mozilla.org/tracemonkey/rev/b512be855093
Whiteboard: fixed-in-tracemonkey
Comment 48•15 years ago
|
||
Backed out due to massive failures in jsfunfuzz http://hg.mozilla.org/tracemonkey/rev/a55a5e3c4276
Whiteboard: fixed-in-tracemonkey
Comment 49•15 years ago
|
||
see 486812 for test case
Comment 51•15 years ago
|
||
(In reply to comment #47) > http://hg.mozilla.org/tracemonkey/rev/b512be855093 caused js1_8/regress/regress-477234.js Assertion failure: *(uint64*)&global[STOBJ_NSLOTS(JS_GetGlobalForObject(cx, cx->fp->scopeChain))] == 0xdeadbeefdeadbeefLL, at ../jstracer.cpp:4241;
Comment 52•15 years ago
|
||
Thanks bob. That actually explains the crashes we were seeing quite well. Maybe we should land my minimal patch until this is sorted out.
Comment 53•15 years ago
|
||
also, this changed js1_2/regexp/alphanumeric.js and js1_2/regexp/digit.js opt/shell from a crash to TypeError: (void 0) is undefined.
Assignee | ||
Comment 54•15 years ago
|
||
Sounds like the chunk of code copied from bug 475998. Maybe I misunderstood or mis-copied it.
Assignee | ||
Comment 55•15 years ago
|
||
Previous variant left odd (stale) debugging machinery in place. Caused clobbers to un-owned memory in debug builds, quite random and quite platform-specific. Working on win32 presently. This variant is (I think) free from such flaws; it fixes as many of the cases as I've been able to reproduce here, anyways. Will post changes-to-patch as separate diff momentarily to clarify what changed. Not sure why bc saw other changes; would appreciate a double-check on this patch from bc and jruderman as it is behaving spooky, and I'm not able to reproduce all their specific failures. Also not sure why this variant causes change in trace stats for one test. (Also not sure why global object is growing quickly in the first place. But that's a separate bug.)
Attachment #370762 -
Attachment is obsolete: true
Attachment #371368 -
Flags: review?(gal)
Assignee | ||
Comment 56•15 years ago
|
||
This is just a sub-patch showing the differences between the last 2 variants I uploaded, for review (minus one hunk that simply drifted context, but was otherwise identical).
Comment 57•15 years ago
|
||
(In reply to comment #24) > (In investigating this, I've also noticed that the "reserve then release" > idiom we are using to make stack reconstruction "infallible" on exit is > ... probably wrong? I don't see any reason why the stack arena pool will > *necessarily* succeed in re-supplying you with a chunk of memory you've > already released. But I guess that's an issue for a later patch.) When I introduced that, I read the arena code to make sure it does, even if you free a big chunk then allocate in several smaller chunks. mrbkap agreed at the time that it was safe. If this is no longer true, or I misread, it's definitely a bug.
Comment 58•15 years ago
|
||
The arena code has not changed. We need to agree on how it works and make that requirement stick, with assertions more than docs, but both of course. /be
Assignee | ||
Comment 59•15 years ago
|
||
It looks to me like JS_ArenaRelease calls FreeArenaList on the tail of the pool (after the mark), and that subsequently calls free() on all the tail arenas. Perhaps I'm misreading.
Assignee | ||
Comment 60•15 years ago
|
||
FYI I've set up the fuzzer here and while the former patch certainly provokes a crash within moments, the new one appears to run to completion (whatever its iteration count is).
Updated•15 years ago
|
Attachment #371369 -
Flags: review+
Assignee | ||
Comment 61•15 years ago
|
||
Landed again in http://hg.mozilla.org/tracemonkey/rev/143e997c858e, bounced off a mac tinderbox and was backed out again in http://hg.mozilla.org/tracemonkey/rev/6eca563de031. Also the ARM tinderboxen were (naturally!) disagreeing over the drift in trace-stats. Which as of yesterday, appears on my workstation to be triggerable by leaking 8 bytes of random malloc, unrelated to anything else. Sigh. I'm beginning to despair on this bug. I think I'll land gal's initial patch for now. It fixes the original case minimally, it's true, and I cannot off the top of my head construct an attack that actually abuses the current state of affairs to the point of exploitation. I'm just worried longer-term about stack growth. The remainder of this bug can then be taken off blocking status, have its critical/P1 priority dropped to P2/enhancement or such, and left assigned to me for mopping up after this release cycle.
Assignee | ||
Updated•15 years ago
|
Attachment #370338 -
Flags: review- → review+
Comment 62•15 years ago
|
||
Probably should clone this bug into a followup bug. /be
Assignee | ||
Comment 63•15 years ago
|
||
As you like. Minimal version landed as http://hg.mozilla.org/tracemonkey/rev/6b6158fa172d. Residual work transferred to new bug 487324.
Assignee | ||
Comment 64•15 years ago
|
||
I believe that stuck. Can close when merged to trunk. Sorry for all the fuss.
Whiteboard: fixed-in-tracemonkey
Comment 65•15 years ago
|
||
http://hg.mozilla.org/mozilla-central/rev/6b6158fa172d
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Comment 66•15 years ago
|
||
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/8bbaafeb92f4
Keywords: fixed1.9.1
Comment 67•15 years ago
|
||
oops, wrong changeset (the one above was later backed out)
Keywords: fixed1.9.1
Comment 68•15 years ago
|
||
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/b7cabe99bb2c
Keywords: fixed1.9.1
Comment 69•15 years ago
|
||
can't repro on mac. i'll try win later.
Updated•15 years ago
|
Flags: in-testsuite+
Comment 70•15 years ago
|
||
I did a testrun with the latest nightly builds on trunk and 1.9.1 on WinXP and cannot see the crash anymore. But while running the testcase I got the unresponsive script warning, clicked on stop and quit Firefox. Now a process hangs in memory for about 2mins and consumes 100% cpu load. Don't we really stop on processing the script? Shall I file a new bug on that issue?
Target Milestone: --- → mozilla1.9.2a1
Comment 71•15 years ago
|
||
Henrik: can you use shark if you are on a Mac (you'll need XCode and the CHUD tools) to see where the 100% CPU load is being spent? Or similar for other OSes, just need to find the best profiler and attach it to the process. /be
Reporter | ||
Comment 72•15 years ago
|
||
I followed whimboo's STR and got the same "hang for awhile on shutdown" - did a Shark analysis off a Shark latest-tracemonkey build, and here's what I got. It seems to be spending lots of time in js_SearchScope.
Comment 73•15 years ago
|
||
graydon, your favorite bug is back, in a different flavor. See #72
Comment 74•15 years ago
|
||
(In reply to comment #72) > Created an attachment (id=374175) [details] > Shark profile screenshot > > I followed whimboo's STR and got the same "hang for awhile on shutdown" - did a > Shark analysis off a Shark latest-tracemonkey build, and here's what I got. > > It seems to be spending lots of time in js_SearchScope. David or Graydon, is this an already known bug or shall I file a new one?
Assignee | ||
Comment 75•15 years ago
|
||
(In reply to comment #74) > David or Graydon, is this an already known bug or shall I file a new one? Hmm, I'm already hunting a couple in this space this afternoon but feel free to file a fresh one and assign to me. I'll mark as dupe if it happens to be so.
Comment 76•15 years ago
|
||
Henrik, Gary: this looks like bug 489636 (see bug 488967 comment 12). Could file new and see about dup'ing, or just optimize for that (likely, IMHO) outcome by commenting there. But no need to comment more in this bug about 100% CPU in GC. /be
Comment 77•15 years ago
|
||
(In reply to comment #72) > Created an attachment (id=374175) [details] > Shark profile screenshot > > I followed whimboo's STR and got the same "hang for awhile on shutdown" - did a > Shark analysis off a Shark latest-tracemonkey build, and here's what I got. > > It seems to be spending lots of time in js_SearchScope. Shark lies. Need full symbols, should see static functions in the same file whose address "rounds down" to js_SearchScope, the nearest extern function. /be
Comment 78•15 years ago
|
||
Verified fixed with testcase given in comment 0 on trunk and 1.9.1 with the following debug builds: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2a1pre) Gecko/20090422 Minefield/3.6a1pre ID:20090422224452 Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1b4pre) Gecko/20090422 Shiretoko/3.5b4pre ID:20090422122043 (In reply to comment #76) > Henrik, Gary: this looks like bug 489636 (see bug 488967 comment 12). Could Thanks Brendan. Let's watch this bug for now.
Status: RESOLVED → VERIFIED
Keywords: fixed1.9.1 → verified1.9.1
Comment 79•15 years ago
|
||
js1_8_1/trace/trace-test.js http://hg.mozilla.org/tracemonkey/rev/61892f57b46a
Assignee | ||
Updated•15 years ago
|
Attachment #371368 -
Flags: review?(gal)
Updated•15 years ago
|
Flags: wanted1.9.0.x-
Comment 80•15 years ago
|
||
cvsroot/mozilla/js/tests/js1_8_1/trace/trace-test.js,v <-- trace-test.js new revision: 1.14; previous revision: 1.13 /cvsroot/mozilla/js/tests/shell.js,v <-- shell.js
Comment 81•14 years ago
|
||
when this bug is opened, the test should be checked in.
Flags: in-testsuite+ → in-testsuite?
Updated•13 years ago
|
Crash Signature: [@ _chkstk]
Comment 82•11 years ago
|
||
Filter on qa-project-auto-change: Bug in removed tracer code, setting in-testsuite- flag.
Group: core-security
Flags: in-testsuite? → in-testsuite-
You need to log in
before you can comment on or make changes to this bug.
Description
•