Closed
Bug 432729
Opened 17 years ago
Closed 16 years ago
heap corruption (?) causing mochitest failures @js_Interpret
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
INCOMPLETE
People
(Reporter: rcampbell, Unassigned)
References
Details
failing mochitests on qm-xserve01 appear to be coming from some corrupted heap memory. The error started appearing around 4/29/2008 @ 18:30.
stack follows:
#0 js_Interpret (cx=0x2d6e62a0) at /builds/slave/trunk_osx/mozilla/js/src/jsinterp.c:5974
#1 0x00240941 in js_Execute (cx=0x2d6e62a0, chain=0x2a1808a0, script=0x2a19d000, down=0x0, flags=0, result=0xbfffe0fc) at /builds/slave/trunk_osx/mozilla/js/src/jsinterp.c:1535
#2 0x0020b603 in JS_EvaluateUCScriptForPrincipals (cx=0x2d6e62a0, obj=0x2a1808a0, principals=0x31128754, chars=0x2ad53008, length=150640, filename=0x2f29afc8 "http://localhost:8888/MochiKit/packed.js", lineno=1, rval=0xbfffe0fc) at /builds/slave/trunk_osx/mozilla/js/src/jsapi.c:4999
#3 0x01487e27 in nsJSContext::EvaluateString (this=0x2d6e6210, aScript=@0x2f29afa0, aScopeObject=0x2a1808a0, aPrincipal=0x31128750, aURL=0x2f29afc8 "http://localhost:8888/MochiKit/packed.js", aLineNo=1, aVersion=0, aRetValue=0x0, aIsUndefined=0xbfffe1f0) at /builds/slave/trunk_osx/mozilla/dom/src/base/nsJSEnvironment.cpp:1530
#4 0x01373942 in nsScriptLoader::EvaluateScript (this=0x338251c0, aRequest=0x2f29af90, aScript=@0x2f29afa0) at /builds/slave/trunk_osx/mozilla/content/base/src/nsScriptLoader.cpp:582
#5 0x01373d80 in nsScriptLoader::ProcessRequest (this=0x338251c0, aRequest=0x2f29af90) at /builds/slave/trunk_osx/mozilla/content/base/src/nsScriptLoader.cpp:496
#6 0x01373e52 in nsScriptLoader::ProcessPendingRequests (this=0x338251c0) at /builds/slave/trunk_osx/mozilla/content/base/src/nsScriptLoader.cpp:629
#7 0x01373f77 in nsScriptLoader::OnStreamComplete (this=0x338251c0, aLoader=0x319f2d80, aContext=0x2f29af90, aStatus=0, aStringLen=150640, aString=0x2a7eb008 "/***\n\n MochiKit.MochiKit 1.4 : PACKED VERSION\n\n THIS FILE IS AUTOMATICALLY GENERATED. If creating patches, please\n diff against the source tree, not this file.\n\n See <http://mochikit.com/"...) at /builds/slave/trunk_osx/mozilla/content/base/src/nsScriptLoader.cpp:804
#8 0x0108d606 in nsStreamLoader::OnStopRequest (this=0x319f2d80, request=0x319f2a8c, ctxt=0x2f29af90, aStatus=0) at /builds/slave/trunk_osx/mozilla/netwerk/base/src/nsStreamLoader.cpp:108
#9 0x010e9ff5 in nsHttpChannel::OnStopRequest (this=0x319f2a60, request=0x2f6ecce0, ctxt=0x2f29af90, status=0) at /builds/slave/trunk_osx/mozilla/netwerk/protocol/http/src/nsHttpChannel.cpp:4443
#10 0x0107547e in nsInputStreamPump::OnStateStop (this=0x2f6ecce0) at /builds/slave/trunk_osx/mozilla/netwerk/base/src/nsInputStreamPump.cpp:576
#11 0x010758d3 in nsInputStreamPump::OnInputStreamReady (this=0x2f6ecce0, stream=0x2f6ecdc8) at /builds/slave/trunk_osx/mozilla/netwerk/base/src/nsInputStreamPump.cpp:401
#12 0x01a65e37 in nsInputStreamReadyEvent::Run (this=0x2f6ecc80) at /builds/slave/trunk_osx/mozilla/xpcom/io/nsStreamUtils.cpp:111
#13 0x018ad974 in nsThread::ProcessNextEvent (this=0x20d10380, mayWait=0, result=0xbfffe4dc) at /builds/slave/trunk_osx/mozilla/xpcom/threads/nsThread.cpp:510
#14 0x01875621 in NS_ProcessPendingEvents_P (thread=0x20d10380, timeout=20) at nsThreadUtils.cpp:180
#15 0x01833c47 in nsBaseAppShell::NativeEventCallback (this=0x25cb5f10) at /builds/slave/trunk_osx/mozilla/widget/src/xpwidgets/nsBaseAppShell.cpp:121
#16 0x01807303 in nsAppShell::ProcessGeckoEvents (aInfo=0x25cb5f10) at /builds/slave/trunk_osx/mozilla/widget/src/cocoa/nsAppShell.mm:302
#17 0x9082a09a in CFRunLoopRunSpecific ()
#18 0x90829b0e in CFRunLoopRunInMode ()
#19 0x92dd8bef in RunCurrentEventLoopInMode ()
#20 0x92dd8234 in ReceiveNextEventCommon ()
#21 0x92dd8154 in BlockUntilNextEventMatchingListInMode ()
#22 0x9327d465 in _DPSNextEvent ()
#23 0x9327d056 in -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] ()
#24 0x93276ddb in -[NSApplication run] ()
#25 0x01806745 in nsAppShell::Run (this=0x25cb5f10) at /builds/slave/trunk_osx/mozilla/widget/src/cocoa/nsAppShell.mm:591
#26 0x016bc7bb in nsAppStartup::Run (this=0x25cd1b50) at /builds/slave/trunk_osx/mozilla/toolkit/components/startup/src/nsAppStartup.cpp:181
#27 0x01011828 in XRE_main (argc=6, argv=0xbffff92c, aAppData=0x20d0b520) at /builds/slave/trunk_osx/mozilla/toolkit/xre/nsAppRunner.cpp:3170
#28 0x00001cc9 in start ()
Flags: blocking1.9?
Comment 1•17 years ago
|
||
This was originally discovered while debugging intermittent failures in bug# 432471.
Updated•17 years ago
|
Summary: #0 js_Interpret (cx=0x2d6e62a0) → js interpreter causing mochitest failures: #0 js_Interpret (cx=0x2d6e62a0)
Updated•17 years ago
|
Summary: js interpreter causing mochitest failures: #0 js_Interpret (cx=0x2d6e62a0) → heap corruption (?) causing mochitest failures @js_Interpret
Comment 2•17 years ago
|
||
Both characterdatagetchar and characterdatagetlength run cleanly (without any error output) under valgrind on linux, when run by themselves.
Comment 3•17 years ago
|
||
From 432471 comment #16:
===snip===
Igor and I have looked at this for a while. The crash is happening as the result of a jsval_void being cast as an object, but there also seems to be heap corruption happening nearby (fp->script's first 12-16 bytes are nonsense). Going to try another stab at reproducing here.
===snip===
Likely, the heap-corruption is putting the js interpreter into an unrecoverably bad state, causing the crash at this point -- not the other way around.
Comment 4•17 years ago
|
||
We should absolutely try and fix this but given that we don't understand it not sure we can block..
Flags: blocking1.9? → blocking1.9-
Reporter | ||
Comment 5•17 years ago
|
||
I just flipped off the debugging info and core dumps on xserve01. I expect it to resume its previous routine of failing in mochitests in the dom level1 suite. We're planning on bringing up another machine but if that's throwing the same error and we're not getting full coverage, I'm not sure the tree should be open anyway.
Disabling the tests in question are something of a poor band-aid.
adding bc to see if he's seen anything similar in his suites...
Comment 6•17 years ago
|
||
Not on linux so far. I just run the dom 1 core iframe html (from the original dom suite) under valgrind on centos5 x86_64 and apart from jumps depending on uninitialized values (during startup and registration) I didn't see any valgrind errors or crashes related to the dom tests themselves. It might be in the mochikit parts though. I'll try on mac next and then mochikit tests locally.
Comment 7•17 years ago
|
||
I can't reproduce locally.
Updated•17 years ago
|
Group: security
Reporter | ||
Comment 8•17 years ago
|
||
(In reply to comment #7)
> I can't reproduce locally.
Maybe we can walk through setting up qm-xserve03 today. I've got another xserve I'll be bringing up, but could use some extra firepower to try and reproduce this.
Comment 9•17 years ago
|
||
Sure. qm-xserve03 would be great.
After a question from brendan yesterday on a different bug, I realized I haven't been testing jemalloc properly.
As I understand it, jemalloc is enabled by default on Linux now, requires --enable-jemalloc on windows and is not available on Mac. crowder asked if I was using the jemalloc valgrind hooks which left me clueless again. Does --with-valgrind enable the appropriate valgrind hooks in general and jemalloc in particular?
I redid some of my configurations locally and added --enable-jemalloc --with-valgrind to my linux configuration and reran the dom1 core iframe xml tests and found a number of valgrind errors (invalid reads and writes), but haven't yet been able to get answers from people whether what I am doing is correct due to the lateness of my irc pings.
I need some help determining the appropriate ac options to use for my testing. My failure to do this properly before now has me worried.
Comment 10•17 years ago
|
||
PS. and testing jemalloc on debug windows builds is not possible due to bug 429745.
Comment 11•17 years ago
|
||
stuart, jason:
Is what I am doing in comment 9 correct?
How hard would it be to get a fix for bug 429745 so I can test windows debug builds with jemalloc?
Can you tell me how to test the js shell with jemalloc?
Comment 12•17 years ago
|
||
(In reply to comment #11)
> Is what I am doing in comment 9 correct?
It sounds like you are correctly enabling valgrind support. (jemalloc is
enabled by default, so the --enable-jemalloc option isn't needed.)
> How hard would it be to get a fix for bug 429745 so I can test windows debug
> builds with jemalloc?
Ted Mielczarek did the Windows CRT build integration, so he may have a better
idea how hard fixing this is.
Comment 13•17 years ago
|
||
preed has a patch there, but it's ugly. I don't know that I'm really comfortable taking that on the 1.9 branch at all. You could apply it locally for testing, and we can look at getting it into 1.9.next.
Comment 14•17 years ago
|
||
(In reply to comment #13)
> preed has a patch there, but it's ugly. I don't know that I'm really
> comfortable taking that on the 1.9 branch at all. You could apply it locally
> for testing, and we can look at getting it into 1.9.next.
(This should probably go in bug 432729, but...) yah, agreed that it's... a lot of change. It would be nice to see it land somewhere, since AIUI, the way we're doing the custom CRT now is... "unsupported" (specifically, calling it msvcrt in the DLL, so we "magically" pick up the symbols, even though the DLL is actually different).
Anyway, I digress: I was going to say, ted: if you want me to redo the patch so that the logic in memory/jemalloc/Makefile.in is something like:
ifdef MOZ_DEBUG || MOZ_NEW_CRT_MODE
... my patch ...
else
... old style ...
I'd be happy to do that.
I had no delusions that this would make Fx 3.0, but it would be nice to make 3.0.0.1 (or 3.0.1, or whatever), but I'm torn because I agree that it can be risky there, too.
It's lamentable, though, that the 1.9.0 branch will effectively have no useful debug builds for 18-24 months.
Comment 15•17 years ago
|
||
Comment 16•17 years ago
|
||
(In reply to comment #14)
>
> It's lamentable, though, that the 1.9.0 branch will effectively have no useful
> debug builds for 18-24 months.
>
There is no f'ing way we can live without debug builds for 2 years. :-)
Comment 17•17 years ago
|
||
Well, we get debug builds, just not debug jemalloc builds, which sucks.
Comment 18•17 years ago
|
||
(In reply to comment #17)
> Well, we get debug builds, just not debug jemalloc builds, which sucks.
Right, that. :-p
Or, put another way, we don't have an (easy?) way of getting debug fields of what people are actually using.
Reporter | ||
Comment 19•17 years ago
|
||
added another core dump and build to:
http://people.mozilla.org/~rcampbell/xserve01/*27855*
The good news: we got another crash. The bad news is the stack dump is totally different.
Comment 20•17 years ago
|
||
Yeah, not a lot useful to be gained from this stack, alas.
Comment 21•16 years ago
|
||
Resolving incomplete, since this hasn't happened in ages and no one can reproduce reliably.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → INCOMPLETE
Updated•16 years ago
|
Group: core-security
You need to log in
before you can comment on or make changes to this bug.
Description
•