Closed
Bug 572149
Opened 15 years ago
Closed 15 years ago
segfault while running mochitest with messageManager.loadFrameScript()
Categories
(Core :: JavaScript Engine, defect, P1)
Tracking
()
RESOLVED
FIXED
People
(Reporter: jmaher, Assigned: mrbkap)
References
Details
(Whiteboard: fixed-in-tracemonkey)
Attachments
(1 file)
|
4.58 KB,
patch
|
gal
:
review+
|
Details | Diff | Splinter Review |
I have been tracking down for the last couple days a few issues related to my mochitest + e10s patch. There are two issues remaining and it is a crash of firefox (linux and winnt) for toolkit/content/tests/widgets/test_tree_hier.xul and an a11y test (which I can work around, but still we should fix).
I have narrowed down the test_tree_heir.xul issue and this is how you can reproduce it:
1) download build + tests: http://ftp.mozilla.org/pub/mozilla.org/firefox/tryserver-builds/jmaher@mozilla.com-f62530962001/tryserver-linux-debug/
2) unpack both to the same directory (like tinderbox does)
3) run this command: python mochitest/runtests.py --appname=firefox/firefox-bin --utility-path=bin --extra-profile-file=bin/plugins --certificate-path=certs --close-when-done --console-level=INFO --test-path=toolkit/content/tests/widgets --autorun
4) observe the failure near the end of the test run. As a note, I usually see this assertion "Assertion failure: StackBase(cx->fp) + stackDepth <= cx->regs->sp, at /builds/slave/tryserver-linux-debug/build/js/src/jsinterp.cpp:1221", at least when I get an exit code 6. Other times I get an exit code 11 with no assertion message.
Since I cannot reproduce this locally with a build on my machine, I have to work with builds from the try server (i.e. no symbols). Here is the stack trace that I have found to date:
(gdb) bt
#0 0x0012d422 in __kernel_vsyscall ()
#1 0x0013c230 in raise () from /lib/tls/i686/cmov/libpthread.so.0
#2 0x026df75f in JS_Assert () from /home/joel/mozilla/simple_e10s/firefox/libmozjs.so
#3 0x0263d14e in ?? () from /home/joel/mozilla/simple_e10s/firefox/libmozjs.so
#4 0x0263ac71 in ?? () from /home/joel/mozilla/simple_e10s/firefox/libmozjs.so
#5 0x0263f54e in js_Invoke () from /home/joel/mozilla/simple_e10s/firefox/libmozjs.so
#6 0x0263fa45 in ?? () from /home/joel/mozilla/simple_e10s/firefox/libmozjs.so
#7 0x0259ce54 in JS_CallFunctionValue () from /home/joel/mozilla/simple_e10s/firefox/libmozjs.so
#8 0x00c6ef16 in ?? () from /home/joel/mozilla/simple_e10s/firefox/libxul.so
#9 0x00ca64ae in ?? () from /home/joel/mozilla/simple_e10s/firefox/libxul.so
#10 0x00ca6c78 in ?? () from /home/joel/mozilla/simple_e10s/firefox/libxul.so
#11 0x0182053c in ?? () from /home/joel/mozilla/simple_e10s/firefox/libxul.so
#12 0x0182076d in ?? () from /home/joel/mozilla/simple_e10s/firefox/libxul.so
#13 0x0181903b in ?? () from /home/joel/mozilla/simple_e10s/firefox/libxul.so
#14 0x017abe24 in ?? () from /home/joel/mozilla/simple_e10s/firefox/libxul.so
#15 0x016d0d5a in ?? () from /home/joel/mozilla/simple_e10s/firefox/libxul.so
#16 0x018886e3 in ?? () from /home/joel/mozilla/simple_e10s/firefox/libxul.so
#17 0x018886fb in ?? () from /home/joel/mozilla/simple_e10s/firefox/libxul.so
#18 0x0188875f in ?? () from /home/joel/mozilla/simple_e10s/firefox/libxul.so
#19 0x01583538 in ?? () from /home/joel/mozilla/simple_e10s/firefox/libxul.so
#20 0x012eb151 in ?? () from /home/joel/mozilla/simple_e10s/firefox/libxul.so
#21 0x003e0530 in XRE_main () from /home/joel/mozilla/simple_e10s/firefox/libxul.so
#22 0x08048e42 in ?? ()
#23 0x03114bd6 in __libc_start_main () from /lib/tls/i686/cmov/libc.so.6
#24 0x080489f1 in ?? ()
(gdb)
I have also spent some time figuring out what is causing the problem. I have narrowed my patch down and found that when I do a messageManager.loadFrameScript("data:,dump('hello world');", true);
I get a crash. For more reference on how I am doing the messageManager calls, please see the attachment to bug 567417.
In addition, I tried to narrow the test case down (somewhat). I found that in this directory:
http://mxr.mozilla.org/mozilla-central/source/toolkit/content/tests/widgets/
there are 3 tests that need to be run in order:
test_tooltip.xul
test_tree.xul
test_tree_hier.xul
In addition, I have reduced test_tooltip.xul down to:
http://pastebin.mozilla.org/735695
This is silly as if I remove this simple test file the whole widgets directory completes without crashing. Same with removing the function and code in the file as referenced by the comments.
This is blocking finishing up the mochitest patch in bug 567417 which is in turn blocking landing e10s on m-c.
Comment 1•15 years ago
|
||
The crash is a JS assertion. And JS assertions are fatal in debug builds.
Chatted with mrbkap and he says that he's the right person to take a look in here and see what is happening. So, Joel, I've been following this issue and it's still not clear to me from this bug exactly what mrbkap needs to do to reproduce this issue. Can you enumerate the steps for him?
Something like:
1. Apply patch from bug 567417
2. Change tests to use the reduced test cases
3. Run mochitest with certain command line...
4. Profit?
Joel, please correct that as needed ^
Thanks.
| Reporter | ||
Comment 3•15 years ago
|
||
actually it is easier than that:
1) apply patch from bug 567417
2) python mochitest/runtests.py --appname=firefox/firefox-bin
--utility-path=bin --extra-profile-file=bin/plugins --certificate-path=certs
--close-when-done --console-level=INFO
--test-path=toolkit/content/tests/widgets --autorun
3) wait for crash (about 5 minutes on my local machine)
The only caveat is that I have not had luck with the crash when building locally, that is why my steps outlined using my tryserver builds.
| Assignee | ||
Comment 4•15 years ago
|
||
Many thanks to Luke who helped me debug this. I'd like to start off with four gdb commands:
(gdb) b jstracer.cpp:8360
(gdb) r
(gdb) cond 1 ((nsStandardURL *)((nsXULDocument *)(((nsGlobalWindow *)((XPCWrappedNative *)(cx->globalObject.fslots[2]))->mIdentity)->mDoc.mRawPtr))->mDocumentURI.mRawPtr)->mSpec.mData[70] == 'i' && ((nsStandardURL *)((nsXULDocument *)(((nsGlobalWindow *)((XPCWrappedNative *)(cx->globalObject.fslots[2]))->mIdentity)->mDoc.mRawPtr))->mDocumentURI.mRawPtr)->mSpec.mData[71] == 'e'
(gdb) ignore 1 542
(note that the line number in jstracer was inside a specially crafted if statement to filter out incorrect breaks.
This bug reproduced reliably for me, the problem was tracking down what happened. The initial diagnosis was that we were in js_Interpret, interpreting JSOP_APPLY while inside an imacro. The commands above broke inside of TraceRecorder::callImacro on the last call before the crash. Stepping from there, we found that what happened was that we started recording the imacro for JSOP_APPLY, but as we were about to return to the interpreter to actually record the imacro, realized that we'd run out of jit cache space and decided to abort. Unfortunately, this realization came *after* we'd already "entered" the imacro (i.e. set fp->imacpc and cx->regs->pc to their imacro equivalents). So we'd try to interpret JSOP_APPLY, with pc *not* pointing at an apply. This meant that GET_ARGC returned a random number, so when we looked in the stack past all of the "arguments", we found some random memory and tried to treat it like a JS object. Depending on what happened to be at that memory, you'd crash or assert in various places.
The fix is to take this very special case and realize that it happens by returning a special "recorded imacro but aborted at the same time" allowing us to do the right thing.
Assignee: nobody → mrbkap
Component: IPC → JavaScript Engine
QA Contact: ipc → general
| Assignee | ||
Comment 5•15 years ago
|
||
Attachment #451778 -
Flags: review?(gal)
Comment 6•15 years ago
|
||
Comment on attachment 451778 [details] [diff] [review]
Proposed fix
Scary.
Attachment #451778 -
Flags: review?(gal) → review+
| Reporter | ||
Comment 7•15 years ago
|
||
wow, I had no idea this was such a hairy bug. Awesome debugging and I look forward to seeing this work. I will give this some extra testing on the various scenarios (variations to the above sequence and a a11y testcase) that I was able to reproduce this problem with.
thanks for the fast turnaround!
| Assignee | ||
Comment 8•15 years ago
|
||
Whiteboard: fixed-in-tracemonkey
| Reporter | ||
Comment 9•15 years ago
|
||
cool! I am doing a run on tryserver with this and my patch and I am not seeing the failing in Md5 runs!
Is there a trace-monkey merge coming up anytime soon? I cannot land my mochitest harness patch with this bug in m-c. Maybe we could disable the offending test, but I don't think that would be a popular idea.
| Reporter | ||
Comment 10•15 years ago
|
||
pushed to m-c http://hg.mozilla.org/mozilla-central/rev/1a575a9ad94f
| Assignee | ||
Updated•15 years ago
|
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•