Closed
Bug 600488
Opened 15 years ago
Closed 14 years ago
trace-test -j failures for fennec/android on galaxy s
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
WONTFIX
People
(Reporter: cjones, Assigned: sayrer)
References
Details
Attachments
(8 files, 9 obsolete files)
After working around android fail with patches to js.cpp and trace-test.py, writing a wrapper script for trace-test.py (more android fail), and setting up an android python environment, I have some preliminary test results for trace-tests with just tracejit (-j). Some failures appear to be intermittent, which doesn't seem good. I ran three trials; the common failures were
all 3
----------------------------------------
basic/testComparisons.js
(SIGSEGV)
closures/t034.js
(SIGSEGV)
sunspider/check-date-format-tofte.js
(assertion failure)
And these tests failed in two out of three trials
2/3
----------------------------------------
sunspider/check-3d-cube.js
(SIGSEGV)
sunspider/check-crypto-aes.js
(SIGSEGV)
v8-v5/check-crypto.js
(SIGSEGV)
Will post full results here, setup info somewhere, and maybe -mj results somewhere, tomorrow.
Assignee | ||
Comment 1•15 years ago
|
||
Oh nice, do you have the patch and instructions for running it?
Assignee | ||
Updated•15 years ago
|
Assignee: general → sayrer
Reporter | ||
Comment 2•15 years ago
|
||
Both coming up; last bit I just finished this morning. Here's the segfault in basic/testComparisons.js
(gdb) bt
#0 0x40117708 in ?? ()
#1 0x000e3a14 in ExecuteTrace (cx=0x4023e040, f=0x407c0d68, inlineCallCount=<value optimized out>, innermostNestedGuardp=<value optimized out>, lrp=0x0) at /home/cjones/mozilla/mozilla-central/js/src/jstracer.cpp:6544
#2 js::ExecuteTree (cx=0x4023e040, f=0x407c0d68, inlineCallCount=<value optimized out>, innermostNestedGuardp=<value optimized out>, lrp=0x0) at /home/cjones/mozilla/mozilla-central/js/src/jstracer.cpp:6645
#3 0x000e6eea in VisitStackSlots<js::CheckEntryTypeVisitor> (cx=0xbeb188d4, globalObj=0xbeb187ec, f=<value optimized out>, count=<value optimized out>) at /home/cjones/mozilla/mozilla-central/js/src/jstracer.cpp:1928
#4 VisitSlots<js::CheckEntryTypeVisitor> (cx=0xbeb188d4, globalObj=0xbeb187ec, f=<value optimized out>, count=<value optimized out>) at /home/cjones/mozilla/mozilla-central/js/src/jstracer.cpp:1958
#5 VisitSlots<js::CheckEntryTypeVisitor> (cx=0xbeb188d4, globalObj=0xbeb187ec, f=<value optimized out>, count=<value optimized out>) at /home/cjones/mozilla/mozilla-central/js/src/jstracer.cpp:1985
#6 CheckEntryTypes (cx=0xbeb188d4, globalObj=0xbeb187ec, f=<value optimized out>, count=<value optimized out>) at /home/cjones/mozilla/mozilla-central/js/src/jstracer.cpp:6437
#7 js::FindVMCompatiblePeer (cx=0xbeb188d4, globalObj=0xbeb187ec, f=<value optimized out>, count=<value optimized out>) at /home/cjones/mozilla/mozilla-central/js/src/jstracer.cpp:6461
#8 0x000fddbc in js::MonitorLoopEdge (cx=0x4023e040, inlineCallCount=@0xbeb188d4) at /home/cjones/mozilla/mozilla-central/js/src/jstracer.cpp:7149
#9 0x0016d1a6 in js::Interpret (cx=0x4023e040, entryFrame=<value optimized out>, inlineCallCount=Cannot access memory at address 0xff2fc02c
) at /home/cjones/mozilla/mozilla-central/js/src/jsinterp.cpp:3238
#10 0x000568da in RunScript (cx=0x4023e040, chain=0x40802000, script=0x407d0020, prev=<value optimized out>, flags=0, result=0x0) at /home/cjones/mozilla/mozilla-central/js/src/jsinterp.cpp:484
#11 js::Execute (cx=0x4023e040, chain=0x40802000, script=0x407d0020, prev=<value optimized out>, flags=0, result=0x0) at /home/cjones/mozilla/mozilla-central/js/src/jsinterp.cpp:749
#12 0x0001a61c in JS_ExecuteScript (cx=0x4023e040, obj=0x407e38f4, script=0xbeb1892c, rval=<value optimized out>) at /home/cjones/mozilla/mozilla-central/js/src/jsapi.cpp:4744
#13 0x0000efe0 in Process (cx=0x4023e040, obj=<value optimized out>, filename=<value optimized out>, forceTTY=<value optimized out>) at /home/cjones/mozilla/mozilla-central/js/src/shell/js.cpp:444
#14 0x0000f7f8 in ProcessArgs (cx=0x4023e040, argc=8, argv=0xbeb18c88, envp=<value optimized out>) at /home/cjones/mozilla/mozilla-central/js/src/shell/js.cpp:797
#15 shell (cx=0x4023e040, argc=8, argv=0xbeb18c88, envp=<value optimized out>) at /home/cjones/mozilla/mozilla-central/js/src/shell/js.cpp:5258
#16 0x0000fa7a in main (argc=9, argv=0xbeb18c84, envp=0xbeb18cac) at /home/cjones/mozilla/mozilla-central/js/src/shell/js.cpp:5365
Reporter | ||
Comment 3•15 years ago
|
||
Various hacks for running trace-tests. Patch includes PrintJSStack() from bug 600304 as well.
Comment 4•15 years ago
|
||
Lovely. We are dying on trace. Seeing the assertion would help potentially.
Reporter | ||
Comment 5•15 years ago
|
||
Reporter | ||
Comment 6•15 years ago
|
||
Instructions for running trace-tests on device: https://wiki.mozilla.org/Mobile/Fennec/Android#JS_trace-tests. Please let me know if stuff is missing or broken.
Reporter | ||
Comment 7•15 years ago
|
||
Rob/whoever, I don't know whether fixing one test per bug is preferred (probably?). I posted the stack in comment 2 to prove completeness of my testing env.
Reporter | ||
Comment 8•15 years ago
|
||
Filed bug 600596 for -mj failures.
Reporter | ||
Comment 9•15 years ago
|
||
Filed bug 600639 for debugging support.
Assignee | ||
Comment 10•15 years ago
|
||
testjs.py isn't working for me. I'm getting an ImportError: no module named trace_test
the module search path looks right though.
Assignee | ||
Comment 11•15 years ago
|
||
it works if I rename trace-test/trace-test.py to trace-test/trace_test.py
Assignee | ||
Comment 12•15 years ago
|
||
with TM tip on my Nexus One, I get 1 failure: date-format-tofte. It is not a segfault. It's just a failing test. Anyone with a Galaxy want to try that?
Reporter | ||
Comment 13•15 years ago
|
||
(In reply to comment #11)
> it works if I rename trace-test/trace-test.py to trace-test/trace_test.py
(In reply to comment #3)
> Created attachment 479419 [details] [diff] [review]
> Support for debugging on android
>
> Various hacks for running trace-tests. Patch includes PrintJSStack() from bug
> 600304 as well.
diff --git a/js/src/trace-test/trace-test.py b/js/src/trace-test/trace_test.py
rename from js/src/trace-test/trace-test.py
rename to js/src/trace-test/trace_test.py
--- a/js/src/trace-test/trace-test.py
+++ b/js/src/trace-test/trace_test.py
(In reply to comment #12)
> with TM tip on my Nexus One, I get 1 failure: date-format-tofte. It is not a
> segfault. It's just a failing test. Anyone with a Galaxy want to try that?
I did. See comment 0.
Reporter | ||
Comment 14•15 years ago
|
||
Forgot to attach the full logs, my bad.
Reporter | ||
Comment 15•15 years ago
|
||
Reporter | ||
Comment 16•15 years ago
|
||
Assignee | ||
Comment 17•15 years ago
|
||
(In reply to comment #13)
> (In reply to comment #11)
> > it works if I rename trace-test/trace-test.py to trace-test/trace_test.py
>
> (In reply to comment #3)
> > Created attachment 479419 [details] [diff] [review] [details]
> > Support for debugging on android
> >
> > Various hacks for running trace-tests. Patch includes PrintJSStack() from bug
> > 600304 as well.
> diff --git a/js/src/trace-test/trace-test.py b/js/src/trace-test/trace_test.py
> rename from js/src/trace-test/trace-test.py
> rename to js/src/trace-test/trace_test.py
> --- a/js/src/trace-test/trace-test.py
> +++ b/js/src/trace-test/trace_test.py
Ah, I just applied the patch the wrong way.
>
> (In reply to comment #12)
> > with TM tip on my Nexus One, I get 1 failure: date-format-tofte. It is not a
> > segfault. It's just a failing test. Anyone with a Galaxy want to try that?
>
> I did. See comment 0.
Which TM cset did you use?
OS: Android → All
Assignee | ||
Comment 18•15 years ago
|
||
I built with
changeset: 54633:94e52cdc2157
tag: tip
user: Andreas Gal<gal@uci.edu>
date: Wed Sep 29 12:28:05 2010 -0700
summary: Bug 600580 - TM: set right compartment in _newJSDContext r=gregor
Reporter | ||
Comment 19•15 years ago
|
||
Building from m-c
changeset: 54781:23605bdd871a
tag: qparent
user: Sid Stamm <sstamm@mozilla.com>
date: Thu Sep 23 15:38:47 2010 -0700
summary: Bug 594446 - CSP report-uri should accept relative URIs [r=jst a=blocking2.0: beta7+]
I still get several failures, and unfortunately they're still not consistent. I did three trials again; below are the tests that failed in all three and the ones that failed in two out of three trials.
3/3
-----
basic/testComparisons.js
Exit code: -11
closures/t008.js
Exit code: -11
closures/t034.js
Exit code: -11
sunspider/check-date-format-tofte.js
/data/local/js/trace-test/tests/sunspider/check-date-format-tofte.js:302: Error: Assertion failed
Exit code: 3
v8-v5/check-earley-boyer.js
Exit code: -11
2/3
-----
basic/testApplyCall.js
Exit code: -11
basic/testShiftLeft.js
basic/testShiftRightArithmetic.js
/data/local/js/trace-test/tests/basic/testShiftLeft.js:35: Error: Assertion failed
/data/local/js/trace-test/tests/basic/testShiftRightArithmetic.js:41 Error: Assertion failed
Exit code: 3
closures/name3.js
Exit code: -11
sunspider/check-3d-cube.js
Exit code: -11
v8-v5/check-crypto.js
Exit code: -11
v8-v5/check-regexp.js
Exit code: -4
(We now know check-date-format-tofte to be bogus so it's not worth bothering with.)
Reporter | ||
Comment 20•15 years ago
|
||
Attachment #479563 -
Attachment is obsolete: true
Attachment #479564 -
Attachment is obsolete: true
Attachment #479566 -
Attachment is obsolete: true
Reporter | ||
Comment 21•15 years ago
|
||
Reporter | ||
Comment 22•15 years ago
|
||
Assignee | ||
Comment 23•15 years ago
|
||
is yours similar?
Reporter | ||
Comment 24•15 years ago
|
||
Nearly identical on the important things (--enable-optimize --disable-debug is the default, right?).
Reporter | ||
Comment 25•15 years ago
|
||
And why not, -j passes all tests in a debug build, except for being OOM-killed during the OOM test, and failing the broken date test.
Assignee | ||
Comment 26•15 years ago
|
||
I have a Samsung Vibrant now. I can reproduce this problem using the same build that passes on the Nexus One.
Assignee | ||
Comment 27•15 years ago
|
||
testComparisons will always segfault for me when run alone:
import trace_test
trace_test.main([
JS,
'basic/testComparisons.js',
'--show-cmd',
'--avoid-stdio',
'--jitflags', 'j',
'--write-failure-log', failures_file,
])
Reporter | ||
Comment 28•15 years ago
|
||
Also wfm.
Reporter | ||
Comment 29•15 years ago
|
||
BTW, using --show-output is a little nicer than --write-failure-log.
Passes at -O0.
Reporter | ||
Comment 30•15 years ago
|
||
Playing with compiler flags proved not very fruitful. I've been using basic/testComparisons.js since a segfault is 100% reproducible with -j and --enable-optimize/--disable-debug. This test also fails pretty reliably with -m, giving "Too much recursion". Quick summary is
running -j:
--enable-debug/--disable-optimize: couldn't force a segfault
everything else, including intermediate -O[k] flags, segfaults intermittently
running -m:
--enable-optimize="-O0 -fno-omit-frame-pointer": couldn't force "Too much recursion"
everything else intermittently gives "Too much recursion"
Tried various other little combinations, including --disable/--enable-tracejit/-methodjit and --disable-jemalloc. No interesting changes.
I think I finally found something interesting in gdb though.
Reporter | ||
Comment 31•15 years ago
|
||
Here are some interesting things at the crash
(gdb) x/i $pc
=> 0xeaf14 <js::ExecuteTree(JSContext*, js::TreeFragment*, uintN&, js::VMSideExit**, js::VMSideExit**)+540>: ldr r2, [sp, #12]
(gdb) x/i $lr
0xeaf15 <js::ExecuteTree(JSContext*, js::TreeFragment*, uintN&, js::VMSideExit**, js::VMSideExit**)+540>: ldr r2, [sp, #12]
($lr looks weird ... maybe that's expected?)
Note that we're crashing on |ldr r2, [sp, #12]|.
(gdb) p $sp
$17 = (void *) 0xbebae6a8
(gdb) p/x *(int*)$sp
$20 = 0x0
(gdb) p/x *(int*)($sp + 12)
$21 = 0x407aa928
By the time gdb is involved, we can load $sp no problem, and load the address we segfaulted on. Hmmmmm.....
$ cat /proc/18525/maps
cat /proc/18525/maps
00008000-0019e000 r-xp 00000000 b3:02 7606287 /data/local/js/js
0019e000-001ba000 rwxp 00195000 b3:02 7606287 /data/local/js/js
001ba000-001c6000 rwxp 001ba000 00:00 0 [heap]
[snip]
beb9a000-bebaf000 rwxp befeb000 00:00 0 [stack]
So 0xbebaf000 is the top of the stack. We have 21 pages mapped, 84KiB. (I double checked and
(gdb) p sysconf(0x27) // ed: 0x27 = _SC_PAGESIZE
$22 = 4096)
Note that
bebaf000−bebae6a8 = 958
bebaf000−(bebae6a8+12) = 946
We segfaulted trying to read from within the last 1KiB of the last page in the stack mapping.
This seems very suspicious. If the address we segfaulted on was near $end_of_stack-4096 there would be more to go on. But as things stand, I don't have any concrete hypotheses. That methodjit is dying with "too much recursion" on this test also seems suspicious.
Going to keep this in gdb for a bit longer in case anyone wants something, while I do some other work. Maybe will next try catching and returning from SIGSEGV to see what happens.
Comment 32•15 years ago
|
||
(In reply to comment #31)
> ($lr looks weird ... maybe that's expected?)
It implies that the instruction you're seeing is immediately after a function call, and the function call was (probably) a leaf function that didn't touch LR.
Do you know what the prior call was? Was it the code invocation itself?
I've seen the symptom before, but can't quite place it. (A load failing in the code, but working fine in GDB.)
Reporter | ||
Comment 33•15 years ago
|
||
(In reply to comment #32)
> Do you know what the prior call was? Was it the code invocation itself?
>
I don't know.
Reporter | ||
Comment 34•15 years ago
|
||
If it turns out that there's a system problem causing JIT unfriendliness, we need to blacklist these devices without affecting others. This patch does that, in a gross way. Last resort.
Reporter | ||
Comment 35•15 years ago
|
||
Attachment #480048 -
Attachment is obsolete: true
Comment 36•15 years ago
|
||
Semi-random failures and weirdness w.r.t. the JITs on ARM .. could
this be another manifestation of D-I cache coherence problems proposed
as the cause of bug 596755 ?
Comment 37•15 years ago
|
||
(In reply to comment #36)
> Semi-random failures and weirdness w.r.t. the JITs on ARM .. could
> this be another manifestation of D-I cache coherence problems proposed
> as the cause of bug 596755 ?
It's high on my list of suspicions, though I'd be very surprised to see such problems in NanoJIT.
I wouldn't be surprised to see it in JM (or YARR) for example, but it's very young in comparison, and the trace JIT is pretty mature now.
Reporter | ||
Comment 38•15 years ago
|
||
Ah! That fits what I've seen so far.
Reporter | ||
Comment 39•15 years ago
|
||
The APIs changed, and all the debugging supported has been pushed to tracemonkey.
Attachment #479419 -
Attachment is obsolete: true
Attachment #479421 -
Attachment is obsolete: true
Attachment #479624 -
Attachment is obsolete: true
Attachment #479656 -
Attachment is obsolete: true
Attachment #480049 -
Attachment is obsolete: true
Reporter | ||
Updated•15 years ago
|
Attachment #480264 -
Attachment mime type: text/x-python → text/plain
Comment 40•15 years ago
|
||
(In reply to comment #37)
> I wouldn't be surprised to see it in JM (or YARR) for example, but it's very
> young in comparison, and the trace JIT is pretty mature now.
Well, supposing it's YARR that's failing to do the syncing properly
(although as per https://bugzilla.mozilla.org/show_bug.cgi?id=596755#c14
LukeW reckons YARR _is_ syncing properly) .. "js -j" would still invoke
YARR, wouldn't it? In which case we'd still see the problem.
Comment 41•15 years ago
|
||
There's an easy way to use Valgrind on ARM to check whether there's a
coherence problem. First, run it "normally" on V, perhaps with
--tool=none to speed things up. Check that you can get some kind of
failure or weirdness.
Then, once you have failure/weirdness, rerun but add --smc-check=all.
If the failure/weirdness disappears that's a pretty good sign there's
a coherence problem.
Why this works: on ARM-Linux, dynamically generated code must be
sync'd with the icache using the cacheflush system call. Valgrind
watches these so it knows when it should discard its translations of
the applications's code. If the application fails to do these when it
should, then it will misbehave on V (as well as on the real hardware)
because V will be running out-of-date translations of parts of the app.
With --smc-check=all, V checks each translation is still valid before
running it; if not it re-makes it. This is expensive but provides
transparent I-D cache coherence without any cooperation from the app.
Hence it causes the app to behave as if it always correctly notifies
any new code (via sys_cacheflush) before running it.
Assignee | ||
Comment 42•15 years ago
|
||
(In reply to comment #41)
> There's an easy way to use Valgrind on ARM
I have a dumb question. How do I build Valgrind for ARM/Android? I swear I googled for that this morning.
Reporter | ||
Comment 43•15 years ago
|
||
Julian, I took a crack at building valgrind for android this evening. The result is at http://people.mozilla.com/~cjones/valgrind-android.tar.gz . It appears that this was built with -g, so https://wiki.mozilla.org/Mobile/Fennec/Android#Debugging_with_GDB should work. To try the build on device, run
wget http://people.mozilla.com/~cjones/valgrind-android.tar.gz
tar zxvf valgrind-android.tar.gz
adb push vg /data/local/vg
adb shell
Then from within the |adb shell|, run
/data/local/vg/bin/valgrind /data/local/vg/test
A rooted device isn't needed for this. vg/ does have to be pushed to exactly /data/local/vg or it won't work.
Valgrind doesn't build out of the box. The only bug I can ascribe to valgrind itself is not testing for pthread_rwlock. The rest is just stuff the android folks neglected to copy and paste from glibc. I'm sick of dealing with their broken libc, so I refuse to post the workaround patch publicly; I'd rather get the android folks to fix bionic (I'd be willing to write a patch once we get all the valgrind issues sorted out). Would be happy to e-mail the workaround patch though.
On my i9000, valgrind just spins @ 100% CPU running the test program. Brad also saw this on his N1, so it's likely a problem with android+valgrind and not the i9000. AFAIK, the obstacles to valgrind-on-android are
- home-grown libc (bionic), so need a new suppression file and maybe other stuff. New suppression file should be easy.
- home-grown loader; not sure what issues this might cause
Let me know if I can help more.
Reporter | ||
Comment 44•15 years ago
|
||
Oh right, probably want the source of /data/local/vg/test.
Comment 45•15 years ago
|
||
(In reply to comment #41)
> I have a dumb question. How do I build Valgrind for ARM/Android?
Not a dumb question. I've only tried it on beagleboard (Ubuntu 10.x)
and N900, but not on Android. I don't have an Android device, else I
would make sure it works.
(In reply to comment #44)
> On my i9000, valgrind just spins @ 100% CPU running the test program. Brad
> also saw this on his N1, so it's likely a problem with
> android+valgrind
Can you send the results of running with
"-d -d -v -v --trace-flags=10000000" ?
Overall though this seems like a large diversion from the original
problem. I'm wondering if I can somehow check for coherence loss
by running the tests on a beagleboard instead.
Comment 46•15 years ago
|
||
(In reply to comment #45)
> (In reply to comment #41)
> Can you send the results of running with
> "-d -d -v -v --trace-flags=10000000" ?
attached here
Comment 47•15 years ago
|
||
(In reply to comment #40)
> (In reply to comment #37)
> Well, supposing it's YARR that's failing to do the syncing properly
See bug 596755 comment 16 ,17,18 for some preliminary observations
re MJ PIC/MIC lack-of-syncing.
Comment 48•15 years ago
|
||
(In reply to comment #46)
> Created attachment 480411 [details]
> valgrind trace output
>
> (In reply to comment #45)
> > (In reply to comment #41)
> > Can you send the results of running with
> > "-d -d -v -v --trace-flags=10000000" ?
>
> attached here
I can see where it's hanging, during setup of some file space prior to
starting the app. (in between lines 1791 and 1829 of
coregrind/m_main.c). No idea why though.
Comment 49•14 years ago
|
||
Tracer has been removed.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•