Open Bug 1757798 Opened 2 years ago Updated 2 years ago

Crashing spidermonkey tests do not always fail

Categories

(Core :: JavaScript Engine, defect, P2)

defect

Tracking

()

People

(Reporter: sfink, Unassigned)

References

(Blocks 1 open bug)

Details

This is what bug 1718819 was originally filed for, but it's still very much a problem.

Most commonly seen in green debug SM(cgc) jobs, sometimes a run will be successful (all tests report as passing) but there are minidumps that show assertion failures of various kinds. It's not possible currently to identify the test that produced a minidump, but it appears that this is not a result of an expected-crash test.

No longer blocks: 1706317
Blocks: 1706317
Severity: -- → S3

I did a try push with a set of patches that displays the command line.

The first example log shows:

Command line:
  /builds/worker/workspace/obj-spider/dist/bin/js --dll /builds/worker/fetches/injector/libbreakpadinjector.so -f /builds/worker/checkouts/gecko/js/src/jit-test/lib/prologue.js --ion-eager --more-compartments --ion-offthread-compile=off --selfhosted-xdr-path /tmp/tmpyy8e62v1/shell.xdr --selfhosted-xdr-mode decode -e 'const platform='"'"'linux'"'"'' -e 'const libdir='"'"'/builds/worker/checkouts/gecko/js/src/jit-test/lib/'"'"'' -e 'const scriptdir='"'"'/builds/worker/checkouts/gecko/js/src/jit-test/tests/v8-v5/'"'"'' --module-load-path /builds/worker/checkouts/gecko/js/src/jit-test/modules/ -f /builds/worker/checkouts/gecko/js/src/jit-test/tests/v8-v5/check-splay.js
Crash reason:  SIGABRT
Crash address: 0x3e80000105c

Thread 0  (crashed)
0  libpthread.so.0!__pthread_cond_signal [pthread_cond_signal.c : 94 + 0x11]
1  js!mozilla::detail::ConditionVariableImpl::notify_one() [ConditionVariable_posix.cpp:b57e20efe50c238d4439e2a5107844182e1221a3 : 95 + 0x4]
2  js!js::GlobalHelperThreadState::submitTask(js::GCParallelTask*, js::AutoLockHelperThreadState const&)
3  js!js::GCParallelTask::startOrRunIfIdle(js::AutoLockHelperThreadState&) [GCParallelTask.cpp:b57e20efe50c238d4439e2a5107844182e1221a3 : 66 + 0xa]
4  js!js::gc::GCRuntime::endSweepingSweepGroup(JSFreeOp*, js::SliceBudget&) [Sweeping.cpp:b57e20efe50c238d4439e2a5107844182e1221a3 : 1602 + 0x1b]
...

(this is the main thread).

The matching result lines are:

[task 2022-03-06T19:51:20.962Z] TEST-PASS | js/src/jit-test/tests/v8-v5/check-splay.js | Success (code 0, args "--baseline-eager") [21.2 s]
[task 2022-03-06T19:52:25.993Z] TEST-PASS | js/src/jit-test/tests/v8-v5/check-splay.js | Success (code 0, args "") [86.9 s]
[task 2022-03-06T19:53:29.672Z] TEST-PASS | js/src/jit-test/tests/v8-v5/check-splay.js | Success (code -6, args "--ion-eager --ion-offthread-compile=off --more-compartments") [150.1 s]

So it looks like the crashing one is the 3rd, and it has a very suspicious runtime of 150.1s. This looks like a mishandled 150s timeout? It accurately reports an exit code of -6 but considers it to be a TEST-PASS.

The other three examples in that push are the same test, with about the same duration. The crash stacks vary, in particular the last one dies in js::CurrentThreadCanAccessRuntime(JSRuntime const*) [Runtime.cpp:b57e20efe50c238d4439e2a5107844182e1221a3 : 789 + 0x5] which has been freaking me out since I took that to mean we're doing an invalid access. But now it looks more like we probably just spend quite a bit of time there, and are randomly timeout-aborted such that we end up dying there fairly often.

This is making me feel better. It's looking like when we timeout, we kill the running test, which generates a minidump. Then for some reason we mark it as a pass instead of a timeout.

(In reply to Steve Fink [:sfink] [:s:] from comment #2)
Nice, that would explain it!

You need to log in before you can comment on or make changes to this bug.