Closed
Bug 1285074
Opened 9 years ago
Closed 8 years ago
Hit MOZ_CRASH(js::Mutex::lock: pthread_mutex_lock failed) or Assertion failure: r == 0, at js/src/threading/posix/Mutex.cpp:52
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
FIXED
mozilla53
People
(Reporter: gkw, Assigned: sstangl)
References
Details
(Keywords: assertion, crash, testcase)
Attachments
(6 files, 6 obsolete files)
47.83 KB,
text/plain
|
Details | |
765.94 KB,
text/plain
|
Details | |
3.01 KB,
patch
|
terrence
:
review+
|
Details | Diff | Splinter Review |
1.91 KB,
patch
|
terrence
:
review+
|
Details | Diff | Splinter Review |
37.28 KB,
text/plain
|
Details | |
13.75 KB,
patch
|
Details | Diff | Splinter Review |
The following testcase crashes on mozilla-central revision 95ffbc4ff635 (build with --enable-debug --enable-more-deterministic --32 --enable-simulator=arm, run with --fuzzing-safe --gc-zeal=14 --arm-sim-icache-checks --ion-eager):
See attachment.
Backtrace:
0 js-dbg-32-dm-clang-armSim-darwin-95ffbc4ff635 0x001cf689 js::Mutex::lock() + 185 (Mutex.cpp:52)
1 js-dbg-32-dm-clang-armSim-darwin-95ffbc4ff635 0x006d02d0 js::jit::Simulator::FlushICache(void*, unsigned long) + 96 (Simulator-arm.cpp:371)
2 js-dbg-32-dm-clang-armSim-darwin-95ffbc4ff635 0x00416f58 js::jit::AutoFlushICache::flush(unsigned long, unsigned long) + 104 (Ion.cpp:3353)
3 js-dbg-32-dm-clang-armSim-darwin-95ffbc4ff635 0x00677329 js::jit::Assembler::RetargetNearBranch(js::jit::Instruction*, int, js::jit::Assembler::Condition, bool) + 233 (Assembler-arm.cpp:2955)
4 js-dbg-32-dm-clang-armSim-darwin-95ffbc4ff635 0x0067704c js::jit::PatchJump(js::jit::CodeLocationJump&, js::jit::CodeLocationLabel, js::jit::ReprotectCode) + 508 (Assembler-arm.cpp:621)
/snip
For detailed crash information, see attachment.
--gc-zeal=14 seems needed, the gczeal(0) function call at the bottom of the testcase is also needed. Locking s-s because of this.
Testcase is fairly intermittent. Replace the path at the top of the testcase to the path of your tree to try and reproduce, ideally at the same revision.
Reporter | ||
Comment 1•9 years ago
|
||
Reporter | ||
Comment 2•9 years ago
|
||
Reporter | ||
Comment 3•9 years ago
|
||
I'm trying to get a bisection window, in the meanwhile setting needinfo? from :terrence and :jonco in case this is related to the recent threading/GC changes, as a start.
Flags: needinfo?(terrence)
Flags: needinfo?(jcoppeard)
Whiteboard: [jsbugmon:update]
Reporter | ||
Comment 4•9 years ago
|
||
I also tried to reduce to a smaller testcase after trying for a few hours, but it gets harder to reproduce.
Reporter | ||
Comment 5•9 years ago
|
||
autoBisect shows this is probably related to the following changeset:
The first bad revision is:
changeset: https://hg.mozilla.org/mozilla-central/rev/73cc7590bc39
user: Nick Fitzgerald
date: Thu Jun 30 13:13:30 2016 -0700
summary: Bug 1283278 - Use js::Mutex instead of PRLock in js/src/jit/arm/Simulator-arm.{h,cpp}; r=terrence
Actually, this is likely Nick's. Nick, is bug 1283278 a likely regressor?
Comment 6•9 years ago
|
||
This means something about taking the lock is buggy. This is likely an existing issue and the new mutex has debug assertions against various errors that weren't present before. I'm adding `perror()` calls to the posix implementation of `js::Mutex` so we can tell for which reason the locking is failing.
Flags: needinfo?(nfitzgerald)
Updated•9 years ago
|
Whiteboard: leave-open
Comment 7•9 years ago
|
||
Attachment #8769286 -
Flags: review?(terrence)
Comment 8•9 years ago
|
||
Comment on attachment 8769286 [details] [diff] [review]
Add `perror` calls when pthread locking fails
Review of attachment 8769286 [details] [diff] [review]:
-----------------------------------------------------------------
This is a ton of copy-paste. Let's instead abstract it.
How about:
#define CHECK_PTHREAD_CALL(c, msg) \
if ((c) != 0) { \
perror((msg)); \
MOZ_CRASH((msg)); \
}
CHECK_PTHREAD_CALL(pthread_mutexattr_init(&attr),
"js::Mutex::Mutex: pthread_mutexattr_init failed");
#undef CHECK_PTHREAD_CALL
Or if a macro feels too dirty, I think you could also make a fairly clean MOZ_ALWAYS_INLINE varargs template to do the same, although a macro feels fine here.
Attachment #8769286 -
Flags: review?(terrence)
Comment 9•9 years ago
|
||
I was going to use a static helper function but MOZ_CRASH needs the literal. Embarassingly enough, I didn't think of a macro :P
Comment 10•9 years ago
|
||
Attachment #8769312 -
Flags: review?(terrence)
Updated•9 years ago
|
Attachment #8769286 -
Attachment is obsolete: true
Comment 11•9 years ago
|
||
Comment on attachment 8769312 [details] [diff] [review]
Add `perror` calls when pthread locking fails
Review of attachment 8769312 [details] [diff] [review]:
-----------------------------------------------------------------
Nice!
::: js/src/threading/posix/Mutex.cpp
@@ +11,5 @@
>
> #include "threading/Mutex.h"
> #include "threading/posix/MutexPlatformData.h"
>
> +#define TRY_CALL_PTHREADS(call, msg) \
Oh, excellent! TRY(!) is definitely the right concept here.
Attachment #8769312 -
Flags: review?(terrence) → review+
Comment 12•9 years ago
|
||
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
Comment 13•9 years ago
|
||
Was supposed to be leave-open.
Comment 14•9 years ago
|
||
Gary, could I ask you to reproduce once more with the new patch that's in m-c? It should add a log message giving a little more info right before the crash. Thanks a lot!
Flags: needinfo?(gary)
Reporter | ||
Comment 15•9 years ago
|
||
js::Mutex::lock: pthread_mutex_lock failed: Undefined error: 0
Hit MOZ_CRASH(js::Mutex::lock: pthread_mutex_lock failed) at /Users/skywalker/trees/mozilla-central/js/src/threading/posix/Mutex.cpp:66
Segmentation fault: 11
Tried with m-c rev 679118259e91.
Flags: needinfo?(gary) → needinfo?(nfitzgerald)
Comment 16•9 years ago
|
||
This is crashing in the ARM simulator, so the platform should be ARM right? Do we know whether it's just the simulator itself being buggy? Is there a similar crash in these mutexes when tested on real x86?
Flags: needinfo?(gary)
Hardware: x86 → ARM
Reporter | ||
Comment 17•9 years ago
|
||
(In reply to Daniel Veditz [:dveditz] from comment #16)
> This is crashing in the ARM simulator, so the platform should be ARM right?
> Do we know whether it's just the simulator itself being buggy? Is there a
> similar crash in these mutexes when tested on real x86?
--arm-sim-icache-checks is needed for this testcase to trigger. I was unable to get x86 binaries to crash because they do not support --arm-sim-icache-checks.
However, I am not definitively sure whether there may be a latent bug present, so moving the needinfo? back to :fitzgen.
Flags: needinfo?(gary)
Comment 18•8 years ago
|
||
(In reply to Daniel Veditz [:dveditz] from comment #16)
> This is crashing in the ARM simulator, so the platform should be ARM right?
> Do we know whether it's just the simulator itself being buggy? Is there a
> similar crash in these mutexes when tested on real x86?
This is within the simulator itself.
Unfortunately, I am not familiar with the simulator.
I am honestly pretty confused about this one. On the one hand, pthread_mutex_lock is documented to return 0 or an error number, meaning that if it doesn't return 0, then the call failed. We are getting a non-0 return value, which implies that the call failed. However, perror() is giving conficting information(!) by saying that errno is 0 and that there is no error.
I'm not sure how to proceed.
Flags: needinfo?(nfitzgerald)
Comment 19•8 years ago
|
||
Interesting, apparently pthread_mutex_lock doesn't set errno and only returns the error code directly, so perror() is useless with it.
Comment 20•8 years ago
|
||
The old code assumed that various pthreads calls would set errno, and therefore
we could use perror() to print information about them. That assumption was
incorrect: pthreads calls return the error number directly and do not set errno
at all.
Attachment #8778352 -
Flags: review?(terrence)
Comment 21•8 years ago
|
||
Comment 22•8 years ago
|
||
Comment on attachment 8778352 [details] [diff] [review]
Fix pthreads error reporting for posix js::Mutex
Review of attachment 8778352 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/threading/posix/Mutex.cpp
@@ +41,5 @@
> + break; \
> + } \
> + MOZ_CRASH(msg); \
> + } \
> + }
You can assign to errno, so I think we should instead do:
MOZ_ASSERT(!errno);
errno = result;
perror(msg);
Attachment #8778352 -
Flags: review?(terrence)
Comment 23•8 years ago
|
||
The old code assumed that various pthreads calls would set errno, and therefore
we could use perror() to print information about them. That assumption was
incorrect: pthreads calls return the error number directly and do not set errno
at all. The solution: asign the return value to errno and then call perror();
Attachment #8778390 -
Flags: review?(terrence)
Updated•8 years ago
|
Attachment #8778352 -
Attachment is obsolete: true
Comment 24•8 years ago
|
||
Sorry for the bug spam. This version includes the assert that errno is not
already set.
Attachment #8778392 -
Flags: review?(terrence)
Updated•8 years ago
|
Attachment #8778390 -
Attachment is obsolete: true
Attachment #8778390 -
Flags: review?(terrence)
Comment 25•8 years ago
|
||
Comment 26•8 years ago
|
||
Comment on attachment 8778392 [details] [diff] [review]
Fix pthreads error reporting for posix js::Mutex
Review of attachment 8778392 [details] [diff] [review]:
-----------------------------------------------------------------
Thanks! That's much more future-proof.
::: js/src/threading/posix/Mutex.cpp
@@ +16,5 @@
> #define TRY_CALL_PTHREADS(call, msg) \
> + { \
> + int result = (call); \
> + if (result != 0) { \
> + MOZ_ASSERT(!errno); \
Hurm, thinking more about it: we probably don't check and reset errno anywhere else, so this is probably actually not that safe. I'd say if it hits in the try run just remove the assert and land without.
Attachment #8778392 -
Flags: review?(terrence) → review+
Comment 27•8 years ago
|
||
Updated•8 years ago
|
Group: javascript-core-security
Reporter | ||
Comment 28•8 years ago
|
||
Nick, what's next here? We have other similar signatures crashing m-c rev f7d5008ee2ab at:
Hit MOZ_CRASH(js::Mutex::lock: pthread_mutex_lock failed) at js/src/threading/posix/Mutex.cpp:71
The testcases are also fairly reproducible but hard-to-reduce.
Flags: needinfo?(nfitzgerald)
Reporter | ||
Comment 29•8 years ago
|
||
Also cc'ing Sean since this seems related to ARM.
Keywords: crash
Summary: Assertion failure: r == 0, at js/src/threading/posix/Mutex.cpp:52 → Hit MOZ_CRASH(js::Mutex::lock: pthread_mutex_lock failed) or Assertion failure: r == 0, at js/src/threading/posix/Mutex.cpp:52
Comment 30•8 years ago
|
||
(In reply to Gary Kwong [:gkw] [:nth10sd] from comment #28)
> Created attachment 8796890 [details]
> Stack from Crash Reporter
>
> Nick, what's next here? We have other similar signatures crashing m-c rev
> f7d5008ee2ab at:
>
> Hit MOZ_CRASH(js::Mutex::lock: pthread_mutex_lock failed) at
> js/src/threading/posix/Mutex.cpp:71
>
> The testcases are also fairly reproducible but hard-to-reduce.
This is the same as bug 1294242: the arm simulator has a lock acquired, a signal is sent to the same thread, the signal handler attempts to acquire the same lock that is already acquired. This would be a deadlock, but we eagerly detect it and crash with a diagnostic message. This is a bug in the arm simulator. It is not safe to acquire locks in a signal handler.
Flags: needinfo?(nfitzgerald)
Assignee | ||
Comment 31•8 years ago
|
||
To clarify, this is a bug that only exists within the ARM simulator (and possibly the MIPS simulators). Other architectures and ARM-on-hardware do not have this problem, because they do not acquire locks to flush their caches.
Reporter | ||
Comment 32•8 years ago
|
||
Assertion failure: !mEntered, at dist/include/mozilla/Vector.h:472 is yet another new signature.
What's next here, Sean?
Flags: needinfo?(sstangl)
Assignee | ||
Comment 33•8 years ago
|
||
This patch contains a fix for ARM. If it looks good, the same logic needs to be extended to MIPS32 and MIPS64.
There are two ways to avoid double-locking the mutex:
1. Protect ICacheCheckingEnabled-sensitive areas with pthread_sigmask().
2. Disallow the signal handler from manipulating the icache, then detect the manipulation and flush it.
The advantage of #1 is that the cache remains correct. The disadvantage is that it requires memory allocation from signal context, and as far as I know jemalloc isn't signal-safe.
So I went with #2, which requires the signal handler to set a flag, and the client code to detect that flag after it's read all state required for making the assertion. A C++ compiler can technically reorder this code to not be signal safe, but I'm not sure what can be done about that other than declaring it volatile.
Gary, does this patch placate the fuzzers?
Flags: needinfo?(sstangl) → needinfo?(gary)
Assignee | ||
Comment 34•8 years ago
|
||
Waldo pointed out that I can just throw an atomic in there.
Reporter | ||
Comment 35•8 years ago
|
||
Comment on attachment 8809639 [details] [diff] [review]
Test patch, ARM only
Yes, it does, thanks!
(didn't fuzz with this patch, but just checked that the patch does fix the testcase)
Flags: needinfo?(gary)
Assignee | ||
Updated•8 years ago
|
Assignee: nobody → sstangl
Assignee | ||
Comment 36•8 years ago
|
||
Luke, does the interaction with cacheInvalidatedBySignalHandler_ look OK to you?
This patch is about an error between Wasm signal handlers and the ARM simulator.
Requesting f? instead of r? since the final patch needs a similar change to the MIPS32 and MIPS64 simulators.
Attachment #8809639 -
Attachment is obsolete: true
Attachment #8809967 -
Flags: feedback?(luke)
Comment 37•8 years ago
|
||
Comment on attachment 8809967 [details] [diff] [review]
0001-Bug-1285074-Make-ICacheMap-manipulation-Wasm-signal-.patch
Review of attachment 8809967 [details] [diff] [review]:
-----------------------------------------------------------------
Seems like the right fix.
::: js/src/jit/arm/Simulator-arm.h
@@ +362,5 @@
> public:
> static bool ICacheCheckingEnabled;
> static void FlushICache(void* start, size_t size);
>
> + // Wasm rewrites Jitcode from a signal handler, but is prevented from
Actually, it is only non-asm.js JS Ion-compiled code that gets its backedges rewritten by the signal handler (wasm just moves the pc directly).
Attachment #8809967 -
Flags: feedback?(luke) → feedback+
Assignee | ||
Comment 38•8 years ago
|
||
Thanks for the feedback. This is the same patch, with MIPS32 and MIPS64 simulator changes as well. ARM64's simulator does not have an instruction cache.
Attachment #8809967 -
Attachment is obsolete: true
Attachment #8810639 -
Flags: review?(luke)
Updated•8 years ago
|
Attachment #8810639 -
Flags: review?(luke) → review+
Assignee | ||
Updated•8 years ago
|
Keywords: checkin-needed
Comment 39•8 years ago
|
||
Pushed by cbook@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/1daa455a877d
Make ICacheMap manipulation signal-safe. r=luke
Keywords: checkin-needed
Comment 40•8 years ago
|
||
sorry had to back this out for spidermonkey bustage like https://treeherder.mozilla.org/logviewer.html#?job_id=39362462&repo=mozilla-inbound
Flags: needinfo?(sstangl)
Comment 41•8 years ago
|
||
Backout by cbook@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/b64acfad92a7
Backed out changeset 1daa455a877d for spidermonkey bustage
Assignee | ||
Comment 42•8 years ago
|
||
Same patch, with reordered #includes to placate `make check-style`.
Attachment #8810639 -
Attachment is obsolete: true
Flags: needinfo?(sstangl)
Assignee | ||
Updated•8 years ago
|
Keywords: checkin-needed
Updated•8 years ago
|
status-firefox51:
--- → affected
status-firefox52:
--- → affected
status-firefox53:
--- → affected
Keywords: leave-open
Whiteboard: leave-open
Target Milestone: mozilla50 → ---
Comment 43•8 years ago
|
||
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/39a7161b9628
Make ICacheMap manipulation signal-safe. r=luke
Keywords: checkin-needed
Comment 44•8 years ago
|
||
bugherder |
Status: REOPENED → RESOLVED
Closed: 9 years ago → 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Comment 45•8 years ago
|
||
I guess this will ride the train from 53
You need to log in
before you can comment on or make changes to this bug.
Description
•