Closed Bug 1285074 Opened 3 years ago Closed 3 years ago

Hit MOZ_CRASH(js::Mutex::lock: pthread_mutex_lock failed) or Assertion failure: r == 0, at js/src/threading/posix/Mutex.cpp:52

Categories

(Core :: JavaScript Engine, defect, critical)

ARM
macOS
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla53
Tracking Status
firefox50 --- wontfix
firefox51 --- wontfix
firefox52 --- wontfix
firefox53 --- fixed

People

(Reporter: gkw, Assigned: sstangl)

References

(Blocks 2 open bugs)

Details

(Keywords: assertion, crash, testcase)

Attachments

(6 files, 6 obsolete files)

47.83 KB, text/plain
Details
765.94 KB, text/plain
Details
3.01 KB, patch
terrence
: review+
Details | Diff | Splinter Review
1.91 KB, patch
terrence
: review+
Details | Diff | Splinter Review
37.28 KB, text/plain
Details
13.75 KB, patch
Details | Diff | Splinter Review
The following testcase crashes on mozilla-central revision 95ffbc4ff635 (build with --enable-debug --enable-more-deterministic --32 --enable-simulator=arm, run with --fuzzing-safe --gc-zeal=14 --arm-sim-icache-checks --ion-eager):

See attachment.

Backtrace:

0   js-dbg-32-dm-clang-armSim-darwin-95ffbc4ff635	0x001cf689 js::Mutex::lock() + 185 (Mutex.cpp:52)
1   js-dbg-32-dm-clang-armSim-darwin-95ffbc4ff635	0x006d02d0 js::jit::Simulator::FlushICache(void*, unsigned long) + 96 (Simulator-arm.cpp:371)
2   js-dbg-32-dm-clang-armSim-darwin-95ffbc4ff635	0x00416f58 js::jit::AutoFlushICache::flush(unsigned long, unsigned long) + 104 (Ion.cpp:3353)
3   js-dbg-32-dm-clang-armSim-darwin-95ffbc4ff635	0x00677329 js::jit::Assembler::RetargetNearBranch(js::jit::Instruction*, int, js::jit::Assembler::Condition, bool) + 233 (Assembler-arm.cpp:2955)
4   js-dbg-32-dm-clang-armSim-darwin-95ffbc4ff635	0x0067704c js::jit::PatchJump(js::jit::CodeLocationJump&, js::jit::CodeLocationLabel, js::jit::ReprotectCode) + 508 (Assembler-arm.cpp:621)
/snip

For detailed crash information, see attachment.

--gc-zeal=14 seems needed, the gczeal(0) function call at the bottom of the testcase is also needed. Locking s-s because of this.

Testcase is fairly intermittent. Replace the path at the top of the testcase to the path of your tree to try and reproduce, ideally at the same revision.
Attached file Testcase
I'm trying to get a bisection window, in the meanwhile setting needinfo? from :terrence and :jonco in case this is related to the recent threading/GC changes, as a start.
Flags: needinfo?(terrence)
Flags: needinfo?(jcoppeard)
Whiteboard: [jsbugmon:update]
I also tried to reduce to a smaller testcase after trying for a few hours, but it gets harder to reproduce.
autoBisect shows this is probably related to the following changeset:

The first bad revision is:
changeset:   https://hg.mozilla.org/mozilla-central/rev/73cc7590bc39
user:        Nick Fitzgerald
date:        Thu Jun 30 13:13:30 2016 -0700
summary:     Bug 1283278 - Use js::Mutex instead of PRLock in js/src/jit/arm/Simulator-arm.{h,cpp}; r=terrence

Actually, this is likely Nick's. Nick, is bug 1283278 a likely regressor?
Blocks: 1283278
Flags: needinfo?(terrence)
Flags: needinfo?(nfitzgerald)
Flags: needinfo?(jcoppeard)
This means something about taking the lock is buggy. This is likely an existing issue and the new mutex has debug assertions against various errors that weren't present before. I'm adding `perror()` calls to the posix implementation of `js::Mutex` so we can tell for which reason the locking is failing.
Flags: needinfo?(nfitzgerald)
Whiteboard: leave-open
Comment on attachment 8769286 [details] [diff] [review]
Add `perror` calls when pthread locking fails

Review of attachment 8769286 [details] [diff] [review]:
-----------------------------------------------------------------

This is a ton of copy-paste. Let's instead abstract it.

How about:

#define CHECK_PTHREAD_CALL(c, msg) \
  if ((c) != 0) { \
    perror((msg)); \
    MOZ_CRASH((msg)); \
  }

CHECK_PTHREAD_CALL(pthread_mutexattr_init(&attr),
                   "js::Mutex::Mutex: pthread_mutexattr_init failed");

#undef CHECK_PTHREAD_CALL

Or if a macro feels too dirty, I think you could also make a fairly clean MOZ_ALWAYS_INLINE varargs template to do the same, although a macro feels fine here.
Attachment #8769286 - Flags: review?(terrence)
I was going to use a static helper function but MOZ_CRASH needs the literal. Embarassingly enough, I didn't think of a macro :P
Attachment #8769286 - Attachment is obsolete: true
Comment on attachment 8769312 [details] [diff] [review]
Add `perror` calls when pthread locking fails

Review of attachment 8769312 [details] [diff] [review]:
-----------------------------------------------------------------

Nice!

::: js/src/threading/posix/Mutex.cpp
@@ +11,5 @@
>  
>  #include "threading/Mutex.h"
>  #include "threading/posix/MutexPlatformData.h"
>  
> +#define TRY_CALL_PTHREADS(call, msg)            \

Oh, excellent! TRY(!) is definitely the right concept here.
Attachment #8769312 - Flags: review?(terrence) → review+
https://hg.mozilla.org/mozilla-central/rev/48610bba6a85
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
Was supposed to be leave-open.
Status: RESOLVED → REOPENED
Keywords: leave-open
Resolution: FIXED → ---
Gary, could I ask you to reproduce once more with the new patch that's in m-c? It should add a log message giving a little more info right before the crash. Thanks a lot!
Flags: needinfo?(gary)
js::Mutex::lock: pthread_mutex_lock failed: Undefined error: 0
Hit MOZ_CRASH(js::Mutex::lock: pthread_mutex_lock failed) at /Users/skywalker/trees/mozilla-central/js/src/threading/posix/Mutex.cpp:66
Segmentation fault: 11

Tried with m-c rev 679118259e91.
Flags: needinfo?(gary) → needinfo?(nfitzgerald)
This is crashing in the ARM simulator, so the platform should be ARM right? Do we know whether it's just the simulator itself being buggy? Is there a similar crash in these mutexes when tested on real x86?
Flags: needinfo?(gary)
Hardware: x86 → ARM
(In reply to Daniel Veditz [:dveditz] from comment #16)
> This is crashing in the ARM simulator, so the platform should be ARM right?
> Do we know whether it's just the simulator itself being buggy? Is there a
> similar crash in these mutexes when tested on real x86?

--arm-sim-icache-checks is needed for this testcase to trigger. I was unable to get x86 binaries to crash because they do not support --arm-sim-icache-checks.

However, I am not definitively sure whether there may be a latent bug present, so moving the needinfo? back to :fitzgen.
Flags: needinfo?(gary)
(In reply to Daniel Veditz [:dveditz] from comment #16)
> This is crashing in the ARM simulator, so the platform should be ARM right?
> Do we know whether it's just the simulator itself being buggy? Is there a
> similar crash in these mutexes when tested on real x86?

This is within the simulator itself.

Unfortunately, I am not familiar with the simulator.

I am honestly pretty confused about this one. On the one hand, pthread_mutex_lock is documented to return 0 or an error number, meaning that if it doesn't return 0, then the call failed. We are getting a non-0 return value, which implies that the call failed. However, perror() is giving conficting information(!) by saying that errno is 0 and that there is no error.

I'm not sure how to proceed.
Flags: needinfo?(nfitzgerald)
Interesting, apparently pthread_mutex_lock doesn't set errno and only returns the error code directly, so perror() is useless with it.
The old code assumed that various pthreads calls would set errno, and therefore
we could use perror() to print information about them. That assumption was
incorrect: pthreads calls return the error number directly and do not set errno
at all.
Attachment #8778352 - Flags: review?(terrence)
Comment on attachment 8778352 [details] [diff] [review]
Fix pthreads error reporting for posix js::Mutex

Review of attachment 8778352 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/threading/posix/Mutex.cpp
@@ +41,5 @@
> +          break;                                                        \
> +      }                                                                 \
> +      MOZ_CRASH(msg);                                                   \
> +    }                                                                   \
> +  }

You can assign to errno, so I think we should instead do:

MOZ_ASSERT(!errno);
errno = result;
perror(msg);
Attachment #8778352 - Flags: review?(terrence)
The old code assumed that various pthreads calls would set errno, and therefore
we could use perror() to print information about them. That assumption was
incorrect: pthreads calls return the error number directly and do not set errno
at all. The solution: asign the return value to errno and then call perror();
Attachment #8778390 - Flags: review?(terrence)
Attachment #8778352 - Attachment is obsolete: true
Sorry for the bug spam. This version includes the assert that errno is not
already set.
Attachment #8778392 - Flags: review?(terrence)
Attachment #8778390 - Attachment is obsolete: true
Attachment #8778390 - Flags: review?(terrence)
Comment on attachment 8778392 [details] [diff] [review]
Fix pthreads error reporting for posix js::Mutex

Review of attachment 8778392 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks! That's much more future-proof.

::: js/src/threading/posix/Mutex.cpp
@@ +16,5 @@
>  #define TRY_CALL_PTHREADS(call, msg)            \
> +  {                                             \
> +    int result = (call);                        \
> +    if (result != 0) {                          \
> +      MOZ_ASSERT(!errno);                       \

Hurm, thinking more about it: we probably don't check and reset errno anywhere else, so this is probably actually not that safe. I'd say if it hits in the try run just remove the assert and land without.
Attachment #8778392 - Flags: review?(terrence) → review+
Group: javascript-core-security
Nick, what's next here? We have other similar signatures crashing m-c rev f7d5008ee2ab at:

Hit MOZ_CRASH(js::Mutex::lock: pthread_mutex_lock failed) at js/src/threading/posix/Mutex.cpp:71

The testcases are also fairly reproducible but hard-to-reduce.
Flags: needinfo?(nfitzgerald)
Also cc'ing Sean since this seems related to ARM.
Keywords: crash
Summary: Assertion failure: r == 0, at js/src/threading/posix/Mutex.cpp:52 → Hit MOZ_CRASH(js::Mutex::lock: pthread_mutex_lock failed) or Assertion failure: r == 0, at js/src/threading/posix/Mutex.cpp:52
(In reply to Gary Kwong [:gkw] [:nth10sd] from comment #28)
> Created attachment 8796890 [details]
> Stack from Crash Reporter
> 
> Nick, what's next here? We have other similar signatures crashing m-c rev
> f7d5008ee2ab at:
> 
> Hit MOZ_CRASH(js::Mutex::lock: pthread_mutex_lock failed) at
> js/src/threading/posix/Mutex.cpp:71
> 
> The testcases are also fairly reproducible but hard-to-reduce.

This is the same as bug 1294242: the arm simulator has a lock acquired, a signal is sent to the same thread, the signal handler attempts to acquire the same lock that is already acquired. This would be a deadlock, but we eagerly detect it and crash with a diagnostic message. This is a bug in the arm simulator. It is not safe to acquire locks in a signal handler.
Flags: needinfo?(nfitzgerald)
To clarify, this is a bug that only exists within the ARM simulator (and possibly the MIPS simulators). Other architectures and ARM-on-hardware do not have this problem, because they do not acquire locks to flush their caches.
Assertion failure: !mEntered, at dist/include/mozilla/Vector.h:472 is yet another new signature.

What's next here, Sean?
Flags: needinfo?(sstangl)
Attached patch Test patch, ARM only (obsolete) — Splinter Review
This patch contains a fix for ARM. If it looks good, the same logic needs to be extended to MIPS32 and MIPS64.

There are two ways to avoid double-locking the mutex:
1. Protect ICacheCheckingEnabled-sensitive areas with pthread_sigmask().
2. Disallow the signal handler from manipulating the icache, then detect the manipulation and flush it.

The advantage of #1 is that the cache remains correct. The disadvantage is that it requires memory allocation from signal context, and as far as I know jemalloc isn't signal-safe.

So I went with #2, which requires the signal handler to set a flag, and the client code to detect that flag after it's read all state required for making the assertion. A C++ compiler can technically reorder this code to not be signal safe, but I'm not sure what can be done about that other than declaring it volatile.

Gary, does this patch placate the fuzzers?
Flags: needinfo?(sstangl) → needinfo?(gary)
Waldo pointed out that I can just throw an atomic in there.
Comment on attachment 8809639 [details] [diff] [review]
Test patch, ARM only

Yes, it does, thanks!

(didn't fuzz with this patch, but just checked that the patch does fix the testcase)
Flags: needinfo?(gary)
Assignee: nobody → sstangl
Luke, does the interaction with cacheInvalidatedBySignalHandler_ look OK to you?

This patch is about an error between Wasm signal handlers and the ARM simulator.

Requesting f? instead of r? since the final patch needs a similar change to the MIPS32 and MIPS64 simulators.
Attachment #8809639 - Attachment is obsolete: true
Attachment #8809967 - Flags: feedback?(luke)
Comment on attachment 8809967 [details] [diff] [review]
0001-Bug-1285074-Make-ICacheMap-manipulation-Wasm-signal-.patch

Review of attachment 8809967 [details] [diff] [review]:
-----------------------------------------------------------------

Seems like the right fix.

::: js/src/jit/arm/Simulator-arm.h
@@ +362,5 @@
>    public:
>      static bool ICacheCheckingEnabled;
>      static void FlushICache(void* start, size_t size);
>  
> +    // Wasm rewrites Jitcode from a signal handler, but is prevented from

Actually, it is only non-asm.js JS Ion-compiled code that gets its backedges rewritten by the signal handler (wasm just moves the pc directly).
Attachment #8809967 - Flags: feedback?(luke) → feedback+
Thanks for the feedback. This is the same patch, with MIPS32 and MIPS64 simulator changes as well. ARM64's simulator does not have an instruction cache.
Attachment #8809967 - Attachment is obsolete: true
Attachment #8810639 - Flags: review?(luke)
Attachment #8810639 - Flags: review?(luke) → review+
Keywords: checkin-needed
Pushed by cbook@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/1daa455a877d
Make ICacheMap manipulation signal-safe. r=luke
Keywords: checkin-needed
sorry had to back this out for spidermonkey bustage like https://treeherder.mozilla.org/logviewer.html#?job_id=39362462&repo=mozilla-inbound
Flags: needinfo?(sstangl)
Backout by cbook@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/b64acfad92a7
Backed out changeset 1daa455a877d for spidermonkey bustage
Same patch, with reordered #includes to placate `make check-style`.
Attachment #8810639 - Attachment is obsolete: true
Flags: needinfo?(sstangl)
Keywords: checkin-needed
Keywords: leave-open
Whiteboard: leave-open
Target Milestone: mozilla50 → ---
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/39a7161b9628
Make ICacheMap manipulation signal-safe. r=luke
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/39a7161b9628
Status: REOPENED → RESOLVED
Closed: 3 years ago3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Depends on: 1319232
Duplicate of this bug: 1294242
Depends on: 1324910
You need to log in before you can comment on or make changes to this bug.