Closed Bug 1353199 Opened 8 years ago Closed 4 years ago

wasm crash with profiling enabled due to ThreadLocalData<> jitTop

Categories

(Core :: JavaScript Engine: JIT, defect, P3)

defect

Tracking

()

RESOLVED FIXED

People

(Reporter: sfink, Assigned: bhackett1024)

Details

(Keywords: triage-deferred)

Attachments

(2 files)

Program ./obj-x86_64-pc-linux-gnu/dist/bin/firefox (pid = 22598) received signal 11. Stack: #04: js::CheckThreadLocal::check() const (/home/sfink/src/mozilla/js/src/threading/ProtectedData.cpp:41 (discriminator 1)) #05: js::ProtectedData<js::CheckThreadLocal, unsigned char*>::ref() const (/home/sfink/src/mozilla/js/src/threading/ProtectedData.h:109) #06: JS::ProfilingFrameIterator::ProfilingFrameIterator(JSContext*, JS::ProfilingFrameIterator::RegisterState const&, unsigned int) (/home/sfink/src/mozilla/js/src/vm/Stack.cpp:1787) #07: MergeStacksIntoProfile(ProfileBuffer*, TickSample*, NativeStack&) (/home/sfink/src/mozilla/tools/profiler/core/platform.cpp:535) #08: std::__atomic_base<unsigned int>::fetch_add(unsigned int, std::memory_order) (/usr/include/c++/6.3.1/bits/atomic_base.h:514) #09: Tick(mozilla::BaseAutoLock<ProfilerStateMutex> const&, ProfileBuffer*, TickSample*) (/home/sfink/src/mozilla/tools/profiler/core/platform.cpp:980) #10: SamplerThread::SuspendAndSampleAndResumeThread(mozilla::BaseAutoLock<ProfilerStateMutex> const&, TickSample*) (/home/sfink/src/mozilla/tools/profiler/core/platform-linux-android.cpp:418) #11: SamplerThread::Run() (/home/sfink/src/mozilla/tools/profiler/core/platform.cpp:1607) #12: ThreadEntry(void*) (/home/sfink/src/mozilla/tools/profiler/core/platform-linux-android.cpp:277) #13: start_thread (/usr/src/debug/glibc-2.24-33-ge9e69e4/nptl/pthread_create.c:333) #14: clone (/usr/src/debug////////glibc-2.24-33-ge9e69e4/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:107) JS::ProfilingFrameIterator::iteratorConstruct(const RegisterState& state) does if (activation_->isWasm()) { new (storage()) wasm::ProfilingFrameIterator(*activation_->asWasm(), state); // Set savedPrevJitTop_ to the actual jitTop_ from the runtime. savedPrevJitTop_ = activation_->cx()->jitTop; return; } from the sampler thread, and JSContext::jitTop is ThreadLocalData<>.
I expect the issue here is that the profiler thread is calling SuspendThread() on the thread being sampled and then accessing cx->jitTop from the profiler thread whose TlsContext will be null. I think this problem has come up in other contexts as well and there are ways to silence these assertions; maybe AutoNoteSingleThreadedRegion?
Flags: needinfo?(bhackett1024)
AutoNoteSingleThreadedRegion will work, but we need a better solution for this problem (e.g. many protected data checks are disabled on windows because of this) and I'll write a more comprehensive patch.
For what it's worth, I've landed this fix in bug 1340219 (as well as another use of ANSTR in WasmFrameIterator.cpp) because this could be hiding other browser issues when using a debug-optimized local build. The logic in both cases is the same: the profiler thread has interrupted the main thread and we're in profiler code, so it's legit to manipulate ProtectedData here.
After looking at the profiler code a bit, I think it makes the most sense here to just allow API consumers to disable the protected data checks if they are doing something which the checks don't handle. It would be nice if we could more narrowly specify that one thread is suspending another and is allowed to do anything the second thread would, but specifying this would add a fair bit of complexity to the API --- the only way the API has to specify threads right now is via JSContext, and when the profiler suspends a thread it doesn't necessarily know that thread's JSContext --- and require a new TLS key internally (the suspending thread might not have a JSContext). This patch renames AutoNoteSingleThreadedRegion to AutoDisableProtectedDataChecks and moves it to the public API.
Assignee: nobody → bhackett1024
Flags: needinfo?(bhackett1024)
Attachment #8858324 - Flags: review?(jdemooij)
Disable protected data checks while potentially using a profiling frame iterator from a sampler thread.
Attachment #8858325 - Flags: review?(jseward)
Comment on attachment 8858325 [details] [diff] [review] Use AutoDisableProtectedDataChecks in the profiler Review of attachment 8858325 [details] [diff] [review]: ----------------------------------------------------------------- ::: tools/profiler/core/platform.cpp @@ +599,5 @@ > + // for the protected data checks used by the JS engine to check requirements > + // about which thread owns which data. Since in such cases the sampled thread > + // is suspended, we can access the sampled thread's data without fear of > + // races. Disable the protected data checks while performing these accesses. > + JS::AutoDisableProtectedDataChecks adpdc; There's a new AutoNoteSingleThreadedRegion in WasmFrameIterator.cpp which, iiuc, can simply be removed with this here.
Attachment #8858324 - Flags: review?(jdemooij) → review+
Attachment #8858325 - Flags: review?(jseward) → review?(n.nethercote)
Comment on attachment 8858325 [details] [diff] [review] Use AutoDisableProtectedDataChecks in the profiler Review of attachment 8858325 [details] [diff] [review]: ----------------------------------------------------------------- ::: tools/profiler/core/platform.cpp @@ +594,5 @@ > : aBuffer->mGeneration; > uint32_t jsCount = 0; > JS::ProfilingFrameIterator::Frame jsFrames[1000]; > > + // We might not be on the thread we are sampling, which will cause problems We are on the thread we are sampling if and only if aSample.mIsSynchronous is true. So you should be able to make this a Maybe<JS::AutoDisableProtectedDataChecks> and then emplace it if !aSample.mIsSynchronous holds. And the comment will need updating accordingly.
Attachment #8858325 - Flags: review?(n.nethercote) → feedback+
Keywords: triage-deferred
Priority: -- → P3
Is there more to be done here? Is it still relevant?
Flags: needinfo?(bhackett1024)

If the profiler works in debug builds, this shouldn't be needed anymore.

Flags: needinfo?(bhackett1024)
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: