Open Bug 1398137 Opened 7 years ago Updated 3 years ago

Crash under memcpy | DoLULBacktrace

Categories

(Core :: Gecko Profiler, defect, P2)

defect

Tracking

()

REOPENED

People

(Reporter: nbp, Unassigned)

Details

Under a self-built nightly based on a tree from 8 days ago. Turning on the Gecko profiler immediately caused the following backtrace: (rr) bt #0 0x00007f4ae691546f in __memmove_avx_unaligned_erms () at /nix/store/kjwbqnh13dxh6w4pk2gb3ddmhpiaihqg-glibc-2.25/lib/libc.so.6 #1 0x00007f4ad3d6a010 in DoLULBacktrace(PSLockRef, ThreadInfo const&, Registers const&, NativeStack&) (aLock=..., aThreadInfo=..., aRegs=..., aNativeStack=...) at /home/nicolas/mozilla/contrib-push/tools/profiler/core/platform.cpp:1212 #2 0x00007f4ad3d6a18f in DoNativeBacktrace(PSLockRef, ThreadInfo const&, Registers const&, NativeStack&) (aLock=..., aThreadInfo=..., aRegs=..., aNativeStack=...) at /home/nicolas/mozilla/contrib-push/tools/profiler/core/platform.cpp:1245 #3 0x00007f4ad3d6a2e8 in DoSharedSample(PSLockRef, bool, ThreadInfo&, mozilla::TimeStamp const&, Registers const&, ProfileBuffer::LastSample*, ProfileBuffer&) (aLock=..., aIsSynchronous=false, aThreadInfo=..., aNow=..., aRegs=..., aLS=0x7f4abfd3aec4, aBuffer=...) at /home/nicolas/mozilla/contrib-push/tools/profiler/core/platform.cpp:1283 #4 0x00007f4ad3d6a49c in DoPeriodicSample(PSLockRef, ThreadInfo&, mozilla::TimeStamp const&, Registers const&, int64_t, int64_t) (aLock=..., aThreadInfo=..., aNow=..., aRegs=..., aRSSMemory=0, aUSSMemory=0) at /home/nicolas/mozilla/contrib-push/tools/profiler/core/platform.cpp:1322 #5 0x00007f4ad3d6b816 in SamplerThread::<lambda(const Registers&)>::operator()(const Registers &) const (__closure=0x7f4a9f8fee40, aRegs=...) at /home/nicolas/mozilla/contrib-push/tools/profiler/core/platform.cpp:1980 #6 0x00007f4ad3d75ff8 in Sampler::SuspendAndSampleAndResumeThread<SamplerThread::Run()::<lambda(const Registers&)> >(PSLockRef, const ThreadInfo &, const SamplerThread::<lambda(const Registers&)> &) (this=0x7f4a9c426b30, aLock=..., aThreadInfo=..., aProcessRegs=...) at /home/nicolas/mozilla/contrib-push/tools/profiler/core/platform-linux-android.cpp:345 #7 0x00007f4ad3d6bb0f in SamplerThread::Run() (this=0x7f4a9c426b30) at /home/nicolas/mozilla/contrib-push/tools/profiler/core/platform.cpp:1981 #8 0x00007f4ad3d6bffd in ThreadEntry(void*) (aArg=0x7f4a9c426b30) at /home/nicolas/mozilla/contrib-push/tools/profiler/core/platform-linux-android.cpp:385 #9 0x00007f4ae7638234 in start_thread () at /nix/store/kjwbqnh13dxh6w4pk2gb3ddmhpiaihqg-glibc-2.25/lib/libpthread.so.0 #10 0x00007f4ae68d575f in clone () at /nix/store/kjwbqnh13dxh6w4pk2gb3ddmhpiaihqg-glibc-2.25/lib/libc.so.6 I captured this profile under rr, the problem seems to be related to the estimation of the stack size (nToCopy == 163840), which includes way too much content. > 1176 uintptr_t start = startRegs.xsp.Value() - rEDZONE_SIZE; // start = 0x7f4abe7fdd60 > > Set by > * 7 Thread 28550.28573 (Compositor) 0x00007f4ad3d6bdf2 > in SigprofHandler (aSignal=27, aInfo=0x7f4abe7fd970, aContext=0x7f4abe7fd840) > at /home/nicolas/mozilla/contrib-push/tools/profiler/core/platform-linux-android.cpp:219 > 1186 uintptr_t end = reinterpret_cast<uintptr_t>(aThreadInfo.StackTop()); // end = 0x7f4abeffdeaf > 1192 nToCopy = end - start; // nToCopy = 8388936 > 1193 if (nToCopy > lul::N_STACK_BYTES) > 1194 nToCopy = lul::N_STACK_BYTES; // nToCopy = 163840
Here is the stack from which the stack end was taken from: > Thread 7 hit Hardware watchpoint 5: -location aThreadInfo.mStackTop > > Old value = (void *) 0x7f4abeffdeaf > New value = (void *) 0x0 > 0x00007f4ad3d68471 in ThreadInfo::ThreadInfo (this=0x7f4abfd3ac00, aName=0x7f4abfd9ed68 "Compositor", aThreadId=28573, aIsMainThread=false, aStackTop=0x7f4abeffdeaf) at /home/nicolas/mozilla/contrib-push/tools/profiler/core/ThreadInfo.cpp:34 34 , mLastSample() > (rr) bt > #0 0x00007f4ad3d68471 in ThreadInfo::ThreadInfo(char const*, int, bool, void*) (this=0x7f4abfd3ac00, aName=0x7f4abfd9ed68 "Compositor", aThreadId=28573, aIsMainThread=false, aStackTop=0x7f4abeffdeaf) at /home/nicolas/mozilla/contrib-push/tools/profiler/core/ThreadInfo.cpp:34 > #1 0x00007f4ad3d6cd98 in locked_register_thread(PSLockRef, char const*, void*) (aLock=..., aName=0x7f4abfd9ed68 "Compositor", aStackTop=0x7f4abeffdeaf) at /home/nicolas/mozilla/contrib-push/tools/profiler/core/platform.cpp:2146 > #2 0x00007f4ad3d6fcc5 in profiler_register_thread(char const*, void*) (aName=0x7f4abfd9ed68 "Compositor", aGuessStackTop=0x7f4abeffdeaf) at /home/nicolas/mozilla/contrib-push/tools/profiler/core/platform.cpp:2987 > #3 0x00007f4acdea11d2 in mozilla::AutoProfilerRegisterThread::AutoProfilerRegisterThread(char const*, mozilla::detail::GuardObjectNotifier&&) (this=0x7f4abeffdeaf, aName=0x7f4abfd9ed68 "Compositor", _notifier=<unknown type in /home/nicolas/mozilla/_build/firefox/bugzil.la/js-startup-cache/final/x64/gcc/dbg/dist/bin/libxul.so, CU 0x6715cc, DIE 0x6f1fae>) at /home/nicolas/mozilla/_build/firefox/bugzil.la/js-startup-cache/final/x64/gcc/dbg/dist/include/GeckoProfiler.h:491 > #4 0x00007f4ace64be55 in base::Thread::ThreadMain() (this=0x7f4abfd9ed30) at /home/nicolas/mozilla/contrib-push/ipc/chromium/src/base/thread.cc:157 > #5 0x00007f4ace6344f3 in ThreadFunc(void*) (closure=0x7f4abfd9ed30) at /home/nicolas/mozilla/contrib-push/ipc/chromium/src/base/platform_thread_posix.cc:38 > #6 0x00007f4ae7638234 in start_thread () at /nix/store/kjwbqnh13dxh6w4pk2gb3ddmhpiaihqg-glibc-2.25/lib/libpthread.so.0 > #7 0x00007f4ae68d575f in clone () at /nix/store/kjwbqnh13dxh6w4pk2gb3ddmhpiaihqg-glibc-2.25/lib/libc.so.6 I suspect that the Compositor thread might have done a long-jump, which causes these awfully long stack space to be reported and limited to the lul::N_STACK_BYTES value.
This seems to be a rr issue and not a Gecko Profiler issue: Here 0x7f4abeffdeaf is the stackTop address (comment 1), and the following print command highlight that the syscall_hook does the equivalent of a long-jump. > #9 0x00007f4ae763e375 in pthread_cond_wait@@GLIBC_2.3.2 () from /nix/store/kjwbqnh13dxh6w4pk2gb3ddmhpiaihqg-glibc-2.25/lib/libpthread.so.0 > (rr) p 0x7f4abeffdeaf - (uintptr_t) $sp > $1081 = 1535 > (rr) f 8 > #8 0x00007f4ae784fc95 in _syscall_hook_trampoline_48_3d_00_f0_ff_ff () from /nix/store/rn9vv9qdw2l5vvy5d9fxbv04w25q930a-rr-4.5.0-master/bin/../lib/rr/librrpreload.so > (rr) p 0x7f4abeffdeaf - (uintptr_t) $sp > $1082 = 8388287 This seems to cause the problem I observed here, as the SigprofHandler is called with the stack of the syscall_hook instead of the stack of the of the original thread. I will check against newer versions of rr, and open a bug against rr if I can still reproduce the issue.
Group: javascript-core-security
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INVALID
For the record, you can work around this by running with rr record -n, which will disable the syscallbuf optimization and avoid being on the wrong stack when you're sampling, but at a fairly substantial cost in performance.

This bug is, in fact, valid, and is, in fact due to LUL. It could very well happen without rr, with a signal handler with an alternate stack, for example.

What happens is that because of the alternate stack that rr sets, the start and the end of the stack are not related. The relevant code is:

    uintptr_t nToCopy = 0;
    if (start < end) {
      nToCopy = end - start;
      if (nToCopy > lul::N_STACK_BYTES) nToCopy = lul::N_STACK_BYTES;
    }
    MOZ_ASSERT(nToCopy <= lul::N_STACK_BYTES);

(https://searchfox.org/mozilla-central/rev/79d73b4aff88dd4a0f06dd3789e1148c49b0de60/tools/profiler/core/platform.cpp#2358-2363)

When you're lucky, rr's stack has a bigger address, so the if is not taken, nToCopy stays 0, and no copy happens.
When you're not lucky, the if is taken, nToCopy is larger than N_STACK_BYTES, and capped to that value... which is larger than the stack size that rr sets up.
So the memcpy that follows goes past that stack space and leads to a crash because there's no page mapped there (or there is a guard page).

Status: RESOLVED → REOPENED
Resolution: INVALID → ---
Severity: normal → S3
Priority: -- → P2
You need to log in before you can comment on or make changes to this bug.