Crash under memcpy | DoLULBacktrace

RESOLVED INVALID

Status

()

Core
Gecko Profiler
RESOLVED INVALID
10 months ago
10 months ago

People

(Reporter: nbp, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Under a self-built nightly based on a tree from 8 days ago.
Turning on the Gecko profiler immediately caused the following backtrace:

(rr) bt
#0  0x00007f4ae691546f in __memmove_avx_unaligned_erms () at /nix/store/kjwbqnh13dxh6w4pk2gb3ddmhpiaihqg-glibc-2.25/lib/libc.so.6
#1  0x00007f4ad3d6a010 in DoLULBacktrace(PSLockRef, ThreadInfo const&, Registers const&, NativeStack&) (aLock=..., aThreadInfo=..., aRegs=..., aNativeStack=...) at /home/nicolas/mozilla/contrib-push/tools/profiler/core/platform.cpp:1212
#2  0x00007f4ad3d6a18f in DoNativeBacktrace(PSLockRef, ThreadInfo const&, Registers const&, NativeStack&) (aLock=..., aThreadInfo=..., aRegs=..., aNativeStack=...) at /home/nicolas/mozilla/contrib-push/tools/profiler/core/platform.cpp:1245
#3  0x00007f4ad3d6a2e8 in DoSharedSample(PSLockRef, bool, ThreadInfo&, mozilla::TimeStamp const&, Registers const&, ProfileBuffer::LastSample*, ProfileBuffer&) (aLock=..., aIsSynchronous=false, aThreadInfo=..., aNow=..., aRegs=..., aLS=0x7f4abfd3aec4, aBuffer=...) at /home/nicolas/mozilla/contrib-push/tools/profiler/core/platform.cpp:1283
#4  0x00007f4ad3d6a49c in DoPeriodicSample(PSLockRef, ThreadInfo&, mozilla::TimeStamp const&, Registers const&, int64_t, int64_t) (aLock=..., aThreadInfo=..., aNow=..., aRegs=..., aRSSMemory=0, aUSSMemory=0) at /home/nicolas/mozilla/contrib-push/tools/profiler/core/platform.cpp:1322
#5  0x00007f4ad3d6b816 in SamplerThread::<lambda(const Registers&)>::operator()(const Registers &) const (__closure=0x7f4a9f8fee40, aRegs=...) at /home/nicolas/mozilla/contrib-push/tools/profiler/core/platform.cpp:1980
#6  0x00007f4ad3d75ff8 in Sampler::SuspendAndSampleAndResumeThread<SamplerThread::Run()::<lambda(const Registers&)> >(PSLockRef, const ThreadInfo &, const SamplerThread::<lambda(const Registers&)> &) (this=0x7f4a9c426b30, aLock=..., aThreadInfo=..., aProcessRegs=...) at /home/nicolas/mozilla/contrib-push/tools/profiler/core/platform-linux-android.cpp:345
#7  0x00007f4ad3d6bb0f in SamplerThread::Run() (this=0x7f4a9c426b30) at /home/nicolas/mozilla/contrib-push/tools/profiler/core/platform.cpp:1981
#8  0x00007f4ad3d6bffd in ThreadEntry(void*) (aArg=0x7f4a9c426b30) at /home/nicolas/mozilla/contrib-push/tools/profiler/core/platform-linux-android.cpp:385
#9  0x00007f4ae7638234 in start_thread () at /nix/store/kjwbqnh13dxh6w4pk2gb3ddmhpiaihqg-glibc-2.25/lib/libpthread.so.0
#10 0x00007f4ae68d575f in clone () at /nix/store/kjwbqnh13dxh6w4pk2gb3ddmhpiaihqg-glibc-2.25/lib/libc.so.6

I captured this profile under rr, the problem seems to be related to the estimation of the stack size (nToCopy == 163840), which includes way too much content.

> 1176        uintptr_t start = startRegs.xsp.Value() - rEDZONE_SIZE; // start = 0x7f4abe7fdd60
>
> Set by
>   * 7    Thread 28550.28573 (Compositor) 0x00007f4ad3d6bdf2
>            in SigprofHandler (aSignal=27, aInfo=0x7f4abe7fd970, aContext=0x7f4abe7fd840)
>            at /home/nicolas/mozilla/contrib-push/tools/profiler/core/platform-linux-android.cpp:219

> 1186        uintptr_t end = reinterpret_cast<uintptr_t>(aThreadInfo.StackTop()); // end = 0x7f4abeffdeaf

> 1192          nToCopy = end - start;             // nToCopy = 8388936
> 1193          if (nToCopy > lul::N_STACK_BYTES)
> 1194            nToCopy = lul::N_STACK_BYTES;    // nToCopy = 163840
(Reporter)

Comment 1

10 months ago
Here is the stack from which the stack end was taken from:

> Thread 7 hit Hardware watchpoint 5: -location aThreadInfo.mStackTop
> 
> Old value = (void *) 0x7f4abeffdeaf
> New value = (void *) 0x0
> 0x00007f4ad3d68471 in ThreadInfo::ThreadInfo (this=0x7f4abfd3ac00, aName=0x7f4abfd9ed68 "Compositor", aThreadId=28573, aIsMainThread=false, aStackTop=0x7f4abeffdeaf) at /home/nicolas/mozilla/contrib-push/tools/profiler/core/ThreadInfo.cpp:34
34        , mLastSample()
> (rr) bt
> #0  0x00007f4ad3d68471 in ThreadInfo::ThreadInfo(char const*, int, bool, void*) (this=0x7f4abfd3ac00, aName=0x7f4abfd9ed68 "Compositor", aThreadId=28573, aIsMainThread=false, aStackTop=0x7f4abeffdeaf) at /home/nicolas/mozilla/contrib-push/tools/profiler/core/ThreadInfo.cpp:34
> #1  0x00007f4ad3d6cd98 in locked_register_thread(PSLockRef, char const*, void*) (aLock=..., aName=0x7f4abfd9ed68 "Compositor", aStackTop=0x7f4abeffdeaf) at /home/nicolas/mozilla/contrib-push/tools/profiler/core/platform.cpp:2146
> #2  0x00007f4ad3d6fcc5 in profiler_register_thread(char const*, void*) (aName=0x7f4abfd9ed68 "Compositor", aGuessStackTop=0x7f4abeffdeaf) at /home/nicolas/mozilla/contrib-push/tools/profiler/core/platform.cpp:2987
> #3  0x00007f4acdea11d2 in mozilla::AutoProfilerRegisterThread::AutoProfilerRegisterThread(char const*, mozilla::detail::GuardObjectNotifier&&) (this=0x7f4abeffdeaf, aName=0x7f4abfd9ed68 "Compositor", _notifier=<unknown type in /home/nicolas/mozilla/_build/firefox/bugzil.la/js-startup-cache/final/x64/gcc/dbg/dist/bin/libxul.so, CU 0x6715cc, DIE 0x6f1fae>) at /home/nicolas/mozilla/_build/firefox/bugzil.la/js-startup-cache/final/x64/gcc/dbg/dist/include/GeckoProfiler.h:491
> #4  0x00007f4ace64be55 in base::Thread::ThreadMain() (this=0x7f4abfd9ed30) at /home/nicolas/mozilla/contrib-push/ipc/chromium/src/base/thread.cc:157
> #5  0x00007f4ace6344f3 in ThreadFunc(void*) (closure=0x7f4abfd9ed30) at /home/nicolas/mozilla/contrib-push/ipc/chromium/src/base/platform_thread_posix.cc:38
> #6  0x00007f4ae7638234 in start_thread () at /nix/store/kjwbqnh13dxh6w4pk2gb3ddmhpiaihqg-glibc-2.25/lib/libpthread.so.0
> #7  0x00007f4ae68d575f in clone () at /nix/store/kjwbqnh13dxh6w4pk2gb3ddmhpiaihqg-glibc-2.25/lib/libc.so.6

I suspect that the Compositor thread might have done a long-jump, which causes these awfully long stack space to be reported and limited to the lul::N_STACK_BYTES value.
(Reporter)

Comment 2

10 months ago
This seems to be a rr issue and not a Gecko Profiler issue:

Here 0x7f4abeffdeaf is the stackTop address (comment 1), and the following print command highlight that the syscall_hook does the equivalent of a long-jump.

> #9  0x00007f4ae763e375 in pthread_cond_wait@@GLIBC_2.3.2 () from /nix/store/kjwbqnh13dxh6w4pk2gb3ddmhpiaihqg-glibc-2.25/lib/libpthread.so.0
> (rr) p 0x7f4abeffdeaf - (uintptr_t) $sp
> $1081 = 1535
> (rr) f 8
> #8  0x00007f4ae784fc95 in _syscall_hook_trampoline_48_3d_00_f0_ff_ff () from /nix/store/rn9vv9qdw2l5vvy5d9fxbv04w25q930a-rr-4.5.0-master/bin/../lib/rr/librrpreload.so
> (rr) p 0x7f4abeffdeaf - (uintptr_t) $sp
> $1082 = 8388287

This seems to cause the problem I observed here, as the SigprofHandler is called with the stack of the syscall_hook instead of the stack of the of the original thread.

I will check against newer versions of rr, and open a bug against rr if I can still reproduce the issue.
Group: javascript-core-security
Status: NEW → RESOLVED
Last Resolved: 10 months ago
Resolution: --- → INVALID
For the record, you can work around this by running with rr record -n, which will disable the syscallbuf optimization and avoid being on the wrong stack when you're sampling, but at a fairly substantial cost in performance.
You need to log in before you can comment on or make changes to this bug.