Closed Bug 1403968 Opened 8 years ago Closed 7 years ago

Perf Profiling slack with debugger causes crash

Categories

(Core :: Gecko Profiler, defect, P1)

57 Branch
Unspecified
macOS
defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: yulia, Assigned: n.nethercote)

References

Details

Crash Data

Attachments

(3 files)

Attached file perf-html crash
steps to reproduce: 1) Open slack.com and join a channel, and start the perf-html addon 2) open the debugger panel 3) select a file from the debugger panel 4) reload slack using cmd-r (this will take some time) 5) once slack has reloaded, try to open the profiler using ctrl-shift-2 sometimes it will crash. not always. crash looked like the attachment
Summary: Perf Profiling slack with debugger causes segfault → Perf Profiling slack with debugger causes crash
https://crash-stats.mozilla.com/report/index/be53341b-5ca2-44fb-a492-bcc870170928#tab-details We're crashing at this assert: > } else if (e.Get().IsJitReturnAddr()) { > numFrames++; > > // A JIT frame may expand to multiple frames due to inlining. > void* pc = e.Get().u.mPtr; > unsigned depth = aUniqueStacks.LookupJITFrameDepth(pc); > if (depth == 0) { > StreamJSFramesOp framesOp(pc, stack); > --> MOZ_RELEASE_ASSERT(aContext); > JS::ForEachProfiledFrame(aContext, pc, framesOp); > aUniqueStacks.AddJITFrameDepth(pc, framesOp.depth());
Well, it looks from this that the assert is just wrong. The |aContext| here is passed in directly from the thread's |mContext| higher up in the call stack. There are no mutations done, so if |aContext| is null, it's because the thread's |mContext| field was null from the start. What thread is this happening on?
Crash Signature: [@ ProfileBuffer::StreamSamplesToJSON const ]
I added this assertion a while back. There are two ways to call StreamSamplesToJSON(). (1) The one from ProfilerBacktrace::StreamJSON() (which goes via StreamSamplesAndMarkers()) explicitly passes in a null aContext. This is ok because that's a "synchronous" sample obtained with profiler_get_backtrace(), which never contains JitReturnAddr entries. (2) The other ones (ThreadInfo::StreamJSON() via StreamSamplesAndMarkers(), and ThreadInfo::FlushSamplesAndMarkers()) pass in an mContext from a ThreadInfo. The latter of these two asserts that mContext is non-null, but it's just a vanilla assert, not a release-assert. So, the failing assertion was meant to capture the distinction between (1) and (2). It's non-obvious because I never managed to restructure this part of the code to capture the distinction more nicely. My assumption here was that the mContext from the ThreadInfo is never null, but it sounds like that's just wrong? How can a ThreadInfo (a) have a null mContext, and (b) also have JIT return addresses in its profile?
Flags: needinfo?(kvijayan)
I wasn't able to reproduce the crash, but if I've understood it correctly this should fix the problem.
Attachment #8914149 - Flags: review?(kvijayan)
Assignee: nobody → n.nethercote
Status: NEW → ASSIGNED
> I wasn't able to reproduce the crash, but if I've understood it correctly > this should fix the problem. Wait, no, it won't. The Maybe<> makes things slightly clearer, but won't fix the crash, because the MOZ_RELEASE_ASSERT is immediately followed by a call to JS::ForEachProfiledFrame(), which repeatedly derefs `aContext`. So the assertion is not incorrect, and if it wasn't there we would have crashed immediately afterwards in JS::ForEachProfiledFrame(). So the question is: how do we end up with a thread that has JIT return addresses in its samples, but has a null mContext? And is that a legitimate case or not?
Comment on attachment 8914149 [details] [diff] [review] Wrap ProfileBuffer::StreamSamplesToJSON()'s aContext argument in a Maybe<> Review of attachment 8914149 [details] [diff] [review]: ----------------------------------------------------------------- ::: tools/profiler/core/ProfileBufferEntry.cpp @@ +855,5 @@ > unsigned depth = aUniqueStacks.LookupJITFrameDepth(pc); > if (depth == 0) { > StreamJSFramesOp framesOp(pc, stack); > + MOZ_RELEASE_ASSERT(aContext.isSome()); > + JS::ForEachProfiledFrame(*aContext, pc, framesOp); You probably want to check for |if (aContext)| around this.
(In reply to Nicholas Nethercote [:njn] from comment #5) > So the question is: how do we end up with a thread that has JIT return > addresses in its samples, but has a null mContext? And is that a legitimate > case or not? This is the question I had as well. Is this potentially happening after the thread has been de-initialized and the mContext cleared? The STR includes a reload before profiling, so it's a slight hint in that direction.
Flags: needinfo?(kvijayan)
Check this out: http://searchfox.org/mozilla-central/source/tools/profiler/core/platform.cpp#1652 This is where the call to StreamJSON is happening in the call stack. It's in a loop where we iterate over dead threads and stream their samples. So this is definitely a "dead" thread and somewhere something is cleaning up the mContext when it shouldn't be, or we're assuming an mContext when we shouldn't be (dead threads might be specified to have empty mContexts). A simple immediate patch for this would be to simply skip streaming for dead threads. However, this might mean that you lose profiling information that you captured before a reload. Is that something people care about?
Flags: needinfo?(n.nethercote)
Flags: needinfo?(mstange)
ThreadInfo::FlushSamplesAndMarkers is supposed to take care of this problem: It causes us to serialize the profile of a thread to JSON when the thread loses its JSContext. We then hold on to that JSON and insert it into the complete profile the next time a profile is requested.
Flags: needinfo?(mstange)
Thanks for that, Markus. You're exactly right, FlushSamplesAndMarkers _does_ serialize the info before mContext is destroyed, and the serialized info is used here: http://searchfox.org/mozilla-central/source/tools/profiler/core/ThreadInfo.cpp#198 Not coincidentally, the call to |streamSamplesToJSON| in our call stack is the one that occurs exactly AFTER the usage of |aSavedStreamSamples| in the caller |StreamSamplesAndMarkers|. So a NULL check around aContext here: http://searchfox.org/mozilla-central/source/tools/profiler/core/ThreadInfo.cpp#205 Should do the trick...?
Flags: needinfo?(mstange)
Attachment #8914858 - Flags: review?(n.nethercote)
Hmm, no, I don't think this is right. Let's say we have a large enough buffer. We start the profiler at time A, clear the JSContext at time B, and capture the profile at time C. Your patch would mean that the resulting profile has no samples for the time between B and C. The way this is supposed to work is: At B, we stream the samples from between A and B to mSavedStreamedSamples, and reset the buffer [1]. At C, we first insert mSavedStreamedSamples, and then add the stream out the samples that are in the buffer now, which will only be samples that have been captured between B and C. So I think what we need to find out is why the streaming operation at C finds samples in the buffer that need the JS context, even though the buffer is only supposed to contain samples that have been captured after B. [1] http://searchfox.org/mozilla-central/source/tools/profiler/core/ThreadInfo.cpp#279-281 : > // Reset the buffer. Attempting to symbolicate JS samples after mContext has > // gone away will crash. > aBuffer.Reset();
Flags: needinfo?(mstange)
Comment on attachment 8914858 [details] [diff] [review] fix-bug-1403968.patch Review of attachment 8914858 [details] [diff] [review]: ----------------------------------------------------------------- I agree with Markus's analysis.
Attachment #8914858 - Flags: review?(n.nethercote) → review-
One thing to note: this assertion has been present for months now, so this must be an unusual circumstance.
Flags: needinfo?(n.nethercote)
Comment on attachment 8914149 [details] [diff] [review] Wrap ProfileBuffer::StreamSamplesToJSON()'s aContext argument in a Maybe<> Cancelling stale review request.
Attachment #8914149 - Flags: review?(kvijayan)
Priority: -- → P1
Yulia, can you still reproduce this? I changed this code a lot in bug 1429904 and those changes might have fixed this bug.
Flags: needinfo?(ystartsev)
No longer crashes for me :) nice work!
Flags: needinfo?(ystartsev)
Great! Thanks for testing.
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: