Closed
Bug 1340161
Opened 7 years ago
Closed 7 years ago
Crash in ThreadInfo::StreamJSON
Categories
(Core :: Gecko Profiler, defect)
Core
Gecko Profiler
Tracking
()
RESOLVED
FIXED
mozilla55
People
(Reporter: jseward, Assigned: n.nethercote)
References
(Blocks 1 open bug)
Details
Attachments
(2 files, 1 obsolete file)
5.11 KB,
text/plain
|
Details | |
8.31 KB,
patch
|
jseward
:
review+
|
Details | Diff | Splinter Review |
STR: x86_64-Linux, F25. --enable-debug build of m-c. Start Firefox, start the profiler. Click on the profiler button and then on "Capture Profile". If that doesn't crash it immediately, close the resulting tab, and repeat. It usually crashes after one or two rounds of this.
Reporter | ||
Comment 1•7 years ago
|
||
Reporter | ||
Comment 2•7 years ago
|
||
The immediate cause of the crash is that ThreadInfo::StreamJSON is called with mPseudoStack being null, but it is dereferenced without being checked. I don't know whether this is expected. In any case, putting a null check-and-if-so-exit at the start of the routine stops it crashing, and the profiler results still look, at least superficially, OK.
Reporter | ||
Comment 3•7 years ago
|
||
Adds missing null check.
Attachment #8838079 -
Flags: review?(mstange)
Comment 4•7 years ago
|
||
I've gone through the places where ThreadInfo is created, and all of them supply a non-null PseudoStack. So I think mPseudoStack being null means that ThreadInfo::SetPendingDelete has been called. However, the caller of ThreadInfo::StreamJSON has this to say: // Note that we intentionally include thread profiles which // have been marked for pending delete. MutexAutoLock lock(info->GetMutex()); info->StreamJSON(aWriter, aSinceTime); So the question is, why does ThreadInfo::StreamJSON need the pseudostack's JSContext, and what are we supposed to supply there if the thread has already gone away? I haven't done more research to answer those questions yet.
Assignee | ||
Comment 5•7 years ago
|
||
There are two places in platform.cpp where we skip a thread if (!info->hasProfile() || info->IsPendingDelete). But in StreamJSON() we only have the !info->hasProfile() test. I think Markus is right that this thread is pending delete, and we should check for that. Patch shortly.
Assignee | ||
Comment 6•7 years ago
|
||
Oh, I missed the bit about the comment that says we deliberately ignore the pending delete. Hmm.
Assignee | ||
Comment 7•7 years ago
|
||
Now I'm wondering why we null out mPseudoStack in ThreadInfo::SetPendingDelete(). That may be a hangover from when PseudoStack was refcounted. platform.cpp is now careful to not destroy any PseudoStacks until the corresponding ThreadInfos are destroyed -- see the two occurrences of |delete tlsPseudoStack.get();|. Julian, can you try removing that |mPseudoStack = nullptr;| line in ThreadInfo::SetPendingDelete()? I would do it but I've been doing basic smoke tests of the profiler for some time without hitting this particular crash.
Flags: needinfo?(jseward)
Assignee | ||
Comment 8•7 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #7) > Now I'm wondering why we null out mPseudoStack in > ThreadInfo::SetPendingDelete(). That may be a hangover from when PseudoStack > was refcounted. platform.cpp is now careful to not destroy any PseudoStacks > until the corresponding ThreadInfos are destroyed -- see the two occurrences > of |delete tlsPseudoStack.get();|. No, I got that wrong too. profiler_unregister_thread() deletes the PseudoStack, and the nulling of mPseudoStack in SetPendingDelete() is required for that to be safe. I wonder if I broke this in bug 1126576.
Flags: needinfo?(jseward)
Reporter | ||
Comment 9•7 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #7 and comment #8) > Julian, can you try removing that |mPseudoStack = nullptr;| line in > ThreadInfo::SetPendingDelete()? > [..] > No, I got that wrong too. profiler_unregister_thread() deletes the PseudoStack, > and the nulling of mPseudoStack in SetPendingDelete() is required for that to be safe. Doing that (and, obviously, removing my "fix") causes it not to crash whilst running (#include <disclaimer.h> re testing), but it does fail at exit, in a way that's consistent with your intuition. See below. > I would do it but I've been doing basic smoke tests of the profiler for some > time without hitting this particular crash. I am a bit surprised to hear that, given that I can reliably crash it within 30 seconds of startup. You are running an assertions-enabled build, yes? Also I am doing this with e10s disabled so as to reduce the number of processes around -- that just causes confusion with GDB. ----------- For some reason GDB failed to catch the SEGV, so no good stack traces :-( Assertion failure: !info->Stack(), at /home/sewardj/MOZ/MC-TALL/tools/profiler/core/platform.cpp:2184 #01: ???[/home/sewardj/MOZ/MC-TALL/ff-Og-linux64-dbg/dist/bin/libxul.so +0x4c10399] #02: ???[/home/sewardj/MOZ/MC-TALL/ff-Og-linux64-dbg/dist/bin/libxul.so +0x4c109ea] #03: ???[/home/sewardj/MOZ/MC-TALL/ff-Og-linux64-dbg/dist/bin/libxul.so +0x4f964c9] #04: ???[/home/sewardj/MOZ/MC-TALL/ff-Og-linux64-dbg/dist/bin/libxul.so +0x4fa4bf3] #05: ???[/home/sewardj/MOZ/MC-TALL/ff-Og-linux64-dbg/dist/bin/plugin-container +0x4837] #06: ???[/home/sewardj/MOZ/MC-TALL/ff-Og-linux64-dbg/dist/bin/plugin-container +0x4879] #07: __libc_start_main[/lib64/libc.so.6 +0x20401] #08: _start[/home/sewardj/MOZ/MC-TALL/ff-Og-linux64-dbg/dist/bin/plugin-container +0x472a] #09: ??? (???:???) Program /home/sewardj/MOZ/MC-TALL/ff-Og-linux64-dbg/dist/bin/plugin-container (pid = 10666) received signal 11. Stack: #01: ???[/lib64/libc.so.6 +0x35990] #02: ???[/home/sewardj/MOZ/MC-TALL/ff-Og-linux64-dbg/dist/bin/libxul.so +0x4c10399] #03: ???[/home/sewardj/MOZ/MC-TALL/ff-Og-linux64-dbg/dist/bin/libxul.so +0x4c109ea] #04: ???[/home/sewardj/MOZ/MC-TALL/ff-Og-linux64-dbg/dist/bin/libxul.so +0x4f964c9] #05: ???[/home/sewardj/MOZ/MC-TALL/ff-Og-linux64-dbg/dist/bin/libxul.so +0x4fa4bf3] #06: ???[/home/sewardj/MOZ/MC-TALL/ff-Og-linux64-dbg/dist/bin/plugin-container +0x4837] #07: ???[/home/sewardj/MOZ/MC-TALL/ff-Og-linux64-dbg/dist/bin/plugin-container +0x4879] #08: __libc_start_main[/lib64/libc.so.6 +0x20401] #09: _start[/home/sewardj/MOZ/MC-TALL/ff-Og-linux64-dbg/dist/bin/plugin-container +0x472a] #10: ??? (???:???) Sleeping for 300 seconds. Type 'gdb /home/sewardj/MOZ/MC-TALL/ff-Og-linux64-dbg/dist/bin/plugin-container 10666' to attach your debugger to this thread. [10190] WARNING: NS_ENSURE_TRUE(maybeContext) failed: file /home/sewardj/MOZ/MC-TALL/xpcom/threads/nsThread.cpp, line 1009 [10190] WARNING: 'NS_FAILED(RemovePermissionChangeObserver())', file /home/sewardj/MOZ/MC-TALL/dom/notification/Notification.cpp, line 669
Assignee | ||
Updated•7 years ago
|
Assignee: nobody → n.nethercote
Assignee | ||
Comment 10•7 years ago
|
||
Comment on attachment 8838079 [details] [diff] [review] bug1340161-1.cset Review of attachment 8838079 [details] [diff] [review]: ----------------------------------------------------------------- This avoids the crash but isn't the right fix -- we lose profiles for recently killed threads, which we want to keep. I'll work on a proper fix on Monday.
Attachment #8838079 -
Flags: review?(mstange) → review-
Assignee | ||
Comment 11•7 years ago
|
||
I still can't reproduce this crash locally. Julian, can you please test with this patch, and then review it if it fixes the crash?
Attachment #8839028 -
Flags: review?(jseward)
Reporter | ||
Comment 12•7 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #11) > Created attachment 8839028 [details] [diff] [review] > Fix PseudoStack handling when profiler threads are marked with > SetPendingDelete() I looked at this but it is now crashing, both with and without the patch, as described in bug 1340193, which is really confusing. I'll continue investigating.
Assignee | ||
Updated•7 years ago
|
Attachment #8838079 -
Attachment is obsolete: true
Assignee | ||
Comment 13•7 years ago
|
||
Comment on attachment 8839028 [details] [diff] [review] Fix PseudoStack handling when profiler threads are marked with SetPendingDelete() I worked out steps to reproduce this crash reliably: - Profile all threads (by putting "," in the Threads box) - Browse a complex site for a bit, e.g. nytimes.com or arstechnica.com. - Capture a profile. I have confirmed that the attached patch fixes the problem. jseward can you please review, or if you're not comfortable doing so, forward the request onto mstange? Thank you.
Flags: needinfo?(jseward)
Reporter | ||
Comment 14•7 years ago
|
||
Comment on attachment 8839028 [details] [diff] [review] Fix PseudoStack handling when profiler threads are marked with SetPendingDelete() Review of attachment 8839028 [details] [diff] [review]: ----------------------------------------------------------------- Looks plausible, overall. ::: tools/profiler/core/ThreadInfo.cpp @@ +47,4 @@ > MOZ_COUNT_DTOR(ThreadInfo); > + > + if (mPendingDelete) { > + delete mPseudoStack; If mPseudoStack is never null, can you wrap it in NotNull? ::: tools/profiler/core/platform.cpp @@ +1720,5 @@ > } > #endif > > // We just destroyed all the ThreadInfos in gRegisteredThreads, so it is safe > + // the delete the PseudoStack. tlsPseudoStack is certain to still be the Comment doesn't make sense. Should it be "safe to delete the PseudoStack" ?
Attachment #8839028 -
Flags: review?(jseward) → review+
Assignee | ||
Comment 15•7 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/d87a24359ef54e09b0f5c8365de4edd56f9eb9a8 Bug 1340161 - Fix PseudoStack handling when profiler threads are marked with SetPendingDelete(). r=jseward.
Comment 17•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/d87a24359ef5
Status: NEW → RESOLVED
Closed: 7 years ago
status-firefox55:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Assignee | ||
Updated•7 years ago
|
Flags: needinfo?(jseward)
Updated•7 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•