Closed Bug 1340161 Opened 7 years ago Closed 7 years ago

Crash in ThreadInfo::StreamJSON

Categories

(Core :: Gecko Profiler, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox54 --- wontfix
firefox55 --- fixed

People

(Reporter: jseward, Assigned: n.nethercote)

References

(Blocks 1 open bug)

Details

Attachments

(2 files, 1 obsolete file)

STR: x86_64-Linux, F25.  --enable-debug build of m-c.  

Start Firefox, start the profiler.  Click on the profiler
button and then on "Capture Profile".  If that doesn't 
crash it immediately, close the resulting tab, and repeat.
It usually crashes after one or two rounds of this.
Attached file Crash stack
The immediate cause of the crash is that ThreadInfo::StreamJSON
is called with mPseudoStack being null, but it is dereferenced
without being checked.  I don't know whether this is expected.
In any case, putting a null check-and-if-so-exit at the start of
the routine stops it crashing, and the profiler results still look,
at least superficially, OK.
Blocks: 1329181
Attached patch bug1340161-1.cset (obsolete) — Splinter Review
Adds missing null check.
Attachment #8838079 - Flags: review?(mstange)
I've gone through the places where ThreadInfo is created, and all of them supply a non-null PseudoStack. So I think mPseudoStack being null means that ThreadInfo::SetPendingDelete has been called.

However, the caller of ThreadInfo::StreamJSON has this to say:

          // Note that we intentionally include thread profiles which
          // have been marked for pending delete.

          MutexAutoLock lock(info->GetMutex());

          info->StreamJSON(aWriter, aSinceTime);

So the question is, why does ThreadInfo::StreamJSON need the pseudostack's JSContext, and what are we supposed to supply there if the thread has already gone away?
I haven't done more research to answer those questions yet.
There are two places in platform.cpp where we skip a thread if (!info->hasProfile() || info->IsPendingDelete). But in StreamJSON() we only have the !info->hasProfile() test. I think Markus is right that this thread is pending delete, and we should check for that. Patch shortly.
Oh, I missed the bit about the comment that says we deliberately ignore the pending delete. Hmm.
Now I'm wondering why we null out mPseudoStack in ThreadInfo::SetPendingDelete(). That may be a hangover from when PseudoStack was refcounted. platform.cpp is now careful to not destroy any PseudoStacks until the corresponding ThreadInfos are destroyed -- see the two occurrences of |delete tlsPseudoStack.get();|.

Julian, can you try removing that |mPseudoStack = nullptr;| line in ThreadInfo::SetPendingDelete()? I would do it but I've been doing basic smoke tests of the profiler for some time without hitting this particular crash.
Flags: needinfo?(jseward)
(In reply to Nicholas Nethercote [:njn] from comment #7)
> Now I'm wondering why we null out mPseudoStack in
> ThreadInfo::SetPendingDelete(). That may be a hangover from when PseudoStack
> was refcounted. platform.cpp is now careful to not destroy any PseudoStacks
> until the corresponding ThreadInfos are destroyed -- see the two occurrences
> of |delete tlsPseudoStack.get();|.

No, I got that wrong too. profiler_unregister_thread() deletes the PseudoStack, and the nulling of mPseudoStack in SetPendingDelete() is required for that to be safe.

I wonder if I broke this in bug 1126576.
Flags: needinfo?(jseward)
(In reply to Nicholas Nethercote [:njn] from comment #7 and comment #8)
> Julian, can you try removing that |mPseudoStack = nullptr;| line in
> ThreadInfo::SetPendingDelete()?
> [..]
> No, I got that wrong too. profiler_unregister_thread() deletes the PseudoStack,
> and the nulling of mPseudoStack in SetPendingDelete() is required for that to be safe.

Doing that (and, obviously, removing my "fix") causes it not to crash whilst
running (#include <disclaimer.h> re testing), but it does fail at exit, in a
way that's consistent with your intuition.  See below.

> I would do it but I've been doing basic smoke tests of the profiler for some
> time without hitting this particular crash.

I am a bit surprised to hear that, given that I can reliably crash it within
30 seconds of startup.  You are running an assertions-enabled build, yes?
Also I am doing this with e10s disabled so as to reduce the number of processes
around -- that just causes confusion with GDB.

-----------

For some reason GDB failed to catch the SEGV, so no good stack traces :-(

Assertion failure: !info->Stack(), at /home/sewardj/MOZ/MC-TALL/tools/profiler/core/platform.cpp:2184
#01: ???[/home/sewardj/MOZ/MC-TALL/ff-Og-linux64-dbg/dist/bin/libxul.so +0x4c10399]
#02: ???[/home/sewardj/MOZ/MC-TALL/ff-Og-linux64-dbg/dist/bin/libxul.so +0x4c109ea]
#03: ???[/home/sewardj/MOZ/MC-TALL/ff-Og-linux64-dbg/dist/bin/libxul.so +0x4f964c9]
#04: ???[/home/sewardj/MOZ/MC-TALL/ff-Og-linux64-dbg/dist/bin/libxul.so +0x4fa4bf3]
#05: ???[/home/sewardj/MOZ/MC-TALL/ff-Og-linux64-dbg/dist/bin/plugin-container +0x4837]
#06: ???[/home/sewardj/MOZ/MC-TALL/ff-Og-linux64-dbg/dist/bin/plugin-container +0x4879]
#07: __libc_start_main[/lib64/libc.so.6 +0x20401]
#08: _start[/home/sewardj/MOZ/MC-TALL/ff-Og-linux64-dbg/dist/bin/plugin-container +0x472a]
#09: ??? (???:???)

Program /home/sewardj/MOZ/MC-TALL/ff-Og-linux64-dbg/dist/bin/plugin-container (pid = 10666) received signal 11.
Stack:
#01: ???[/lib64/libc.so.6 +0x35990]
#02: ???[/home/sewardj/MOZ/MC-TALL/ff-Og-linux64-dbg/dist/bin/libxul.so +0x4c10399]
#03: ???[/home/sewardj/MOZ/MC-TALL/ff-Og-linux64-dbg/dist/bin/libxul.so +0x4c109ea]
#04: ???[/home/sewardj/MOZ/MC-TALL/ff-Og-linux64-dbg/dist/bin/libxul.so +0x4f964c9]
#05: ???[/home/sewardj/MOZ/MC-TALL/ff-Og-linux64-dbg/dist/bin/libxul.so +0x4fa4bf3]
#06: ???[/home/sewardj/MOZ/MC-TALL/ff-Og-linux64-dbg/dist/bin/plugin-container +0x4837]
#07: ???[/home/sewardj/MOZ/MC-TALL/ff-Og-linux64-dbg/dist/bin/plugin-container +0x4879]
#08: __libc_start_main[/lib64/libc.so.6 +0x20401]
#09: _start[/home/sewardj/MOZ/MC-TALL/ff-Og-linux64-dbg/dist/bin/plugin-container +0x472a]
#10: ??? (???:???)
Sleeping for 300 seconds.
Type 'gdb /home/sewardj/MOZ/MC-TALL/ff-Og-linux64-dbg/dist/bin/plugin-container 10666' to attach your debugger to this thread.
[10190] WARNING: NS_ENSURE_TRUE(maybeContext) failed: file /home/sewardj/MOZ/MC-TALL/xpcom/threads/nsThread.cpp, line 1009
[10190] WARNING: 'NS_FAILED(RemovePermissionChangeObserver())', file /home/sewardj/MOZ/MC-TALL/dom/notification/Notification.cpp, line 669
Assignee: nobody → n.nethercote
Comment on attachment 8838079 [details] [diff] [review]
bug1340161-1.cset

Review of attachment 8838079 [details] [diff] [review]:
-----------------------------------------------------------------

This avoids the crash but isn't the right fix -- we lose profiles for recently killed threads, which we want to keep.

I'll work on a proper fix on Monday.
Attachment #8838079 - Flags: review?(mstange) → review-
I still can't reproduce this crash locally.

Julian, can you please test with this patch, and then review it if it fixes the
crash?
Attachment #8839028 - Flags: review?(jseward)
(In reply to Nicholas Nethercote [:njn] from comment #11)
> Created attachment 8839028 [details] [diff] [review]
> Fix PseudoStack handling when profiler threads are marked with
> SetPendingDelete()

I looked at this but it is now crashing, both with and without
the patch, as described in bug 1340193, which is really confusing.
I'll continue investigating.
Attachment #8838079 - Attachment is obsolete: true
Comment on attachment 8839028 [details] [diff] [review]
Fix PseudoStack handling when profiler threads are marked with SetPendingDelete()

I worked out steps to reproduce this crash reliably:

- Profile all threads (by putting "," in the Threads box)

- Browse a complex site for a bit, e.g. nytimes.com or arstechnica.com.

- Capture a profile.

I have confirmed that the attached patch fixes the problem. jseward can you please review, or if you're not comfortable doing so, forward the request onto mstange? Thank you.
Flags: needinfo?(jseward)
Comment on attachment 8839028 [details] [diff] [review]
Fix PseudoStack handling when profiler threads are marked with SetPendingDelete()

Review of attachment 8839028 [details] [diff] [review]:
-----------------------------------------------------------------

Looks plausible, overall.

::: tools/profiler/core/ThreadInfo.cpp
@@ +47,4 @@
>    MOZ_COUNT_DTOR(ThreadInfo);
> +
> +  if (mPendingDelete) {
> +    delete mPseudoStack;

If mPseudoStack is never null, can you wrap it in NotNull?

::: tools/profiler/core/platform.cpp
@@ +1720,5 @@
>    }
>  #endif
>  
>    // We just destroyed all the ThreadInfos in gRegisteredThreads, so it is safe
> +  // the delete the PseudoStack. tlsPseudoStack is certain to still be the

Comment doesn't make sense.  Should it be "safe to delete
the PseudoStack" ?
Attachment #8839028 - Flags: review?(jseward) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/d87a24359ef54e09b0f5c8365de4edd56f9eb9a8
Bug 1340161 - Fix PseudoStack handling when profiler threads are marked with SetPendingDelete(). r=jseward.
https://hg.mozilla.org/mozilla-central/rev/d87a24359ef5
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Flags: needinfo?(jseward)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: