Crash in ThreadInfo::StreamJSON

RESOLVED FIXED in Firefox 55

Status

()

Core
Gecko Profiler
RESOLVED FIXED
2 months ago
a month ago

People

(Reporter: jseward, Assigned: njn)

Tracking

(Blocks: 1 bug)

Trunk
mozilla55
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox54 affected, firefox55 fixed)

Details

Attachments

(2 attachments, 1 obsolete attachment)

(Reporter)

Description

2 months ago
STR: x86_64-Linux, F25.  --enable-debug build of m-c.  

Start Firefox, start the profiler.  Click on the profiler
button and then on "Capture Profile".  If that doesn't 
crash it immediately, close the resulting tab, and repeat.
It usually crashes after one or two rounds of this.
(Reporter)

Comment 1

2 months ago
Created attachment 8838068 [details]
Crash stack
(Reporter)

Comment 2

2 months ago
The immediate cause of the crash is that ThreadInfo::StreamJSON
is called with mPseudoStack being null, but it is dereferenced
without being checked.  I don't know whether this is expected.
In any case, putting a null check-and-if-so-exit at the start of
the routine stops it crashing, and the profiler results still look,
at least superficially, OK.
(Reporter)

Updated

2 months ago
Blocks: 1329181
(Reporter)

Comment 3

2 months ago
Created attachment 8838079 [details] [diff] [review]
bug1340161-1.cset

Adds missing null check.
Attachment #8838079 - Flags: review?(mstange)
I've gone through the places where ThreadInfo is created, and all of them supply a non-null PseudoStack. So I think mPseudoStack being null means that ThreadInfo::SetPendingDelete has been called.

However, the caller of ThreadInfo::StreamJSON has this to say:

          // Note that we intentionally include thread profiles which
          // have been marked for pending delete.

          MutexAutoLock lock(info->GetMutex());

          info->StreamJSON(aWriter, aSinceTime);

So the question is, why does ThreadInfo::StreamJSON need the pseudostack's JSContext, and what are we supposed to supply there if the thread has already gone away?
I haven't done more research to answer those questions yet.
(Assignee)

Comment 5

2 months ago
There are two places in platform.cpp where we skip a thread if (!info->hasProfile() || info->IsPendingDelete). But in StreamJSON() we only have the !info->hasProfile() test. I think Markus is right that this thread is pending delete, and we should check for that. Patch shortly.
(Assignee)

Comment 6

2 months ago
Oh, I missed the bit about the comment that says we deliberately ignore the pending delete. Hmm.
(Assignee)

Comment 7

2 months ago
Now I'm wondering why we null out mPseudoStack in ThreadInfo::SetPendingDelete(). That may be a hangover from when PseudoStack was refcounted. platform.cpp is now careful to not destroy any PseudoStacks until the corresponding ThreadInfos are destroyed -- see the two occurrences of |delete tlsPseudoStack.get();|.

Julian, can you try removing that |mPseudoStack = nullptr;| line in ThreadInfo::SetPendingDelete()? I would do it but I've been doing basic smoke tests of the profiler for some time without hitting this particular crash.
Flags: needinfo?(jseward)
(Assignee)

Comment 8

2 months ago
(In reply to Nicholas Nethercote [:njn] from comment #7)
> Now I'm wondering why we null out mPseudoStack in
> ThreadInfo::SetPendingDelete(). That may be a hangover from when PseudoStack
> was refcounted. platform.cpp is now careful to not destroy any PseudoStacks
> until the corresponding ThreadInfos are destroyed -- see the two occurrences
> of |delete tlsPseudoStack.get();|.

No, I got that wrong too. profiler_unregister_thread() deletes the PseudoStack, and the nulling of mPseudoStack in SetPendingDelete() is required for that to be safe.

I wonder if I broke this in bug 1126576.
Flags: needinfo?(jseward)
(Reporter)

Comment 9

2 months ago
(In reply to Nicholas Nethercote [:njn] from comment #7 and comment #8)
> Julian, can you try removing that |mPseudoStack = nullptr;| line in
> ThreadInfo::SetPendingDelete()?
> [..]
> No, I got that wrong too. profiler_unregister_thread() deletes the PseudoStack,
> and the nulling of mPseudoStack in SetPendingDelete() is required for that to be safe.

Doing that (and, obviously, removing my "fix") causes it not to crash whilst
running (#include <disclaimer.h> re testing), but it does fail at exit, in a
way that's consistent with your intuition.  See below.

> I would do it but I've been doing basic smoke tests of the profiler for some
> time without hitting this particular crash.

I am a bit surprised to hear that, given that I can reliably crash it within
30 seconds of startup.  You are running an assertions-enabled build, yes?
Also I am doing this with e10s disabled so as to reduce the number of processes
around -- that just causes confusion with GDB.

-----------

For some reason GDB failed to catch the SEGV, so no good stack traces :-(

Assertion failure: !info->Stack(), at /home/sewardj/MOZ/MC-TALL/tools/profiler/core/platform.cpp:2184
#01: ???[/home/sewardj/MOZ/MC-TALL/ff-Og-linux64-dbg/dist/bin/libxul.so +0x4c10399]
#02: ???[/home/sewardj/MOZ/MC-TALL/ff-Og-linux64-dbg/dist/bin/libxul.so +0x4c109ea]
#03: ???[/home/sewardj/MOZ/MC-TALL/ff-Og-linux64-dbg/dist/bin/libxul.so +0x4f964c9]
#04: ???[/home/sewardj/MOZ/MC-TALL/ff-Og-linux64-dbg/dist/bin/libxul.so +0x4fa4bf3]
#05: ???[/home/sewardj/MOZ/MC-TALL/ff-Og-linux64-dbg/dist/bin/plugin-container +0x4837]
#06: ???[/home/sewardj/MOZ/MC-TALL/ff-Og-linux64-dbg/dist/bin/plugin-container +0x4879]
#07: __libc_start_main[/lib64/libc.so.6 +0x20401]
#08: _start[/home/sewardj/MOZ/MC-TALL/ff-Og-linux64-dbg/dist/bin/plugin-container +0x472a]
#09: ??? (???:???)

Program /home/sewardj/MOZ/MC-TALL/ff-Og-linux64-dbg/dist/bin/plugin-container (pid = 10666) received signal 11.
Stack:
#01: ???[/lib64/libc.so.6 +0x35990]
#02: ???[/home/sewardj/MOZ/MC-TALL/ff-Og-linux64-dbg/dist/bin/libxul.so +0x4c10399]
#03: ???[/home/sewardj/MOZ/MC-TALL/ff-Og-linux64-dbg/dist/bin/libxul.so +0x4c109ea]
#04: ???[/home/sewardj/MOZ/MC-TALL/ff-Og-linux64-dbg/dist/bin/libxul.so +0x4f964c9]
#05: ???[/home/sewardj/MOZ/MC-TALL/ff-Og-linux64-dbg/dist/bin/libxul.so +0x4fa4bf3]
#06: ???[/home/sewardj/MOZ/MC-TALL/ff-Og-linux64-dbg/dist/bin/plugin-container +0x4837]
#07: ???[/home/sewardj/MOZ/MC-TALL/ff-Og-linux64-dbg/dist/bin/plugin-container +0x4879]
#08: __libc_start_main[/lib64/libc.so.6 +0x20401]
#09: _start[/home/sewardj/MOZ/MC-TALL/ff-Og-linux64-dbg/dist/bin/plugin-container +0x472a]
#10: ??? (???:???)
Sleeping for 300 seconds.
Type 'gdb /home/sewardj/MOZ/MC-TALL/ff-Og-linux64-dbg/dist/bin/plugin-container 10666' to attach your debugger to this thread.
[10190] WARNING: NS_ENSURE_TRUE(maybeContext) failed: file /home/sewardj/MOZ/MC-TALL/xpcom/threads/nsThread.cpp, line 1009
[10190] WARNING: 'NS_FAILED(RemovePermissionChangeObserver())', file /home/sewardj/MOZ/MC-TALL/dom/notification/Notification.cpp, line 669
(Assignee)

Updated

2 months ago
Assignee: nobody → n.nethercote
(Assignee)

Comment 10

2 months ago
Comment on attachment 8838079 [details] [diff] [review]
bug1340161-1.cset

Review of attachment 8838079 [details] [diff] [review]:
-----------------------------------------------------------------

This avoids the crash but isn't the right fix -- we lose profiles for recently killed threads, which we want to keep.

I'll work on a proper fix on Monday.
Attachment #8838079 - Flags: review?(mstange) → review-
(Assignee)

Comment 11

2 months ago
Created attachment 8839028 [details] [diff] [review]
Fix PseudoStack handling when profiler threads are marked with SetPendingDelete()

I still can't reproduce this crash locally.

Julian, can you please test with this patch, and then review it if it fixes the
crash?
Attachment #8839028 - Flags: review?(jseward)
(Reporter)

Comment 12

2 months ago
(In reply to Nicholas Nethercote [:njn] from comment #11)
> Created attachment 8839028 [details] [diff] [review]
> Fix PseudoStack handling when profiler threads are marked with
> SetPendingDelete()

I looked at this but it is now crashing, both with and without
the patch, as described in bug 1340193, which is really confusing.
I'll continue investigating.
(Assignee)

Updated

2 months ago
Attachment #8838079 - Attachment is obsolete: true
(Assignee)

Comment 13

2 months ago
Comment on attachment 8839028 [details] [diff] [review]
Fix PseudoStack handling when profiler threads are marked with SetPendingDelete()

I worked out steps to reproduce this crash reliably:

- Profile all threads (by putting "," in the Threads box)

- Browse a complex site for a bit, e.g. nytimes.com or arstechnica.com.

- Capture a profile.

I have confirmed that the attached patch fixes the problem. jseward can you please review, or if you're not comfortable doing so, forward the request onto mstange? Thank you.
Flags: needinfo?(jseward)
(Reporter)

Comment 14

2 months ago
Comment on attachment 8839028 [details] [diff] [review]
Fix PseudoStack handling when profiler threads are marked with SetPendingDelete()

Review of attachment 8839028 [details] [diff] [review]:
-----------------------------------------------------------------

Looks plausible, overall.

::: tools/profiler/core/ThreadInfo.cpp
@@ +47,4 @@
>    MOZ_COUNT_DTOR(ThreadInfo);
> +
> +  if (mPendingDelete) {
> +    delete mPseudoStack;

If mPseudoStack is never null, can you wrap it in NotNull?

::: tools/profiler/core/platform.cpp
@@ +1720,5 @@
>    }
>  #endif
>  
>    // We just destroyed all the ThreadInfos in gRegisteredThreads, so it is safe
> +  // the delete the PseudoStack. tlsPseudoStack is certain to still be the

Comment doesn't make sense.  Should it be "safe to delete
the PseudoStack" ?
Attachment #8839028 - Flags: review?(jseward) → review+
(Assignee)

Comment 15

2 months ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/d87a24359ef54e09b0f5c8365de4edd56f9eb9a8
Bug 1340161 - Fix PseudoStack handling when profiler threads are marked with SetPendingDelete(). r=jseward.
(Assignee)

Updated

2 months ago
Duplicate of this bug: 1322786

Comment 17

2 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/d87a24359ef5
Status: NEW → RESOLVED
Last Resolved: 2 months ago
status-firefox55: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
(Assignee)

Updated

2 months ago
Flags: needinfo?(jseward)
(Assignee)

Updated

2 months ago
Duplicate of this bug: 1345796

Updated

a month ago
Depends on: 1344486
You need to log in before you can comment on or make changes to this bug.