Profiler hangs when interrupting malloc in JS::ProfilingFrameIterator::extractStack

RESOLVED FIXED in Firefox 53

Status

()

Core
JavaScript Engine
RESOLVED FIXED
a year ago
a year ago

People

(Reporter: azakai, Assigned: jonco)

Tracking

Trunk
mozilla53
x86_64
Linux
Points:
---

Firefox Tracking Flags

(firefox53 fixed)

Details

Attachments

(2 attachments, 2 obsolete attachments)

(Reporter)

Description

a year ago
STR:

1. Load http://assets.metacade.com/emulators/jaguar-wasm.html , which is the Jaguar game emulated in JSMESS, compiled to wasm. This is a fairly large wasm application.

2. Start the profiler.

The profiler appears to run, it updates its own display, but the frame rate it shows the app as having is either 0 or decreases to 0, and the app does not render. The browser itself is no longer responsive, stopping the profiler is impossible. Other tabs show the spinner forever, even if I close this tab. The only option to shut down the browser.
(Reporter)

Updated

a year ago
OS: Unspecified → Linux
Hardware: Unspecified → x86_64
Version: unspecified → Trunk
I'm able to reproduce: the content process seems hung (spinner) and when I break into it with GDB I think I see why: we're doing a malloc from within the signal handler which tries to take a lock which we've already taken further down the stack (frame #19) because we've interrupted malloc.

Hg indicates that this malloc copy was introduced by bug 1297360.  Kannan/Jon: Could we instead move the duplication of labels stored in Frames to the points at which labels can be freed (presumably, during GC)?
Blocks: 1297360
Component: Developer Tools: Performance Tools (Profiler/Timeline) → JavaScript Engine
Flags: needinfo?(kvijayan)
Flags: needinfo?(jcoppeard)
Product: Firefox → Core
Summary: Profiler hangs browser on Jaguar/JSMESS/wasm → Profiler hangs when interrupting malloc in JS::ProfilingFrameIterator::extractStack
Created attachment 8809071 [details]
backtrace.txt

Backtrace of hang.  I should add that this is wasm, but I see a DuplicateStack() for non-wasm as well.
That should be possible, but might cause longer pause times during GC if profiling is enabled.

During GC of the jitcode map table, the collector can save the to-be-freed labels to a temporary set.  After the scan of the jitcode map table, the GC would then need to scan the profile sample buffer for entries with pointers from the set, and do the duplication then.  Then it frees all the labels in the set.  This should be safe since sampling is suppressed during GC of the profiler jitcodemap tables.
Flags: needinfo?(kvijayan)
(Assignee)

Comment 4

a year ago
Created attachment 8810459 [details] [diff] [review]
bug1316150-hold-jit-strings

How about this: we add a flag to delay freeing of any of these strings until we know the client doesn't have any references to them any more.  This patch applies on top of a backout of the patch in bug 1297360.
Assignee: nobody → jcoppeard
Flags: needinfo?(jcoppeard)
Attachment #8810459 - Flags: review?(kvijayan)
Comment on attachment 8810459 [details] [diff] [review]
bug1316150-hold-jit-strings

Review of attachment 8810459 [details] [diff] [review]:
-----------------------------------------------------------------

That sounds great, thanks Jon!  I think wasm now has the same problem as Ion (labels can be cleared either when profiling is disabled (js::wasm::Code::ensureProfilingState) or when a Code object is destroyed by its owning wasm::Instance).  Could we hoist frameLabelsToFree_ and keepFrameLabels_ bool up to the JSRuntime so wasm could use too?

::: js/src/jit/JitcodeMap.h
@@ +1010,5 @@
>      JitcodeGlobalEntry* freeEntries_;
>      uint32_t rand_;
>      uint32_t skiplistSize_;
> +    bool keepFrameLabels_;
> +    Vector<const char*, 0, SystemAllocPolicy> frameLabelsToFree_;

Perhaps a Vector<UniqueChars>?

::: tools/profiler/core/GeckoSampler.cpp
@@ +763,5 @@
>    }
>    uint32_t jsCount = 0;
>  #ifndef SPS_STANDALONE
>    JS::ProfilingFrameIterator::Frame jsFrames[1000];
> +  mozilla::Maybe<JS::AutoKeepFrameLabels> keepLabels;

This might be my own misunderstanding of how SPS works, but I'd assume that we need to start keeping labels as soon as profiling starts (when const char*s go into the buffer) and stop keeping when profiling stops.  Unfortunately, this would mean never freeing labels for people who run with profiling always-on, which is why I was suggesting the admittedly more complicated scheme that attaches generations to labels (you get stamped with a generation when your module/jitcode dies and you can't be freed until 2 generations  later).
(Assignee)

Comment 6

a year ago
(In reply to Luke Wagner [:luke] from comment #5)

Thanks for the comments, I'll update the patch.

> This might be my own misunderstanding of how SPS works, but I'd assume that
> we need to start keeping labels as soon as profiling starts (when const
> char*s go into the buffer) and stop keeping when profiling stops.

If I'm understanding this correctly this label string is only used here, where it's passed to addDynamicTag which memcpys the contents:

http://searchfox.org/mozilla-central/source/tools/profiler/core/GeckoSampler.cpp#904

This is after the ProfilingFrameIterator loop but still in the same function.
(In reply to Jon Coppeard (:jonco) from comment #6)
Ah hah, thanks for explaining.  I had assumed that the reason for the DuplicateString() was because we wanted to embed string pointers into the circular buffer (so we didn't chew through it so quickly).  But I see now that the problem was just GCs during the stack walk itself.

So in this case, if we are walking the stack and find a wasm module, that module is definitely rooted by the callstack, and so no GC should be able to invalidate its labels, so I think your patch is fine from a wasm pov.
(Assignee)

Updated

a year ago
Attachment #8810459 - Flags: review?(kvijayan)
(Assignee)

Comment 8

a year ago
Created attachment 8811684 [details] [diff] [review]
bug1316150-profiler-gc-hazard

So after digging into this some more, it seems that the gecko profiler doesn't use these label strings after anything that could trigger a GC, so we don't really need to do any of this.  The testing function ReadSPSProfilingStack does but that can be changed.

Here's a patch that fixes that and makes the hazard analysis check that ProfilingFrameIterator::Frame is not used after GC (this is not ideal since we could copy the pointer somewhere but is better than nothing).

This is on top of a backout of bug 1297360 which removes the malloc.
Attachment #8810459 - Attachment is obsolete: true
Attachment #8811684 - Flags: review?(kvijayan)
Comment on attachment 8811684 [details] [diff] [review]
bug1316150-profiler-gc-hazard

Review of attachment 8811684 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/builtin/TestingFunctions.cpp
@@ +1744,3 @@
>            : kind(kind), label(label) {}
>          const char* kind;
> +        char* label;

Since this label is now a duped string, I don't see where it's getting freed.

@@ +1774,5 @@
>                default:
>                  frameKindStr = "unknown";
>              }
>  
> +            char* label = JS_strdup(cx, frames[i].label);

Duped here.
(Assignee)

Comment 10

a year ago
(In reply to Kannan Vijayan [:djvj] from comment #9)
NewString() takes owenership of the malloced chars, but I do need to make this a UniqueChars so that it gets freed if we fail before that point.
(Assignee)

Comment 11

a year ago
Created attachment 8812176 [details] [diff] [review]
bug1316150-profiler-gc-hazard v2

Updated patch.
Attachment #8811684 - Attachment is obsolete: true
Attachment #8811684 - Flags: review?(kvijayan)
Attachment #8812176 - Flags: review?(kvijayan)
Comment on attachment 8812176 [details] [diff] [review]
bug1316150-profiler-gc-hazard v2

Review of attachment 8812176 [details] [diff] [review]:
-----------------------------------------------------------------

Cool.
Attachment #8812176 - Flags: review?(kvijayan) → review+

Comment 13

a year ago
Pushed by jcoppeard@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/f35a745eff4c
Make profiler frame labels a GC hazard and make shell profiling function copy them r=djvj

Comment 14

a year ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/f35a745eff4c
Status: NEW → RESOLVED
Last Resolved: a year ago
status-firefox53: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
(Assignee)

Updated

a year ago
Duplicate of this bug: 1297360
I retested comment 0 and can confirm no more hang.  Thanks guys!
You need to log in before you can comment on or make changes to this bug.