Closed
Bug 1298816
Opened 5 years ago
Closed 4 years ago
JitcodeGlobalTable::markUnconditionally is taking a lot of time when the profiler is running
Categories
(Core :: JavaScript: GC, defect)
Core
JavaScript: GC
Tracking
()
RESOLVED
FIXED
mozilla55
| Tracking | Status | |
|---|---|---|
| firefox55 | --- | fixed |
People
(Reporter: jrmuizel, Assigned: sfink)
References
(Blocks 1 open bug)
Details
(Whiteboard: [qf:p1])
Attachments
(1 file, 2 obsolete files)
|
16.45 KB,
patch
|
jonco
:
review+
|
Details | Diff | Splinter Review |
We have a profiler label in nsJSContext::GarbageCollectNow that tells the profiler that a GC is happening but doesn't give any information about what page is being GC'd. This would be very handy when looking at the source of long GCs.
Comment 1•5 years ago
|
||
GCs are global, across all tabs and chrome JS, about half of the time. It would be neat to somehow figure out how much time is being spent GCing each compartment, and then map compartments to URLs.
| Reporter | ||
Comment 2•5 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #1) > GCs are global, across all tabs and chrome JS, about half of the time. It > would be neat to somehow figure out how much time is being spent GCing each > compartment, and then map compartments to URLs. In what function does the per compartment stuff happen? How easy is it for us to map a compartment to a URL?
Comment 3•5 years ago
|
||
(In reply to Jeff Muizelaar [:jrmuizel] from comment #2) > In what function does the per compartment stuff happen? Almost everything the GC does is per-compartment. The way work is split up, it would likely be a bit of a hassle to attribute GC time to each compartment. For instance, first the GC does marking for each compartment, then later it does sweeping for each compartment. > How easy is it for us to map a compartment to a URL? That should not be too hard. Each compartment corresponds to a particular nsGlobalWindow, so something could be figured out from there. It would likely require some kind of callback into DOM.
| Reporter | ||
Comment 4•5 years ago
|
||
So, in this profile http://people.mozilla.org/~bgirard/cleopatra/#report=46a8d3e26e9748c77b4ab7dfe392f684f3cb1a6d I'm spending all of my time in js::jit::JitcodeGlobalTable::markUnconditionally(JSTracer*). Is that associated with a particular compartment or does it loop over the compartments?
Flags: needinfo?(continuation)
Comment 5•5 years ago
|
||
(In reply to Jeff Muizelaar [:jrmuizel] from comment #4) > I'm spending all of my time in > js::jit::JitcodeGlobalTable::markUnconditionally(JSTracer*). Is that > associated with a particular compartment or does it loop over the > compartments? I'm not familiar with how JS engine stuff is traced by the GC. Maybe Terrence knows.
Flags: needinfo?(continuation) → needinfo?(terrence)
Comment 6•5 years ago
|
||
(In reply to Jeff Muizelaar [:jrmuizel] from comment #4) > So, in this profile > http://people.mozilla.org/~bgirard/cleopatra/ > #report=46a8d3e26e9748c77b4ab7dfe392f684f3cb1a6d > I'm spending all of my time in > js::jit::JitcodeGlobalTable::markUnconditionally(JSTracer*). Is that > associated with a particular compartment or does it loop over the > compartments? Shu added this in bug 1182730 to get around a problem that I don't fully understand.
Flags: needinfo?(terrence) → needinfo?(shu)
Comment 7•5 years ago
|
||
(In reply to Terrence Cole [:terrence] from comment #6) > (In reply to Jeff Muizelaar [:jrmuizel] from comment #4) > > So, in this profile > > http://people.mozilla.org/~bgirard/cleopatra/ > > #report=46a8d3e26e9748c77b4ab7dfe392f684f3cb1a6d > > I'm spending all of my time in > > js::jit::JitcodeGlobalTable::markUnconditionally(JSTracer*). Is that > > associated with a particular compartment or does it loop over the > > compartments? > > Shu added this in bug 1182730 to get around a problem that I don't fully > understand. It's been a while. So :djvj added a global side table of profiling metadata for JitCode so they could be profiled better. This table stores raw pointers to GC things from inside the JS engine. It is marked weakly during major GCs depending on what scripts may have been profiled and are inside the profiler's circular buffer. The problem was that sometimes, there are raw pointers to nursery GC things in the table, so to tenure them properly I mark the whole table unconditionally during minor GCs. This is understandably very slow and I would certainly like to find a better solution. Usually, both of those problems can be solved by read barriers. But this table is consulted from inside the profiler, which is either from a signal handler or a different thread. We don't have any barriers in the engine that are safe for that use case.
Flags: needinfo?(shu)
| Reporter | ||
Comment 8•5 years ago
|
||
Oh, so is js::jit::JitcodeGlobalTable::markUnconditionally only happening while profiling is on?
Flags: needinfo?(shu)
Comment 9•5 years ago
|
||
(In reply to Jeff Muizelaar [:jrmuizel] from comment #8) > Oh, so is js::jit::JitcodeGlobalTable::markUnconditionally only happening > while profiling is on? Yeah, it's marking the side table that enables the profiler to symbolicate JS JIT code that it sampled.
Flags: needinfo?(shu)
| Reporter | ||
Comment 10•5 years ago
|
||
(In reply to Shu-yu Guo [:shu] from comment #9) > (In reply to Jeff Muizelaar [:jrmuizel] from comment #8) > > Oh, so is js::jit::JitcodeGlobalTable::markUnconditionally only happening > > while profiling is on? > > Yeah, it's marking the side table that enables the profiler to symbolicate > JS JIT code that it sampled. Oh, well I guess it's a bit of red herring then. Perhaps we could rename it to something more obviously associated with profiling so people who see it in profiles know to ignore it? Also, what determines how much time we spend in that function? Does it depend on the number of scripts or something else?
Flags: needinfo?(shu)
Comment 11•5 years ago
|
||
(In reply to Jeff Muizelaar [:jrmuizel] from comment #10) > (In reply to Shu-yu Guo [:shu] from comment #9) > > (In reply to Jeff Muizelaar [:jrmuizel] from comment #8) > > > Oh, so is js::jit::JitcodeGlobalTable::markUnconditionally only happening > > > while profiling is on? > > > > Yeah, it's marking the side table that enables the profiler to symbolicate > > JS JIT code that it sampled. > > Oh, well I guess it's a bit of red herring then. Perhaps we could rename it > to something more obviously associated with profiling so people who see it > in profiles know to ignore it? > > Also, what determines how much time we spend in that function? Does it > depend on the number of scripts or something else? Roughly O(# of scripts). It's the # of JitCode GC things, so pieces of code that the JIT has generated. For a single script, the JIT may compile multiple JitCodes for the body of the script (due to invalidated assumptions, or better inlining, or other heuristics), as well as JitCodes for fast paths generated for the inline caches.
Flags: needinfo?(shu)
Comment 12•5 years ago
|
||
Leaving notes from my conversation about this with Shu yesterday. The thing that SPS is reading from OMT is the bytecode mapping, not the raw GC pointer. We only reference the GC pointers from SPS during symbolification, which happens on the main thread (e.g. not while GC is running). While the GC is running, SPS may run OMT, but will not touch GC pointers in this case. So it should be perfectly safe to use a barrier to update these pointers instead of walking the entire table.
Comment 13•4 years ago
|
||
Shu, can this be fixed now? With the current focus on Quantum Flow, it would be nice if minor GCs in profiles weren't always misleadingly longer than they would be otherwise.
Flags: needinfo?(shu)
Comment 14•4 years ago
|
||
(In reply to :Ehsan Akhgari from comment #13) > Shu, can this be fixed now? With the current focus on Quantum Flow, it > would be nice if minor GCs in profiles weren't always misleadingly longer > than they would be otherwise. I can investigate, but not for a bit. How high a priority is this for Quantum right now?
Comment 15•4 years ago
|
||
(In reply to Shu-yu Guo [:shu] from comment #14) > (In reply to :Ehsan Akhgari from comment #13) > > Shu, can this be fixed now? With the current focus on Quantum Flow, it > > would be nice if minor GCs in profiles weren't always misleadingly longer > > than they would be otherwise. > > I can investigate, but not for a bit. How high a priority is this for > Quantum right now? We can probably live with it easily for at least a few more weeks since at this stage many of the performance issues being profiles are multi-second long pauses so the cost added by this issue is sort of lost in the sea of existing badness, but as we fix the really bad slowness issues this will become more and more misleading. Do you have a rough sense of how soon you would be able to get to it, and how much work it will be (let's say in human-weeks?) Markus, is there any way to somehow make this more visible in the profiler UI in the mean time so that people can ignore it a bit more effectively than pretty much accidentally?
Flags: needinfo?(mstange)
Comment 17•4 years ago
|
||
Kannan, can you help with this before Shu has some free time?
Flags: needinfo?(kvijayan)
Comment 18•4 years ago
|
||
(In reply to :Ehsan Akhgari from comment #15) > Markus, is there any way to somehow make this more visible in the profiler > UI in the mean time so that people can ignore it a bit more effectively than > pretty much accidentally? We could add pseudo stack frames for them with a category called PROFILER_OVERHEAD, maybe? And then mark those stack frames special in the profiler tree view.
Flags: needinfo?(mstange)
| Assignee | ||
Comment 19•4 years ago
|
||
It seems to work. I haven't compared nursery timings with and without this. I will do so.
Attachment #8853451 -
Flags: review?(jcoppeard)
| Assignee | ||
Updated•4 years ago
|
Assignee: nobody → sphink
Status: NEW → ASSIGNED
| Assignee | ||
Comment 20•4 years ago
|
||
Comment on attachment 8853451 [details] [diff] [review] Add read barrier to jitcode global table to accelerate nursery marking Just realized that if you enable the optimization tracking info, you will need to scan more than this for minor GCs. Canceling review to address.
Attachment #8853451 -
Flags: review?(jcoppeard)
Updated•4 years ago
|
Summary: Include information about what's being GC'd for the profiler → JitcodeGlobalTable::markUnconditionally is taking a lot of time when the profiler is running
| Assignee | ||
Comment 21•4 years ago
|
||
This seems to work, though I was getting some local leaks that are making me nervous. I'd also like to do some timings -- I'm assuming this is faster, but if linked list traversal turns out to be slower than a linear sweep through the entries, it's not guaranteed to be faster even if we're looking at fewer items.
Attachment #8854235 -
Flags: review?(jcoppeard)
| Assignee | ||
Updated•4 years ago
|
Attachment #8853451 -
Attachment is obsolete: true
Comment 22•4 years ago
|
||
Comment on attachment 8854235 [details] [diff] [review] Add read barrier to jitcode global table to accelerate nursery marking Review of attachment 8854235 [details] [diff] [review]: ----------------------------------------------------------------- This change looks fine as long as it does turn out to be a performance improvement. Reading through the comments again I just want to check my understanding of what's going on here. The only nursery pointers I can see are in IonTrackedTypeWithAddendum - the type and the constructor function. Wouldn't we normally put a post write barrier on these? I guess this is complicated by the fact that they're in a vector that can be reallocated though (and we can't directly put a type into the store buffer). So I think this approach is simpler. ::: js/src/jit/Ion.cpp @@ +615,5 @@ > if (trc->runtime()->geckoProfiler().enabled() && > trc->runtime()->hasJitRuntime() && > trc->runtime()->jitRuntime()->hasJitcodeGlobalTable()) > { > + trc->runtime()->jitRuntime()->getJitcodeGlobalTable()->traceForMinorGC(trc); I guess this method should be called TraceJitcodeGlobalTableForMinorGC since that's only situation it's used. ::: js/src/jit/JitcodeMap.cpp @@ +545,5 @@ > JSRuntime* rt) > { > MOZ_ASSERT(!TlsContext.get()->isProfilerSamplingEnabled()); > > + removeFromNurseryList(&entry); When does this get called? If it's only called when sweeping during sweeping the nursery list should be empty anyway. @@ +751,5 @@ > + entry->trace<Unconditionally>(trc); > + JitcodeGlobalEntry* prev = entry; > + entry = entry->nextNursery_; > + if (!prev->mayContainNurseryPointers()) > + removeFromNurseryList(prev); Everything live in the nursery should be promoted. Can we always remove these entries from the list? ::: js/src/jit/JitcodeMap.h @@ +566,5 @@ > base_.init(); > } > > explicit JitcodeGlobalEntry(const IonEntry& ion) > + : JitcodeGlobalEntry() Oh, we can use delegating constructors now, nice. @@ +1057,5 @@ > > private: > MOZ_MUST_USE bool addEntry(const JitcodeGlobalEntry& entry, JSRuntime* rt); > > + JitcodeGlobalEntry* do_lookupInternal(void* ptr); nit: I'm not sure our coding standard supports do_ method names. Maybe call it lookupUnbarriered for extra clarity?
| Assignee | ||
Updated•4 years ago
|
Whiteboard: [qf:p1]
| Assignee | ||
Comment 23•4 years ago
|
||
(In reply to Jon Coppeard (:jonco) from comment #22) > Comment on attachment 8854235 [details] [diff] [review] > Add read barrier to jitcode global table to accelerate nursery marking > > Review of attachment 8854235 [details] [diff] [review]: > ----------------------------------------------------------------- > > This change looks fine as long as it does turn out to be a performance > improvement. I only ran it twice with/without, my machine wasn't totally idle, and I suspect the variance between runs is quite high, but: Without patch: mean median 5%ile 95%ile 2925 2109 232 7870 3097 2258 560 9739 With patch: 2557 1519 460 7802 2866 1259 285 9295 That's the mean and median for all nursery GCs > 100 microseconds. It seems to show a smallish improvement on mean, a largish one on median, which looks great. But the other percentiles don't show much of anything. I suppose I could speculate that this is a common cause of moderately slow minor GCs, but the really long ones are something else. Then the median would shift quite a bit, but the mean would be more determined by the really bad ones that are still there? > Reading through the comments again I just want to check my understanding of > what's going on here. The only nursery pointers I can see are in > IonTrackedTypeWithAddendum - the type and the constructor function. > Wouldn't we normally put a post write barrier on these? I guess this is > complicated by the fact that they're in a vector that can be reallocated > though (and we can't directly put a type into the store buffer). So I think > this approach is simpler. The vector is not a problem; it's a vector of pointers to JitcodeGlobalEntries. Implementing a barrier for the series of Types embedded in the optimization tracking info might be a little tricky, but I haven't tried so it's possible that it's still simpler than what I have here. > ::: js/src/jit/Ion.cpp > @@ +615,5 @@ > > if (trc->runtime()->geckoProfiler().enabled() && > > trc->runtime()->hasJitRuntime() && > > trc->runtime()->jitRuntime()->hasJitcodeGlobalTable()) > > { > > + trc->runtime()->jitRuntime()->getJitcodeGlobalTable()->traceForMinorGC(trc); > > I guess this method should be called TraceJitcodeGlobalTableForMinorGC since > that's only situation it's used. Yeah, the trace() name was driving me crazy, so I fixed it, but I didn't fix this one in the middle. Will do. > ::: js/src/jit/JitcodeMap.cpp > @@ +545,5 @@ > > JSRuntime* rt) > > { > > MOZ_ASSERT(!TlsContext.get()->isProfilerSamplingEnabled()); > > > > + removeFromNurseryList(&entry); > > When does this get called? If it's only called when sweeping during > sweeping the nursery list should be empty anyway. See below for why I don't think it should be empty. But this comment made me realize that I should be calling removeFromNurseryList when an entry expires, too. I'm not sure how much it matters. You would have one of these IonEntries, it would expire (during a GC), you would leave it in the nursery list anyway, on the following minor GCs you'd uselessly mark it, then on the next major GC it would be swept. But I might as well remove it ASAP. > @@ +751,5 @@ > > + entry->trace<Unconditionally>(trc); > > + JitcodeGlobalEntry* prev = entry; > > + entry = entry->nextNursery_; > > + if (!prev->mayContainNurseryPointers()) > > + removeFromNurseryList(prev); > > Everything live in the nursery should be promoted. Can we always remove > these entries from the list? I don't think so. "mayContainNurseryPointers" here means "capable of storing nursery pointers". Maybe I should rename it to capableOfStoringNurseryPointers. The current name is ambiguous. If you do a minor GC, then of course there won't be any nursery pointers afterwards. But after returning to the mutator, I think you can update an existing entry to point to the nursery, and you'd want to trace that one. From TypeInference.cpp: // The only times // when type sets contain nursery references is when a nursery object has its // group dynamically changed to a singleton. In such cases the type set will // need to be traced at the next minor GC. Note that we seem to have Types here, not typesets; I assume the same applies. > @@ +1057,5 @@ > > > > private: > > MOZ_MUST_USE bool addEntry(const JitcodeGlobalEntry& entry, JSRuntime* rt); > > > > + JitcodeGlobalEntry* do_lookupInternal(void* ptr); > > nit: I'm not sure our coding standard supports do_ method names. Maybe call > it lookupUnbarriered for extra clarity? Yeah, ok. We have lookup+lookupInfallible -> lookupInternal -> do_lookupInternal. Too many names for the same thing! I guess it'll be lookup+lookupInfallible -> lookupInternal -> lookupInternalUnbarriered.
Updated•4 years ago
|
Attachment #8854235 -
Flags: review?(jcoppeard)
| Assignee | ||
Comment 24•4 years ago
|
||
Hm, I'm not sure what happened here. I thought I had this up for review. This is a stripped-down version that only worries about nursery scanning IonEntries. It even restricts itself to adding the linked list pointers to the IonEntry struct only for space savings. That might not be the greatest idea, if we are going to later add nursery pointers to other things. Anyway, I'm tempted to land this as-is to see if it helps with the observed perf behavior. But my timings aren't all that conclusive. Let me know what you think.
Attachment #8860574 -
Flags: review?(jcoppeard)
| Assignee | ||
Updated•4 years ago
|
Attachment #8854235 -
Attachment is obsolete: true
Comment 25•4 years ago
|
||
Comment on attachment 8860574 [details] [diff] [review] Add read barrier to jitcode global table to accelerate nursery marking Review of attachment 8860574 [details] [diff] [review]: ----------------------------------------------------------------- This looks good to me, so let's land it and see if it improves things. If not we can investigate a more sophisticated approach.
Attachment #8860574 -
Flags: review?(jcoppeard) → review+
Comment 26•4 years ago
|
||
Pushed by sfink@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/4d09ad5101ac Add read barrier to jitcode global table to accelerate nursery marking, r=jonco
| Assignee | ||
Updated•4 years ago
|
Flags: needinfo?(kvijayan)
Comment 27•4 years ago
|
||
| bugherder | ||
https://hg.mozilla.org/mozilla-central/rev/4d09ad5101ac
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
status-firefox55:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
You need to log in
before you can comment on or make changes to this bug.
Description
•