Open Bug 1027305 Opened 10 years ago Updated 2 years ago

Lazily call PCToLineNumber and atomize filenames in SavedStacks

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect

Tracking

()

People

(Reporter: fitzgen, Unassigned)

References

Details

Attachments

(4 files, 3 obsolete files)

This was kind of the original plan for bug 1004110, but it morphed into just memoization of PCToLineNumber.

Initially, we should just save a stack as {script, pc, functionDisplayName, parentFrame, principals}. Once asked for specific location information, we can additionally store {source, line, column} by atomizing the script's filename and calling PCToLineNumber (via SavedStacks::getLocation).

The key to maintaining shared older frame tails is that we continue to hash the saved frame based on script and pc (and function display name, parent, etc) as opposed to hashing the computed location. This way a still-lazy and a reified saved frame will hash the same and we can avoid the shared-tail fragmentation I described in bug 1004110 comment 1. We will have to set the script pointer to null in SavedStacks::sweep and then rehash the saved frame once again, but this is ok because if the script is collected, we won't be saving that same frame again.
Assignee: nobody → nfitzgerald
Status: NEW → ASSIGNED
Attached patch WIP (obsolete) — Splinter Review
This is a work in progress patch.

I am trying to avoid keeping a frame's script alive, and I thought that meant that we couldn't reify a lazy frame whose script was being collected. After talking with shu, we realized that the source doesn't need to be a JSAtom anymore because we aren't hashing it anymore (it can remain a char *), so we don't actually need a cx to reify. The one thing we can't do is insert into the PCToLineNumber memoization table while we are under GC, but it seems ok to just lose the opportunity for memoization in these cases.
Also, shu points out we could keep the ScriptSource alive, which would keep the script's filename alive and allow us to avoid allocation and copying strings under gc.
Ok, so I ended up holding a strong reference to the ScriptSourceObject, which is better than holding the JSScript alive (and therefore also the ScriptSourceObject), but still isn't perfect. Seems like there should be a better way to lazily atomize the filename that doesn't keep the whole source text around, but I can't think of one...

Try push: https://tbpl.mozilla.org/?tree=Try&rev=ff891cee9aa6
Attachment #8443193 - Attachment is obsolete: true
Attachment #8443242 - Flags: review?(shu)
--ion,    debugger, trackingAllocationSites
warmup
measure
[Stats total: 554.4991921386722s, mean: 1.1089983842773445s, stddev: 7%]

--no-ion, debugger, trackingAllocationSites
warmup
measure
[Stats total: 529.9646965332032s, mean: 1.0599293930664064s, stddev: 2%]

--ion,    debugger, no hooks
warmup
measure
[Stats total: 109.69202856445322s, mean: 0.21938405712890646s, stddev: 6%]

--no-ion, debugger, no hooks
warmup
measure
[Stats total: 106.73220117187498s, mean: 0.21346440234374994s, stddev: 3%]
Attached image screenshot of profile
So this patch is only about a ~1.15x improvement over just memoizing PCToLineNumber.

Here is a screenshot of my profiling.

Further ideas in bug 1028418.
(In reply to Nick Fitzgerald [:fitzgen] from comment #5)
> Created attachment 8443764 [details]
> screenshot of profile
> 
> So this patch is only about a ~1.15x improvement over just memoizing
> PCToLineNumber.
> 
> Here is a screenshot of my profiling.
> 
> Further ideas in bug 1028418.

It seems like the complexity and the GC-sensitive subtleties here isn't worth the performance gain. I believe the bit-on-frame on idea in bug 1028418 should be a much bigger win and is possibly easier to reason about.

What do you think about shelving this idea in favor of bug 1028418? We can revisit if that idea doesn't pan out.
Attachment #8443242 - Flags: review?(shu)
(In reply to Shu-yu Guo [:shu] from comment #6)
> (In reply to Nick Fitzgerald [:fitzgen] from comment #5)
> > Created attachment 8443764 [details]
> > screenshot of profile
> > 
> > So this patch is only about a ~1.15x improvement over just memoizing
> > PCToLineNumber.
> > 
> > Here is a screenshot of my profiling.
> > 
> > Further ideas in bug 1028418.
> 
> It seems like the complexity and the GC-sensitive subtleties here isn't
> worth the performance gain. I believe the bit-on-frame on idea in bug
> 1028418 should be a much bigger win and is possibly easier to reason about.
> 
> What do you think about shelving this idea in favor of bug 1028418? We can
> revisit if that idea doesn't pan out.

:bz, what do you think? You were a big proponent of making this lazy.

:shu, what do you think about holding onto the JSScript? It would simplify the GC things a little bit and I think isn't really that much worse than holding onto the source object. Would that change your mind at all?
Flags: needinfo?(shu)
Flags: needinfo?(bzbarsky)
Do you mean holding onto the JSScript strongly? That doesn't seem like a great alternative either given the reload use case.

My intuition is that if there are a lot of shared tails of older frames that we don't need to walk, the # of frames we have to walk should be small enough that doing the walk eagerly is fast enough.
Flags: needinfo?(shu)
(In reply to Shu-yu Guo [:shu] from comment #8)

> My intuition is that if there are a lot of shared tails of older frames that
> we don't need to walk, the # of frames we have to walk should be small
> enough that doing the walk eagerly is fast enough.

That is, if we implement the bit-on-frame idea from bug 1028418 and making it possible to determine that we don't need to walk them.
Nick, it's all a matter of what our actual performance goals are and whether we're hitting them.

For our use cases in bug 857648 the cost of doing a copy of the script filename (just the cost of the _malloc_ call that involved) was not really acceptable because our relevant performance metric there was "as fast as possible".  I was hoping we could get rid of the API we're using there and just use SavedStacks instead, but a minimal requirement for that seems to be that the performance is comparable.  I don't care too much about how that's accomplished (this bug, bug 1028418, or some combination), as long as it's comparable.

Note that right now exceptions thrown from DOM methods do in fact end up holding on to the JSScript, right?

In any case, right now a CaptureCurrentStack() call seems to be about 400ns per stackframe or so plus 400ns initial overhead (that's with me calling it over and over with the same exact stack).  And like I said in bug 1026188 comment 7, a good bit of that is atomizing the filename.  If we're going to use this stuff in non-debug mode we need to reduce the cost significantly, imo.
Flags: needinfo?(bzbarsky)
Saved 100,000 stacks via JS::DescribeStack and via SavedStacks::saveCurrentStack (with this patch), results are below:

DescribeStack: 819 ms
SaveStack: 1516 ms
Attachment #8446179 - Attachment is obsolete: true
Commenting out the object allocations in SavedStacks::insertFrame gives us:

DescribeStack: 828 ms
SaveStack: 1234 ms

... not sure why this is so much slower than JS::DescribeStack.

I have to say I agree with :shu that all this work isn't worth the 1.15x improvement. Will leave this bug open because 1.15x might be worth it in the future, but for now we have bigger fish to fry (like actually shipping memory tools rather than improving perf on memory tools that haven't shipped yet).
Assignee: nfitzgerald → nobody
Status: ASSIGNED → NEW
Note that my primary interest in the performance of SaveStack is because it would be good to use it for exceptions and get rid of DescribeStack, then expose the saved stacks sanely to the console, but we have benchmarks that are gated on exception performance...
In particular, I'd really like to fix bug 857648.
So for what it's worth, I poked a bit at how performance on a per-call basis breaks down for the comment 12 testcase.  Here's what I see:

On my hardware, describeStack() ends up being about 370ns per call.

On my hardware, in a tip build with just the comment 12 patch applied, saveStack() ends up being about 630ns per call.  That 630ns is broken down as follows, based on adding early returns and hacking saveStack to handle a null return:

 50ns to land inside saveStack, construct the CallArgs, call saveCurrentStack (but have it
      just return null immediately), set the rval, push into the array.
240ns to construct the ScriptFrameIter in saveCurrentStack.
170ns to copy-construct another ScriptFrameIter in SavedStacks::insertFrames.
~20ns to do the recursive call to insertFrames() (which finds that iter.done()).
~20ns for the getLocation call.
100ns for the lookupForAdd in getOrCreateSavedFrame.

We never reach the createFrameFromLookup line, since we're getting a consistent hashtable hit here, so the performance of that bit is not relevant.  It also looks like we get cache hits in SavedStacks::getLocation, so we don't end up atomizing more than once.

sizeof(ScriptFrameIter) == 1024, for what it's worth, so that's a lot of data to copy and whatnot...

I tried doing this:

    RootedScript script(cx, iter.script());
    jsbytecode* pc = iter.pc();
    RootedFunction callee(cx, iter.maybeCallee());
    JSCompartment* compartment = iter.compartment();

before doing the recursive insertFrames call, and getting rid of the thisFrame object entirely, and that dropped times from ~630ns to ~440ns per call.

If I also stop hashing the "source" string in SavedFrame::HashPolicy::hash, then I get that time down to being faster than DescribeStack.

But "source" is already a JSAtom*.  Why are we hashing the chars again?  Can't we just hash the pointer itself?  Certainly in the comparison function we're comparing the pointer (after comparing the length for some reason).
Nick, is there some reason we shouldn't try doing this?
Attachment #8446305 - Flags: feedback?(nfitzgerald)
Attachment #8446305 - Attachment is obsolete: true
Attachment #8446305 - Flags: feedback?(nfitzgerald)
Attachment #8446306 - Flags: feedback?(nfitzgerald)
I fixed the JSAtom hashing in my patch for bug 993085, which is still waiting on review, although this patch also changes it so that the source isn't hashed (and the functionDisplayName is hashed as a ptr). Dumb mistake nonetheless :|

When I said I commented out the allocations, I just returned true after the insertFrames recursive call. This also disabled the lookups and all that.

A better benchmark would probably have some more frames on the stack before capturing it...
Comment on attachment 8446306 [details] [diff] [review]
Attempt to not copy-construct iterators and rehash atoms, even compiling

Review of attachment 8446306 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/vm/SavedStacks.cpp
@@ +101,5 @@
>      if (source != lookup.source)
>          return false;
>  
>      JSAtom *functionDisplayName = existing->getFunctionDisplayName();
> +    if (functionDisplayName != lookup.functionDisplayName)

Yeah, I've made this fix (and the hashing fix) in other patches (and maybe the patch in this bug too?) but somehow it hasn't landed yet.

Dumb mistake :|

@@ +480,5 @@
> +    RootedScript script(cx, iter.script());
> +    jsbytecode* pc = iter.pc();
> +    RootedFunction callee(cx, iter.maybeCallee());
> +    // script and callee should keep compartment alive.
> +    JSCompartment* compartment = iter.compartment();

Good catch!
Attachment #8446306 - Flags: feedback?(nfitzgerald) → feedback+
OK, I spun off those bits into bug 1030938, since they're not strictly relevant here.
No longer blocks: 857648
No longer blocks: 1028418
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: