Open Bug 1027305 Opened 10 years ago Updated 8 days ago

Lazily call PCToLineNumber and atomize filenames in SavedStacks

Tracking

()

Status:

NEW

People

(Reporter: fitzgen, Unassigned)

References

(Blocks 1 open bug)

Details

Attachments

(4 files, 3 obsolete files)

WIP 10 years ago Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8] 21.05 KB, patch		Details \| Diff \| Splinter Review
lazy-stacks.patch 10 years ago Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8] 22.24 KB, patch		Details \| Diff \| Splinter Review
screenshot of profile 10 years ago Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8] 1.47 MB, image/png		Details
JS::DescribeStack vs SavedStacks::saveCurrentStack 10 years ago Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8] 2.28 KB, patch		Details \| Diff \| Splinter Review
JS::DescribeStack vs SavedStacks::saveCurrentStack 10 years ago Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8] 2.84 KB, patch		Details \| Diff \| Splinter Review
Attempt to not copy-construct iterators and rehash atoms 10 years ago Boris Zbarsky [:bzbarsky] 4.88 KB, patch		Details \| Diff \| Splinter Review
Attempt to not copy-construct iterators and rehash atoms, even compiling 10 years ago Boris Zbarsky [:bzbarsky] 4.88 KB, patch	fitzgen : feedback+	Details \| Diff \| Splinter Review

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Description

•

10 years ago

This was kind of the original plan for bug 1004110, but it morphed into just memoization of PCToLineNumber. Initially, we should just save a stack as {script, pc, functionDisplayName, parentFrame, principals}. Once asked for specific location information, we can additionally store {source, line, column} by atomizing the script's filename and calling PCToLineNumber (via SavedStacks::getLocation). The key to maintaining shared older frame tails is that we continue to hash the saved frame based on script and pc (and function display name, parent, etc) as opposed to hashing the computed location. This way a still-lazy and a reified saved frame will hash the same and we can avoid the shared-tail fragmentation I described in bug 1004110 comment 1. We will have to set the script pointer to null in SavedStacks::sweep and then rehash the saved frame once again, but this is ok because if the script is collected, we won't be saving that same frame again.

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Updated

•

10 years ago

Assignee: nobody → nfitzgerald

Status: NEW → ASSIGNED

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Comment 1

•

10 years ago

Attached patch WIP (obsolete) — Details — Splinter Review

This is a work in progress patch. I am trying to avoid keeping a frame's script alive, and I thought that meant that we couldn't reify a lazy frame whose script was being collected. After talking with shu, we realized that the source doesn't need to be a JSAtom anymore because we aren't hashing it anymore (it can remain a char *), so we don't actually need a cx to reify. The one thing we can't do is insert into the PCToLineNumber memoization table while we are under GC, but it seems ok to just lose the opportunity for memoization in these cases.

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Comment 2

•

10 years ago

Also, shu points out we could keep the ScriptSource alive, which would keep the script's filename alive and allow us to avoid allocation and copying strings under gc.

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Comment 3

•

10 years ago

Attached patch lazy-stacks.patch — Details — Splinter Review

Ok, so I ended up holding a strong reference to the ScriptSourceObject, which is better than holding the JSScript alive (and therefore also the ScriptSourceObject), but still isn't perfect. Seems like there should be a better way to lazily atomize the filename that doesn't keep the whole source text around, but I can't think of one... Try push: https://tbpl.mozilla.org/?tree=Try&rev=ff891cee9aa6

Attachment #8443193 - Attachment is obsolete: true

Attachment #8443242 - Flags: review?(shu)

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Comment 4

•

10 years ago

--ion, debugger, trackingAllocationSites warmup measure [Stats total: 554.4991921386722s, mean: 1.1089983842773445s, stddev: 7%] --no-ion, debugger, trackingAllocationSites warmup measure [Stats total: 529.9646965332032s, mean: 1.0599293930664064s, stddev: 2%] --ion, debugger, no hooks warmup measure [Stats total: 109.69202856445322s, mean: 0.21938405712890646s, stddev: 6%] --no-ion, debugger, no hooks warmup measure [Stats total: 106.73220117187498s, mean: 0.21346440234374994s, stddev: 3%]

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Updated

•

10 years ago

Blocks: 1028418

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Comment 5

•

10 years ago

Attached image screenshot of profile — Details

So this patch is only about a ~1.15x improvement over just memoizing PCToLineNumber. Here is a screenshot of my profiling. Further ideas in bug 1028418.

Shu-yu Guo [:shu]

Comment 6

•

10 years ago

(In reply to Nick Fitzgerald [:fitzgen] from comment #5) > Created attachment 8443764 [details] > screenshot of profile > > So this patch is only about a ~1.15x improvement over just memoizing > PCToLineNumber. > > Here is a screenshot of my profiling. > > Further ideas in bug 1028418. It seems like the complexity and the GC-sensitive subtleties here isn't worth the performance gain. I believe the bit-on-frame on idea in bug 1028418 should be a much bigger win and is possibly easier to reason about. What do you think about shelving this idea in favor of bug 1028418? We can revisit if that idea doesn't pan out.

Shu-yu Guo [:shu]

Updated

•

10 years ago

Attachment #8443242 - Flags: review?(shu)

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Comment 7

•

10 years ago

(In reply to Shu-yu Guo [:shu] from comment #6) > (In reply to Nick Fitzgerald [:fitzgen] from comment #5) > > Created attachment 8443764 [details] > > screenshot of profile > > > > So this patch is only about a ~1.15x improvement over just memoizing > > PCToLineNumber. > > > > Here is a screenshot of my profiling. > > > > Further ideas in bug 1028418. > > It seems like the complexity and the GC-sensitive subtleties here isn't > worth the performance gain. I believe the bit-on-frame on idea in bug > 1028418 should be a much bigger win and is possibly easier to reason about. > > What do you think about shelving this idea in favor of bug 1028418? We can > revisit if that idea doesn't pan out. :bz, what do you think? You were a big proponent of making this lazy. :shu, what do you think about holding onto the JSScript? It would simplify the GC things a little bit and I think isn't really that much worse than holding onto the source object. Would that change your mind at all?

Flags: needinfo?(shu)

Flags: needinfo?(bzbarsky)

Shu-yu Guo [:shu]

Comment 8

•

10 years ago

Do you mean holding onto the JSScript strongly? That doesn't seem like a great alternative either given the reload use case. My intuition is that if there are a lot of shared tails of older frames that we don't need to walk, the # of frames we have to walk should be small enough that doing the walk eagerly is fast enough.

Flags: needinfo?(shu)

Shu-yu Guo [:shu]

Comment 9

•

10 years ago

(In reply to Shu-yu Guo [:shu] from comment #8) > My intuition is that if there are a lot of shared tails of older frames that > we don't need to walk, the # of frames we have to walk should be small > enough that doing the walk eagerly is fast enough. That is, if we implement the bit-on-frame idea from bug 1028418 and making it possible to determine that we don't need to walk them.

Boris Zbarsky [:bzbarsky]

Comment 10

•

10 years ago

Nick, it's all a matter of what our actual performance goals are and whether we're hitting them. For our use cases in bug 857648 the cost of doing a copy of the script filename (just the cost of the _malloc_ call that involved) was not really acceptable because our relevant performance metric there was "as fast as possible". I was hoping we could get rid of the API we're using there and just use SavedStacks instead, but a minimal requirement for that seems to be that the performance is comparable. I don't care too much about how that's accomplished (this bug, bug 1028418, or some combination), as long as it's comparable. Note that right now exceptions thrown from DOM methods do in fact end up holding on to the JSScript, right? In any case, right now a CaptureCurrentStack() call seems to be about 400ns per stackframe or so plus 400ns initial overhead (that's with me calling it over and over with the same exact stack). And like I said in bug 1026188 comment 7, a good bit of that is atomizing the filename. If we're going to use this stuff in non-debug mode we need to reduce the cost significantly, imo.

Flags: needinfo?(bzbarsky)

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Comment 11

•

10 years ago

Attached patch JS::DescribeStack vs SavedStacks::saveCurrentStack (obsolete) — Details — Splinter Review

Saved 100,000 stacks via JS::DescribeStack and via SavedStacks::saveCurrentStack (with this patch), results are below: DescribeStack: 819 ms SaveStack: 1516 ms

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Comment 12

•

10 years ago

Attached patch JS::DescribeStack vs SavedStacks::saveCurrentStack — Details — Splinter Review

(Now with the benchmark file)

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Updated

•

10 years ago

Attachment #8446179 - Attachment is obsolete: true

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Comment 13

•

10 years ago

Commenting out the object allocations in SavedStacks::insertFrame gives us: DescribeStack: 828 ms SaveStack: 1234 ms ... not sure why this is so much slower than JS::DescribeStack. I have to say I agree with :shu that all this work isn't worth the 1.15x improvement. Will leave this bug open because 1.15x might be worth it in the future, but for now we have bigger fish to fry (like actually shipping memory tools rather than improving perf on memory tools that haven't shipped yet).

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Updated

•

10 years ago

Assignee: nfitzgerald → nobody

Status: ASSIGNED → NEW

Boris Zbarsky [:bzbarsky]

Comment 14

•

10 years ago

Note that my primary interest in the performance of SaveStack is because it would be good to use it for exceptions and get rid of DescribeStack, then expose the saved stacks sanely to the console, but we have benchmarks that are gated on exception performance...

Boris Zbarsky [:bzbarsky]

Comment 15

•

10 years ago

In particular, I'd really like to fix bug 857648.

Boris Zbarsky [:bzbarsky]

Comment 16

•

10 years ago

So for what it's worth, I poked a bit at how performance on a per-call basis breaks down for the comment 12 testcase. Here's what I see: On my hardware, describeStack() ends up being about 370ns per call. On my hardware, in a tip build with just the comment 12 patch applied, saveStack() ends up being about 630ns per call. That 630ns is broken down as follows, based on adding early returns and hacking saveStack to handle a null return: 50ns to land inside saveStack, construct the CallArgs, call saveCurrentStack (but have it just return null immediately), set the rval, push into the array. 240ns to construct the ScriptFrameIter in saveCurrentStack. 170ns to copy-construct another ScriptFrameIter in SavedStacks::insertFrames. ~20ns to do the recursive call to insertFrames() (which finds that iter.done()). ~20ns for the getLocation call. 100ns for the lookupForAdd in getOrCreateSavedFrame. We never reach the createFrameFromLookup line, since we're getting a consistent hashtable hit here, so the performance of that bit is not relevant. It also looks like we get cache hits in SavedStacks::getLocation, so we don't end up atomizing more than once. sizeof(ScriptFrameIter) == 1024, for what it's worth, so that's a lot of data to copy and whatnot... I tried doing this: RootedScript script(cx, iter.script()); jsbytecode* pc = iter.pc(); RootedFunction callee(cx, iter.maybeCallee()); JSCompartment* compartment = iter.compartment(); before doing the recursive insertFrames call, and getting rid of the thisFrame object entirely, and that dropped times from ~630ns to ~440ns per call. If I also stop hashing the "source" string in SavedFrame::HashPolicy::hash, then I get that time down to being faster than DescribeStack. But "source" is already a JSAtom*. Why are we hashing the chars again? Can't we just hash the pointer itself? Certainly in the comparison function we're comparing the pointer (after comparing the length for some reason).

Boris Zbarsky [:bzbarsky]

Comment 17

•

10 years ago

Attached patch Attempt to not copy-construct iterators and rehash atoms (obsolete) — Details — Splinter Review

Nick, is there some reason we shouldn't try doing this?

Attachment #8446305 - Flags: feedback?(nfitzgerald)

Boris Zbarsky [:bzbarsky]

Comment 18

•

10 years ago

Attached patch Attempt to not copy-construct iterators and rehash atoms, even compiling — Details — Splinter Review

Attachment #8446305 - Attachment is obsolete: true

Attachment #8446305 - Flags: feedback?(nfitzgerald)

Attachment #8446306 - Flags: feedback?(nfitzgerald)

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Comment 19

•

10 years ago

I fixed the JSAtom hashing in my patch for bug 993085, which is still waiting on review, although this patch also changes it so that the source isn't hashed (and the functionDisplayName is hashed as a ptr). Dumb mistake nonetheless :| When I said I commented out the allocations, I just returned true after the insertFrames recursive call. This also disabled the lookups and all that. A better benchmark would probably have some more frames on the stack before capturing it...

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Comment 20

•

10 years ago

Comment on attachment 8446306 [details] [diff] [review] Attempt to not copy-construct iterators and rehash atoms, even compiling Review of attachment 8446306 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/vm/SavedStacks.cpp @@ +101,5 @@ > if (source != lookup.source) > return false; > > JSAtom *functionDisplayName = existing->getFunctionDisplayName(); > + if (functionDisplayName != lookup.functionDisplayName) Yeah, I've made this fix (and the hashing fix) in other patches (and maybe the patch in this bug too?) but somehow it hasn't landed yet. Dumb mistake :| @@ +480,5 @@ > + RootedScript script(cx, iter.script()); > + jsbytecode* pc = iter.pc(); > + RootedFunction callee(cx, iter.maybeCallee()); > + // script and callee should keep compartment alive. > + JSCompartment* compartment = iter.compartment(); Good catch!

Attachment #8446306 - Flags: feedback?(nfitzgerald) → feedback+

Boris Zbarsky [:bzbarsky]

Comment 21

•

10 years ago

OK, I spun off those bits into bug 1030938, since they're not strictly relevant here.

Boris Zbarsky [:bzbarsky]

Updated

•

10 years ago

No longer blocks: 857648

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Reporter

Updated

•

9 years ago

No longer blocks: 1028418

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

Nicolas B. Pierron [:nbp]

Updated

•

8 days ago

Blocks: sm-runtime

Severity: S3 → N/A

Type: defect → enhancement

Priority: -- → P5

You need to log in before you can comment on or make changes to this bug.