Closed Bug 1028418 Opened 10 years ago Closed 9 years ago

Minimize stack walking needed in SavedStacks::saveCurrentStack

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla42
Tracking Status
firefox42 --- fixed

People

(Reporter: fitzgen, Assigned: fitzgen)

References

(Blocks 1 open bug)

Details

Attachments

(10 files, 27 obsolete files)

1.59 MB, image/jpeg
Details
23.23 KB, text/plain
Details
1.32 KB, patch
fitzgen
: review+
Details | Diff | Splinter Review
916 bytes, patch
fitzgen
: review+
Details | Diff | Splinter Review
1.05 KB, patch
fitzgen
: review+
Details | Diff | Splinter Review
13.13 KB, patch
fitzgen
: review+
Details | Diff | Splinter Review
3.91 KB, patch
fitzgen
: review+
Details | Diff | Splinter Review
43.97 KB, patch
fitzgen
: review+
Details | Diff | Splinter Review
4.32 KB, patch
fitzgen
: review+
Details | Diff | Splinter Review
2.50 KB, patch
fitzgen
: review+
Details | Diff | Splinter Review
With PCToLineNumber memoization and lazy filename atomization and PCToLineNumber
calls, the tracking allocation sites in the debugger benchmark[0]
takes 554.4991921386722 seconds.

If we only capture the youngest frame and don't do the full stack walk, tracking
allocation sites in the debugger benchmark takes 214.9585874023438 seconds.

[0]: https://github.com/jimblandy/benchmark-debugger

If we could avoid walking the whole stack every time, we could potentially get a
speedup of ~2.5x!

Frames at the older end of the stack get walked over and over and over, even
though we already have a tail for them inside SavedStacks. The reason is that
we need to know a frame's parent before we know which SavedFrame instance to get
(or create).

If we had a bit on each stack frame that was initially clear, but which we could
set once we had created a SavedFrame instance for that stack frame, we could
maintain a cache of {AbstractFramePtr, SavedFrame} pairs. When we want to
capture a stack, we'd walk up the stack until we found a frame with our marked
bit. We'd then look up that frame in our cache and return the SavedFrame for
that stack frame, thus avoiding the need to continue walking up the rest of the
stack to even older frames. We would also remove all pairs from our cache that
have frames younger than our current frame (as they are now dangling pointers to
dead frames) and finally mark each frame on the stack and add the new
{AbstractFramePtr, SavedFrame} pairs to the cache.

Jim suggests that js::StackFrame already has a flags field whose upper bits are
zeroed, and we might be able to use that. Also, in IonFrames, a bit in the
descriptor could work for this (and believes there may be at least one
available), and we might be able to use that as well.
Here's the whiteboard diagram that came out of our Vidyo conversation. It's not very clear taken out of context (and might not even have been that clear *in* context), but for what it's worth:

The diagram shows a stack frame tree: all frames ever pushed during the course of a program's execution. Frames that we have in the cache have been flagged with x's. The cache mapping AbstractFramePtrs of flagged frames to their SavedStack instances is shown on the right.
The bottom right shows a given function j being called from two different places, to illustrate why the cache is needed: we can't tell which SavedFrame want to use for a frame without some knowledge of the older frames. The cache tells us when we've reached the portion of the stack that we've traversed previously.
We probably want to handle Ion frames with inlined frames in some clever way.

When running Ion code, we have an Ion frame, and then a chunk of inlined frames depending on the current instruction. If the Ion infrastructure has some pointer that identifies the particular chunk of inlined frames, call that an 'inline descriptor': the <Ion frame base address, inline descriptor> pair is what we want to use to index the walking cache.
Each js::Activation should own its own section of the walking cache. Then, that portion of the cache will get cleaned up automatically when we leave the Activation.
Blocks: 1059139
Assignee: nobody → nfitzgerald
No longer depends on: 1027305
Attached patch saved-stacks-frame-bit.patch (obsolete) — Splinter Review
Combined patch that applies on the following git commit:

> commit b90d0b0eca25df21d30e84577c5df0904d382158
> Author: Mike Shal <mshal@mozilla.com>
> Date:   Wed Jun 24 16:17:50 2015 -0400
> 
>     No bug - bump mozharness.json to 946d3352e137; a=RyanVM

To repro my lack of symbols, configure and build (I configure like `js/src/configure --with-ccache --enable-threadsafe --enable-stdcxx-compat --disable-shared-js --enable-debug --enable-debug-symbols --disable-optimize` and build with `make`) and then run:

  ./js/src/jit-test/jit_test.py $OBJ_DIR/dist/bin/js debug/Environment-variables.js -g
  (gdb) run
  # hit assertion failure
  (gdb) bt full 5

The backtrace is pretty useless (for me) as I don't get symbols for any of those frames.

Thanks for taking the time to help me out with this!
Flags: needinfo?(bzbarsky)
Talked this over with Nick.  Current hypothesis is that this is Apple gdb vs homebrew gdb....
Flags: needinfo?(bzbarsky)
(In reply to Boris Zbarsky [:bz] from comment #12)
> Talked this over with Nick.  Current hypothesis is that this is Apple gdb vs
> homebrew gdb....

Confirmed this was the problem.
For some reason, these changes result in "leaking" (marking and therefore preventing sweeping of) Debugger objects in the debug/* jit tests.

Not entirely sure what is going on.
Depends on: 1184605
With this patch series applied, I have been running into assertion failures when bailing out of baseline. I was able to narrow down what behavior triggers the assertion failure to simply rematerializing frames and doing nothing with them. I filed bug 1184605 for this, as I am 99% sure this is an existing, latent bug that this patch series just uncovered.
Alright, so we can't use AbstractFramePtr as the key for the cache, because
rematerializing ion frames to get an AbstractFramePtr requires bailing out from
ion. See bug 1184605 comment 6 for more details. We can't bailout every time we
capture a stack! The whole point of this caching is to improve performance, not
degrade it :P

Instead I'm going to investigate caching at a fine granularity when we have a
usable AbstractFramePtr, as these patches do now. When in asm js, we don't cache
at all, as these patches do now (but we should eventually fix this). When in
ion, we will cache at a coarser granularity: physical frames rather than logical
frames (ie, we won't cache for inlined frames).

The cache key will be either the AbstractFramePtr's raw bits, or the fp to the
physical ion frame. We don't care which, because we are never dereferencing
these pointers, we just treat them as unique numbers. We know they are unique
because two objects cannot occupy the same location in memory at the same
time. We don't worry about stale entries in the cache because (a) the cache is
ordered, and (b) we only go searching in the cache from older -> younger when we
have something that we know is live and has a live entry in the cache.

The final change required from the existing patch series is to take a bit on the
ion physical frame descriptors, rather than on the RematerializedFrame.
Depends on: 1184839
Attachment #8626961 - Attachment is obsolete: true
Attached patch rollup.patch (obsolete) — Splinter Review
This rollup patch applies on the following git commit:

commit acf6cee4eaba7976298c54e4c04915d1c78869e8
Merge: 5f95825 7620aed
Author: Ryan VanderMeulen <ryanvm@gmail.com>
Date:   Thu Jul 16 15:15:42 2015 -0400

    Merge b2g-inbound to m-c. a=merge
Attached file benchmarks (obsolete) —
These are some benchmark results from https://github.com/jimblandy/benchmark-debugger and https://bugzilla.mozilla.org/show_bug.cgi?id=1152893#c1. Seeing some pretty worrying TypeErrors in the former when we *don't* have these patches applied. Messing with the allocation sampling probability results in different TypeErrors surfacing. We definitely shouldn't be affecting semantics by observing allocations...

Anyways, the promise benchmark suite seems pretty noisy and I'm not 100% sure how to interpret these results.

bz, you offered your services in https://bugzilla.mozilla.org/show_bug.cgi?id=1152893#c7 and I'd like to take you up on it :)
Flags: needinfo?(bzbarsky)
Attached file benchmark error allocation in a loop (obsolete) —
Stupid microbenchmark cCapturing a stack with five frames in a loop.
Results of profiling the debugger benchmark tracking allocs P=1.0 w/ `perf record -F 999`:

Overhead  Symbol
   6.46%  js::jit::InlineFrameIterator::findNextFrame
   4.96%  js::jit::JitFrameIterator::checkInvalidation
   4.91%  js::SavedStacks::getOrCreateSavedFrame
   4.85%  js::jit::SnapshotIterator::SnapshotIterator
   4.00%  js::SavedStacks::insertFrames
   3.70%  js::jit::JitFrameIterator::machineState
   3.67%  js::SavedStacks::getLocation
   3.32%  js::jit::JitFrameIterator::script
   3.11%  js::jit::InlineFrameIterator::resetOn
   2.97%  js::TenuringTracer::traverse<JSObject*>
   2.82%  js::jit::SnapshotReader::readAllocationIndex
   2.73%  js::jit::IonScript::getSafepointIndex
   2.54%  js::jit::SnapshotReader::readAllocation
   2.27%  js::SavedFrame::HashPolicy::hash
   2.19%  js::InternalGCMethods<js::DebugScopeObject*>::readBarrier
   1.93%  js::jit::SafepointReader::SafepointReader
   1.68%  mozilla::VectorBase<js::SavedFrame::Lookup, 20ul, js::TempAllocPolicy, js::Vector<js::SavedFrame::Lookup, 20ul,
   1.66%  js::SavedStacks::findCachedSavedFrame
   1.63%  js::jit::JitFrameIterator::ionScript
   1.49%  js::SavedFrame::HashPolicy::match
   1.45%  js::WeakMap<js::PreBarriered<JSObject*>, js::RelocatablePtr<JS::Value>, js::DefaultHasher<js::PreBarriered<JSObj
   1.04%  js::jit::RInstruction::readRecoverData
   0.99%  js::ObjectWeakMap::trace
   0.90%  js::SavedStacks::placeSavedFrameInCache
Ok, I fixed the bug that led to way too many cache misses. Going to post new benchmark results today.
Flags: needinfo?(bzbarsky)
Attached file bench-results.txt
About 4x faster when sampling allocations with P=1

About 2x faster when sampling allocations with P=.05

Doesn't seem to give any benefit to the Promise benchmark; perhaps a small (about 8%?) regression in performance there, but the results aren't easy to interpret at a glance. After actually looking at the benchmark code, this makes sense because the code doesn't seem to ever capture two stacks with any of the same frames live during both captures.

Overall, I think we should land this patch series and then whiteboard additional optimization ideas specifically for async stacks.
Attachment #8635196 - Attachment is obsolete: true
Attachment #8635205 - Attachment is obsolete: true
Flags: needinfo?(bzbarsky)
Attachment #8635179 - Attachment is obsolete: true
Attachment #8636250 - Flags: review+
> the code doesn't seem to ever capture two stacks with any of the same frames live during
> both captures

Well, the async bits are not the same.  The sync bits are the same, right?

Not sure what info you're looking for, exactly...
Flags: needinfo?(bzbarsky)
(In reply to Boris Zbarsky [:bz] from comment #40)
> > the code doesn't seem to ever capture two stacks with any of the same frames live during
> > both captures
> 
> Well, the async bits are not the same.  The sync bits are the same, right?

But they are never captured during the same JS activation, AFAICT, so even though the frame bits will get set, we pop the frames and the next time we push the "same" frame we won't have the bits set. Additionally, the cache has the same lifetime as the JS activation.

> Not sure what info you're looking for, exactly...

Me neither! It just seemed you wanted to be kept in the loop given your comments in the other bug.
Attachment #8635187 - Attachment is obsolete: true
Attachment #8636253 - Flags: review?(shu)
Attachment #8636254 - Flags: review?(shu)
Attachment #8636255 - Flags: review?(shu)
Attachment #8636257 - Flags: review?(shu)
Attachment #8636258 - Flags: review?(shu)
Attachment #8636259 - Flags: review?(shu)
Shu, https://bugzilla.mozilla.org/attachment.cgi?id=8636257&action=diff#a/js/src/vm/Stack.h_sec3 has a nice big overview comment about how this caching works, and should hopefully help with review.
> But they are never captured during the same JS activation, AFAICT

Oh, true.  For exception stacks and Promise that will almost always be the case, right?  The only things that will capture stacks multiple times during an activation are presumably devtools...
(In reply to Boris Zbarsky [:bz] from comment #43)
> > But they are never captured during the same JS activation, AFAICT
> 
> Oh, true.  For exception stacks and Promise that will almost always be the
> case, right?  The only things that will capture stacks multiple times during
> an activation are presumably devtools...

Agreed that the common case will be devtools, but many people allocate Error objects just to access the stack string as well. This would benefit them as well, if they did it multiple times during a single activation.
Comment on attachment 8636253 [details] [diff] [review]
Part 1: Make the InterpreterFrame::Flags enum typed the way it is used

Review of attachment 8636253 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/vm/Stack.h
@@ +278,5 @@
>  
>  class InterpreterFrame
>  {
>    public:
> +    enum Flags: uint32_t {

Style nit: space before :
Attachment #8636253 - Flags: review?(shu) → review+
Comment on attachment 8636254 [details] [diff] [review]
Part 2: Make the BaselineFrame::Flags enum typed the way it is used

Review of attachment 8636254 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/BaselineFrame.h
@@ +33,5 @@
>  // itself.
>  class BaselineFrame
>  {
>    public:
> +    enum Flags: uint32_t {

Style nit: space before :
Attachment #8636254 - Flags: review?(shu) → review+
Comment on attachment 8636255 [details] [diff] [review]
Part 3: Take a bit on each of interpreter, baseline, and rematerialized frames for marking whether there is a js::SavedFrame for the given frame in the js::SavedStacks cache

Review of attachment 8636255 [details] [diff] [review]:
-----------------------------------------------------------------

Cancelling r? because as it stands FrameIter::hasCachedSavedFrame doesn't work. Maybe this is overwritten in a later patch, I'll come back to this patch in that case.

::: js/src/jit/RematerializedFrame.h
@@ +37,5 @@
>  
> +    // If true, this frame has been on the stack when
> +    // |js::SavedStacks::saveCurrentStack| was called, and so there is a
> +    // |js::SavedFrame| object cached for this frame.
> +    bool hasCachedSavedFrame_;

It'd be nice if this flag was propagated to the bailed out frame. It's really easy and should go in CopyFromRematerializedFrame.

::: js/src/vm/Stack.cpp
@@ +473,5 @@
>  
> +bool
> +FrameIter::hasCachedSavedFrame(JSContext* cx, bool* hasCachedSavedFramep)
> +{
> +    if (isIon() && !ensureHasRematerializedFrame(cx))

This doesn't work per our past discussion. This method should be checking the raw Ion frame when there's no usable AbstractFramePtr, no?
Attachment #8636255 - Flags: review?(shu)
Comment on attachment 8636255 [details] [diff] [review]
Part 3: Take a bit on each of interpreter, baseline, and rematerialized frames for marking whether there is a js::SavedFrame for the given frame in the js::SavedStacks cache

Review of attachment 8636255 [details] [diff] [review]:
-----------------------------------------------------------------

I saw that the incorrect hasCachedSavedFrame is indeed overwritten in a later patch.
Attachment #8636255 - Flags: review+
Comment on attachment 8636257 [details] [diff] [review]
Part 5: Minimize stack walking when capturing SavedFrame stacks with a cache

Review of attachment 8636257 [details] [diff] [review]:
-----------------------------------------------------------------

This is pretty nice -- cleaner than I thought it'd be. There're a lot of comments below but nothing major.

::: js/src/jsgc.cpp
@@ +2610,5 @@
>          Debugger::markAll(&trc);
>          Debugger::markIncomingCrossCompartmentEdges(&trc);
>  
>          for (CompartmentsInZoneIter c(zone); !c.done(); c.next()) {
> +            c->trace(&trc);

Why wasn't this necessary before? SavedStacks couldn't be compacted before? Or is this an existing bug?

::: js/src/vm/SavedFrame.h
@@ +8,5 @@
> +#define vm_SavedFrame_h
> +
> +namespace js {
> +
> +class SavedFrame : public NativeObject {

Disclosure here: I didn't read this class very carefully since I assumed it was moved wholesale from SavedStacks.h

@@ +13,5 @@
> +    friend class SavedStacks;
> +
> +  public:
> +    static const Class          class_;
> +    static void finalize(FreeOp* fop, JSObject* obj);

Style nit: kinda weird to have finalize() up with the static fields. Move it below the set of static fields?

::: js/src/vm/SavedStacks.cpp
@@ +82,5 @@
> +    if (!frames->growByUninitialized(1)) {
> +        ReportOutOfMemory(cx);
> +        return false;
> +    }
> +    new (&frames->back()) Entry(key, pc, savedFrame);

I recall you implementing emplaceBack for this case. :)

@@ +104,5 @@
> +
> +    Maybe<Key> maybeKey = getKey(frameIter);
> +    MOZ_ASSERT(maybeKey.isSome());
> +
> +    Key key(maybeKey.ref());

Style nit: use *maybeKey instead of .ref()

@@ +120,5 @@
> +
> +    if (!frame) {
> +        frames->clear();
> +        return;
> +    }

Since we know frameIter.hasCachedSavedFrame(), is the only scenario where we would not find a matching frame when frames->length() == 1 and there's a pc mismatch? If so, would be nice to assert the tighter conditions.

@@ +127,5 @@
> +
> +    if (frame->compartment() != cx->compartment()) {
> +        frame.set(nullptr);
> +        numberStillValid--;
> +    }

I don't know enough about the SavedStack API to say if this is right. It's designed to never hand out frames in compartments we haven't entered?

@@ +1041,5 @@
> +        // The bit set means that the next older parent (frame, pc) pair *must*
> +        // be in the cache.
> +        if (maxFrameCount == 0 &&
> +            !iter.isAsmJS() &&
> +            (iter.hasUsableAbstractFramePtr() || iter.isPhysicalIonFrame()))

Along with the suggestion below to make hasCachedSavedFrame() to just return false for asm frames, I would remove the iter checks.

@@ +1052,5 @@
>          if (!stackChain->growByUninitialized(1)) {
>              ReportOutOfMemory(cx);
>              return false;
>          }
>          new (&stackChain->back()) SavedFrame::Lookup(

emplaceBack here too.

@@ +1084,5 @@
> +                return false;
> +            cache->find(cx, iter, &cachedFrame);
> +            if (cachedFrame)
> +                break;
> +        }

If I'm reading this right, the youngest frame is never looked up in the cache. Is this because it's always expected to fail due to pc mismatch? ISTM it doesn't hurt to try to find even the youngest frame.

@@ +1111,5 @@
>              return false;
> +
> +        if (maxFrameCount == 0 && lookup->cacheKey && parentFrame != cachedFrame) {
> +            auto* cache = lookup->activation->savedFrameCache(cx);
> +            if (!cache || !cache->insert(cx, lookup->cacheKey.ref(), lookup->pc, parentFrame))

Style nit: use *maybeKey instead of .ref()

::: js/src/vm/Stack-inl.h
@@ +1017,5 @@
>  
> +inline bool
> +FrameIter::hasCachedSavedFrame() const
> +{
> +    MOZ_ASSERT(!isAsmJS());

I don't think hasCachedSavedFrame needs to be so restrictive on the precondition. It could just return false for isAsmJS, given that getKey can already return Nothing().

::: js/src/vm/Stack.h
@@ +1171,5 @@
> +//
> +// We have a SavedFrameCache for each activation to minimize the number of
> +// entries that must be scanned through, and to avoid the headaches of
> +// maintaining a cache for each compartment and invalidating stale cache entries
> +// in the presence of cross-compartment calls.

Great comment and example.

@@ +1172,5 @@
> +// We have a SavedFrameCache for each activation to minimize the number of
> +// entries that must be scanned through, and to avoid the headaches of
> +// maintaining a cache for each compartment and invalidating stale cache entries
> +// in the presence of cross-compartment calls.
> +class SavedFrameCache : public JS::StaticTraceable

I think we should rename this. SavedStacks is already a cache for SavedFrames. Maybe LiveSavedFrameCache? SavedFrameActivationCache?

@@ +1180,5 @@
> +
> +  private:
> +    struct Entry
> +    {
> +        Key                         key;

This is not really the key -- it's just the frame pointer. The logical key is both the pointer and the pc. I would just rename to framePtr or something.

@@ +1185,5 @@
> +        jsbytecode*                 pc;
> +        RelocatablePtr<SavedFrame*> savedFrame;
> +
> +        Entry(Key& key, jsbytecode* pc, SavedFrame* savedFrame)
> +            : key(key)

Style nit: initializer list ':' is half-indented in SM.

@@ +1201,5 @@
> +  public:
> +    explicit SavedFrameCache() : frames(nullptr) { }
> +
> +    SavedFrameCache(SavedFrameCache&& rhs)
> +        : frames(rhs.frames)

Style nit: initializer list ':' is half-indented in SM.

@@ +1205,5 @@
> +        : frames(rhs.frames)
> +    {
> +        MOZ_ASSERT(this != &rhs, "self-move disallowed");
> +        rhs.frames = nullptr;
> +        rhs.~SavedFrameCache();

Why explicitly call the destructor here?

@@ +1220,5 @@
> +    bool init(JSContext* cx) {
> +        frames = js_new<EntryVector>();
> +        if (!frames)
> +            JS_ReportOutOfMemory(cx);
> +        return initialized();

Would be easier to read if we just |return true| here.

@@ +1366,5 @@
>      JSString* asyncCause() {
>          return asyncCause_;
>      }
>  
> +    inline SavedFrameCache* savedFrameCache(JSContext* cx);

Since this is fallible, I'd rename to getSavedFrameCache.
Attachment #8636257 - Flags: review?(shu) → review+
Attachment #8636258 - Flags: review?(shu) → review+
Attachment #8636259 - Flags: review?(shu) → review+
Attachment #8636250 - Attachment is obsolete: true
Attachment #8639495 - Flags: review+
Attachment #8636259 - Attachment is obsolete: true
Attachment #8639503 - Flags: review+
Fixed the assertion failures, and got a green try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=e2a28fc2e85a

Relanding.
Flags: needinfo?(nfitzgerald)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: