Last Comment Bug 783147 - Collect telemetry to measure the effectiveness of bug 780960
: Collect telemetry to measure the effectiveness of bug 780960
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: JavaScript Engine (show other bugs)
: unspecified
: All All
: -- normal (vote)
: mozilla17
Assigned To: Bill McCloskey (:billm)
: general
Mentors:
Depends on:
Blocks: 780960
  Show dependency treegraph
 
Reported: 2012-08-15 19:25 PDT by Bill McCloskey (:billm)
Modified: 2012-09-19 01:33 PDT (History)
3 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
patch to implement Tarjan's algorithm (4.85 KB, patch)
2012-08-15 19:26 PDT, Bill McCloskey (:billm)
no flags Details | Diff | Review
telemetry patch (12.45 KB, patch)
2012-08-15 19:27 PDT, Bill McCloskey (:billm)
continuation: review+
Details | Diff | Review
patch to implement Tarjan's algorithm, v2 (4.86 KB, patch)
2012-08-15 19:30 PDT, Bill McCloskey (:billm)
continuation: review+
Details | Diff | Review

Description Bill McCloskey (:billm) 2012-08-15 19:25:35 PDT
Sweeping by compartments would be a fair amount of work, so we should only do it if it significant reduces sweeping times. This patch tries to measure the effect by computing the strongly connected components of the graph of compartment dependencies. Then it measures the total time spent sweeping each compartment, and it records the SCC that takes the longest time to sweep. We should be able to compare this against the total time to do all the sweeping. (Note that this patch measures sweeping a little differently than GC_SWEEP_MS, because it excludes some aspects of sweeping.)

I ran Gregor's membench test with this patch. The sweep time went up to ~180ms, but the worst SCC never took more than 15ms to sweep. So it seems pretty effective at first blush.
Comment 1 Bill McCloskey (:billm) 2012-08-15 19:26:24 PDT
Created attachment 652294 [details] [diff] [review]
patch to implement Tarjan's algorithm

This first patch implements Tarjan's algorithm to find the SCCs.
Comment 2 Bill McCloskey (:billm) 2012-08-15 19:27:09 PDT
Created attachment 652295 [details] [diff] [review]
telemetry patch

This patch collects the sweep times and stores them in a telemetry histogram.
Comment 3 Bill McCloskey (:billm) 2012-08-15 19:30:21 PDT
Created attachment 652296 [details] [diff] [review]
patch to implement Tarjan's algorithm, v2

Oops. qref!
Comment 4 David Mandelin [:dmandelin] 2012-08-16 12:23:33 PDT
Tarjan is everywhere, huh? :-) I didn't even know that algorithm existed--I only knew what Wikipedia calls "Kosaraju's algorithm".
Comment 5 Andrew McCreight [:mccr8] 2012-08-16 12:25:58 PDT
I actually implemented the Cheriyan-Mehlhorn-Gabow algorithm, which is similar, in bug 653191, for some prototype CC work that never landed (because it wasn't any faster).  I'll have to re-read up on this stuff...
Comment 6 Bill McCloskey (:billm) 2012-08-16 12:28:11 PDT
(In reply to David Mandelin [:dmandelin] from comment #4)
> Tarjan is everywhere, huh? :-) I didn't even know that algorithm existed--I
> only knew what Wikipedia calls "Kosaraju's algorithm".

Yeah, me too. Tarjan's algorithm is nice because it works in a single pass. Eventually we'll need to run this "online" (keeping track of edge changes between slices) and I think Tarjan will be easier to adapt.
Comment 7 Andrew McCreight [:mccr8] 2012-08-16 14:46:10 PDT
If you want to find out about various SCC algorithms and how to implement them efficiently for large graphs, I recommend the paper "Engineering DFS-Based Graph Algorithms" by Mehlhorn, Naher and Sanders.  I have a copy sitting around if you want to borrow it.
Comment 8 Bill McCloskey (:billm) 2012-08-16 14:54:49 PDT
That sounds interesting, but hopefully our graphs aren't *that* big :-).
Comment 9 Andrew McCreight [:mccr8] 2012-08-16 17:57:25 PDT
Comment on attachment 652296 [details] [diff] [review]
patch to implement Tarjan's algorithm, v2

Review of attachment 652296 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jscompartment.h
@@ +291,5 @@
>       */
>      size_t                       gcMallocAndFreeBytes;
>      size_t                       gcTriggerMallocAndFreeBytes;
>  
> +    unsigned                     index;

"index" is too generic of a name for a field of JSCompartment.  Maybe sccIndex.

::: js/src/jsgc.cpp
@@ +3471,5 @@
>      rt->gcIncrementalState = state;
>  }
>  #endif
>  
> +class PartitionCompartments

Should this be PartitionUnmarkedCompartments or something?  You are only considering edges where the target is white or gray.

@@ +3479,5 @@
> +
> +    static const Node Undefined = Node(-1);
> +
> +    JSRuntime *runtime;
> +    NodeVector indexes, lowlinks, stack, sccs, onstack;

nit: a better capitalization would be lowLinks and onStack maybe?

@@ +3480,5 @@
> +    static const Node Undefined = Node(-1);
> +
> +    JSRuntime *runtime;
> +    NodeVector indexes, lowlinks, stack, sccs, onstack;
> +    Node clock, nextSCC;

nit: maybe |currIndex| instead of |clock|

@@ +3493,5 @@
> +    void partition();
> +    unsigned getSCC(JSCompartment *comp) { return failed() ? 0 : sccs[comp->index]; }
> +};
> +
> +const PartitionCompartments::Node PartitionCompartments::Undefined;

Does this declaration do anything?

@@ +3514,5 @@
> +        runtime->compartments[v]->index = v;
> +        indexes.infallibleAppend(Undefined);
> +        lowlinks.infallibleAppend(Undefined);
> +        sccs.infallibleAppend(Undefined);
> +        onstack.infallibleAppend(false);

onstack is a vector of nodes, but you are putting bools on it.

@@ +3528,5 @@
> +        fail();
> +        return;
> +    }
> +
> +    indexes[v] = clock;

I assume vectors bounds checked such that they will at least assert if you go outside their current length via indexing? If not there needs to be some assert here about v being valid.

@@ +3537,5 @@
> +
> +    JSCompartment *comp = runtime->compartments[v];
> +
> +    for (WrapperMap::Enum e(comp->crossCompartmentWrappers); !e.empty(); e.popFront()) {
> +        Cell *other = e.front().key.wrapped;

Wasn't there some weirdness with CCWs to strings? Does that matter here? If I'm misremembering that just ignore me...

@@ +3540,5 @@
> +    for (WrapperMap::Enum e(comp->crossCompartmentWrappers); !e.empty(); e.popFront()) {
> +        Cell *other = e.front().key.wrapped;
> +        if (other->isMarked(BLACK) && !other->isMarked(GRAY))
> +            continue;
> +        Node w = other->compartment()->index;

the blank line would maybe make more sense before the w and not after, but maybe that's just me.

@@ +3554,5 @@
> +    if (lowlinks[v] == indexes[v]) {
> +        Node w;
> +        do {
> +            w = stack.back();
> +            stack.popBack();

You can use popCopy() instead of back() + popBack().
Comment 10 Andrew McCreight [:mccr8] 2012-08-16 18:26:28 PDT
Comment on attachment 652295 [details] [diff] [review]
telemetry patch

Review of attachment 652295 [details] [diff] [review]:
-----------------------------------------------------------------

Pretty cool.

::: js/src/gc/Statistics.cpp
@@ +353,5 @@
>      ss.appendNumber("Compartments Collected", "%d", "", collectedCount);
>      ss.appendNumber("Total Compartments", "%d", "", compartmentCount);
>      ss.appendNumber("MMU (20ms)", "%d", "%", int(mmu20 * 100));
>      ss.appendNumber("MMU (50ms)", "%d", "%", int(mmu50 * 100));
> +    ss.appendDecimal("SCC Sweep Total", "ms", t(sccTotal));

As I say later, this could probably be "Compartment Sweep Total".  Well, you actually already compute and report that elsewhere, don't you?  Is this number going to be different?

::: js/src/jsfriendapi.h
@@ +84,5 @@
>      JS_TELEMETRY_GC_MMU_50,
>      JS_TELEMETRY_GC_RESET,
>      JS_TELEMETRY_GC_INCREMENTAL_DISABLED,
> +    JS_TELEMETRY_GC_NON_INCREMENTAL,
> +    JS_TELEMETRY_GC_SCC_SWEEP_TOTAL_MS,

I complain about the TOTAL name elsewhere.

::: toolkit/components/telemetry/TelemetryHistograms.h
@@ +61,5 @@
>  HISTOGRAM(GC_MMU_50, 1, 100, 20, LINEAR, "Minimum percentage of time spent outside GC over any 50ms window")
>  HISTOGRAM_BOOLEAN(GC_RESET, "Was an incremental GC canceled?")
>  HISTOGRAM_BOOLEAN(GC_INCREMENTAL_DISABLED, "Is incremental GC permanently disabled?")
>  HISTOGRAM_BOOLEAN(GC_NON_INCREMENTAL, "Was the GC non-incremental?")
> +HISTOGRAM(GC_SCC_SWEEP_TOTAL_MS, 1, 500, 50, LINEAR, "Time spent marking compartment SCCs (ms)")

The description should be sweeping, not marking, for both of these. Also, isn't the total just the total time spent sweeping compartments? No need to mention SCCs in either the histogram name or the description.  GC_COMPARTMENT_SWEEP_TOTAL_MS?

500ms seems like an optimistic upper limit, but you have better data, with your GC addon, than me.
Comment 11 Bill McCloskey (:billm) 2012-08-16 23:23:52 PDT
You're right about string wrappers--they should be skipped.

Instead of some of the renamings, I'd rather just add comments. The extra declaration for Undefined is required by GCC for some weird reason that I don't understand.

There is a difference between compartment sweep time and SCC sweep time. The latter includes the time to do foreground sweeping of objects/scripts/strings in the compartment while the former does not. I agree that SCC sweep time is not a very good name, but I don't think we'll find anything sufficiently short that really conveys the right idea. So I chose to give it an overly technical name so that no one would mistake it for something else :-).
Comment 12 Andrew McCreight [:mccr8] 2012-08-17 07:01:37 PDT
(In reply to Bill McCloskey (:billm) from comment #11)
> Instead of some of the renamings, I'd rather just add comments.

Sounds reasonable to me.

> The extra
> declaration for Undefined is required by GCC for some weird reason that I
> don't understand.

Fun.

> There is a difference between compartment sweep time and SCC sweep time. The
> latter includes the time to do foreground sweeping of
> objects/scripts/strings in the compartment while the former does not. I
> agree that SCC sweep time is not a very good name, but I don't think we'll
> find anything sufficiently short that really conveys the right idea. So I
> chose to give it an overly technical name so that no one would mistake it
> for something else :-).

Ah, ok.  I didn't realize.
Comment 14 Bill McCloskey (:billm) 2012-08-17 13:01:23 PDT
I backed this out because it showed the same weird plugin failure as on my try run.
https://hg.mozilla.org/integration/mozilla-inbound/rev/bbb5e18e6c0a
Comment 15 Bill McCloskey (:billm) 2012-08-17 18:47:15 PDT
The plugin failure happened again, so it wasn't my fault!
https://hg.mozilla.org/integration/mozilla-inbound/rev/c694e2557b1e
Comment 16 Ryan VanderMeulen [:RyanVM] 2012-08-18 04:26:09 PDT
https://hg.mozilla.org/mozilla-central/rev/c694e2557b1e

Note You need to log in before you can comment on or make changes to this bug.