Closed Opened 8 years ago Closed 8 years ago

# [jsdbg2] Debugger.Memory allocation log and related functions should support sampling, not just full logging

Not set
normal

RESOLVED FIXED
mozilla35

## Attachments

### (2 files, 7 obsolete files)

 18.57 KB, patch fitzgen : review+ Details | Diff | Splinter Review 905 bytes, patch jimb : review+ Details | Diff | Splinter Review
Debugger.Memory can log every allocation (see Debugger.Memory.prototype.drainAllocationsLog), and we want to add the ability to merely count allocations. However, for very allocation-intensive workloads, this will probably be too much information. Debugger.Memory should allow its client to request that only a sampling of the allocations be recorded.

The simplest behavior to implement would be to sample every Nth allocation, where N is chosen by Debugger.Memory's user. However, this approach can be fooled by harmonics: the process being observed may happen to have periodic behavior whose period is an even multiple or divisor of N, causing those behaviors to be oversampled. An alternative is to give each event a likelihood P of being sampled.
Blocks: 1059139
We probably want to have Debugger.Memory.prototype.takeCensus take an options object as an argument, akin to TestingFunctions.cpp's 'evaluate':

https://hg.mozilla.org/mozilla-central/file/f7a27a866c47/js/src/shell/js.cpp#l4374

because we know we're going to want to add other arguments to takeCensus to configure what we need.
> The simplest behavior to implement would be to sample every Nth allocation,
> where N is chosen by Debugger.Memory's user. However, this approach can be
> fooled by harmonics: the process being observed may happen to have periodic
> behavior whose period is an even multiple or divisor of N, causing those
> behaviors to be oversampled. An alternative is to give each event a
> likelihood P of being sampled.

Another approach came to my mind is sampling along size. For example:

void *alloc(size_t size) {
...
static size_t cumulative = 0;
cumulative += size;
if (cumulative >= SAMPLE_SIZE) {
sample(cumulative / SAMPLE_SIZE);
cumulative %= SAMPLE_SIZE;
}
...
}

So that the probability of an object being sampled is proportional to its size. By choosing SAMPLE_SIZE carefully, e.g. a prime, the problem of aliasing can be somewhat eased.

> because we know we're going to want to add other arguments to takeCensus to
> configure what we need.

For parameters to sampling, perhaps we also need a way to pass them when starting allocation tracking?

By the way, it would be great if we can have 1) allocation and deallocation events and 2) timestamps of events also recorded. I'm not sure but it looks like that timestamps can be useful.

By having (sampled) allocation events, we can show how fast a function or call path is allocating. This can help identifying most GC-demanding codes and people may find that a design problem or consider rewriting them in ArrayBuffers.

Having deallocation events enables us to calculate the peak memory usage of every function and call path. Again, I'm not sure but looks useful.
Assignee: nobody → nfitzgerald
Status: NEW → ASSIGNED
(In reply to Ting-Yuan Huang from comment #2)
> Having deallocation events enables us to calculate the peak memory usage of
> every function and call path.

I'm not sure that de-allocation events would enable peak memory usage per function/call-path calculations.

The GC is free to reclaim an object anytime after it becomes unreachable, there is nothing that stipulates it must do so immediately once an object is no longer reachable. To get the level of granularity where we could reconstruct peak memory usage on a per function basis, we would have to instrument updates of references to know exactly when a given object becomes garbage. However, one of a tracing GC's main advantages over refcounting is how individual updates of references have little to no overhead. In practice, when an object is reclaimed by the GC, it has probably been unreachable for some time, because collection is batched.
Depends on: 1065623
Attached patch sampling-allocations.patch (obsolete) — Splinter Review
WIP that seems to work (but hasn't been tested or anything yet).

TODO:

* Use our own seed for the PRNG, rather than sharing with Math.random and friends, so we can

* Add a shell testing function to set the seed, so we can write deterministic tests.

* Write a follow up patch to combine onLogAllocationSite and slowPathOnLogAllocationSite and just accept the only Debugger instance that can be tracking allocation sites for the current compartment.

* Measure and verify whether allocationSamplingProbability = 0.1 is about 1/10 the overhead of allocationSamplingProbability = 1.0. If it is greater than 1/10 overhead, we might want to stop generating a random number on every allocation and instead only generate one per allocation we actually sample and then use that to determine how many samples we need to skip before we take another sample.
(In reply to Nick Fitzgerald [:fitzgen] from comment #3)
> The GC is free to reclaim an object anytime after it becomes unreachable,
> there is nothing that stipulates it must do so immediately once an object is
> no longer reachable. To get the level of granularity where we could
> reconstruct peak memory usage on a per function basis, we would have to
> instrument updates of references to know exactly when a given object becomes
> garbage. However, one of a tracing GC's main advantages over refcounting is
> how individual updates of references have little to no overhead. In
> practice, when an object is reclaimed by the GC, it has probably been
> unreachable for some time, because collection is batched.

Sorry, I should have describe it more clearly. The peak is defined over memory usages where the delayed-free nature of of a mark-sweep collector is already counted in. I'm not quite sure but it seems that the free event can be logged in the finalizer of objects which are sampled. In this way, the memory usage, which depends on the underlying garbage collector, is reflected instead of the real-time size of the set reachable. Please find the example below.

> I'm not sure that de-allocation events would enable peak memory usage per
> function/call-path calculations.

Memory usages (who holds how many bytes at where) can be reconstructed by full allocation/free history and approximated by sampled ones.

For example, bytes are allocated (event A) and later get collected (event C):

A: (0x1000, 32bytes), A: (0x2000, 32bytes), (lose ref to 0x1000), A(0x3000, 64bytes), (lose ref to 0x2000), C: (0x1000), C: (0x2000), A: (0x1000)

Then the peak defined in the above is 128bytes, although the peak of the real-time reachable set is 64 bytes.
> although the peak of the real-time reachable set is 64 bytes.

Sorry, it's 96.
Comment on attachment 8487624 [details] [diff] [review]
sampling-allocations.patch

Review of attachment 8487624 [details] [diff] [review]:
-----------------------------------------------------------------

Just one nit.

::: js/src/vm/DebuggerMemory.cpp
@@ +247,5 @@
>
> +/* static */ bool
> +DebuggerMemory::getAllocationSamplingProbability(JSContext *cx, unsigned argc, Value *vp)
> +{
> +    THIS_DEBUGGER_MEMORY(cx, argc, vp, "(set allocationSamplingProbability)", args, memory);

I think the string here should say "get" rather than "set".
(In reply to Ting-Yuan Huang from comment #5)
> (In reply to Nick Fitzgerald [:fitzgen] from comment #3)
> > I'm not sure that de-allocation events would enable peak memory usage per
> > function/call-path calculations.
>
> Memory usages (who holds how many bytes at where) can be reconstructed by
> full allocation/free history and approximated by sampled ones.
>
> For example, bytes are allocated (event A) and later get collected (event C):
>
> A: (0x1000, 32bytes), A: (0x2000, 32bytes), (lose ref to 0x1000), A(0x3000,
> 64bytes), (lose ref to 0x2000), C: (0x1000), C: (0x2000), A: (0x1000)
>
> Then the peak defined in the above is 128bytes, although the peak of the
> real-time reachable set is 64 bytes.

If you're not looking for the live, reachable size (which really requires something like Debugger.Memory.prototype.takeCensus), I think you might be able to use a couple of observer notifications that are already implemented:

https://developer.mozilla.org/en-US/docs/Mozilla/Projects/SpiderMonkey/Internals/GC/Statistics_API
(In reply to Ting-Yuan Huang from comment #5)
> Sorry, I should have describe it more clearly. The peak is defined over
> memory usages where the delayed-free nature of of a mark-sweep collector is
> already counted in. I'm not quite sure but it seems that the free event can
> be logged in the finalizer of objects which are sampled. In this way, the
> memory usage, which depends on the underlying garbage collector, is
> reflected instead of the real-time size of the set reachable. Please find
> the example below.

SpiderMonkey actually segregates objects with finalizers from objects without, because there are valuable optimizations available with the latter group:

- Our generational GC can allocate in the nursery only finalizer-free JSObjects, because after collecting the nursery and tenuring any live objects, we simply reset the nursery's allocation pointer back to the beginning, and declare everything in it to be free space, without touching the individual dead objects at all.

- Even in the tenured heap, objects without finalizers can be allocated in arenas where the GC's sweep phase is done in a worker thread, to reduce latency.

In both cases, we could get counts (by counting tenuring from the nursery, and receiving reports from the worker thread when sweeping is complete), but it would take some work.

The fellow who implemented much of Firefox's existing memory accounting machinery (like about:memory), Nicolas Nethercote (njn), has told us that using running counters to track total memory consumption usually doesn't work out too well, because the counters always become inaccurate over time, for some reason or another. Hence, about:memory actually iterates over everything and produces a fresh count each time.

I think our counters will be good enough for identifying hot allocation sites, but if we want to track ongoing total usage, I think we should use other means.
Attached patch sampling-allocations.patch (obsolete) — Splinter Review
Second WIP.

Stopped generating a random number on every allocation and instead calculate how many allocations to skip until we take the next sample. Verified that when P=0.1 we are in fact only adding about 10% of the overhead that P=1 adds. Yay!

Still need to:

* Use our own seed for the PRNG, rather than sharing with Math.random and friends, so we can

* Add a shell testing function to set the seed, so we can

* Write deterministic tests

* Write a follow up patch to combine onLogAllocationSite and slowPathOnLogAllocationSite and just accept the only Debugger instance that can be tracking allocation sites for the current compartment.
Attachment #8487624 - Attachment is obsolete: true
Blocks: 1066361
Comment on attachment 8488272 [details] [diff] [review]
sampling-allocations.patch

Review of attachment 8488272 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/vm/SavedStacks.cpp
@@ +772,5 @@
> +        // expensive), we calculate the number of allocations to skip before
> +        // taking the next sample.
> +        double notSamplingProb = 1.0 - allocationTrackingDbg->allocationSamplingProbability;
> +        stacks.allocationSkipCount = floor(log(math_random_no_outparam(cx)) /
> +                                           log(notSamplingProb));

What is the distribution function here?  I afraid the mean value here is not a equivalence of the specified probability.  Maybe you should use one of exponential or normal or other distribution.  They are easy to be found on the wikipedia or google.
(In reply to Nick Fitzgerald [:fitzgen] from comment #8)
> (In reply to Ting-Yuan Huang from comment #5)
> > (In reply to Nick Fitzgerald [:fitzgen] from comment #3)
> > > I'm not sure that de-allocation events would enable peak memory usage per
> > > function/call-path calculations.
> >
> > Memory usages (who holds how many bytes at where) can be reconstructed by
> > full allocation/free history and approximated by sampled ones.
> >
> > For example, bytes are allocated (event A) and later get collected (event C):
> >
> > A: (0x1000, 32bytes), A: (0x2000, 32bytes), (lose ref to 0x1000), A(0x3000,
> > 64bytes), (lose ref to 0x2000), C: (0x1000), C: (0x2000), A: (0x1000)
> >
> > Then the peak defined in the above is 128bytes, although the peak of the
> > real-time reachable set is 64 bytes.
>
> If you're not looking for the live, reachable size (which really requires
> something like Debugger.Memory.prototype.takeCensus), I think you might be
> able to use a couple of observer notifications that are already implemented:
>
>
> https://developer.mozilla.org/en-US/docs/Mozilla/Projects/SpiderMonkey/
> Internals/GC/Statistics_API

We need to know not only the size of objects being freed, but also what objects are freed.  With that, we account memory usage for per-function.
(In reply to Thinker Li [:sinker] from comment #11)
> Comment on attachment 8488272 [details] [diff] [review]
> sampling-allocations.patch
>
> Review of attachment 8488272 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> ::: js/src/vm/SavedStacks.cpp
> @@ +772,5 @@
> > +        // expensive), we calculate the number of allocations to skip before
> > +        // taking the next sample.
> > +        double notSamplingProb = 1.0 - allocationTrackingDbg->allocationSamplingProbability;
> > +        stacks.allocationSkipCount = floor(log(math_random_no_outparam(cx)) /
> > +                                           log(notSamplingProb));
>
> What is the distribution function here?  I afraid the mean value here is not
> a equivalence of the specified probability.  Maybe you should use one of
> exponential or normal or other distribution.  They are easy to be found on

It's been a little while since I've done much stats, but:

P = the probability we sample any given event.

~P = 1-P, the probability we don't sample a given event.

(~P)^n = the probability we don't sample the next n events.

let X = random between 0 and 1.

log base ~P of X = n, aka the number of events we should skip until we take the next sample.

This should still give every event the same probability P of being sampled and should result in the same distribution as taking a random number every time to determine whether to sample the current event.

I should probably add this to the comment describing what we are doing :)
(In reply to Thinker Li [:sinker] from comment #12)
> We need to know not only the size of objects being freed, but also what
> objects are freed.  With that, we account memory usage for per-function.

We could get accurate free counts by disabling generational GC and off-thread sweeping, or by adding instrumentation as I suggest in comment 9.

But we could also get accurate usage counts with a census, and that would give us retained sizes, which are much more meaningful.
Attached patch sampling-allocations.patch (obsolete) — Splinter Review
Ok this has deterministic tests now! Woo!

Try push: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=a61ecc077771

Next up: that patch cleaning up onLogAllocationSite.
Attachment #8488272 - Attachment is obsolete: true
Attachment #8488877 - Flags: review?(jimb)
Part 2 to clean up Debugger::onLogAllocationSite stuff.

Try push: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=c58199fec699
Attachment #8488884 - Flags: review?(jimb)
(In reply to Nick Fitzgerald [:fitzgen] from comment #13)
> (In reply to Thinker Li [:sinker] from comment #11)
> > Comment on attachment 8488272 [details] [diff] [review]
> > sampling-allocations.patch
> >
> > Review of attachment 8488272 [details] [diff] [review]:
> > -----------------------------------------------------------------
> >
> > ::: js/src/vm/SavedStacks.cpp
> > @@ +772,5 @@
> > > +        // expensive), we calculate the number of allocations to skip before
> > > +        // taking the next sample.
> > > +        double notSamplingProb = 1.0 - allocationTrackingDbg->allocationSamplingProbability;
> > > +        stacks.allocationSkipCount = floor(log(math_random_no_outparam(cx)) /
> > > +                                           log(notSamplingProb));
> >
> > What is the distribution function here?  I afraid the mean value here is not
> > a equivalence of the specified probability.  Maybe you should use one of
> > exponential or normal or other distribution.  They are easy to be found on
> > the wikipedia or google.
>
> It's been a little while since I've done much stats, but:
>
> P = the probability we sample any given event.
>
> ~P = 1-P, the probability we don't sample a given event.
>
> (~P)^n = the probability we don't sample the next n events.
>
> let X = random between 0 and 1.
>
> log base ~P of X = n, aka the number of events we should skip until we take
> the next sample.

Hey! What you got is an probability density function (pdf), it it not a right random number generator.
What you describe is exactly exponential distribution.  A right random number generator of exponential distribution could be found in wikipedia.(http://en.wikipedia.org/wiki/Exponential_distribution#Generating_exponential_variates)

So, please check |Probability density function| section of the page, you would know exactly what I am talking.
For every event having an independent possibility of being sampled, it is a classical example of Poisson Process, a discrete version.  The possibility of sampling/or not sampling in N events is what described as exponential distribution.  So, Poisson and Exponentail Distribution are talking the same thing, but focus on different attributes.

It is not easy to understand, so I suggest to use an existing distribution function; Exponential, Normal Distribution, ... etc.  Use their generator functions, instead of inventing by yourself.
(In reply to Thinker Li [:sinker] from comment #18)
> For every event having an independent possibility of being sampled, it is a
> classical example of Poisson Process, a discrete version.  The possibility
> of sampling/or not sampling in N events is what described as exponential
> distribution.  So, Poisson and Exponentail Distribution are talking the same
> thing, but focus on different attributes.

Since you're worried, now I'm worried. But I believe the following JS models what we're doing in the C++, and the plots sure look the same:

$cat samples.js function plotHist(h) { for (var i = 0; i < h.length; i++) { if (typeof h[i] != 'number') h[i] = 0; bar = '' + i; while (bar.length < 6) bar += ' '; bar += '|'; for (var c = 0; c < h[i]; c+=10) bar += '='; print(bar); } } function countRuns(n, sampler, hist) { // var log = ''; var run = 0; while (n-- > 0) { var take = sampler(); // log += take ? '*' : '-'; if (take) { if (typeof hist[run] != 'number') hist[run] = 0; hist[run]++; run = 0; } else { run++; } } // print(log); } function makeDirectSampler(p) { return function () { return Math.random() < p; }; } function makeCountingSampler(p) { var logNotP = Math.log(1-p); function chooseSkip() { return Math.floor(Math.log(Math.random()) / logNotP); } var skip = chooseSkip(); return function () { if (skip > 0) { skip--; return false; } skip = chooseSkip(); return true; } } var hist; hist = []; countRuns(100000, makeDirectSampler(.1), hist); plotHist(hist); print('\n\n\n'); hist = []; countRuns(100000, makeCountingSampler(.1), hist); plotHist(hist);$ ./mem/js/src/obj~/js/src/js -f samples.js
0     |===============================================================================================
1     |========================================================================================
2     |===================================================================================
3     |============================================================================
4     |=====================================================================
5     |===========================================================
6     |=======================================================
7     |===============================================
8     |=========================================
9     |========================================
10    |====================================
11    |===================================
12    |===========================
13    |==========================
14    |=========================
15    |===================
16    |===================
17    |================
18    |===============
19    |================
20    |===========
21    |==========
22    |===========
23    |=========
24    |=========
25    |======
26    |=======
27    |=======
28    |=======
29    |=====
30    |====
31    |====
32    |====
33    |====
34    |====
35    |===
36    |==
37    |===
38    |==
39    |=
40    |=
41    |==
42    |==
43    |==
44    |==
45    |==
46    |=
47    |==
48    |=
49    |=
50    |=
51    |=
52    |=
53    |=
54    |=
55    |=
56    |=
57    |=
58    |=
59    |=
60    |=
61    |=
62    |=
63    |
64    |=
65    |=
66    |=
67    |=
68    |=
69    |=
70    |
71    |=
72    |=
73    |
74    |
75    |
76    |=
77    |=
78    |
79    |=

0     |===================================================================================================
1     |=========================================================================================
2     |================================================================================
3     |==============================================================================
4     |===================================================================
5     |=============================================================
6     |================================================
7     |=============================================
8     |==========================================
9     |=======================================
10    |=================================
11    |================================
12    |===========================
13    |==========================
14    |==========================
15    |======================
16    |=====================
17    |=================
18    |================
19    |===============
20    |=============
21    |==============
22    |===========
23    |=========
24    |==========
25    |========
26    |======
27    |=======
28    |======
29    |=====
30    |=====
31    |===
32    |====
33    |====
34    |===
35    |==
36    |==
37    |==
38    |==
39    |===
40    |==
41    |=
42    |==
43    |==
44    |==
45    |=
46    |=
47    |=
48    |=
49    |=
50    |=
51    |=
52    |=
53    |=
54    |=
55    |=
56    |=
57    |=
58    |=
59    |=
60    |=
61    |=
62    |
63    |=
64    |
65    |
66    |=
67    |
68    |
69    |
70    |
71    |
72    |
73    |
74    |=
75    |=
76    |
77    |
78    |
79    |
80    |=
81    |
82    |
83    |
84    |
85    |
86    |
87    |
88    |
89    |
90    |
91    |=
92    |
93    |
94    |
95    |
96    |
97    |
98    |
99    |
100   |
101   |
102   |
103   |
104   |
105   |
106   |
107   |
108   |
109   |
110   |
111   |
112   |
113   |
114   |=
\$
see rule 43 of http://www.mathsisfun.com/calculus/integration-rules.html

1. the mean value of your RNG == integral(log(X,~P)|X=0~1) == (ln(1)-1)/ln(~P)
2. (ln(1)-1)/ln(~P) != 1/P   right?! (P == 1 - ~P)

Since P is your sample-rate, the mean value of your RNG should be 1/P.  That means if P == 0.2, the mean value of your RNG should be 5.   If P == 0.1, the mean value should be 10.  Obviously, your RNG doesn't comply with this rule.
(In reply to Thinker Li [:sinker] from comment #20)
> Since P is your sample-rate, the mean value of your RNG should be 1/P.  That
> means if P == 0.2, the mean value of your RNG should be 5.   If P == 0.1,
> the mean value should be 10.

I edited my script above to measure random run lengths. Their average is indeed 1/P, for both the trivial P sampler and the skip-count-generating sampler.

What we want isn't an exponential distribution, but the discrete version of that, a geometric distribution: http://en.wikipedia.org/wiki/Geometric_distribution

The formula we're using to produce random numbers with a geometric distribution, given a random variable uniformly distributed across [0, 1], appears towards the bottom of this section: http://en.wikipedia.org/wiki/Geometric_distribution#Related_distributions

The exponential distribution is the continuous analogue of the geometric
distribution. If X is an exponentially distributed random variable with
parameter λ, then

Y = floor(X),

[...] is a geometrically distributed random variable with parameter p = 1 −
e^−λ (thus λ = −ln(1 − p)) and taking values in the set {0, 1, 2, ...}. This
can be used to generate geometrically distributed pseudorandom numbers by
first generating exponentially distributed pseudorandom numbers from a
uniform pseudorandom number generator: then *** floor( \ln(U) / \ln(1-p) )
is geometrically distributed with parameter p, if U is uniformly distributed
in [0,1]. ***

The bit in *** stars *** is what Nick and I believe we're implementing.

I think the source of confusion is that the parameter λ of the exponential distribution is not the same as the parameter p of the geometric distribution. I think that's the discrepancy you're describing in comment 20. But it looks to me as if we're using the technique described there correctly, given that P is the probability of success in any particular Bernoulli trial.

While I do want to be sure we're getting this right, in the end, the we really need only:

- to have some sort of influence over the fraction of events sampled; and
- to not sample with any interesting regularity that might make samples non-representative.

There are many ways to choose skip counts that meet these criteria. The only reason to choose this one is that it's equivalent to doing a Bernoulli trial on each allocation, which is easy to explain in documentation.

("But we want the discrete version of that" is the programmers' equivalent of "But what about the angular momentum?" It's something anyone can say, even if they know nothing about the situation, that stops everyone in their tracks for a moment...)
Comment on attachment 8488877 [details] [diff] [review]
sampling-allocations.patch

Review of attachment 8488877 [details] [diff] [review]:
-----------------------------------------------------------------

I'd like to have SavedStack hold onto the probability itself, because it will make the allocation metadata callback so much simpler. Details in the comments.

::: js/src/jit-test/tests/debug/Memory-allocationSamplingProbability-02.js
@@ +16,5 @@
> +    for (var i = 0; i < 100; i++) {
> +      objs.push(new Object);
> +    }
> +  );
> +}

We don't really want to be tracking the alloctaions of the eval itself. It would be neater to do:

root.eval('' + function makeSomeAllocations() {
for (...) {
...
}
});

and then call root.makeSomeAllocations() below.

@@ +30,5 @@
> +
> +measure(0.0, 0);
> +measure(1.0, 101);
> +measure(0.1, 9);
> +measure(0.5, 53);

We don't really have guarantees that the allocations we request are the only ones that occur; the platform could conceivably make allocations of its own, whenever it pleases. So I think the test needs a comment here, something like:

These are the sample counts that were correct when this test was last updated; changes to SpiderMonkey may occasionally cause changes here. Anything that is within a plausible range for the given sampling probability is fine.

::: js/src/vm/Debugger.cpp
@@ +345,5 @@
>  Debugger::Debugger(JSContext *cx, JSObject *dbg)
>    : object(dbg), uncaughtExceptionHook(nullptr), enabled(true), trackingAllocationSites(false),
> +    allocationSamplingProbability(1.0), allocationsLogLength(0),
> +    maxAllocationsLogLength(DEFAULT_MAX_ALLOCATIONS_LOG_LENGTH), frames(cx->runtime()), scripts(cx),
> +    sources(cx), objects(cx), environments(cx)

Let's make this initialization list do one member per line. Scanning these things is for the birds.

::: js/src/vm/DebuggerMemory.cpp
@@ +259,5 @@
> +    THIS_DEBUGGER_MEMORY(cx, argc, vp, "(set allocationSamplingProbability)", args, memory);
> +    if (!args.requireAtLeast(cx, "(set allocationSamplingProbability)", 1))
> +        return false;
> +
> +    double percentage;

"Percentage" should denote a number between 0 and 100; let's call this "probability".

::: js/src/vm/SavedStacks.cpp
@@ +8,5 @@
>  #include "vm/SavedStacks.h"
>
>  #include "mozilla/Attributes.h"
>
> +#include <cmath>

I think we just want to use <math.h> here.

@@ +775,5 @@
> +        // P = the probability we sample any given event.
> +        //
> +        // ~P = 1-P, the probability we don't sample a given event.
> +        //
> +        // (~P)^n = the probability we don't sample the next n events.

I think we should say "the probability that we skip at least the next n events." It sounds like this is supposed to be the probability that we skip n and then sample, but that's not true.

@@ +779,5 @@
> +        // (~P)^n = the probability we don't sample the next n events.
> +        //
> +        // let X = random between 0 and 1.
> +        //
> +        // log base ~P of X = n, aka the number of events we should skip until we take the next sample.

You may want to spell out: floor(log base ~P of X).

Let's add this, as a second sentence to this paragraph:

Any value for X less than (~P)^n yields a skip count greater than n, so the likelihood of a skip count greater than n is (~P)^n, as required.

::: js/src/vm/SavedStacks.h
@@ +125,5 @@
> +
> +  public:
> +    // This only public so that we can set it in jit-tests to make them
> +    // deterministic. Don't mess with this!
> +    uint64_t            rngState;

Would it work to make this private, but expose a public void setRNGState(uint64_t)?

I think we want to store the sampling probability here, as well. Then the object metadata callback won't need to look for the Debugger that is doing the sampling. It does mean that the setter has to iterate over debuggees, and Debugger::addDebuggeeGlobal needs to set the new debuggee's probability; this would be done right where we set the callback. js::Debugger will still need its own copy of the probability, so that the value is retained even if we remove all debuggees.
Attachment #8488877 - Flags: review?(jimb)
Comment on attachment 8488884 [details] [diff] [review]
cleanup-onLogAllocationSite.patch

Review of attachment 8488884 [details] [diff] [review]:
-----------------------------------------------------------------

I think this patch doesn't work any more if the revisions I requested for part one are made.

::: js/src/vm/Debugger.h
@@ +457,5 @@
>      static inline JSTrapStatus onExceptionUnwind(JSContext *cx, MutableHandleValue vp);
>      static inline void onNewScript(JSContext *cx, HandleScript script,
>                                     GlobalObject *compileAndGoGlobal);
>      static inline void onNewGlobalObject(JSContext *cx, Handle<GlobalObject *> global);
> +    static bool onLogAllocationSite(JSContext *cx, HandleSavedFrame frame, Debugger *dbg);

Is there a reason to make this a static member function of Debugger that takes a Debugger *, rather than having it be a this-ful member function?
Attachment #8488884 - Flags: review?(jimb)
After checking comment 21, your RNG is really in Geometric distribution as you mentioned.  Excellent!  But, don't use the description mentioned in comment 13 as a code document.  It is wrong.
(In reply to Thinker Li [:sinker] from comment #24)
> After checking comment 21, your RNG is really in Geometric distribution as
> you mentioned.  Excellent!  But, don't use the description mentioned in
> comment 13 as a code document.  It is wrong.

Thank you for checking this out. It is not an area where I have much experience.

In the patch review in comment 22, I suggest changing the description to read as below. Does this look right? The only difference I see from Wikipedia is the use of "log base B of N" notation instead of "log B / log N"; when I expressed it that way to fitzgen originally, it was because I wanted to suggest the ~P's being multiplied together as we skip events.

---

P = the probability we sample any given event.

~P = 1-P, the probability we don't sample a given event.

(~P)^n = the probability that we skip at least the next n events.

let X = random between 0 and 1.

floor(log base ~P of X) = n, aka the number of events we should skip until we take the next sample. Any value for X less than (~P)^n yields a skip count greater than n, so the likelihood of a skip count greater than n is (~P)^n, as required.
(In reply to Jim Blandy :jimb from comment #25)
>... the use of "log base B of N" notation instead of "log B / log
> N";

Err, log base B of N instead of "log N / log B"...
Attached patch sampling-allocations.patch (obsolete) — Splinter Review
Updated based on comments!

Try push: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=58bb2758e94b
Attachment #8488877 - Attachment is obsolete: true
Attachment #8488884 - Attachment is obsolete: true
Attachment #8489628 - Flags: review?(jimb)
Comment on attachment 8489628 [details] [diff] [review]
sampling-allocations.patch

Review of attachment 8489628 [details] [diff] [review]:
-----------------------------------------------------------------

Looks great! One comment:

::: js/src/builtin/TestingFunctions.cpp
@@ +2040,5 @@
>  "  to count only things of that kind. If kind is the string 'specific',\n"
>  "  then you can provide an extra argument with some specific traceable\n"
>  "  thing to count.\n"),
>
> +    JS_FN_HELP("setSavedStacksRNGState", SetSavedStacksRNGState, 0, 0,

I think this should be "1, 0,", since it does take one argument.
Attachment #8489628 - Flags: review?(jimb) → review+
Attached patch sampling-allocations.patch (obsolete) — Splinter Review
Fixed arity of the testing function.

Carrying over jimb's r+.
Attachment #8489628 - Attachment is obsolete: true
Attachment #8490221 - Flags: review+
Keywords: checkin-needed
https://hg.mozilla.org/integration/mozilla-inbound/rev/b78a99aae4ce
Keywords: checkin-needed
sorry had to back this out for causing a Android 4 Debug Bustage with test failures like https://tbpl.mozilla.org/php/getParsedLog.php?id=48270773&tree=Mozilla-Inbound
try push with a call to perror() to see what's up: https://tbpl.mozilla.org/?tree=Try&rev=76f62f127fb4
Frustratingly, checking all of the android options in try chooser doesn't get me a try push on Android 4.0, just 2.3 and 4.2.

Tomcat, any idea how I can do try pushes for Android 4.0 Panda?
Flags: needinfo?(cbook)
Blocks: 1068171
(In reply to Nick Fitzgerald [:fitzgen] from comment #33)
> Frustratingly, checking all of the android options in try chooser doesn't
> get me a try push on Android 4.0, just 2.3 and 4.2.
>
> Tomcat, any idea how I can do try pushes for Android 4.0 Panda?

hm i got the android 4 debug tests when retrigger the android 2.3 debug runs i think. Maybe Ryan know the answer here since i guess he has more experience with try
Flags: needinfo?(cbook) → needinfo?(ryanvm)
(In reply to Nick Fitzgerald [:fitzgen] from comment #33)
> Frustratingly, checking all of the android options in try chooser doesn't
> get me a try push on Android 4.0, just 2.3 and 4.2.
>
> Tomcat, any idea how I can do try pushes for Android 4.0 Panda?

Android 4.0 Jit tests are hidden by default because they're permafail.
https://tbpl.mozilla.org/?tree=Try&rev=76f62f127fb4&showall=1 shows it.

Otherwise, Android builds are version-independent. They just show on the 2.3 line because we had to stuff them somewhere :)
Flags: needinfo?(ryanvm)
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #35)
> Android 4.0 Jit tests are hidden by default because they're permafail.
> https://tbpl.mozilla.org/?tree=Try&rev=76f62f127fb4&showall=1 shows it.

LOL, I bet the problem Nick ran into is exactly why.
(In reply to Jim Blandy :jimb from comment #36)
> (In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #35)
> > Android 4.0 Jit tests are hidden by default because they're permafail.
> > https://tbpl.mozilla.org/?tree=Try&rev=76f62f127fb4&showall=1 shows it.
>
> LOL, I bet the problem Nick ran into is exactly why.

So if Android 4.0 JIT tests are already perma-orange and hidden, why was my patch backed out for bad Android 4.0 jit tests?

How do I move this forward from here?
Flags: needinfo?(cbook)
Flags: needinfo?(ryanvm)
Not sure I follow your question:

(In reply to Carsten Book [:Tomcat] from comment #31)
> sorry had to back this out for causing a Android 4 Debug Bustage with test
> failures like
> https://tbpl.mozilla.org/php/getParsedLog.php?id=48270773&tree=Mozilla-
> Inbound

Android 4.0 Panda mozilla-inbound debug test plain-reftest-2 on 2014-09-17 01:49:55 PDT for push b78a99aae4ce

"plain-reftest-2"
Flags: needinfo?(ryanvm)
Flags: needinfo?(cbook)
Attached patch sampling-allocations.patch (obsolete) — Splinter Review
Carrying over jimb's r+.

This should fix the android issues: https://tbpl.mozilla.org/?tree=Try&rev=4f56abc0d241
Attachment #8490221 - Attachment is obsolete: true
Attachment #8493404 - Flags: review+
Adding one small change jimb asked for on irc.
Attachment #8493404 - Attachment is obsolete: true
Attachment #8493443 - Flags: review+
Keywords: checkin-needed
I'm confused - reftest-2 was failing, but your Try run didn't include it? *NOT* jsreftest, jit-test, etc. *REFTEST*

I've gone ahead and manually triggered a run for you. Go ahead and re-set checkin-needed if it passes.
Keywords: checkin-needed
Looks good. I'll land this soonish.
Keywords: checkin-needed
https://hg.mozilla.org/integration/mozilla-inbound/rev/0da1dc4f6c95
Flags: in-testsuite+
Keywords: checkin-needed
> perror("FITZGEN: error opening /dev/urandom: ");

I think you forgot to remove this?
Attachment #8493818 - Flags: review?(jimb)
(In reply to Simon Lindholm from comment #44)
> > perror("FITZGEN: error opening /dev/urandom: ");
>
> I think you forgot to remove this?

Thanks
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #41)
> I'm confused - reftest-2 was failing, but your Try run didn't include it?
> *NOT* jsreftest, jit-test, etc. *REFTEST*
>
> I've gone ahead and manually triggered a run for you. Go ahead and re-set
> checkin-needed if it passes.

My bad. Thanks for triggering a run for me.
https://hg.mozilla.org/mozilla-central/rev/0da1dc4f6c95
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla35
Attachment #8493818 - Flags: review?(jimb) → review+
Keywords: checkin-needed
https://hg.mozilla.org/integration/mozilla-inbound/rev/e600bf456142
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/e600bf456142`
Depends on: 1140806
You need to log in before you can comment on or make changes to this bug.