Closed Bug 1464509 Opened Last year Closed 11 months ago

Add profiler counter support for capturing totals and instance counts for the Gecko Profiler

Categories

(Core :: Gecko Profiler, enhancement, P1)

58 Branch
enhancement

Tracking

()

RESOLVED FIXED
mozilla64
Tracking Status
firefox64 --- fixed

People

(Reporter: jesup, Assigned: jesup)

References

(Depends on 1 open bug, Blocks 2 open bugs)

Details

Attachments

(11 files, 46 obsolete files)

1.38 KB, patch
mstange
: review+
Details | Diff | Splinter Review
1.16 KB, patch
Details | Diff | Splinter Review
9.82 KB, patch
glandium
: review+
Details | Diff | Splinter Review
1.77 KB, patch
mstange
: review+
Details | Diff | Splinter Review
1.21 KB, patch
mstange
: review+
Details | Diff | Splinter Review
23.63 KB, patch
mstange
: review+
Details | Diff | Splinter Review
6.40 KB, patch
mstange
: review+
Details | Diff | Splinter Review
17.71 KB, patch
mstange
: review+
Details | Diff | Splinter Review
1.09 KB, patch
ted
: review+
Details | Diff | Splinter Review
6.41 KB, patch
mstange
: review+
Details | Diff | Splinter Review
2.04 KB, patch
mstange
: review+
Details | Diff | Splinter Review
For monitoring a number of things in the Gecko Profiler, we'd like to have counters for accumulating values and for counting the number of events in a given period, specifically per-thread but also possibly per-process.  These can be used to monitor things like number of IO's, memory in-use and number of allocations per unit time and many other things.

Memory in-use is an example where per-process makes far more sense than recording per-thread.  IO probably makes more sense per-thread, though it may not be critical.

One prime use of these would be to automatically dump their data into the profile at each profiling tick.  per-thread would be grabbed when the thread was, likely via a map of threads->counter, though that map would have to be locked, so we may want to store per-thread counters in private-thread data or attached to the (sigh) nsThread.
Not yet hooked up to the sampling code.
Blocks: 1420437
Priority: -- → P1
Status: NEW → ASSIGNED
I would be interested in this feature to monitor the number of back-edges/entry for each JS function / script.  This implies that we can create counter for a dynamically allocated JS function, would that be possible?

Having this feature would help determine, within the profiler, whether a JS profile which is using all compiler tiers is a workload issue (well optimized code, but too much data to crunch), or a JIT compiler issue (not well optimized JIT code).

Counting the number of back-edges and entry in each functions, would help us determine the equivalent of the number of iterations between 2 samples taken at a specific sample rate.  Each JS executions tiers (interpreter, baseline, ionmonkey) should have different numbers of iterations per milli-seconds.  Doing a ratio of IonMonkey's speed (#i_ion/sec) over Baseline's speed (#i_baseline/sec), gives us the measured speed-up of Ion compared to Baseline (speed-up = (#i_ion/sec) / (#i_baseline/sec) ).

These speed-up could then be compared against the expected speed-up, and help us determine if there is room for optimization within Ion. Note, having less than 1 speed-up should clearly highlight a JIT bug, on a given JS function, which would make such profiles much more actionable for the JIT team.
It should be possible to make a dynamic version of this; the macros at current create objects on first use (or for a specific case, create static objects).
I added number-of-occurences-only counters, as well as running-total-and-heatmap, plus a class for dynamically allocating (and deleting/removing) number-of-occurences-only counters, specifically for the use-cases mentioned by nbp
Attachment #8980758 - Attachment is obsolete: true
Adds all the ProfileBuffer and JSON changes.  Moves RSS/USS to the new Process-level of the JSON structure.  Note that perf-html does not yet understand this JSON structure.  Note: compiles clean.  Not tested beyond a smoke-test.
Attachment #8985993 - Attachment is obsolete: true
revised; supports count-only counters (no heatmaps), and dynamically created counters.  Nicolas, would it help to have an alternate constructor for ProfilerCountOnly that takes an *aCounter arg, so you can use your own atomic that you can ++ directly without indirecting through the object ptr?  I.e. myJitObject->mCounter++, vs myJitObject->mCounterObj->mCounter++?  (note that I've exposed as public in ProfilerCounterOnly the mCounter field).  I know you need this be be Very Low Overhead to increment.
Attachment #8986534 - Flags: feedback?(nicolas.b.pierron)
Attachment #8986534 - Flags: feedback?(mstange)
Attachment #8985999 - Attachment is obsolete: true
Also adds a "process" level of encapsulation to the JSON and moves RSS/USS
to that as well as the counters.
Attachment #8986535 - Flags: feedback?(mstange)
Attachment #8985995 - Attachment is obsolete: true
Comment on attachment 8986534 [details] [diff] [review]
P1: WIP patch for per-process profiler counters

Review of attachment 8986534 [details] [diff] [review]:
-----------------------------------------------------------------

This sounds good to me.

While I might have concerns about the retained memory impact of these structures, I think I can use the ProfilerCountOnly structure as a detached structure which is indexed by the JSScript pointers (to avoid increasing the size of the JSScript).

However, I do not know how the GeckoProfiler.h file can be used within SpiderMonkey. We have a vm/GeckoProfiler.h, but this is a different one which might be API compatible.
Attachment #8986534 - Flags: feedback?(nicolas.b.pierron) → feedback+
> However, I do not know how the GeckoProfiler.h file can be used within
> SpiderMonkey. We have a vm/GeckoProfiler.h, but this is a different one
> which might be API compatible.

I'll check vm/GeckoProfiler.  This code could be transplanted virtually anywhere, though, and it has minimal dependencies.
Moved Counts to a separate include file so we can include it from js/src/vm/GeckoProfiler.h.  Still need to resolve the standalone build issues (and thus the tweak to the spidermonkey_style file).
Attachment #8986961 - Flags: feedback?(nicolas.b.pierron)
Attachment #8986961 - Flags: feedback?(felash)
Attachment #8986534 - Attachment is obsolete: true
Attachment #8986534 - Flags: feedback?(mstange)
I don't think we should be including tools/profiler/public/GeckoProfiler.h in SpiderMonkey code. How we usually do this kind of thing is with a callback that we can set when we create the JSContext. Maybe we can move some data structure into SM headers.
Comment on attachment 8986961 [details] [diff] [review]
P1: WIP patch for per-process profiler counters

Review of attachment 8986961 [details] [diff] [review]:
-----------------------------------------------------------------

Sounds like you wanted to upload a new version but uploaded the previous version of the patch.

::: tools/profiler/public/GeckoProfiler.h
@@ +634,5 @@
> +    }
> +  }
> +
> +  const char *mLabel;
> +  const char *mCategory; // nsCString?

nsCString is not available within SpiderMonkey.
Attachment #8986961 - Flags: feedback?(nicolas.b.pierron)
Comment on attachment 8986961 [details] [diff] [review]
P1: WIP patch for per-process profiler counters

Review of attachment 8986961 [details] [diff] [review]:
-----------------------------------------------------------------

::: tools/profiler/core/platform.cpp
@@ +292,5 @@
> +  PS_GET(const nsTArray<BaseProfilerCount*>&, Counters)
> +
> +  static void AppendCounter(PSLockRef, BaseProfilerCount* aCounter)
> +  {
> +    // we don't own the counter; they may be stored in static pointers

I'm a bit concerned about this... How will we ensure the pointer is still valid when we'll want to use it?

::: tools/profiler/public/GeckoProfiler.h
@@ +641,5 @@
> +  // pointers because we want to be able to use statics and increment them
> +  // directly.  Otherwise we could just have them inline, and not need the
> +  // constructor args.
> +  mozilla::Atomic<int64_t, mozilla::MemoryOrdering::Relaxed> *mCounter;
> +  mozilla::Atomic<uint64_t, mozilla::MemoryOrdering::Relaxed> *mNumber; // may be null

I miss some comments explaining what all these variables are for. Especially `mNumber`.
Attachment #8986961 - Flags: feedback?(felash)
Comment on attachment 8986961 [details] [diff] [review]
P1: WIP patch for per-process profiler counters

Review of attachment 8986961 [details] [diff] [review]:
-----------------------------------------------------------------

Note that I'm far from a C++ expert so I may make very dumb remarks.

But I think there are many indirections and pointers and it could be made less complicated.
Also I'm not convinced about the introduced leak.

::: tools/profiler/public/GeckoProfiler.h
@@ +616,5 @@
> +
> +  void Sample(int64_t &aCounter, uint64_t &aNumber)
> +  {
> +    aCounter = *mCounter;
> +    aNumber = mNumber ? *mNumber : 0;

Wondering if we shouldn't reset the counter after sampling (possibly in another method)? At least this is how I thought this thing would work.

@@ +631,5 @@
> +    (*mCounter) += aNumber;
> +    if (mNumber) {
> +      (*mNumber)++;
> +    }
> +  }

Why is it that different than the "++" operator above?

@@ +639,5 @@
> +  const char *mDescription;
> +  // We're ok with these being un-ordered in race conditions.  These are
> +  // pointers because we want to be able to use statics and increment them
> +  // directly.  Otherwise we could just have them inline, and not need the
> +  // constructor args.

I don't completely buy this argument. Couldn't we have a static variable of this class for this use case?

@@ +661,5 @@
> +  {
> +    profiler_remove_sampled_counter(this);
> +  }
> +
> +  mozilla::Atomic<int64_t, mozilla::MemoryOrdering::Relaxed> mCounter;

This looks more complicated than it could be. What's the purpose of having 2 different classes here? Also the second class "ProfilerCounterOnly" looks more simple than the base class, maybe the opposite would be more understandable? And could save the "if" in "Add"...

@@ +735,5 @@
> +  } while (0)
> +
> +#define AUTO_PROFILER_STATIC_COUNT(label, count)       \
> +  do { \
> +    profiler_number_ ## label++;  /* do this first*/ \

I'm not sure we need this "number" actually; if we need a counter that increments by 1, we could as well have 2 counters instead.

I think you designed this specially for memory or IO workloads, but this may be too specific for some other workloads.
> > +  static void AppendCounter(PSLockRef, BaseProfilerCount* aCounter)
> > +  {
> > +    // we don't own the counter; they may be stored in static pointers
> 
> I'm a bit concerned about this... How will we ensure the pointer is still
> valid when we'll want to use it?

One variation has the counters stored in statics (i.e. they can't go away): see PROFILER_DEFINE_COUNT()

Another has them stored in a subclass as a member: see ProfilerCountOnly.

> > +  // pointers because we want to be able to use statics and increment them
> > +  // directly.  Otherwise we could just have them inline, and not need the
> > +  // constructor args.
> > +  mozilla::Atomic<int64_t, mozilla::MemoryOrdering::Relaxed> *mCounter;
> > +  mozilla::Atomic<uint64_t, mozilla::MemoryOrdering::Relaxed> *mNumber; // may be null
> 
> I miss some comments explaining what all these variables are for. Especially
> `mNumber`.

I'll add them.  mNumber is the (optional) count of how many times we've called Add() (or used AUTO_PROFILER_*); this is used for heatmaps.  mCounter is the running total of the numbers you passed to Add()/etc
> > +  void Sample(int64_t &aCounter, uint64_t &aNumber)
> > +  {
> > +    aCounter = *mCounter;
> > +    aNumber = mNumber ? *mNumber : 0;
> 
> Wondering if we shouldn't reset the counter after sampling (possibly in
> another method)? At least this is how I thought this thing would work.

That would cause major problems with using this for things like the amount of memory allocated - to know the total, you need to know the initial value when the profiler was started (from the expensive jemalloc_stats() call), and then the delta from that point (from the counter).  If you zero it, since we overwrite values in the ProfilerBuffer, we can't calculate the total.  

The processing in perf-html can calculate deltas as needed easily.

> @@ +631,5 @@
> > +    (*mCounter) += aNumber;
> > +    if (mNumber) {
> > +      (*mNumber)++;
> > +    }
> > +  }
> 
> Why is it that different than the "++" operator above?

it shouldn't be... merge error.  Now it's Add(1); return *this;.

> 
> @@ +639,5 @@
> > +  const char *mDescription;
> > +  // We're ok with these being un-ordered in race conditions.  These are
> > +  // pointers because we want to be able to use statics and increment them
> > +  // directly.  Otherwise we could just have them inline, and not need the
> > +  // constructor args.
> 
> I don't completely buy this argument. Couldn't we have a static variable of
> this class for this use case?

So, we try very hard to avoid static constructors; they slow startup.  We even have stuff in the build infrastructure that flags increases in the number of static constructors.  (Some can't really be avoided, or cause problems if they are.)  A static Atomic<> has a trivial (not cared about) constructor; just init to 0.  For example, this is why we have the StaticRefPtr<> template class, because a static RefPtr<> adds static constructors.

If froyd and others are ok with it, we could allow these to be all-static -- but I doubt they will.

> @@ +661,5 @@
> > +  {
> > +    profiler_remove_sampled_counter(this);
> > +  }
> > +
> > +  mozilla::Atomic<int64_t, mozilla::MemoryOrdering::Relaxed> mCounter;
> 
> This looks more complicated than it could be. What's the purpose of having 2
> different classes here? Also the second class "ProfilerCounterOnly" looks
> more simple than the base class, maybe the opposite would be more
> understandable? And could save the "if" in "Add"...

That's partly because I started with counter+heatmap as the only version, but for nbp's perf-sensitive usecase I needed a count-only version.  It's reasonable to consider inverting the class structure now that I know this API works for him; I'll look at doing that.  (I hesitated to redo it more broadly until I got his feedback).

> @@ +735,5 @@
> > +  } while (0)
> > +
> > +#define AUTO_PROFILER_STATIC_COUNT(label, count)       \
> > +  do { \
> > +    profiler_number_ ## label++;  /* do this first*/ \
> 
> I'm not sure we need this "number" actually; if we need a counter that
> increments by 1, we could as well have 2 counters instead.

This would complicate providing a "default" view of the data, as the 2 counters would be apparently disconnected and it wouldn't be clear if we should show one of them as a heatmap.

> I think you designed this specially for memory or IO workloads, but this may
> be too specific for some other workloads.

Well, there are a lot of possible users of this sort of counter/heatmap setup; at all-hands I found that Tarek had implemented something very similar for supporting his about:performance monitoring; we may well end up merging the guts of the two (unless this causes some hot-path perf overhead, which it may very well for cases like nbp's, which is *very* perf-sensitive, and where he wants to provide and increment his own Atomic<> to avoid even an extra ptr deref.
(In reply to Randell Jesup [:jesup] from comment #17)
> Well, there are a lot of possible users of this sort of counter/heatmap
> setup; at all-hands I found that Tarek had implemented something very
> similar for supporting his about:performance monitoring; we may well end up
> merging the guts of the two (unless this causes some hot-path perf overhead,
> which it may very well for cases like nbp's, which is *very* perf-sensitive,
> and where he wants to provide and increment his own Atomic<> to avoid even
> an extra ptr deref.

For perf reasons, if there is any pointer dereferences, I will inline it in the assembly. So the only slow down would be in the interpreted code and in tight loops of JIT-ed code.

One concern would be a memory issue, as for each counter we would have to allocate strings to fill the description with the file name, function name and line number. However, we can iterate on it later on.
I wonder if collecting counts of IPC messages could help identify issues like the one Ehsan describes here:

https://groups.google.com/d/msg/mozilla.dev.platform/qeDQ6_ssTa0/eyMHzQEuCAAJ
Attachment #8986961 - Attachment is obsolete: true
Also moves RSS/USS to top level of the process's JSON
Attachment #8986535 - Attachment is obsolete: true
Attachment #8986535 - Flags: feedback?(mstange)
Attachment #8991436 - Flags: review?(mstange)
Attachment #8991436 - Attachment description: P1: WIP patch for per-process profiler counters → P1: per-process profiler counters
Attachment #8991439 - Attachment description: P2: adjust spidermonkey style for ProfilerCounts - may not want to check this in → P2: adjust spidermonkey style for ProfilerCounts
Attachment #8991439 - Flags: review?(sphink)
Comment on attachment 8991440 [details] [diff] [review]
P3: add process info from threads to top level of the JSON for each process

Review of attachment 8991440 [details] [diff] [review]:
-----------------------------------------------------------------

Note I'll remove the process-type and pid from each thread later once perf-html is ready for it
Attachment #8991440 - Flags: review?(mstange)
Attachment #8991441 - Flags: review?(mstange)
Attachment #8991441 - Attachment description: P4: WIP patch for dumping counters to the ProfileBuffer and JSON → P4: ProfileBuffer and JSON output changes for Counters
Attachment #8991442 - Attachment is obsolete: true
another patch will add the dynamic install/removal of the memory replace when we start/stop profiling
Attachment #8991733 - Flags: review?(mh+mozilla)
Attachment #8991443 - Attachment is obsolete: true
Attachment #8991444 - Flags: review?(mstange)
Comment on attachment 8991445 [details] [diff] [review]
P8 - remove per-thread process info

Review of attachment 8991445 [details] [diff] [review]:
-----------------------------------------------------------------

This will have to land after perf-html knows about process-level processtype/pid values
Attachment #8991445 - Flags: review?(mstange)
Comment on attachment 8991439 [details] [diff] [review]
P2: adjust spidermonkey style for ProfilerCounts

Review of attachment 8991439 [details] [diff] [review]:
-----------------------------------------------------------------

This seems like it makes tools/profiler/public be a place to put shared include files that are usable from anywhere. Or rather -- because I don't think that is the case -- the specific two files tools/profiler/public/{GeckoProfiler.h,ProfilerCounts.h}.

I would want to check with mstange before making that official. I'm fine in principal with adding an exception to check_spidermonkey_style.py if we indeed want that exception, but I don't think I'm the one to decide whether we do.

I haven't really looked at what's going on here to know how necessary it is to do this. Registering a counter address should be infrequent enough to do through a registered function pointer. I guess adding to a counter needs to be direct and is complicated enough that you need a header for it?

To me, it seems pretty messy to have tools/profiler/public contain a mix of include-from-anywhere things like this plus insanely-Gecko-specific things like ProfilerMarkerPayload.h.

mozglue/misc/AutoProfilerLabel.h is a mozglue-only thing, which is weird in its own way, but it is the directory I would expect ProfilerCounts.h to be in. Oh... except it can't be, because it uses mfbt/Atomics.h. :-(

I believe this will break the standalone js packaging job. You'll need to add ProfilerCounts.h to js/src/make-source-package.sh (there are various other things from tools/ already in it. Too many, in fact; I should really get rid of some of those dependencies.)

Like I said, I'd prefer the leave the policy decision of where this lives to mstange. I'll still include some meandering thoughts that he's probably already aware of:

 - it seems like mozglue won't work, due to the Atomics dependency
 - mfbt/profiler/ProfilerCounts.h would be ok, though then tools/profiler/public/GeckoProfiler.h is a misfit
 - mfbt/profiler/ is sort of a misuse of mfbt/, but it doesn't seem too bad to me
 - GeckoProfiler.h depends on everything. Could it be split up?
 - it would be nice to be able to profile the non-libxul.so JS shell
 - I can't tell why GeckoProfiler.h #includes js/RootingAPI.h
 - it looks like the portion of GeckoProfiler.h that needs nsIURI.h is pretty minimal. I don't know what needs nscore.h. But if those parts could be split out, then the rest of GeckoProfiler.h could be moved to js/.
   - nscore.h is probably for AUTO_PROFILER_LABEL_DYNAMIC_NSCSTRING, AUTO_PROFILER_LABEL_DYNAMIC_LOSSY_NSSTRING
Attachment #8991439 - Flags: review?(sphink) → review?(mstange)
(In reply to Steve Fink [:sfink] [:s:] from comment #32)
> mozglue/misc/AutoProfilerLabel.h is a mozglue-only thing, which is weird in
> its own way, but it is the directory I would expect ProfilerCounts.h to be
> in. Oh... except it can't be, because it uses mfbt/Atomics.h. :-(

There is no problem using mfbt in mozglue. Heck, mozglue contains mfbt.
Oh, it does? My mental image of the world is roughly mozglue -> mfbt -> js -> gecko (with most things not needing the js layer). Is mfbt -> mozglue -> js -> gecko accurate?

mozglue/misc/ or mozglue/profiler/ is seeming like a pretty good place for this to me, then.
> Is mfbt -> mozglue -> js -> gecko accurate?

Yes.
Comment on attachment 8991733 [details] [diff] [review]
P6 - Modify jemalloc to allow collection of memory counter info

Review of attachment 8991733 [details] [diff] [review]:
-----------------------------------------------------------------

I don't see why you need this. If you have something that hooks into the memory allocator, then you can have whatever you're planning to have call jemalloc_profiler_count instead call a function that has access to the count as kept by the hooked stuff directly.
Attachment #8991733 - Flags: review?(mh+mozilla) → review-
stripped down to just adding dynamic replacement.  May still have a problem with realloc asserting after removing a replacement.
Attachment #8992078 - Flags: review?(mh+mozilla)
Attachment #8991444 - Attachment is obsolete: true
Attachment #8991444 - Flags: review?(mstange)
Attachment #8991732 - Attachment is obsolete: true
Attachment #8991733 - Attachment is obsolete: true
Comment on attachment 8992079 [details] [diff] [review]
P6 - Add memory replacer with counter to profiler

Review of attachment 8992079 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/moz.configure
@@ +39,5 @@
>  set_define('MOZ_GECKO_PROFILER', gecko_profiler_define)
> +#imply_option('--enable-jemalloc', gecko_profiler_define)
> +#imply_option('--enable-replace-malloc', gecko_profiler_define)
> +imply_option('--enable-jemalloc', True)
> +imply_option('--enable-replace-malloc', True)

I need to fix this; when I try the commented-out ones, it barfs on me for some odd reason
perhaps we should (instead of this patch) use a separate define for MOZ_PROFILER_MEMORY that is turned off if jemalloc is off, instead of forcing jemalloc on - glandium?  The idea would be to ifdef the code for the profiler to hook jemalloc and report it in profiles.
Attachment #8993026 - Flags: review?(ted)
works well now, it appears, with multiple enables and disables.   You have to replace moz_create_arena_with_params or it won't use the moz_arena_* funcs even if they're in the table....
Attachment #8993029 - Flags: review?(mh+mozilla)
Attachment #8992079 - Attachment is obsolete: true
Attachment #8992079 - Flags: review?(mh+mozilla)
Comment on attachment 8992078 [details] [diff] [review]
P5 - Modify jemalloc to allow dynamic replacement

Review of attachment 8992078 [details] [diff] [review]:
-----------------------------------------------------------------

::: memory/build/malloc_decls.h
@@ +48,5 @@
>  MALLOC_DECL(malloc_good_size, size_t, size_t)
>  #endif
>  #if MALLOC_FUNCS & MALLOC_FUNCS_JEMALLOC
>  MALLOC_DECL(jemalloc_stats, void, jemalloc_stats_t*)
> +MALLOC_DECL(jemalloc_replace_dynamic, void, jemalloc_init_func)

This shouldn't be exposed when replace-malloc is disabled. I'm sure this breaks those builds (e.g. asan builds). Also, by defining this here, it's part of the replace-malloc-API itself, so it's in malloc_table_t, which doesn't actually make sense (and is why you need that forward declaration in mozjemalloc.cpp).

It seems the right place for this function declaration is replace-malloc.h.

::: memory/build/mozjemalloc.cpp
@@ +4809,5 @@
> +  // Set this *before* calling replace_init, otherwise if replace_init calls
> +  // malloc() we'll get an infinite loop.
> +  gReplaceMallocInitialized = true;
> +  if (replace_init_func) {
> +    (*replace_init_func)(&gReplaceMallocTable, &gReplaceMallocBridge);

This is going to be racy. ISTR we discussed this on irc: turning the "initialized" state bool into an atomic pointer to a malloc_table_t would seem better.
Attachment #8992078 - Flags: review?(mh+mozilla)
Comment on attachment 8993026 [details] [diff] [review]
configure changes to ensure jemalloc is used if the profiler is on

Review of attachment 8993026 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/moz.configure
@@ +37,5 @@
>  
>  set_config('MOZ_GECKO_PROFILER', gecko_profiler_define)
>  set_define('MOZ_GECKO_PROFILER', gecko_profiler_define)
> +imply_option('--enable-jemalloc', gecko_profiler_define, reason="target")
> +imply_option('--enable-replace-malloc', gecko_profiler_define, reason="target")

This unconditionally enables replace-malloc, which is not wanted. Especially, this is going to enable it to ride the trains, which it is not meant to.

It would be better to ifdef the profiler code that requires it.
Attachment #8993026 - Flags: review?(ted)
Comment on attachment 8993029 [details] [diff] [review]
P6 - Add memory replacer with counter to profiler

Review of attachment 8993029 [details] [diff] [review]:
-----------------------------------------------------------------

::: tools/profiler/core/memory_hooks.cpp
@@ +52,5 @@
> +class ProfilerBridge final : public ReplaceMallocBridge
> +{
> +};
> +
> +static ProfilerBridge* gProfilerBridge;

You don't need to make this a pointer. You actually don't even need it at all if your bridge doesn't provide anything. Just don't touch the ReplaceMallocBridge ** you're handed in init.

@@ +59,5 @@
> +// Utilities
> +//---------------------------------------------------------------------------
> +
> +#ifndef DISALLOW_COPY_AND_ASSIGN
> +#define DISALLOW_COPY_AND_ASSIGN(T) \

we don't have something like this in MFBT?

@@ +80,5 @@
> +//
> +// It would be nice if we could use the InfallibleAllocPolicy from mozalloc,
> +// but we cannot use mozalloc.
> +//
> +class InfallibleAllocPolicyWithNew

This is largely the same code as in DMD. It seems like this should be shared. I'll note that Infallible is a weird name for a class that has fallible functions. (I'd expect the maybe functions to *also* be infallible in such a class)

@@ +201,5 @@
> +/* static */ void
> +InfallibleAllocPolicyWithNew::ExitOnFailure(const void* aP)
> +{
> +  if (!aP) {
> +    MOZ_CRASH("DMD out of memory; aborting");

Copied straight out of DMD.cpp :)

@@ +220,5 @@
> +// MutexBase implements the platform-specific parts of a mutex.
> +
> +#ifdef XP_WIN
> +
> +class MutexBase

You're not using this class, afaics.

@@ +259,5 @@
> +#ifdef XP_WIN
> +
> +#define DMD_TLS_INDEX_TYPE              DWORD
> +#define DMD_CREATE_TLS_INDEX(i_)        do {                                  \
> +                                          (i_) = TlsAlloc();                  \

mozilla/ThreadLocal.h has a type for TlsAlloc and pthread_key_t based thread local storage.

@@ +441,5 @@
> +static void
> +ResetBernoulli()
> +{
> +  new (gBernoulli) FastBernoulliTrial(0.003, 0x8e26eeee166bc8ca,
> +                                             0x56820f304a9c9ae0);

I realize that (almost?) everything until here is copied from DMD. Please share code the code you use, and remove the code you don't use. Because it doesn't seem like you're using the FastBernoulliTrial or the StringTable.

::: tools/profiler/moz.build
@@ +5,5 @@
>  # file, You can obtain one at http://mozilla.org/MPL/2.0/.
>  
> +@template
> +def ReplaceMallocStatic(name):
> +    DEFINES['MOZ_REPLACE_MALLOC_PREFIX'] = name.replace('-', '_')

No need to use a template here. Just define MOZ_REPLACE_MALLOC_PREFIX directly.
Attachment #8993029 - Flags: review?(mh+mozilla)
How does this look like to start?
Attachment #8993374 - Flags: feedback?(mh+mozilla)
compiles but fails in link - jemalloc_replace_dynamic undefined.  I'm sure it's due to not being defined in malloc_decls.h -- one of the many includes of that.  A class system implemented with #include.....  Just need to figure out which set of magic incantations was exporting the symbol and do that by hand.
Attachment #8993573 - Flags: feedback?(mh+mozilla)
Attachment #8993374 - Attachment is obsolete: true
Attachment #8993374 - Flags: feedback?(mh+mozilla)
* * *
Bug 1464509: WIP interdiffs for response to reviews
Attachment #8993678 - Flags: review?(mh+mozilla)
Attachment #8992078 - Attachment is obsolete: true
Attachment #8993029 - Attachment is obsolete: true
Attachment #8992080 - Attachment is obsolete: true
Attachment #8992080 - Flags: review?(mstange)
small changes to get a green try across all platforms
Attachment #8993957 - Flags: review?(mh+mozilla)
Attachment #8993678 - Attachment is obsolete: true
Attachment #8993678 - Flags: review?(mh+mozilla)
Attachment #8993679 - Attachment is obsolete: true
Attachment #8993679 - Flags: review?(mstange)
made it simply not collect memory profile info if jemalloc is off
Attachment #8993959 - Flags: review?(ted)
Attachment #8993026 - Attachment is obsolete: true
Comment on attachment 8991439 [details] [diff] [review]
P2: adjust spidermonkey style for ProfilerCounts

Review of attachment 8991439 [details] [diff] [review]:
-----------------------------------------------------------------

I agree with sfink here - incnluding tools/profiler/public/GeckoProfiler.h in Spidermonkey is not the right way to go. The include file should either go into Spidermonkey or into mozglue.
This is admittedly a bit messy, but it's what we have for now. Bug 1437245 is promising to move everything into mozglue, but until we have that, we have to do a bit of a dance.

You can look at mozglue/misc/AutoProfilerLabel.h and at the patch for bug 1476793 to see what we do for e.g. pushing/popping profiler labels: During profiler startup, we register callback functions or a class pointer with the modules that can't include GeckoProfiler.h (those being mozglue and Spidermonkey), and duplicate a few of the convenience macros. Please take a similar approach with the counters.
Attachment #8991439 - Flags: review?(mstange) → review-
Comment on attachment 8991436 [details] [diff] [review]
P1: per-process profiler counters

Review of attachment 8991436 [details] [diff] [review]:
-----------------------------------------------------------------

::: tools/profiler/core/platform.cpp
@@ +3284,5 @@
>    return RacyFeatures::IsActiveWithFeature(aFeature);
>  }
>  
>  void
> +profiler_add_sampled_counter(BaseProfilerCount *aCounter)

This happens in multiple places, but I'm only commenting on it once: Mozilla style is to have the asterisk hug the type name, not the variable name.

::: tools/profiler/public/ProfilerCounts.h
@@ +25,5 @@
> +
> +typedef mozilla::Atomic<int64_t, mozilla::MemoryOrdering::Relaxed> ProfilerAtomicSigned;
> +typedef mozilla::Atomic<uint64_t, mozilla::MemoryOrdering::Relaxed> ProfilerAtomicUnsigned;
> +
> +// Counter support

Please add a comment around here that explains what a "counter" is - e.g. the fact that it stores both the "count" and a "number", which are the accumulated value and the number of "incremental adjustments" / "accumulations" that have been made to this value. And please also mention one or two examples.

@@ +63,5 @@
> +      (*mNumber)++;
> +    }
> +  }
> +
> +  const char *mLabel;

Please mention that all of these pointers point to values in static storage: string literals + atomics that are defined as globals.

@@ +106,5 @@
> +  const char profiler_category_ ## label[] = category; \
> +  const char profiler_description_ ## label[] = description; \
> +  BaseProfilerCount* AutoCount_ ## label;
> +
> +// This just counts touches of the count

Does it? The definition of this macro looks identical to the definition of the previous macro.

@@ +119,5 @@
> +  ProfilerAtomicSigned profiler_count_ ## label(0);           \
> +  ProfilerAtomicUnsigned profiler_number_ ## label(0);              \
> +  BaseProfilerCount AutoCount_ ## label(#label, \
> +                                       &profiler_count_ ## label,        \
> +                                       &profiler_number_ ## label,       \

Isn't this a static initializer? Where would this macro be used? Inside a function or at the file level?

@@ +124,5 @@
> +                                       category,     \
> +                                       description);
> +
> +// If we didn't care about static initializers, we could avoid the need for
> +// a ptr.  Since these aren't UniquePtrs, they will leak on shutdown.

It's unclear which pointer this comment is referring to.

@@ +131,5 @@
> +
> +// XXX do this without the if() and without the possible race to set (leak)
> +// Best would be something that didn't instantiate until profiling was turned on.
> +// We may want to keep counts, however, before it's turned on - if not, we could
> +// put the atomics in the BaseProfilerCount object.  If we need both forms (or

I don't understand the problem here, and the comments are rather confusing. Are the macros below used by anything?
Attachment #8991436 - Flags: review?(mstange)
Comment on attachment 8991440 [details] [diff] [review]
P3: add process info from threads to top level of the JSON for each process

Review of attachment 8991440 [details] [diff] [review]:
-----------------------------------------------------------------

::: tools/profiler/core/platform.cpp
@@ +1848,5 @@
>  
> +  // Write per-process data
> +  aWriter.StringProperty("processType",
> +                         XRE_ChildProcessTypeToString(XRE_GetProcessType()));
> +  aWriter.IntProperty("pid", static_cast<int64_t>(getpid()));

Let's add these values to the process's "meta" object. It already has the "processType" field there. (which is currently using the numeric value instead of the string, so please just change it to use the string)
Attachment #8991445 - Flags: review?(mstange) → review+
Comment on attachment 8993957 [details] [diff] [review]
P5 - Modify jemalloc to allow dynamic replacement

Review of attachment 8993957 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/vm/GeckoProfiler.h
@@ -8,5 @@
>  #define vm_GeckoProfiler_h
>  
>  #include "mozilla/DebugOnly.h"
>  #include "mozilla/GuardObjects.h"
> -#include "mozilla/ProfilerCounts.h"

This doesn't seem related.

::: memory/build/mozjemalloc.cpp
@@ +4861,5 @@
> +#include "malloc_decls.h"
> +  }
> +  if (table->moz_arena_malloc ==
> +        MozJemalloc::moz_arena_malloc &&
> +      table->malloc != MozJemalloc::malloc) {

I guess this is a fair change, but can you separate this one out in a separate patch/bug?

::: memory/build/mozjemalloc_types.h
@@ +56,5 @@
>  
> +struct malloc_table;
> +typedef struct malloc_table malloc_table_t;
> +struct ReplaceMallocBridge;
> +typedef malloc_table_t* (*jemalloc_init_func)(malloc_table_t const *, struct ReplaceMallocBridge**);

You shouldn't need these here anymore.

::: memory/build/mozmemory.h
@@ +66,5 @@
>  #define moz_create_arena() moz_create_arena_with_params(NULL)
>  #endif
>  
> +#ifdef MOZ_REPLACE_MALLOC
> +MOZ_JEMALLOC_API void jemalloc_replace_dynamic(jemalloc_init_func);

nor this, now it's defined in replace_malloc.h

::: memory/build/mozmemory_wrap.h
@@ +38,5 @@
>  //   - jemalloc_purge_freed_pages
>  //   - jemalloc_free_dirty_pages
>  //   - jemalloc_thread_local_arena
>  //   - jemalloc_ptr_info
> +//   - jemalloc_replace_dynamic

likewise

::: memory/build/replace_malloc.h
@@ +18,5 @@
>  //
>  // An initialization function is called before any malloc replacement
>  // function, and has the following declaration:
>  //
> +//   malloc_table_t replace_init(malloc_table_t const*, ReplaceMallocBridge**)

Can we avoid a signature change for these functions? I'd rather preserve the ABI for the replace malloc libraries.
Attachment #8993957 - Flags: review?(mh+mozilla)
Attachment #8993573 - Flags: feedback?(mh+mozilla)
(In reply to Mike Hommey [:glandium] from comment #58)
> I guess this is a fair change, but can you separate this one out in a
> separate patch/bug?

In fact, I think all these jemalloc/replace_malloc changes should go to a separate bug this one would depend on.
> In fact, I think all these jemalloc/replace_malloc changes should go to a
> separate bug this one would depend on.

Sure.

>> +//   malloc_table_t replace_init(malloc_table_t const*, ReplaceMallocBridge**)
>
>Can we avoid a signature change for these functions? I'd rather preserve the ABI for the replace malloc libraries.

Well, that means that the table must be allocated by jemalloc, and we need to do a bunch of "did it actually replace anything" checks (via a copy of the passed-in table to compare) - this patch simplified that code.  But the cleanup/simplification isn't required; I think we can allocate in jemalloc and pass it down, compare, etc.
Depends on: 1480430
Updated for API changes now in bug 1480430
Attachment #8997023 - Flags: review?(mh+mozilla)
Attachment #8993958 - Attachment is obsolete: true
Attachment #8993958 - Flags: review?(mstange)
Attachment #8993957 - Attachment is obsolete: true
Attachment #8993573 - Attachment is obsolete: true
Attachment #8991439 - Attachment is obsolete: true
Removed all the pthread/TLS code -- we don't need it since we don't allocate during interception (yet); we can re-add it when we add stack capture.  When we do, we'll need to beware a bug in DMD's impl that can be triggeed by pthread_setspecific() allocating a 'level2' structure; using an Atomic<bool> with compareExchange(false, true) to lock around the setspecific, and default to blocking if that fails solves the bug on try (the profiler gtest tests trip on this).  Also removed the bernoulli stuff for similar reasons; it can be re-added as needed.
Attachment #8997560 - Flags: review?(mh+mozilla)
Attachment #8997023 - Attachment is obsolete: true
Attachment #8997023 - Flags: review?(mh+mozilla)
updated from earlier comments, and cleaned up API some
Attachment #8998421 - Flags: review?(mstange)
Attachment #8991436 - Attachment is obsolete: true
cleanup to installation API, and switch from static/macros to allocated subclass of BaseProfilerCount
Attachment #8998422 - Flags: review?(mh+mozilla)
Attachment #8997560 - Attachment is obsolete: true
Attachment #8997560 - Flags: review?(mh+mozilla)
Attachment #8993680 - Attachment is obsolete: true
Attachment #8993680 - Flags: review?(mstange)
Comment on attachment 8998422 [details] [diff] [review]
P6 - Add memory replacer with counter to profiler

Review of attachment 8998422 [details] [diff] [review]:
-----------------------------------------------------------------

::: tools/profiler/core/memory_hooks.cpp
@@ +251,5 @@
> +  }
> +  size_t actualSize = gMallocTable.malloc_usable_size(aPtr);
> +  if (actualSize > 0) {
> +    // this never allocates
> +    sCounter->Add(actualSize);

Glandium: my only question is if this is low-enough overhead (in memory accesses)  sCounter is a UniquePtr<BaseProfilerCountTotal>, and Add(count) will do (*mCounter) += count; if (mNumber) { (*mNumber)++; }
Re comment 67
Flags: needinfo?(mh+mozilla)
Comment on attachment 8998422 [details] [diff] [review]
P6 - Add memory replacer with counter to profiler

Review of attachment 8998422 [details] [diff] [review]:
-----------------------------------------------------------------

::: tools/profiler/core/memory_hooks.cpp
@@ +32,5 @@
> +#include "mozilla/JSONWriter.h"
> +#include "mozilla/MemoryReporting.h"
> +#include "mozilla/ProfilerCounts.h"
> +
> +#if defined(MOZ_REPLACE_MALLOC) && defined(MOZ_PROFILER_MEMORY)

Shouldn't this file simply be skipped when both are undefined?

@@ +44,5 @@
> +class ProfilerBridge final : public ReplaceMallocBridge
> +{
> +};
> +
> +static ProfilerBridge* gProfilerBridge;

As said in previous review, you don't need this.

@@ +66,5 @@
> +//
> +// It would be nice if we could use the InfallibleAllocPolicy from mozalloc,
> +// but we cannot use mozalloc.
> +//
> +class InfallibleAllocPolicyWithNew

You actually don't use a lot of the methods in this class, especially the maybe_* ones.

@@ +188,5 @@
> +/* static */ void
> +InfallibleAllocPolicyWithNew::ExitOnFailure(const void* aP)
> +{
> +  if (!aP) {
> +    MOZ_CRASH("DMD out of memory; aborting");

not DMD

@@ +207,5 @@
> +// MutexBase implements the platform-specific parts of a mutex.
> +
> +#ifdef XP_WIN
> +
> +class MutexBase

You're still not using this class.

@@ +251,5 @@
> +  }
> +  size_t actualSize = gMallocTable.malloc_usable_size(aPtr);
> +  if (actualSize > 0) {
> +    // this never allocates
> +    sCounter->Add(actualSize);

2 indirections and an atomic? that sounds suboptimal, but it's also not always on, so...

@@ +339,5 @@
> +  FreeCallback(aPtr);
> +  gMallocTable.free(aPtr);
> +}
> +
> +void *

you should make those static too.

@@ +397,5 @@
> +void
> +replace_init(malloc_table_t * aMallocTable, ReplaceMallocBridge** aBridge)
> +{
> +  if (mozilla::profiler::Init(aMallocTable)) {
> +    // Can't use MALLOC_FUNCS_ARENA_ALLOC or it won't use the arena funcs

You changed that in the other bug, though.

::: tools/profiler/moz.build
@@ +11,2 @@
>  if CONFIG['MOZ_GECKO_PROFILER']:
> +    ReplaceMallocStatic('profiler')

You don't need to add a template here. Just add the define directly. It's a template in memory/replace for subdirectories of memory/replace/ to use.
Attachment #8998422 - Flags: review?(mh+mozilla)
Flags: needinfo?(mh+mozilla)
Attachment #8993959 - Attachment is obsolete: true
Attachment #8993959 - Flags: review?(ted)
the allocation policy is no longer needed at all with the removal of the bridge alloc.
Attachment #8998954 - Flags: review?(mh+mozilla)
Attachment #8998422 - Attachment is obsolete: true
moved ifdefs from memory_hooks.cpp to platform.cpp, so we don't have to compile hooks if they aren't used
Attachment #8998957 - Flags: review?(mstange)
Attachment #8998423 - Attachment is obsolete: true
Attachment #8998423 - Flags: review?(mstange)
forgot to include one change requested
Attachment #8998975 - Flags: review?(mh+mozilla)
Attachment #8998954 - Attachment is obsolete: true
Attachment #8998954 - Flags: review?(mh+mozilla)
Comment on attachment 8998421 [details] [diff] [review]
P1: Add per-process profiler counters

Review of attachment 8998421 [details] [diff] [review]:
-----------------------------------------------------------------

::: tools/profiler/public/ProfilerCounts.h
@@ +31,5 @@
> +// 1) a simple counter which can be added to or subtracted from.  This could track the number
> +// of objects of a type, the number of calls to something (reflow, JIT, etc).
> +// 2) a combined counter which has the above, plus a number-of-calls counter that is
> +// incremented by 1 for each call to modify the count.  This provides an optional source for
> +// a 'heatmap' of access.  This can be used (for example) to track the amount of memory

I'm not sure what heatmap really means here, could you clarify what you mean by this a bit?

@@ +35,5 @@
> +// a 'heatmap' of access.  This can be used (for example) to track the amount of memory
> +// allocated, and provide a heatmap of memory operations (allocs/frees).
> +//
> +// Counters are sampled by the profiler once per sample-period.  At this time, all counters
> +// are global to the process.  In the future, there might be more versions with per-thread

I think it would be worth referencing a Bug # here.

@@ +97,5 @@
> +
> +  // These typically are static strings (for example if you use the macros
> +  // below)
> +  const char* mLabel;
> +  const char* mCategory;

Is there any enforcement that mCategory matches the existing category work that Markus has done?

@@ +108,5 @@
> +  // don't have to be - their lifetime must be longer than the use of them
> +  // by the profiler (see profiler_add/remove_sampled_counter()).  If you're
> +  // using a lot of these, they probably should be allocated (see classn
> +  // ProfilerCountOnly below).
> +  ProfilerAtomicSigned* mCounter;

I think it would be nice to document what the mCounter and mNumber represent, it's not immediately obvious what the difference between the two is, although I've not yet looked at how the other patches which presumably show how the counters are used.

@@ +122,5 @@
> +                  const char* aCategory,
> +                  const char* aDescription)
> +    : BaseProfilerCount(aLabel, &mCounter, nullptr, aCategory, aDescription)
> +  {
> +    // Assume we're in libxul

Is there a debug assertion we could throw in here to ensure this is true?

@@ +143,5 @@
> +                       const char* aCategory,
> +                       const char* aDescription)
> +    : BaseProfilerCount(aLabel, &mCounter, &mNumber, aCategory, aDescription)
> +  {
> +    // Assume we're in libxul

Ditto on the assertion.
Comment on attachment 8991441 [details] [diff] [review]
P4: ProfileBuffer and JSON output changes for Counters

Review of attachment 8991441 [details] [diff] [review]:
-----------------------------------------------------------------

::: tools/profiler/core/ProfileBufferEntry.cpp
@@ +1192,5 @@
> +  // Because this is a format entirely internal to the Profiler, any parsing
> +  // error indicates a bug in the ProfileBuffer writing or the parser itself,
> +  // or possibly flaky hardware.
> +
> +  EntryGetter e(*this);

My general preference is on more verbose names to help the reader, so something like `entries` would be easier to read below.

@@ +1213,5 @@
> +  // Build the map of counters and populate it
> +  nsDataHashtable<nsUint64HashKey, CounterMap> counters;
> +
> +  while (e.Has()) {
> +    // skip all non-Counters, including if we start in the middle of a counter

Perhaps you could embed this comment in an else block like you did with the "skip counter sample" comment below. It's a little confusing where it is right now, as the if check is looking for the opposite case.

@@ +1256,5 @@
> +    }
> +    e.Next();
> +  }
> +  // we have a map of a map of counter entries; dump them to JSON
> +  if (counters.Count() == 0) {

I'm not understanding what is going on here, or how we get into this state.
Comment on attachment 8991441 [details] [diff] [review]
P4: ProfileBuffer and JSON output changes for Counters

Review of attachment 8991441 [details] [diff] [review]:
-----------------------------------------------------------------

::: tools/profiler/core/ProfileBufferEntry.cpp
@@ +1192,5 @@
> +  // Because this is a format entirely internal to the Profiler, any parsing
> +  // error indicates a bug in the ProfileBuffer writing or the parser itself,
> +  // or possibly flaky hardware.
> +
> +  EntryGetter e(*this);

> My general preference is on more verbose names to help the reader, so something like `entries` would be easier to read below.

This is just following the pattern used in the rest of the file, to avoid two different ways to reference the same thing

@@ +1213,5 @@
> +  // Build the map of counters and populate it
> +  nsDataHashtable<nsUint64HashKey, CounterMap> counters;
> +
> +  while (e.Has()) {
> +    // skip all non-Counters, including if we start in the middle of a counter

> Perhaps you could embed this comment in an else block like you did with the "skip counter sample" comment below. It's a little confusing where it is right now, as the if 
> check is looking for the opposite case

I'll reword instead

@@ +1256,5 @@
> +    }
> +    e.Next();
> +  }
> +  // we have a map of a map of counter entries; dump them to JSON
> +  if (counters.Count() == 0) {

> I'm not understanding what is going on here, or how we get into this state.

If we didn't find any counters, no point in writing an empty "counters" section
Comment on attachment 8998952 [details] [diff] [review]
configure changes to ensure jemalloc is used if the profiler is on

Review of attachment 8998952 [details] [diff] [review]:
-----------------------------------------------------------------

::: build/moz.configure/memory.configure
@@ +25,5 @@
>  
>  set_config('MOZ_MEMORY', jemalloc)
>  set_define('MOZ_MEMORY', jemalloc)
> +set_config('MOZ_PROFILER_MEMORY', jemalloc)
> +set_define('MOZ_PROFILER_MEMORY', jemalloc)

I'm not sure what this buys you over just using the `MOZ_MEMORY` define directly. Is there a specific reason you need a different define?
Comment on attachment 8998975 [details] [diff] [review]
P6 - Add memory replacer with counter to profiler

Review of attachment 8998975 [details] [diff] [review]:
-----------------------------------------------------------------

::: tools/profiler/core/memory_hooks.cpp
@@ +49,5 @@
> +// Utilities
> +//---------------------------------------------------------------------------
> +
> +#ifndef DISALLOW_COPY_AND_ASSIGN
> +#define DISALLOW_COPY_AND_ASSIGN(T) \

Is this used?
Comment on attachment 8998975 [details] [diff] [review]
P6 - Add memory replacer with counter to profiler

Review of attachment 8998975 [details] [diff] [review]:
-----------------------------------------------------------------

::: tools/profiler/core/memory_hooks.cpp
@@ +28,5 @@
> +#include "nscore.h"
> +
> +#include "mozilla/Assertions.h"
> +#include "mozilla/IntegerPrintfMacros.h"
> +#include "mozilla/JSONWriter.h"

Are these two includes used? Of the system headers further up, are all of them necessary?

@@ +259,5 @@
> +void
> +install_memory_counter(bool aInstall)
> +{
> +  if (!sCounter) {
> +    sCounter.reset(new ProfilerCounterTotal("malloc", "Memory", "Amount of allocated memory"));

sCounter = MakeUnique<ProfilerCounterTotal>(...);
Comment on attachment 8998957 [details] [diff] [review]
P7 - Install/remove memory replacement with counter on profiler start/stop

Review of attachment 8998957 [details] [diff] [review]:
-----------------------------------------------------------------

::: tools/profiler/core/platform.cpp
@@ +161,5 @@
>  
> +#if defined(MOZ_REPLACE_MALLOC) && defined(MOZ_PROFILER_MEMORY)
> +namespace mozilla {
> +namespace profiler {
> +extern void install_memory_counter(bool aInstall);

I would prefer this declaration to go in a dedicated header file, in the patch that adds the definition for it. Why is it extern?
Comment on attachment 8998421 [details] [diff] [review]
P1: Add per-process profiler counters

Review of attachment 8998421 [details] [diff] [review]:
-----------------------------------------------------------------

::: tools/profiler/core/platform.cpp
@@ +320,5 @@
>    // Info on all the registered threads.
>    // ThreadIds in mRegisteredThreads are unique.
>    nsTArray<UniquePtr<RegisteredThread>> mRegisteredThreads;
>  
> +  // All active counters

Maybe make this "Non-owning pointers to all active counters"?

::: tools/profiler/public/ProfilerCounts.h
@@ +53,5 @@
> +// To use without statics/macros:
> +//
> +// UniquePtr<ProfilerCounter> myCounter;
> +// ...
> +// myCounter.reset(new ProfilerCounter("mything", "JIT", "Some JIT byte count"));

I'd prefer
myCounter = MakeUnique<ProfilerCounter>(...);

@@ +55,5 @@
> +// UniquePtr<ProfilerCounter> myCounter;
> +// ...
> +// myCounter.reset(new ProfilerCounter("mything", "JIT", "Some JIT byte count"));
> +// ...
> +// void foo() { ... myCounter->Add(number_of_bytes_generated0; ... }

Could you add a brief explanation of why one would want to use the macro version over the non-macro version? (macros declare a static and don't need initialization)

@@ +92,5 @@
> +    (*mCounter) += aNumber;
> +    if (mNumber) {
> +      (*mNumber)++;
> +    }
> +  }

Can you declare Add and operator++ on the subclasses? That should save you a pointer dereference and a null check, in case BaseProfilerCount is used through one of the ProfilerCounter* subclasses. And the direct users of BaseProfilerCount don't seem to call these methods - they manipulate their own atomics manually.

@@ +106,5 @@
> +  // constructor args.
> +  // These are normally static globals (using the macros below), though they
> +  // don't have to be - their lifetime must be longer than the use of them
> +  // by the profiler (see profiler_add/remove_sampled_counter()).  If you're
> +  // using a lot of these, they probably should be allocated (see classn

typo: classn

and I think "allocated at runtime" would be clearer

@@ +188,5 @@
> +                                       category,     \
> +                                       description);
> +
> +// If we didn't care about static initializers, we could avoid the need for
> +// a ptr to the BaseProfilerCount object.

Same question as in the last review: Are the macros below used by anything? What scenario should they be used in?
Comment on attachment 8991441 [details] [diff] [review]
P4: ProfileBuffer and JSON output changes for Counters

Review of attachment 8991441 [details] [diff] [review]:
-----------------------------------------------------------------

::: tools/profiler/core/ProfileBufferEntry.cpp
@@ +1179,5 @@
> +{
> +  nsTArray<double> mTimes;
> +  nsTArray<uint64_t> mNumbers;
> +  nsTArray<int64_t> mCounts;
> +};

I think both the creation and the traversal of this structure would look more natural if this were expressed as "array of structs" and not as "struct of arrays".

@@ +1207,5 @@
> +  // CounterID
> +  // Time
> +  // ( CounterKey Count Number? )*
> +  //
> +  // And the JSON:

Unfinished comment

@@ +1215,5 @@
> +
> +  while (e.Has()) {
> +    // skip all non-Counters, including if we start in the middle of a counter
> +    if (e.Get().IsCounterId()) {
> +      uint64_t id = e.Get().u.mUint64;

Why not use .mPtr here, and make this an actual void*?

You can use nsVoidPtrHashKey.

@@ +1227,5 @@
> +        e.Next();
> +        while (e.Has() && e.Get().IsCounterKey()) {
> +          uint64_t key;
> +          uint64_t number;
> +          int64_t count;

Let's declare each variable in the line where you initialize it.

@@ +1245,5 @@
> +          data.mTimes.AppendElement(time);
> +          data.mNumbers.AppendElement(number);
> +          data.mCounts.AppendElement(count);
> +          if (e.Get().IsCount()) {
> +            e.Next();

What Count entry would we be skipping here? Please add a comment.

::: tools/profiler/core/ProfileBufferEntry.h
@@ +460,5 @@
> +//     /* more counters */
> +//   ],
> +//   "memory":
> +//   {
> +//     "initial_heap": 12345678,

Do we need this? It's also not implemented at the moment.

::: tools/profiler/core/platform.cpp
@@ +296,5 @@
>      // we don't own the counter; they may be stored in static pointers
>      sInstance->mCounters.AppendElement(aCounter);
>    }
>  
> +  static void AppendCounterLocked(BaseProfilerCount* aCounter)

This method looks unused.

@@ +2188,5 @@
> +        for (auto& counter : counters) {
> +          // create Buffer entries for each counter
> +          buffer.AddEntry(ProfileBufferEntry::CounterId(reinterpret_cast<uint64_t>(counter)));
> +          buffer.AddEntry(ProfileBufferEntry::Time(delta.ToMilliseconds()));
> +          // XXX support keyed maps of counts

Please mention that, for now, we will store these counts as a single "sample group" with the id (or CounterKey) zero. And that in the future, we may have multiple sample groups with custom IDs / counter keys.

@@ +2190,5 @@
> +          buffer.AddEntry(ProfileBufferEntry::CounterId(reinterpret_cast<uint64_t>(counter)));
> +          buffer.AddEntry(ProfileBufferEntry::Time(delta.ToMilliseconds()));
> +          // XXX support keyed maps of counts
> +          uint64_t number;
> +          int64_t count;

Let's reverse these lines.

@@ +2230,5 @@
>              DoPeriodicSample(lock, *registeredThread, *profiledThreadData, now,
> +                             aRegs);
> +            // only report these once per sample-time
> +            rssMemory = 0;
> +            ussMemory = 0;

What effect does setting these to zero have?
Comment on attachment 8998975 [details] [diff] [review]
P6 - Add memory replacer with counter to profiler

Review of attachment 8998975 [details] [diff] [review]:
-----------------------------------------------------------------

::: tools/profiler/core/memory_hooks.cpp
@@ +32,5 @@
> +#include "mozilla/JSONWriter.h"
> +#include "mozilla/MemoryReporting.h"
> +#include "mozilla/ProfilerCounts.h"
> +
> +#if defined(MOZ_REPLACE_MALLOC) && defined(MOZ_PROFILER_MEMORY)

You're only building this file when both are defined, so you don't need this check.

@@ +40,5 @@
> +
> +namespace mozilla {
> +namespace profiler {
> +
> +class ProfilerBridge final : public ReplaceMallocBridge

You don't need this.

@@ +221,5 @@
> +void
> +replace_init(malloc_table_t * aMallocTable, ReplaceMallocBridge** aBridge)
> +{
> +  if (mozilla::profiler::Init(aMallocTable)) {
> +    // Can't use MALLOC_FUNCS_ARENA_ALLOC or it won't use the arena funcs

You changed that in bug 1480430, so this comment doesn't apply.
Attachment #8998975 - Flags: review?(mh+mozilla)
> >  set_config('MOZ_MEMORY', jemalloc)
> >  set_define('MOZ_MEMORY', jemalloc)
> > +set_config('MOZ_PROFILER_MEMORY', jemalloc)
> > +set_define('MOZ_PROFILER_MEMORY', jemalloc)
> 
> I'm not sure what this buys you over just using the `MOZ_MEMORY` define
> directly. Is there a specific reason you need a different define?

MOZ_MEMORY tells us if we're using jemalloc.  MOZ_PROFILER_MEMORY tells if we're going to hook jemalloc when the profiler is enabled.  (perhaps overkill, but useful until we have the ability to turn it on and off in the extension's config UI)
Flags: needinfo?(ted)
Attachment #9008994 - Flags: review?(mstange)
Attachment #8998421 - Attachment is obsolete: true
Attachment #8991440 - Attachment is obsolete: true
Attachment #8991441 - Attachment is obsolete: true
Attachment #8998975 - Attachment is obsolete: true
Attachment #8998957 - Attachment is obsolete: true
Attachment #8998932 - Attachment is obsolete: true
(In reply to Randell Jesup [:jesup] from comment #85)
> > >  set_config('MOZ_MEMORY', jemalloc)
> > >  set_define('MOZ_MEMORY', jemalloc)
> > > +set_config('MOZ_PROFILER_MEMORY', jemalloc)
> > > +set_define('MOZ_PROFILER_MEMORY', jemalloc)
> > 
> > I'm not sure what this buys you over just using the `MOZ_MEMORY` define
> > directly. Is there a specific reason you need a different define?
> 
> MOZ_MEMORY tells us if we're using jemalloc.  MOZ_PROFILER_MEMORY tells if
> we're going to hook jemalloc when the profiler is enabled.  (perhaps
> overkill, but useful until we have the ability to turn it on and off in the
> extension's config UI)

Except it doesn't, since it has the exact same value as MOZ_MEMORY. What tells if we're going to hook jemalloc with your patch is the combination of MOZ_PROFILER_MEMORY and MOZ_REPLACE_MALLOC. You /could/ make MOZ_PROFILER_MEMORY actually mean the combination of both, but that seems overkill.
Attachment #9008998 - Flags: review?(mh+mozilla) → review+
Comment on attachment 8998952 [details] [diff] [review]
configure changes to ensure jemalloc is used if the profiler is on

Review of attachment 8998952 [details] [diff] [review]:
-----------------------------------------------------------------

See comment 92.
Attachment #8998952 - Flags: review?(ted)
> > MOZ_MEMORY tells us if we're using jemalloc.  MOZ_PROFILER_MEMORY tells if
> > we're going to hook jemalloc when the profiler is enabled.  (perhaps
> > overkill, but useful until we have the ability to turn it on and off in the
> > extension's config UI)
> 
> Except it doesn't, since it has the exact same value as MOZ_MEMORY. What
> tells if we're going to hook jemalloc with your patch is the combination of
> MOZ_PROFILER_MEMORY and MOZ_REPLACE_MALLOC. You /could/ make
> MOZ_PROFILER_MEMORY actually mean the combination of both, but that seems
> overkill.

You're correct, I haven't added a enable/disable-profiler-memory-hook (or whatever); but wanted to set it up to allow that.
made it a separate option.  Tested builds with --disable-jemalloc and with --disable-memory-profiling (separately)
Attachment #9009128 - Flags: review?(ted)
Attachment #8998952 - Attachment is obsolete: true
Comment on attachment 9008994 [details] [diff] [review]
P1: Add per-process profiler counters

Review of attachment 9008994 [details] [diff] [review]:
-----------------------------------------------------------------

::: tools/profiler/core/platform.cpp
@@ +298,5 @@
> +  PS_GET(const nsTArray<BaseProfilerCount*>&, Counters)
> +
> +  static void AppendCounter(PSLockRef, BaseProfilerCount* aCounter)
> +  {
> +    // we don't own the counter; they may be stored in static pointers

"static objects" maybe? The pointer isn't what's static

@@ +3321,5 @@
> +void
> +profiler_add_sampled_counter(BaseProfilerCount* aCounter)
> +{
> +  DEBUG_LOG("profiler_add_sampled_counter(%s)", aCounter->mLabel);
> +  PSAutoLock lock(gPSMutex);

We'll need to watch out for this lock to show up in profiles as we add more counters.

::: tools/profiler/public/ProfilerCounts.h
@@ +39,5 @@
> +// are global to the process.  In the future, there might be more versions with per-thread
> +// or other discriminators.
> +//
> +// Typical usage:
> +// Note: the macros use statics, and will be slightly faster/smaller, and

This sentence is comparing something that hasn't been introduced yet. Let's preface it with an introductory sentence like "There are two ways to use counters: With manual calls to the counter APIs, or using macros."

@@ +102,5 @@
> +    Add(1);
> +    return *this;
> +  }
> +
> +  void Add(int64_t aNumber)

This Add method is still present on the base class. Do we need it / does anything call it?
Attachment #9008994 - Flags: review?(mstange) → review+
Comment on attachment 9008995 [details] [diff] [review]
P3: add process info from threads to top level of the JSON for each process

Review of attachment 9008995 [details] [diff] [review]:
-----------------------------------------------------------------

I already reviewed this in comment 57 and the patch is still the same.
Attachment #9008995 - Flags: review?(mstange)
Comment on attachment 9008994 [details] [diff] [review]
P1: Add per-process profiler counters

Review of attachment 9008994 [details] [diff] [review]:
-----------------------------------------------------------------

::: tools/profiler/public/ProfilerCounts.h
@@ +233,5 @@
> +                                       category,     \
> +                                       description);
> +
> +// If we didn't care about static initializers, we could avoid the need for
> +// a ptr to the BaseProfilerCount object.

You still haven't addressed my question about these macros.
Attachment #9008994 - Flags: review+
Comment on attachment 9008997 [details] [diff] [review]
Dump profiler counters to the ProfileBuffer and JSON

Review of attachment 9008997 [details] [diff] [review]:
-----------------------------------------------------------------

::: tools/profiler/core/ProfileBufferEntry.cpp
@@ +1203,5 @@
> +
> +typedef nsDataHashtable<nsUint64HashKey, CounterKeyedSamples> CounterMap;
> +
> +void
> +ProfileBuffer::StreamCountersToJSON(SpliceableJSONWriter& aWriter,

None of my previous comments about this method's contents have been addressed. (There were four of them.)

::: tools/profiler/core/ProfileBufferEntry.h
@@ +465,5 @@
> +//     /* more counters */
> +//   ],
> +//   "memory":
> +//   {
> +//     "initial_heap": 12345678,

You didn't answer my question about this field.
Attachment #9008997 - Flags: review?(mstange)
Attachment #9009000 - Flags: review?(mstange) → review+
Attachment #9008999 - Flags: review?(mstange) → review+
(In reply to Markus Stange [:mstange] from comment #96)
> Comment on attachment 9008994 [details] [diff] [review]
> P1: Add per-process profiler counters
> 
> Review of attachment 9008994 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: tools/profiler/core/platform.cpp
> @@ +298,5 @@
> > +  PS_GET(const nsTArray<BaseProfilerCount*>&, Counters)
> > +
> > +  static void AppendCounter(PSLockRef, BaseProfilerCount* aCounter)
> > +  {
> > +    // we don't own the counter; they may be stored in static pointers
> 
> "static objects" maybe? The pointer isn't what's static

Sure.

> 
> @@ +3321,5 @@
> > +void
> > +profiler_add_sampled_counter(BaseProfilerCount* aCounter)
> > +{
> > +  DEBUG_LOG("profiler_add_sampled_counter(%s)", aCounter->mLabel);
> > +  PSAutoLock lock(gPSMutex);
> 
> We'll need to watch out for this lock to show up in profiles as we add more
> counters.

Sure - though we only grab it when adding/removing them.  keyed (by thread/etc) counters will have more overhead since I think I'll need some more locking there unless I get tricky


> > +// Typical usage:
> > +// Note: the macros use statics, and will be slightly faster/smaller, and
> 
> This sentence is comparing something that hasn't been introduced yet. Let's
> preface it with an introductory sentence like "There are two ways to use
> counters: With manual calls to the counter APIs, or using macros."

Cool, will do

> 
> @@ +102,5 @@
> > +    Add(1);
> > +    return *this;
> > +  }
> > +
> > +  void Add(int64_t aNumber)
> 
> This Add method is still present on the base class. Do we need it / does
> anything call it?

Well -- perhaps not.  But the linker should dead-code remove it if nothing can access it.  But yeah, I can remove it.  Originally I didn't have any subclasses, I added them partway through writing and extending this.
(In reply to Markus Stange [:mstange] from comment #99)
> Comment on attachment 9008997 [details] [diff] [review]
> Dump profiler counters to the ProfileBuffer and JSON
> 
> Review of attachment 9008997 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: tools/profiler/core/ProfileBufferEntry.cpp
> @@ +1203,5 @@
> > +
> > +typedef nsDataHashtable<nsUint64HashKey, CounterKeyedSamples> CounterMap;
> > +
> > +void
> > +ProfileBuffer::StreamCountersToJSON(SpliceableJSONWriter& aWriter,
> 
> None of my previous comments about this method's contents have been
> addressed. (There were four of them.)

Aha.  Skipped that somehow (or fubarred a merge as I flipped around through my stack of patches updating them).

> 
> ::: tools/profiler/core/ProfileBufferEntry.h
> @@ +465,5 @@
> > +//     /* more counters */
> > +//   ],
> > +//   "memory":
> > +//   {
> > +//     "initial_heap": 12345678,
> 
> You didn't answer my question about this field.

Right, I meant to reply here on that - I absolutely want this; I have a follow-on to ping jemalloc_stats to get an initial value, but I need to modify the buffer code to deal with that value going out-of-scope in the circular buffer, so I punted in filling it in initially.
Attachment #9009128 - Flags: review?(ted) → review+
Flags: needinfo?(ted)
Comment on attachment 9009128 [details] [diff] [review]
configure changes to ensure jemalloc is used if the profiler is on

Review of attachment 9009128 [details] [diff] [review]:
-----------------------------------------------------------------

::: build/moz.configure/memory.configure
@@ +12,5 @@
>  
>  option('--enable-jemalloc', env='MOZ_MEMORY',
>         help='Replace memory allocator with jemalloc')
>  
> +option('--enable-memory-profiling', env='MOZ_PROFILER_MEMORY',

Do we really want an option for that? If we do (and I don't think so), this should go to toolkit/moz.configure because it enables gecko code.
Attachment #9009128 - Flags: review+
Attachment #9008997 - Attachment is obsolete: true
> > +// XXX do this without the if() and without the possible race to set (leak)
> > +// Best would be something that didn't instantiate until profiling was turned on.
> > +// We may want to keep counts, however, before it's turned on - if not, we could
> > +// put the atomics in the BaseProfilerCount object.  If we need both forms (or
> 
> I don't understand the problem here, and the comments are rather confusing.
> Are the macros below used by anything?

Nothing currently uses them; they're consistent with the macros used for other AUTO_PROFILER_* macros, and may well be used by performance-critical counters since they're slightly lower overhead than the subclasses (I believe).  Also, since they're rooted at the file level as statics, there's no risk of their being destroyed while still in use.  Earlier versions of memory_hooks.cpp used the macros, but I decided to add subclasses and use them there (and might go back to the macros if it shows up in profiles).  If you prefer I could remove them.
Attachment #9009331 - Flags: review?(mstange) → review+
Examining the JSON output, it's too large - clean up JSON, move to deltas for counters (easier to read anyways), fix RSS/USS reporting (type mismatch, and missing [] for entries)
Attachment #9013000 - Flags: review?(mstange)
restore unused rss/uss schema fields until perf-htmli.io is updated
Attachment #9013005 - Flags: review?(mstange)
Attachment #9013000 - Attachment is obsolete: true
Attachment #9013000 - Flags: review?(mstange)
Comment on attachment 9013005 [details] [diff] [review]
Clean up ProfileBuffer counter JSON output and move to delta; fix RSS memory reporting

Review of attachment 9013005 [details] [diff] [review]:
-----------------------------------------------------------------

::: tools/profiler/core/ProfileBufferEntry.cpp
@@ +219,5 @@
>      mJSONWriter.DoubleElement(aValue);
>    }
>  
>    void StringElement(uint32_t aIndex, const char* aValue) {
> +    MOZ_ASSERT(mStrings);

Let's make these two asserts MOZ_RELEASE_ASSERTS. I don't think the null check should have noticeable overhead over actually writing the strings.

@@ +766,5 @@
>      STACK = 0,
>      TIME = 1,
>      RESPONSIVENESS = 2,
>      RSS = 3,
> +    USS = 4,

no need for the additional comma

@@ +1338,5 @@
>        }
>  
>        aWriter.StartArrayProperty("data");
> +      uint64_t last_number = 0;
> +      int64_t last_count = 0;

Please call these previousNumber and previousCount. (I'm channeling njn here, who had a similar request in bug 1345032 comment 14.)

@@ +1343,2 @@
>        for (size_t i = 0; i < size; i++) {
> +        // Encode as deltas, and only encode if different than the last sample

last -> previous here too

@@ +1386,5 @@
>    }
>  
>    aWriter.StartArrayProperty("data");
> +  int64_t last_rss = 0;
> +  int64_t last_uss = 0;

previousRss / previousUss
Attachment #9013005 - Flags: review?(mstange) → review+
(In reply to Randell Jesup [:jesup] from comment #101)
> > ::: tools/profiler/core/ProfileBufferEntry.h
> > @@ +465,5 @@
> > > +//     /* more counters */
> > > +//   ],
> > > +//   "memory":
> > > +//   {
> > > +//     "initial_heap": 12345678,
> > 
> > You didn't answer my question about this field.
> 
> Right, I meant to reply here on that - I absolutely want this;

What's its purpose? Won't your first sample have the heap size at the beginning of the profile? Or what's the meaning of "initial"?
Attached patch p1 interdiff (obsolete) — Splinter Review
merged patch to follow for final r+
Attachment #9013300 - Flags: feedback?(mstange)
Attachment #9008994 - Attachment is obsolete: true
> >    void StringElement(uint32_t aIndex, const char* aValue) {
> > +    MOZ_ASSERT(mStrings);
> 
> Let's make these two asserts MOZ_RELEASE_ASSERTS. I don't think the null
> check should have noticeable overhead over actually writing the strings.

Sure - though null-check MOZ_RELEASE_ASSERTs don't add much as a rule, as a nullptr here will immediately safely crash with a clear (and safe) signature.  OTOH this won't affect anything important; just a minor code bloat.

> @@ +766,5 @@
> >      STACK = 0,
> >      TIME = 1,
> >      RESPONSIVENESS = 2,
> >      RSS = 3,
> > +    USS = 4,
> 
> no need for the additional comma

Habit - avoids blame-spam and "oops I forgot to add the comma".  Of course, I didn't mean to change this line anyways

> 
> @@ +1338,5 @@
> >        }
> >  
> >        aWriter.StartArrayProperty("data");
> > +      uint64_t last_number = 0;
> > +      int64_t last_count = 0;
> 
> Please call these previousNumber and previousCount. (I'm channeling njn
> here, who had a similar request in bug 1345032 comment 14.)

Sure, reasonable.
> > > > +//   "memory":
> > > > +//   {
> > > > +//     "initial_heap": 12345678,
> > > 
> > > You didn't answer my question about this field.
> > 
> > Right, I meant to reply here on that - I absolutely want this;
> 
> What's its purpose? Won't your first sample have the heap size at the
> beginning of the profile? Or what's the meaning of "initial"?

No, we don't know it: we don't know the amount of memory used at the start of the profile. The plan is to get this from jemalloc_stats() when the profiler is turned on (it's expensive) -- the problem is that as soon as the buffer starts wrapping, you'd need to update that in some way (cheapest would be to look at the "retired" entries in the buffer, and add any memory counters to the saved initial heapsize).  Since the primary interest for debugging is a graph showing changes in memory use during the profile, I've left that for later.
Attachment #9013300 - Flags: feedback?(mstange) → feedback+
Comment on attachment 9013301 [details] [diff] [review]
P1: Add per-process profiler counters

Review of attachment 9013301 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks!
Attachment #9013301 - Flags: review?(mstange) → review+
Attachment #9013300 - Attachment is obsolete: true
> > +option('--enable-memory-profiling', env='MOZ_PROFILER_MEMORY',
> 
> Do we really want an option for that? If we do (and I don't think so), this
> should go to toolkit/moz.configure because it enables gecko code.

We can drop this; we can always add it if need be.  Or make it controlled by a flag to the profiler.
Attachment #9009128 - Attachment is obsolete: true
Attachment #9009330 - Attachment is obsolete: true
Attachment #9014203 - Flags: review?(ted) → review+
I noticed a odd behavior on the Main Process (only!), where samples would be repeated in a pattern at a fraction of an ms apart.  I finally tracked it down the DuplicateLastSample(), which by default duplicates all entries in the buffer in the range.  I added a warning that adding Kinds to the buffer entries should update this, though we should consider something less error-prone.  I wasn't sure if responsiveness should be duplicated; I'm sure RSS/USS shouldn't be.  One thing that needs further investigation is that this problem only appeared to happen in the main process - we should check that sleeping-thread-duplication isn't failing on Content processes
Attachment #9015038 - Flags: review?(mstange)
Comment on attachment 9015038 [details] [diff] [review]
Don't allow DuplicateLastSample to duplicate counters or memory use values

Review of attachment 9015038 [details] [diff] [review]:
-----------------------------------------------------------------

::: tools/profiler/core/ProfileBufferEntry.cpp
@@ +1338,5 @@
>        }
>  
>        aWriter.StartArrayProperty("data");
>        uint64_t previous_number = 0;
>        int64_t previous_count = 0;

No underscores please. This is coming from a previous patch so feel free to fix it in that patch.

@@ +1343,5 @@
>        for (size_t i = 0; i < size; i++) {
>          // Encode as deltas, and only encode if different than the last sample
>          if (i == 0 || samples[i].mNumber != previous_number || samples[i].mCount != previous_count) {
> +          MOZ_ASSERT(i == 0 ||
> +                     samples[i].mTime >= samples[i-1].mTime);

spaces around the minus, please

@@ +1533,5 @@
> +      case ProfileBufferEntry::Kind::CounterId:
> +      case ProfileBufferEntry::Kind::CounterKey:
> +      case ProfileBufferEntry::Kind::Number:
> +      case ProfileBufferEntry::Kind::Count:
> +      //case ProfileBufferEntry::Kind::Responsiveness: ?

I think you're right, Responsiveness should also be skipped and not duplicated.

::: tools/profiler/public/ProfilerCounts.h
@@ +98,5 @@
>  
>      aCounter = *mCounter;
>      aNumber = mNumber ? *mNumber : 0;
> +#ifdef DEBUG
> +    MOZ_ASSERT(!mNumber|| aNumber >= mPrevNumber);

The "!mNumber|| " part seems unnecessary - if the number isn't recorded, then mPrevNumber and aNumber will always be zero, and "aNumber >= mPrevNumber" will be true.

@@ +126,5 @@
>    ProfilerAtomicUnsigned* mNumber; // may be null
>  
>  #ifdef DEBUG
>    uint32_t mCanary;
> +  uint64_t mPrevNumber;

Please add a comment that describes what value is stored in this member.
Attachment #9015038 - Flags: review?(mstange) → review+
Comment on attachment 9015038 [details] [diff] [review]
Don't allow DuplicateLastSample to duplicate counters or memory use values

Review of attachment 9015038 [details] [diff] [review]:
-----------------------------------------------------------------

Actually I'm going to remove the r+ because I want to see the updated patch, to make sure my comments have been addressed.
Attachment #9015038 - Flags: review+
Attachment #9015038 - Attachment is obsolete: true
Attachment #9015097 - Flags: review?(mstange) → review+
In final testing (and with a separate patch that adds a memory contention counter), I found it was generating a sequence instead of an array for multiple counters.  Simple fix.
Attachment #9015399 - Flags: review?(mstange)
Comment on attachment 9015399 [details] [diff] [review]
Counters are an array

Review of attachment 9015399 [details] [diff] [review]:
-----------------------------------------------------------------

oops
Attachment #9015399 - Flags: review?(mstange) → review+
Pushed by rjesup@wgate.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/1ea62adf0902
Add per-process profiler counters r=mstange
https://hg.mozilla.org/integration/mozilla-inbound/rev/eed77f6052d9
Dump profiler counters to the ProfileBuffer and JSON r=mstange
https://hg.mozilla.org/integration/mozilla-inbound/rev/6110df22039b
Add memory replacer with counters to the Gecko profiler r=glandium
https://hg.mozilla.org/integration/mozilla-inbound/rev/e6e614fca202
configure changes to ensure jemalloc is used if the profiler is on r=ted
https://hg.mozilla.org/integration/mozilla-inbound/rev/906bd0577c31
Install/remove memory replacement with counter on profiler start/stop r=mstange
Depends on: 1500015
Depends on: 1520587
You need to log in before you can comment on or make changes to this bug.