Closed Bug 1366294 Opened 7 years ago Closed 7 years ago

Refactor TelemetryHistogram internal storage

Categories

(Toolkit :: Telemetry, enhancement, P1)

enhancement
Points:
2

Tracking

()

RESOLVED FIXED
mozilla56
Performance Impact low
Tracking Status
firefox56 --- fixed

People

(Reporter: gfritzsche, Assigned: chutten)

References

(Blocks 1 open bug)

Details

Attachments

(13 files, 23 obsolete files)

59 bytes, text/x-review-board-request
Dexter
: review+
Details
59 bytes, text/x-review-board-request
gfritzsche
: review+
Details
59 bytes, text/x-review-board-request
chutten
: review+
Details
59 bytes, text/x-review-board-request
chutten
: review+
Details
59 bytes, text/x-review-board-request
chutten
: review+
Details
59 bytes, text/x-review-board-request
gfritzsche
: review+
Details
59 bytes, text/x-review-board-request
gfritzsche
: review+
Details
59 bytes, text/x-review-board-request
gfritzsche
: review+
Details
59 bytes, text/x-review-board-request
gfritzsche
: review+
Details
59 bytes, text/x-review-board-request
gfritzsche
: review+
Details
59 bytes, text/x-review-board-request
gfritzsche
: review+
Details
59 bytes, text/x-review-board-request
gfritzsche
: review+
Details
59 bytes, text/x-review-board-request
gfritzsche
: review+
Details
We're currently sinking enough time into dealing with the convoluted TelemetryHistogram storage/lookup paths (bug 1350765, bug 1361661) to see if we can't quickly clean them up.

Much of the complexity comes from inconsistently using 3 lookups in different places:
(1) HistogramID lookups, which is our Telemetry enum identifier for a HistogramInfo
(2) Histogram names lookups, which directly map to a HistogramID
(3) name-based lookups for base::StatisticsRecorder, where the names encode HistogramID, subsession, process & key

(2) & (3) have odd and not clearly separated code paths.

As we own our histogram.h/.cc fork, we should be able to:
- kill `StatisticsRecorder`
- kill the use of names on `Histogram`
- cleanly refactor TelemetryHistogram to use mainly these items for storage/lookup:
  - a O(1) table for (HistogramID, ProcessID, isSubsession) -> Histogram*
  - (potentially a separate one for keyed histograms)
  - a hashtable of histogramName -> HistogramID
Still lots of TODOs left, but this compiles. Chris, do you want to take a first high-level look? The most interesting part is the storage handling (globals + access functions) at the start of TelemetryHistogram.cpp.
Attachment #8870397 - Flags: feedback?(chutten)
Assignee: nobody → gfritzsche
Status: NEW → ASSIGNED
Forgot to update the patch with the last changes.
Attachment #8870413 - Flags: feedback?(chutten)
Attachment #8870397 - Attachment is obsolete: true
Attachment #8870397 - Flags: feedback?(chutten)
Comment on attachment 8870413 [details] [diff] [review]
(WIP) Refactor TelemetryHistogram internal storage

Review of attachment 8870413 [details] [diff] [review]:
-----------------------------------------------------------------

Wow there's a lot commented out.

Overall the 3-dimensional array for histograms and immediately string -> id converting looks like it ought to do the trick.

The approach seems good, but we'll see how the details shake out.

::: ipc/chromium/src/base/histogram.cc
@@ -608,5 @@
> -    output->append("-");
> -  output->append("O");
> -  while (0 < x_remainder--)
> -    output->append(" ");
> -}

I think I'd prefer these "just remove some unused junk" patch pieces to be in a separate patch.

@@ +853,4 @@
>  }
>  
>  // static
>  void StatisticsRecorder::GetHistograms(Histograms* output) {

Why are you keeping these? Are we unable to remove the type fully?

::: toolkit/components/telemetry/TelemetryHistogram.cpp
@@ +169,5 @@
> +                                    JS::Handle<JSObject*> obj);
> +
> +  const HistogramID mId;
> +  const HistogramInfo& mHistogramInfo;
> +  // TODO: Why does this have separate locking?

Maybe we didn't ensure that we serialized calls to SetRecordingEnabled/IsRecordingEnabled. But now we do.

@@ +207,2 @@
>  
> +// TODO: what do we need this for?

To short-circuit internal_IdentifyCorruptHistograms

On Beta 54 we've seen about 2650 TOTAL_COUNT_HIGH_ERRORS, 72 TOTAL_COUNT_LOW_ERRORS, 20 BUCKET_ORDER_ERRORS, 588 RANGE_CHECKSUM_ERRORS

...so we do see them, from time to time. Not sure if histogram inconsistency checking is slow enough we need this.

@@ +268,5 @@
> +
> +  const HistogramInfo& info = gHistogramsInfo[histogramId];
> +  internal_CreateHistogramInstance(info, &h);
> +  MOZ_ASSERT(h);
> +  return h;

...shouldn't you be setting gHistogramStorage[histogramId][...][...] = h here?

@@ +286,5 @@
> +  }
> +
> +  const HistogramInfo& info = gHistogramsInfo[histogramId];
> +  kh = new KeyedHistogram(histogramId, info);
> +

Ditto here. Need to fill the cache somewhere.

@@ +349,2 @@
>  // Note: this is completely unrelated to mozilla::IsEmpty.
> +// TODO: unused?

I'm just gonna skip most of these "unused?" comments. They're mostly used in other functions that are commented out TODO

@@ +372,5 @@
>                                     uint32_t *aCount, char*** aHistograms)
>  {
>    nsTArray<char*> collection;
>  
> +  for (const auto & h : gHistogramsInfo) {

Only just noticed this now. Maybe should be gHistogramInfos.

@@ +552,5 @@
>  
>  nsresult
> +internal_HistogramAdd(Histogram& histogram,
> +                      const HistogramInfo& info,
> +                      ProcessID processId,

ProcessID and SessionType are unused

@@ +608,5 @@
>      bool corrupt = (check != Histogram::NO_INCONSISTENCIES);
>  
>      if (corrupt) {
> +      mozilla::Telemetry::HistogramID corruptID = HistogramCount;
> +      // TODO: Should we rename these to match the TELEMETRY_* pattern?

Yes.

Also: is there a benefit to these being exponential? We only ever accumulate 1 to them. I wonder if these would be better off as counts, or as a keyed scalar or something.

@@ +715,1 @@
>    return !gCorruptHistograms[id];

Will need [process][sessiontype], I guess?

@@ +755,2 @@
>    Histogram* h;
> +  nsresult rv = internal_CreateHistogramInstance(mHistogramInfo, &h);

Such a better name

@@ +928,5 @@
>  void
>  internal_SetHistogramRecordingEnabled(mozilla::Telemetry::HistogramID aID, bool aEnabled)
>  {
> +  if (gHistogramsInfo[aID].keyed) {
> +    KeyedHistogram* keyed = internal_GetKeyedHistogramById(aID, ProcessID::Parent);

It'll have to be for whichever process corresponds to XRE_GetProcessType

@@ +950,5 @@
>    MOZ_ASSERT(false, "Telemetry::SetHistogramRecordingEnabled(...) id not found");
>  }
>  
>  bool
> +internal_IsRecordingEnabledForHistogramID(HistogramID id)

Can probably shorten to just internal_IsRecordingEnabled(id)

@@ +1014,4 @@
>  }
>  
>  void
> +internal_Accumulate(mozilla::Telemetry::HistogramID aId,

We have that lovely `using` directive way up top but never avail ourselves of it, do we...

@@ +1079,5 @@
>    }
> +
> +#if !defined(MOZ_WIDGET_ANDROID)
> +  if (Histogram* h = internal_GetHistogramById(aId, aProcessType, SessionType::Subsession)) {
> +    internal_HistogramAdd(*h, gHistogramsInfo[aId], aProcessType, SessionType::Subsession, aSample);

Rather than have these #if !defined(MOZ_WIDGET_ANDROID) blocks sprinkled throughout, could we have internal_HistogramAdd(*h, gHistogramsInfo[aId], aProcessType, SessionType::All, aSample); ?

@@ +1510,5 @@
>  
>    JS::CallArgs args = JS::CallArgsFromVp(argc, vp);
>    return NS_SUCCEEDED(keyed->GetJSKeys(cx, args));
> +
> +  return true;

Nope. returns NS_SUCCEEDED in the line above.

@@ +1648,5 @@
>  // calls" from "calls to another function in this interface" will lead
>  // to deadlocking and/or races.  See comments at the top of the file
>  // for further (important!) details.
>  
>  // Create and destroy the singleton StatisticsRecorder object.

I guess we can omit these entirely.

@@ +1687,5 @@
>    // their values in Histograms.json.
>    // We add static asserts here for those values to match so that future changes
>    // don't go unnoticed.
> +  // TODO: Compare explicitly with gHistogramsInfo[<histogram id>].bucketCount here
> +  // once we can make gHistogramsInfo constexpr (requires VS2015).

I think we can file this bug now.

@@ +1907,3 @@
>    {
>      StaticMutexAutoLock locker(gTelemetryHistogramMutex);
> +    h = internal_GetHistogramByName(name, ProcessID::Parent, SessionType::Session);

Again, XRE_GetProcessType... and throughout the next few I guess.
Attachment #8870413 - Flags: feedback?(chutten) → feedback+
(In reply to Chris H-C :chutten from comment #3)
> Wow there's a lot commented out.
> 
> Overall the 3-dimensional array for histograms and immediately string -> id
> converting looks like it ought to do the trick.
> 
> The approach seems good, but we'll see how the details shake out.

Mostly its the JS functions and the serializations commented out.
They are an immense refactoring time sink for a first hack, but don't do anything unusual storage-wise that i'm worried about.

> ::: ipc/chromium/src/base/histogram.cc
> @@ -608,5 @@
> > -    output->append("-");
> > -  output->append("O");
> > -  while (0 < x_remainder--)
> > -    output->append(" ");
> > -}
> 
> I think I'd prefer these "just remove some unused junk" patch pieces to be
> in a separate patch.

I'll look into breaking this up a bit later. For a WIP it's just slowing down things too much.

Cheers for filling some of the blanks on the rest, this helps.
Blocks: 1369171
The Chromium IPC histogram code used the StatisticsRecorder object for storage.
This is keyed by histogram name, which doesn't match our storage reality anymore.
Instead we use a name to refer to a set of histogram instances that record data from different processes, as well as separating session and subsession data.
Consequently we need to rewrite this storage, which means StatisticsRecorder is not used anymore.
Attachment #8873912 - Flags: review?(chutten)
Comment on attachment 8873912 [details] [diff] [review]
Part 1 - Remove base::StatisticsRecorder

Review of attachment 8873912 [details] [diff] [review]:
-----------------------------------------------------------------

I was never fond of the StatisticsRecorder as it was this odd piece of internal chromium code that we had to have super-special init/de-init code for with some of the highest comment:code ratios in Telemetry.

This patch won't even compile, as the uses of StatisticsRecorder remain in place throughout histogram.cc Is this intentional?
- A histogram name identifies a set of histogram instances, for which storage and lookup will be handled in TelemetryHistogram.cpp.
  So we remove the names from histogram code.
- Various unused macros in the header are removed.
- Remaining traces of StatisticsRecorder are removed from the Histogram class code.
- Some unused methods are dropped that were about printing histograms to ASCII etc.
Attachment #8873925 - Flags: review?(chutten)
I had most tests passing this, but some smaller change now makes test_TelemetryHistograms.js hang, ouch.
Attachment #8873925 - Attachment is obsolete: true
Attachment #8873925 - Flags: review?(chutten)
- A histogram name identifies a set of histogram instances, for which storage and lookup will be handled in TelemetryHistogram.cpp.
  So we remove the names from histogram code.
- Various unused macros in the header are removed.
- Remaining traces of StatisticsRecorder are removed from the Histogram class code.
- Some unused methods are dropped that were about printing histograms to ASCII etc.
Attachment #8873927 - Flags: review?(chutten)
Attachment #8873926 - Attachment is obsolete: true
I had most tests passing this, but some smaller change now makes test_TelemetryHistograms.js hang, ouch.
Attachment #8870413 - Attachment is obsolete: true
- A histogram name identifies a set of histogram instances, for which storage and lookup will be handled in TelemetryHistogram.cpp.
  So we remove the names from histogram code.
- Various unused macros in the header are removed.
- Remaining traces of StatisticsRecorder are removed from the Histogram class code.
- Some unused methods are dropped that were about printing histograms to ASCII etc.
Attachment #8873929 - Flags: review?(chutten)
Attachment #8873927 - Attachment is obsolete: true
Attachment #8873927 - Flags: review?(chutten)
Previously we used the base::StatisticsRecorder object for storage by name.
This is keyed by histogram name, which doesn't match our storage reality anymore.
Instead we use a name to refer to a set of histogram instances that record data from different processes, as well as separating session and subsession data.

In this re-write, we instead introduce the following lookup paths (managed in TelemetryHistogram.cpp):
- Main storage:
  - (histogramId, processId, sessionOrSubsession) -> Histogram*
  - (histogramId, processId) -> KeyedHistogram* (this handles subsessions internally)
- Lookup:
  - (histogramName) -> histogramId

This is wrapped with a few lookup functions.

This also allows us to keep HistogramIDs in the JS histogram instances now, instead of pointers to Histogram instances.
That means Histogram instance life-time management is now properly contained inside TelemetryHistogram.cpp.
Attachment #8873930 - Flags: feedback?(chutten)
Attachment #8873928 - Attachment is obsolete: true
(In reply to Georg Fritzsche [:gfritzsche] from comment #13)
> Created attachment 8873930 [details] [diff] [review]
> Part 3 - (WIP) Refactor TelemetryHistogram storage

While i believe that the big changes are in place, the current main open points are:
- we need to change the snapshotting functions to return serialized data in the format: {"main": {...}, ...}
  (need process types in it and should match scalar & events approach)
- histogram corruption detection needs to be re-enabled (or we rip it and take it to a follow-up?)
- the memory reporter needs fixing
- the "can histogram record" functionality needs to be adjusted to the current process types
- child process support needs some testing/fixing
- the locking/threading comments probably need some updating
Comment on attachment 8873929 [details] [diff] [review]
Part 2 - Cleanup Chromium Histogram code

Review of attachment 8873929 [details] [diff] [review]:
-----------------------------------------------------------------

::: ipc/chromium/src/base/histogram.cc
@@ +187,5 @@
>      const int kCommonRaceBasedCountMismatch = 1;
>      if (delta > 0) {
> +      // TODO: Remove these in a separate patch, including their use in in TelemetryHistogram.cpp.
> +      // TelemetryHistogram.cpp says that we use other histograms for this.
> +      // UMA_HISTOGRAM_COUNTS("Histogram.InconsistentCountHigh", delta);

Specifically in IdentifyCorruptHistograms we count the number of COUNT_{HIGH,LOW}_ERROR that this function returns.

Since we don't have UMA, we don't send either of these so they can be safely removed.
Attachment #8873929 - Flags: review?(chutten)
Comment on attachment 8873912 [details] [diff] [review]
Part 1 - Remove base::StatisticsRecorder

Review of attachment 8873912 [details] [diff] [review]:
-----------------------------------------------------------------

I was never fond of the StatisticsRecorder as it was this odd piece of internal chromium code that we had to have super-special init/de-init code for with some of the highest comment:code ratios in Telemetry.

This patch won't even compile, as the uses of StatisticsRecorder remain in place throughout histogram.cc Is this intentional?
Attachment #8873912 - Flags: review?(chutten)
(In reply to Chris H-C :chutten from comment #16)
> This patch won't even compile, as the uses of StatisticsRecorder remain in
> place throughout histogram.cc Is this intentional?

Part 2 takes care of this, it became time-consuming to untangle the changes.
Alernatively i can just fold part 1 & 2 together.
Comment on attachment 8873930 [details] [diff] [review]
Part 3 - (WIP) Refactor TelemetryHistogram storage

Review of attachment 8873930 [details] [diff] [review]:
-----------------------------------------------------------------

I suppose the biggest question I have is: Why pass the ID instead of the Histogram through JS?

::: toolkit/components/telemetry/TelemetryHistogram.cpp
@@ +106,3 @@
>  namespace {
>  
> +typedef nsDataHashtable<nsCStringHashKey, HistogramID> StringToHistogramIdMap;

Do typedefs end in "Type" or not?

@@ +168,5 @@
> +                                    JS::Handle<JSObject*> obj);
> +
> +  const HistogramID mId;
> +  const HistogramInfo& mHistogramInfo;
> +  // TODO: Why does this have separate locking?

Replace with: TODO: since locking improved, can we remove this?

@@ +192,3 @@
>  bool gCanRecordExtended = false;
> + 
> +// TODO: Should we wrap the storage into a class for clearer access control?

Access control seems pretty clear to me with everything hidden back here in the cpp.

Me I'd be itching to remove it for encapsulation and comprehension's sake. But it's really straightforward in use, so... I dunno.

@@ +204,5 @@
> + 
> +// To simplify logic below we use a single histogram instance for all expired histograms.
> +Histogram* gExpiredHistogram = nullptr;
> + 
> +// TODO: what do we need this for?

For counting the types and counts of corrupt histograms (and refusing to reflect them, I think). As long as I've been here we've never taken any action based on these figures.

@@ +268,5 @@
> +    return h;
> +  }
> +
> +  const HistogramInfo& info = gHistogramInfos[histogramId];
> +  mozilla::Unused << internal_CreateHistogramInstance(info, &h);

If we're going to skip the nsresult check and assert that it succeeds here, we should ASSERT within internal_CreteHistogramInstance that the provided histograminfo is correct and stop returning nsresult. (Hell, we could return h)

@@ +566,5 @@
>  nsresult
> +internal_HistogramAdd(Histogram& histogram,
> +                      const HistogramInfo& info,
> +                      ProcessID processId,
> +                      SessionType sessionType,

You aren't using processId or sessionType here.

@@ +954,2 @@
>  {
> +  // TODO: Does this need to respect the current process type?

RecordingEnabled was per-process, so each process would init and set/unset and query independently.

This ended up being integral to my record_in_processes work.

You're right that creating all those Histogram objects just so they can provide a "IsRecordingEnabled()? No? Okay" value is a waste, but I'm not sure how much of one it is.

@@ +983,5 @@
> +    KeyedHistogram* keyed = internal_GetKeyedHistogramById(id, ProcessID::Parent, /* instantiate = */ false);
> +    return keyed && keyed->IsRecordingEnabled();
> +  }
> +
> +  Histogram *h = internal_GetHistogramById(id, ProcessID::Parent, SessionType::Session, /* instantiate = */ false);

Since you're only querying the session histogram, you need only set it on the session histogram above.

@@ +1027,5 @@
>    }
> +
> +  Histogram *h = internal_GetHistogramById(aId, ProcessID::Parent, SessionType::Session);
> +  MOZ_ASSERT(h);
> +  internal_HistogramAdd(*h, gHistogramInfos[aId], ProcessID::Parent, SessionType::Session, aSample);

It seems weird that we get the parent,session Histogram but then need to tell it again when we're adding that it is the parent,session Histogram

Makes half a case for a histogram wrapper that knows these things and only instantiates the inner Histogram when necessary... but that's one too many layers of indirection for the moment.

@@ +1375,5 @@
>    }
> +
> +  // TODO: delete in finalizer
> +  JSHistogramData* data = new JSHistogramData{id};
> +  JS_SetPrivate(obj, data);

How the id is better than the histogram instance itself

@@ +1484,5 @@
>    default:
>      MOZ_CRASH("unhandled reflection status");
>    }
> +
> +  return true;

Unreachable, no?

@@ -1826,5 @@
>  #endif
>  
> -  mozilla::PodArrayZero(gCorruptHistograms);
> -
> -  // Create registered keyed histograms

TelemetrySession expects there to be KeyedHistogram instances returned for all registered keyed histograms.

@@ +1973,5 @@
> +    if (NS_FAILED(rv)) {
> +      return NS_ERROR_FAILURE;
> +    }
> +
> +    // TODO: This might change behavior. Make this a follow-up bug?

internal_GetHistogramEnumId returned NS_ERROR_FAILURE if it was keyed, so this is valid.

@@ +1995,5 @@
> +    if (NS_FAILED(rv)) {
> +      return NS_ERROR_FAILURE;
> +    }
> +
> +    // TODO: This might change behavior. Make this a follow-up bug?

Pretty clear-cut that this is expected. non-keyed hgrams weren't in gKeyedHistograms, after all.

@@ +2038,5 @@
>      includeGPUProcess = gpm->AttemptedGPUProcess();
>    }
>  
> +#if !defined(MOZ_WIDGET_ANDROID)
> +  SessionType sessionType = SessionType(!!subsession);

subsession is already a bool. Why are you coercing it a second time?

@@ +2058,3 @@
>        }
> +
> +      mozilla::DebugOnly<Histogram*> h = nullptr;

Should this whole thing be #ifndef NDEBUG or something?

::: toolkit/components/telemetry/tests/unit/test_TelemetryHistograms.js
@@ +853,5 @@
>    h.clear(true);
>    snapshot = Telemetry.histogramSnapshots;
>    subsession = Telemetry.snapshotSubsessionHistograms();
> +  Assert.ok(COUNT in snapshot);
> +  Assert.ok(!(COUNT in subsession));

This is changed behaviour through your defaulting only FLAG histograms instead of also defaulting COUNT histograms. This should be called out.
Attachment #8873930 - Flags: feedback?(chutten)
(In reply to Chris H-C :chutten from comment #18)
> Comment on attachment 8873930 [details] [diff] [review]
> Part 3 - (WIP) Refactor TelemetryHistogram storage
> 
> Review of attachment 8873930 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I suppose the biggest question I have is: Why pass the ID instead of the
> Histogram through JS?

Two reasons:
(1) We should pass the HistogramID around so we can have both the C++ and the JS API use the same functions.
    Making Histogram instances aware of their HistogramID would be just a band-aid and becomes entangled again.
(2) Currently the life-time of Histogram instances is restricted by JS(Keyed)Histogram instances.
    This is effectively because we leak shared ownership of Histogram instances out to the JS world, with its different life-time rules.
    Because of (1), this solution seems much cleaner than ref-counting etc.
    This allows us to actually destroy Histogram instances now on subsession resets in this patch.

> @@ +192,3 @@
> >  bool gCanRecordExtended = false;
> > + 
> > +// TODO: Should we wrap the storage into a class for clearer access control?
> 
> Access control seems pretty clear to me with everything hidden back here in
> the cpp.
> 
> Me I'd be itching to remove it for encapsulation and comprehension's sake.
> But it's really straightforward in use, so... I dunno.

Yeah, i think its hard to reason about access without searching the whole file.
Whether a class is the "right" tool or not, this would make access rules very clear.

> @@ +204,5 @@
> > + 
> > +// To simplify logic below we use a single histogram instance for all expired histograms.
> > +Histogram* gExpiredHistogram = nullptr;
> > + 
> > +// TODO: what do we need this for?
> 
> For counting the types and counts of corrupt histograms (and refusing to
> reflect them, I think). As long as I've been here we've never taken any
> action based on these figures.

If we haven't done anything ever with them... how about we drop this for now and re-recreate it in a follow-up bug if needed.

> @@ +954,2 @@
> >  {
> > +  // TODO: Does this need to respect the current process type?
> 
> RecordingEnabled was per-process, so each process would init and set/unset
> and query independently.
> 
> This ended up being integral to my record_in_processes work.
> 
> You're right that creating all those Histogram objects just so they can
> provide a "IsRecordingEnabled()? No? Okay" value is a waste, but I'm not
> sure how much of one it is.

How about a simple lookup table of (HistogramID -> bool) ?
That would be an easy-to-reason-about solution that does exactly what we need.
See comment 17 & comment 19.
Before i wrap up the patches, do you have input/requests on any of these?
Flags: needinfo?(chutten)
(In reply to Chris H-C :chutten from comment #18)
> ::: toolkit/components/telemetry/TelemetryHistogram.cpp
> @@ +106,3 @@
> >  namespace {
> >  
> > +typedef nsDataHashtable<nsCStringHashKey, HistogramID> StringToHistogramIdMap;
> 
> Do typedefs end in "Type" or not?

Usually not, i think thats redundant mostly over the language rules (position/usage makes it clear).
We are currently not consistent about this.

> @@ +192,3 @@
> >  bool gCanRecordExtended = false;
> > + 
> > +// TODO: Should we wrap the storage into a class for clearer access control?
> 
> Access control seems pretty clear to me with everything hidden back here in
> the cpp.
> 
> Me I'd be itching to remove it for encapsulation and comprehension's sake.
> But it's really straightforward in use, so... I dunno.

We can move that to a follow-up.

> @@ +204,5 @@
> > + 
> > +// To simplify logic below we use a single histogram instance for all expired histograms.
> > +Histogram* gExpiredHistogram = nullptr;
> > + 
> > +// TODO: what do we need this for?
> 
> For counting the types and counts of corrupt histograms (and refusing to
> reflect them, I think). As long as I've been here we've never taken any
> action based on these figures.

I'm dropping this and filing a follow-up to consider re-adding this.
Without active monitoring or steps to act on corruption this is effectively useless.

> @@ +566,5 @@
> >  nsresult
> > +internal_HistogramAdd(Histogram& histogram,
> > +                      const HistogramInfo& info,
> > +                      ProcessID processId,
> > +                      SessionType sessionType,
> 
> You aren't using processId or sessionType here.
> 
> @@ +954,2 @@
> >  {
> > +  // TODO: Does this need to respect the current process type?
> 
> RecordingEnabled was per-process, so each process would init and set/unset
> and query independently.
> 
> This ended up being integral to my record_in_processes work.
> 
> You're right that creating all those Histogram objects just so they can
> provide a "IsRecordingEnabled()? No? Okay" value is a waste, but I'm not
> sure how much of one it is.

It seems much cleaner & readable to use a single (HistogramID -> bool) table to track this.
This kills some complexity here, so i'm doing this.

> @@ +1027,5 @@
> >    }
> > +
> > +  Histogram *h = internal_GetHistogramById(aId, ProcessID::Parent, SessionType::Session);
> > +  MOZ_ASSERT(h);
> > +  internal_HistogramAdd(*h, gHistogramInfos[aId], ProcessID::Parent, SessionType::Session, aSample);
> 
> It seems weird that we get the parent,session Histogram but then need to
> tell it again when we're adding that it is the parent,session Histogram
> 
> Makes half a case for a histogram wrapper that knows these things and only
> instantiates the inner Histogram when necessary... but that's one too many
> layers of indirection for the moment.

Lets take this to future bugs.

> @@ -1826,5 @@
> >  #endif
> >  
> > -  mozilla::PodArrayZero(gCorruptHistograms);
> > -
> > -  // Create registered keyed histograms
> 
> TelemetrySession expects there to be KeyedHistogram instances returned for
> all registered keyed histograms.

I don't think thats a requirement. We don't send out empty keyed histograms, so at the most this requires changes in TelemetrySession and its tests.

> @@ +2058,3 @@
> >        }
> > +
> > +      mozilla::DebugOnly<Histogram*> h = nullptr;
> 
> Should this whole thing be #ifndef NDEBUG or something?

The next line has side-effects: instantiating a histogram if it didn't exist yet.
Blocks: 1364043
The Chromium IPC histogram code used the StatisticsRecorder object for storage.
This is keyed by histogram name, which doesn't match our storage reality anymore.
Instead we use a name to refer to a set of histogram instances that record data from different processes, as well as separating session and subsession data.
Consequently we need to rewrite this storage, which means StatisticsRecorder is not used anymore.
Attachment #8874768 - Flags: review?(chutten)
Attachment #8873912 - Attachment is obsolete: true
- A histogram name identifies a set of histogram instances, for which storage and lookup will be handled in TelemetryHistogram.cpp.
  So we remove the names from histogram code.
- Various unused macros in the header are removed.
- Remaining traces of StatisticsRecorder are removed from the Histogram class code.
- Some unused methods are dropped that were about printing histograms to ASCII etc.
Attachment #8874769 - Flags: review?(chutten)
Attachment #8873929 - Attachment is obsolete: true
Previously we used the base::StatisticsRecorder object for storage by name.
This is keyed by histogram name, which doesn't match our storage reality anymore.
Instead we use a name to refer to a set of histogram instances that record data from different processes, as well as separating session and subsession data.

In this re-write, we instead introduce the following lookup paths (managed in TelemetryHistogram.cpp):
- Main storage:
  - (histogramId, processId, sessionOrSubsession) -> Histogram*
  - (histogramId, processId) -> KeyedHistogram* (this handles subsessions internally)
- Lookup:
  - (histogramName) -> histogramId

This is wrapped with a few lookup functions.

This also allows us to keep HistogramIDs in the JS histogram instances now, instead of pointers to Histogram instances.
That means Histogram instance life-time management is now properly contained inside TelemetryHistogram.cpp.
Attachment #8874770 - Flags: review?(chutten)
Attachment #8873930 - Attachment is obsolete: true
Next steps from here that i can think of:

Follow-up bugs on:
- tracking histogram corruption
- wrapping histogram storage in a class (probably depends on bug 1362953)

Separate patches on this bug for:
- updating code to the recent recording_enabled changes
- fixing TelemetrySession fallout
- fixing child process handling (C++ code & test_ChildHistogram.js)
- making memory reporting work again

Other actions:
- call out the potential change to count histograms for consumers
  (or figure out what the exact behavior was and reproduce that)
Blocks: 1370563
Blocks: 1370565
Comment on attachment 8874768 [details] [diff] [review]
Part 1 - Remove base::StatisticsRecorder

Review of attachment 8874768 [details] [diff] [review]:
-----------------------------------------------------------------

Doesn't look like this changed?
Attachment #8874768 - Flags: review?(chutten) → review+
Comment on attachment 8874769 [details] [diff] [review]
Part 2 - Cleanup Chromium Histogram code

Review of attachment 8874769 [details] [diff] [review]:
-----------------------------------------------------------------

Looks the same as last time, but also removes the UMA-sent corruption counters.
Attachment #8874769 - Flags: review?(chutten) → review+
Comment on attachment 8874770 [details] [diff] [review]
Part 3 - Refactor TelemetryHistogram storage

Review of attachment 8874770 [details] [diff] [review]:
-----------------------------------------------------------------

Does this fix the test hang?

::: toolkit/components/telemetry/TelemetryHistogram.cpp
@@ +186,2 @@
>  bool gCanRecordExtended = false;
> + 

There's some EOL whitespace throughout the file :(

@@ +201,5 @@
> +// This tracks whether recording is enabled for specific histograms.
> +// To utilize C++ initialization rules, we invert the meaning to "disabled".
> +bool gHistogramRecordingDisabled[HistogramCount] = {};
> + 
> +// This is for gHistogramInfoss, gHistogramStringTable

Typo: gHistogramInfoss

@@ +301,5 @@
> +}
> +
> +// Clear a histogram from storage.
> +void
> +internal_ClearHistogramById(HistogramID histogramId, ProcessID processId, SessionType sessionType)

Hrm. Apparently we can clear histograms that have their recording disabled. 

Shouldn't be a problem, but it is interesting..

@@ +347,2 @@
>  {
> +  gHistogramRecordingDisabled[aID] = !aEnabled;

no MOZ_ASSERT(internal_IsHistogramEnumId(id)); ?

@@ +605,2 @@
>  {
> +  // Only count & flag histograms are serialized when they are empty.

You reference "count" here but only deal with "flag" in the code.

@@ +1075,5 @@
>        return true;
>      }
>    }
>  
> +  // TODO: This should be locked.

Why isn't it?

@@ +1956,3 @@
>  #endif
> +
> +    // TODO: support multiple processes.

lol.
Attachment #8874770 - Flags: review?(chutten)
Flags: needinfo?(chutten)
(In reply to Chris H-C :chutten from comment #28)
> Comment on attachment 8874770 [details] [diff] [review]
> Part 3 - Refactor TelemetryHistogram storage
> 
> Review of attachment 8874770 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Does this fix the test hang?

Yes, that's why i updated it. It's an instance of bug 1341084, comment 4.

What are the outstanding issues to fix for getting this patch reviewed?
Flags: needinfo?(chutten)
FWIW, comment 25 should be clear about what patch 3 does not aim to address.
(In reply to Georg Fritzsche [:gfritzsche] from comment #29)
> (In reply to Chris H-C :chutten from comment #28)
> > Comment on attachment 8874770 [details] [diff] [review]
> > Part 3 - Refactor TelemetryHistogram storage
> > 
> > Review of attachment 8874770 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > Does this fix the test hang?
> 
> Yes, that's why i updated it. It's an instance of bug 1341084, comment 4.
> 
> What are the outstanding issues to fix for getting this patch reviewed?

Typo, lack of a MOZ_ASSERT, misleading comment, explaining why you removed a lock
Flags: needinfo?(chutten)
Adressed review feedback.
Attachment #8875759 - Flags: review?(chutten)
Attachment #8874770 - Attachment is obsolete: true
Attachment #8875759 - Flags: review?(chutten) → review+
Transitioning this to Chris for the remaining work of comment 25, so i can get to bug 1302681.
Assignee: gfritzsche → chutten
Blocks: 1373240
This is a blocker or dupe for some other QF bugs; this is assumed to have broader impact and will make other optimizations possible.
Whiteboard: [qf]
Whiteboard: [qf] → [qf:p3]
Attachment #8874768 - Attachment is obsolete: true
Attachment #8874769 - Attachment is obsolete: true
Attachment #8875759 - Attachment is obsolete: true
Comment on attachment 8880940 [details]
bug 1366294 - Part 13 - Fix telemetry-using tests.

https://reviewboard.mozilla.org/r/152304/#review157364
Attachment #8880940 - Flags: review?(chutten) → review+
Comment on attachment 8880941 [details]
Bug 1366294 - Part 2 - Cleanup Chromium Histogram code.

https://reviewboard.mozilla.org/r/152306/#review157366
Attachment #8880941 - Flags: review?(chutten) → review+
Comment on attachment 8880942 [details]
Bug 1366294 - Part 3 - Refactor TelemetryHistogram storage.

https://reviewboard.mozilla.org/r/152308/#review157368
Attachment #8880942 - Flags: review?(chutten) → review+
State as of press time: mostly there

There are some weird failures in test_TelemetrySession. Specifically, it complains that GC_REASON_2 isn't in the subsession snapshots, which by all rights it ought to be.

That being said, all of these patches are reviewable as posted. They shouldn't require much further fiddling or organization.

I'll pick up from here next week.
Comment on attachment 8880943 [details]
bug 1366294 - Part 4 - Small cleanups to previous patches.

https://reviewboard.mozilla.org/r/152310/#review157598

::: toolkit/components/telemetry/TelemetryHistogram.cpp:524
(Diff revision 1)
>    // Check if we are allowed to record the data.
>    bool canRecordDataset = CanRecordDataset(gHistogramInfos[id].dataset,
>                                             internal_CanRecordBase(),
>                                             internal_CanRecordExtended());
> -  if (!canRecordDataset || !internal_IsRecordingEnabled(id)) {
> +  if (!canRecordDataset ||
> +    (aProcessType == ProcessID::Parent && !internal_IsRecordingEnabled(id))) {

What is the check against `ProcessID::Parent` for?
Can we add a comment for this?

::: toolkit/components/telemetry/TelemetryHistogram.cpp:693
(Diff revision 1)
>  {
>    bool canRecordDataset = CanRecordDataset(mHistogramInfo.dataset,
>                                             internal_CanRecordBase(),
>                                             internal_CanRecordExtended());
> -  if (!canRecordDataset || !internal_IsRecordingEnabled(mId)) {
> +  if (!canRecordDataset ||
> +    (aProcessType == ProcessID::Parent && !internal_IsRecordingEnabled(mId))) {

Same here.
Attachment #8880943 - Flags: review?(gfritzsche)
Comment on attachment 8880944 [details]
bug 1366294 - Part 5 - Send appropriately-structured Telemetry data to JS

https://reviewboard.mozilla.org/r/152312/#review157600

Great to see this snapshot handling cleaned up!

::: toolkit/components/telemetry/TelemetryHistogram.cpp:1945
(Diff revision 1)
> +    if (!JS_DefineProperty(cx, root_obj, GetNameForProcessID(ProcessID(process)),
> +                           processObject, JSPROP_ENUMERATE)) {

Should we add properties for processes that we haven't recorded any data for?

::: toolkit/components/telemetry/TelemetrySession.jsm:929
(Diff revision 1)
> -    for (let name of registered) {
> -      for (let suffix of Object.values(HISTOGRAM_SUFFIXES)) {
> +    Object.entries(hls).forEach(([process, histograms]) => {
> +      ret[process] = {};

You can use: `for (let [key, value] of Object.entries(hls)) ...`

::: toolkit/components/telemetry/TelemetrySession.jsm:932
(Diff revision 1)
>      let ret = {};
>  
> -    for (let name of registered) {
> -      for (let suffix of Object.values(HISTOGRAM_SUFFIXES)) {
> -        if (name + suffix in hls) {
> -          if (!(suffix in ret)) {
> +    Object.entries(hls).forEach(([process, histograms]) => {
> +      ret[process] = {};
> +      Object.entries(histograms)
> +            .filter(([name, value]) => registered.includes(name))

Do we need this?
We don't record any addon histograms anymore, removed (all?) the internal histogram.cc additions and stopped registering odd histogram names for different internal storage uses.

::: toolkit/components/telemetry/TelemetrySession.jsm:933
(Diff revision 1)
>  
> -    for (let name of registered) {
> -      for (let suffix of Object.values(HISTOGRAM_SUFFIXES)) {
> -        if (name + suffix in hls) {
> -          if (!(suffix in ret)) {
> -            ret[suffix] = {};
> +    Object.entries(hls).forEach(([process, histograms]) => {
> +      ret[process] = {};
> +      Object.entries(histograms)
> +            .filter(([name, value]) => registered.includes(name))
> +            .forEach(([name, value]) => ret[process][name] = this.packHistogram(value));

Can we use `for ([...] of ...)`?

::: toolkit/components/telemetry/TelemetrySession.jsm:1302
(Diff revision 1)
>      let histograms = protect(() => this.getHistograms(isSubsession, clearSubsession), {});
>      let keyedHistograms = protect(() => this.getKeyedHistograms(isSubsession, clearSubsession), {});
>      let scalars = protect(() => this.getScalars(isSubsession, clearSubsession), {});
>      let keyedScalars = protect(() => this.getScalars(isSubsession, clearSubsession, true), {});
>      let events = protect(() => this.getEvents(isSubsession, clearSubsession))
>  
> -    payloadObj.histograms = histograms[HISTOGRAM_SUFFIXES.PARENT] || {};
> -    payloadObj.keyedHistograms = keyedHistograms[HISTOGRAM_SUFFIXES.PARENT] || {};
> +    payloadObj.histograms = histograms.parent || {};
> +    payloadObj.keyedHistograms = keyedHistograms.parent || {};
>      payloadObj.processes = {

This whole payload building is becoming very repetitive, only the "parent" handling sticks out a bit.
We should file a follow-up to make this more generic.
Attachment #8880944 - Flags: review?(gfritzsche)
Comment on attachment 8880945 [details]
bug 1366294 - Part 6 - Update test_TelemetryHistograms for new snapshots format

https://reviewboard.mozilla.org/r/152314/#review157606

::: toolkit/components/telemetry/tests/unit/test_TelemetryHistograms.js:442
(Diff revision 1)
> -  do_check_eq(Telemetry.histogramSnapshots["__expired__"], undefined);
> -  do_check_eq(Telemetry.histogramSnapshots[test_expired_id], undefined);
> +  do_check_eq(Telemetry.histogramSnapshots.parent["__expired__"], undefined);
> +  do_check_eq(Telemetry.histogramSnapshots.parent[test_expired_id], undefined);

I think we can improve this and test all processes for `!(expired in process)`.

::: toolkit/components/telemetry/tests/unit/test_TelemetryHistograms.js:779
(Diff revision 1)
>  add_task(function* test_histogramSnapshots() {
>    let keyed = Telemetry.getKeyedHistogramById("TELEMETRY_TEST_KEYED_COUNT");
>    keyed.add("a", 1);
>  
>    // Check that keyed histograms are not returned
> -  Assert.ok(!("TELEMETRY_TEST_KEYED_COUNT#a" in Telemetry.histogramSnapshots));
> +  Assert.ok(!("TELEMETRY_TEST_KEYED_COUNT#a" in Telemetry.histogramSnapshots.parent));

I think we need to test for "TELEMETRY_TEST_KEYED_COUNT" now.
Attachment #8880945 - Flags: review?(gfritzsche) → review+
Comment on attachment 8880946 [details]
bug 1366294 - Part 7 - Use keyed histogram snaphots to build payloads.

https://reviewboard.mozilla.org/r/152316/#review157608

::: toolkit/components/telemetry/TelemetryHistogram.cpp:2035
(Diff revision 1)
> +    if (!JS_DefineProperty(cx, obj, GetNameForProcessID(ProcessID(process)),
> +                           processObject, JSPROP_ENUMERATE)) {

Do we need to include all processes, even if we didn't record any data for them?

::: toolkit/components/telemetry/TelemetryHistogram.cpp:2045
(Diff revision 1)
> +      const HistogramInfo& info = gHistogramInfos[id];
> +      if (!info.keyed) {
> -      continue;
> +        continue;
> -    }
> +      }
>  
> -    // TODO: This won't get us data from different processes.
> +      if (!CanRecordInProcess(info.record_in_processes, ProcessID(process)) ||

What is the `CanRecordInProcess()` check needed for?
Shouldn't we just include all histograms in the snapshot that contain data?
(i.e. all that return non-null from `internal_GetKeyedHistogramById()`?)

::: toolkit/components/telemetry/TelemetrySession.jsm:933
(Diff revision 1)
>      let registered =
>        Telemetry.registeredKeyedHistograms(this.getDatasetType(), []);

Having a quick glance at DXR, i wonder if we still need `registeredHistograms()` & `registeredKeyedHistograms()` now.
Worth filing a follow-up?

::: toolkit/components/telemetry/TelemetrySession.jsm:940
(Diff revision 1)
> -    for (let id of registered) {
> -      for (let suffix of Object.values(HISTOGRAM_SUFFIXES)) {
> +    let khs = subsession ? Telemetry.snapshotSubsessionKeyedHistograms(clearSubsession)
> +                         : Telemetry.keyedHistogramSnapshots;

This is so much better now...

::: toolkit/components/telemetry/TelemetrySession.jsm:944
(Diff revision 1)
> -                                     : keyed.subsessionSnapshot();
> -        } else {
> -          snapshot = keyed.snapshot();
> -        }
>  
> -        let keys = Object.keys(snapshot);
> +    Object.entries(khs).forEach(([process, histograms]) => {

`for ([...] of Object.entries(khs))` ?

::: toolkit/components/telemetry/TelemetrySession.jsm:947
(Diff revision 1)
> -        }
>  
> -        let keys = Object.keys(snapshot);
> +    Object.entries(khs).forEach(([process, histograms]) => {
> +      ret[process] = {};
> +      Object.entries(histograms)
> +            .filter(([name, value]) => registered.includes(name))

Is this still needed?

::: toolkit/components/telemetry/TelemetrySession.jsm:948
(Diff revision 1)
>  
> -        let keys = Object.keys(snapshot);
> +    Object.entries(khs).forEach(([process, histograms]) => {
> +      ret[process] = {};
> +      Object.entries(histograms)
> +            .filter(([name, value]) => registered.includes(name))
> +            .forEach(([name, value]) => {

`for ([...] of ...)` ?

::: toolkit/components/telemetry/TelemetrySession.jsm:955
(Diff revision 1)
> +              Object.entries(value)
> +                    .forEach(([key, hgram]) => ret[process][name][key] = this.packHistogram(hgram));

`for ([...] of ...)` ?
Attachment #8880946 - Flags: review?(gfritzsche)
Comment on attachment 8880947 [details]
bug 1366294 - Part 8 - Include a message on telemetry testfails.

https://reviewboard.mozilla.org/r/152318/#review157616

::: toolkit/components/telemetry/tests/unit/test_TelemetrySession.js:787
(Diff revision 1)
>      for (let id of Object.keys(classic)) {
>        if (!registeredIds.has(id)) {
>          continue;
>        }
>  
> -      Assert.ok(id in subsession);
> +      Assert.ok(id in subsession, message + ' ' + id);

Nit: Maybe add this as " ($id)" to the message?
Attachment #8880947 - Flags: review?(gfritzsche) → review+
Blocks: 1376600
Blocks: 1376605
(In reply to Georg Fritzsche [:gfritzsche] from comment #48)
> Comment on attachment 8880943 [details]
> bug 1366294 - Part 4 - Small cleanups to previous patches.
> 
> https://reviewboard.mozilla.org/r/152310/#review157598
> 
> ::: toolkit/components/telemetry/TelemetryHistogram.cpp:524
> (Diff revision 1)
> >    // Check if we are allowed to record the data.
> >    bool canRecordDataset = CanRecordDataset(gHistogramInfos[id].dataset,
> >                                             internal_CanRecordBase(),
> >                                             internal_CanRecordExtended());
> > -  if (!canRecordDataset || !internal_IsRecordingEnabled(id)) {
> > +  if (!canRecordDataset ||
> > +    (aProcessType == ProcessID::Parent && !internal_IsRecordingEnabled(id))) {
> 
> What is the check against `ProcessID::Parent` for?
> Can we add a comment for this?

If we skip it then we fail to record data in child process histograms if they aren't allowed to be recorded in the Parent. So for anything without all or main in record_in_processes will never record any data.

This didn't come up before because we checked the histogram with the suffix if it was allowed to be recorded (which always returned true for non-parent hgrams when asked on the parent process). The new pattern is to check the id which maps to _all_ of the histograms now.


(In reply to Georg Fritzsche [:gfritzsche] from comment #49)
> Comment on attachment 8880944 [details]
> bug 1366294 - Part 5 - Send appropriately-structured Telemetry data to JS
> 
> https://reviewboard.mozilla.org/r/152312/#review157600
> 
> Great to see this snapshot handling cleaned up!
> 
> ::: toolkit/components/telemetry/TelemetryHistogram.cpp:1945
> (Diff revision 1)
> > +    if (!JS_DefineProperty(cx, root_obj, GetNameForProcessID(ProcessID(process)),
> > +                           processObject, JSPROP_ENUMERATE)) {
> 
> Should we add properties for processes that we haven't recorded any data for?

I don't see the harm in it.
 
> ::: toolkit/components/telemetry/TelemetrySession.jsm:932
> (Diff revision 1)
> >      let ret = {};
> >  
> > -    for (let name of registered) {
> > -      for (let suffix of Object.values(HISTOGRAM_SUFFIXES)) {
> > -        if (name + suffix in hls) {
> > -          if (!(suffix in ret)) {
> > +    Object.entries(hls).forEach(([process, histograms]) => {
> > +      ret[process] = {};
> > +      Object.entries(histograms)
> > +            .filter(([name, value]) => registered.includes(name))
> 
> Do we need this?
> We don't record any addon histograms anymore, removed (all?) the internal
> histogram.cc additions and stopped registering odd histogram names for
> different internal storage uses.

"registered" histograms are any non-expired histograms in gHistogramInfos. The code just above this also has provision for filtering out TELEMETRY_TEST*... but if that's all we're removing these days then we might be done with this.

I've filed bug 1376600 to track.

> ::: toolkit/components/telemetry/TelemetrySession.jsm:1302
> (Diff revision 1)
> >      let histograms = protect(() => this.getHistograms(isSubsession, clearSubsession), {});
> >      let keyedHistograms = protect(() => this.getKeyedHistograms(isSubsession, clearSubsession), {});
> >      let scalars = protect(() => this.getScalars(isSubsession, clearSubsession), {});
> >      let keyedScalars = protect(() => this.getScalars(isSubsession, clearSubsession, true), {});
> >      let events = protect(() => this.getEvents(isSubsession, clearSubsession))
> >  
> > -    payloadObj.histograms = histograms[HISTOGRAM_SUFFIXES.PARENT] || {};
> > -    payloadObj.keyedHistograms = keyedHistograms[HISTOGRAM_SUFFIXES.PARENT] || {};
> > +    payloadObj.histograms = histograms.parent || {};
> > +    payloadObj.keyedHistograms = keyedHistograms.parent || {};
> >      payloadObj.processes = {
> 
> This whole payload building is becoming very repetitive, only the "parent"
> handling sticks out a bit.
> We should file a follow-up to make this more generic.

bug 1376605
Comment on attachment 8880943 [details]
bug 1366294 - Part 4 - Small cleanups to previous patches.

https://reviewboard.mozilla.org/r/152310/#review159886

::: toolkit/components/telemetry/TelemetryHistogram.cpp:524
(Diff revision 2)
>    // Check if we are allowed to record the data.
>    bool canRecordDataset = CanRecordDataset(gHistogramInfos[id].dataset,
>                                             internal_CanRecordBase(),
>                                             internal_CanRecordExtended());
> -  if (!canRecordDataset || !internal_IsRecordingEnabled(id)) {
> +  if (!canRecordDataset ||
> +    (aProcessType == ProcessID::Parent && !internal_IsRecordingEnabled(id))) {

Please add a comment explaining the parent check.

::: toolkit/components/telemetry/TelemetryHistogram.cpp:693
(Diff revision 2)
>  {
>    bool canRecordDataset = CanRecordDataset(mHistogramInfo.dataset,
>                                             internal_CanRecordBase(),
>                                             internal_CanRecordExtended());
> -  if (!canRecordDataset || !internal_IsRecordingEnabled(mId)) {
> +  if (!canRecordDataset ||
> +    (aProcessType == ProcessID::Parent && !internal_IsRecordingEnabled(mId))) {

Please add a comment explaining the parent check.
Attachment #8880943 - Flags: review?(gfritzsche) → review+
Comment on attachment 8880944 [details]
bug 1366294 - Part 5 - Send appropriately-structured Telemetry data to JS

https://reviewboard.mozilla.org/r/152312/#review159888

::: toolkit/components/telemetry/TelemetrySession.jsm:928
(Diff revision 2)
>  
>      let hls = subsession ? Telemetry.snapshotSubsessionHistograms(clearSubsession)
>                           : Telemetry.histogramSnapshots;
>      let ret = {};
>  
> -    for (let name of registered) {
> +    Object.entries(hls).forEach(([process, histograms]) => {

`for (let [process, histograms] of ...)`
Attachment #8880944 - Flags: review?(gfritzsche) → review+
Comment on attachment 8883317 [details]
bug 1366294 - Part 9 - Fix Windows Build.

https://reviewboard.mozilla.org/r/154212/#review159890
Attachment #8883317 - Flags: review?(gfritzsche) → review+
Comment on attachment 8880946 [details]
bug 1366294 - Part 7 - Use keyed histogram snaphots to build payloads.

https://reviewboard.mozilla.org/r/152316/#review159892

::: toolkit/components/telemetry/TelemetryHistogram.cpp:2049
(Diff revision 2)
> +      const HistogramInfo& info = gHistogramInfos[id];
> +      if (!info.keyed) {
> -      continue;
> +        continue;
> -    }
> +      }
>  
> -    // TODO: This won't get us data from different processes.
> +      if (!CanRecordInProcess(info.record_in_processes, ProcessID(process)) ||

Still open question:
What is the `CanRecordInProcess()` check needed for?
Shouldn't we just include all histograms in the snapshot that contain data?
(i.e. all that return non-null from `internal_GetKeyedHistogramById()`?)

::: toolkit/components/telemetry/TelemetrySession.jsm:943
(Diff revision 2)
> -                                     : keyed.subsessionSnapshot();
> -        } else {
> -          snapshot = keyed.snapshot();
> -        }
>  
> -        let keys = Object.keys(snapshot);
> +    Object.entries(khs).forEach(([process, histograms]) => {

`for ([process, histograms] of ...)`

::: toolkit/components/telemetry/TelemetrySession.jsm:954
(Diff revision 2)
> +              Object.entries(value)
> +                    .forEach(([key, hgram]) => ret[process][name][key] = this.packHistogram(hgram));

`for ([key, hgram] of ...)`
Attachment #8880946 - Flags: review?(gfritzsche)
Comment on attachment 8883318 [details]
bug 1366294 - Part 10 - Nail down count histogram semantics.

https://reviewboard.mozilla.org/r/154214/#review159900

::: toolkit/components/telemetry/TelemetryHistogram.cpp:956
(Diff revision 1)
> -    // Nothing to do here.
> -    return;
> -  }
> -
>    // Now reset the histograms instances for all processes.
> -  for (uint32_t sessionIdx = 0; sessionIdx < sessionTypes.Length(); ++sessionIdx) {
> +  for (SessionType sessionType : sessionTypes) {

Oh, that works with nsTArray? Nice!

::: toolkit/components/telemetry/TelemetryHistogram.cpp:1939
(Diff revision 1)
> -      Histogram* h = internal_GetHistogramById(id, ProcessID(process), sessionType, /* instantiate = */ false);
> +      bool shouldInstantiate =
> +        info.histogramType == nsITelemetry::HISTOGRAM_FLAG;

This is much better than the explicit instantiation loop.

::: toolkit/components/telemetry/tests/unit/test_TelemetrySession.js:897
(Diff revision 1)
>    // histograms got reset.
>    classic = TelemetrySession.getPayload();
>    subsession = TelemetrySession.getPayload("environment-change");
>  
>    Assert.ok(COUNT_ID in classic.histograms);
> -  Assert.ok(COUNT_ID in subsession.histograms);
> +  Assert.ok(!(COUNT_ID in subsession.histograms));

Do we have comparable coverage for flag histograms somewhere?
If not, we should add it.
Attachment #8883318 - Flags: review?(gfritzsche) → review+
Comment on attachment 8880947 [details]
bug 1366294 - Part 8 - Include a message on telemetry testfails.

https://reviewboard.mozilla.org/r/152318/#review157616

> Nit: Maybe add this as " ($id)" to the message?

My understanding of template strings suggest that the embedded expressions are evaluated at the point of declaration, not use. So I'd have to use a tagging function within the closure that knows id, which wouldn't make sense.

...or are you just suggesting to put parens around the id? I can do that.
Comment on attachment 8880946 [details]
bug 1366294 - Part 7 - Use keyed histogram snaphots to build payloads.

https://reviewboard.mozilla.org/r/152316/#review157608

> What is the `CanRecordInProcess()` check needed for?
> Shouldn't we just include all histograms in the snapshot that contain data?
> (i.e. all that return non-null from `internal_GetKeyedHistogramById()`?)

It echose CreateHistogramSnapshots, above.
Comment on attachment 8883318 [details]
bug 1366294 - Part 10 - Nail down count histogram semantics.

https://reviewboard.mozilla.org/r/154214/#review159900

> Do we have comparable coverage for flag histograms somewhere?
> If not, we should add it.

Yup, TELEMETRY_TEST_FLAG is tested to always be present, to have the correct default, and to have good child-process semantics.
Comment on attachment 8880947 [details]
bug 1366294 - Part 8 - Include a message on telemetry testfails.

https://reviewboard.mozilla.org/r/152318/#review157616

> My understanding of template strings suggest that the embedded expressions are evaluated at the point of declaration, not use. So I'd have to use a tagging function within the closure that knows id, which wouldn't make sense.
> 
> ...or are you just suggesting to put parens around the id? I can do that.

I was mostly suggesting parentheses here ("Some message HISTOGRAM" is a bit odd to read).
Or this could use, say, "Some message: HISTOGRAM".
Comment on attachment 8880946 [details]
bug 1366294 - Part 7 - Use keyed histogram snaphots to build payloads.

https://reviewboard.mozilla.org/r/152316/#review160302

::: toolkit/components/telemetry/TelemetryHistogram.cpp:524
(Diff revisions 2 - 3)
> +  // If `histogram` is a non-parent-process histogram, then recording-enabled
> +  // has been checked in its owner process.

How does that work with `SetHistogramRecordingEnabled(FOO, false)`?

If i disable recording for a histogram in the parent, will we still record child process data for it?
Attachment #8880946 - Flags: review?(gfritzsche)
Comment on attachment 8880946 [details]
bug 1366294 - Part 7 - Use keyed histogram snaphots to build payloads.

https://reviewboard.mozilla.org/r/152316/#review160302

> How does that work with `SetHistogramRecordingEnabled(FOO, false)`?
> 
> If i disable recording for a histogram in the parent, will we still record child process data for it?

Yup. That's exactly how it works.
Comment on attachment 8880946 [details]
bug 1366294 - Part 7 - Use keyed histogram snaphots to build payloads.

https://reviewboard.mozilla.org/r/152316/#review160302

> Yup. That's exactly how it works.

Ok, talking through this, this is limited to usage for gfx scroll frame histograms, all of which are only recorded in the parent.
So this doesn't change any behavior.
Lets call this behavior (`SetHistogramRecordingEnabled()` only affects the current process) out in Telemetry.h though.
Comment on attachment 8880946 [details]
bug 1366294 - Part 7 - Use keyed histogram snaphots to build payloads.

https://reviewboard.mozilla.org/r/152316/#review160308
Attachment #8880946 - Flags: review+
Comment on attachment 8884323 [details]
bug 1366294 - Part 11 - Fix OSX Build.

https://reviewboard.mozilla.org/r/155252/#review160312
Attachment #8884323 - Flags: review?(gfritzsche) → review+
Current status: Tracking down a leak of 8 nsStringBuffers

According to the field of orange on try[1] and my local testing, we're leaking nsStringBuffers.
STR:
1) Build debug
2) ./mach test devtools/client/canvasdebugger/test/browser_profiling-webgl.js

I have tracked it down (through selective revert) to having started with one or all of the first three patches (they're indivisible).

I've tried to enable refcount tracing per [2], but have run into an assertfail [3] which basically amounts to "Someone memmoved a refptr? WTF?".

This happens when trying to start the browser with refcnt tracing enabled regardless of the presence or absence of any patches from this bug. (commandline: MOZ_DISABLE_CONTENT_SANDBOX=1 XPCOM_MEM_REFCNT_LOG=`pwd`/xpcom_mem_refcnt_log XPCOM_MEM_LOG_CLASSES=nsStringBuffer XPCOM_MEM_LOG_OBJECTS=1240-1260 ./mach run )

This is all on my Linux x64 box, fwiw.

[1]: https://treeherder.mozilla.org/#/jobs?repo=try&revision=452a381d7a43
[2]: https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Refcount_tracing_and_balancing
[3]: http://searchfox.org/mozilla-central/source/xpcom/base/nsTraceRefcnt.cpp#1039
At least I _think_ this means we're leaking 8 nsStringBuffers. I could be reading this wrong:

== BloatView: ALL (cumulative) LEAK AND BLOAT STATISTICS, default process 9671

     |<----------------Class--------------->|<-----Bytes------>|<----Objects---->|
     |                                      | Per-Inst   Leaked|   Total      Rem|
   0 |TOTAL                                 |       31     4584| 2332262      573|
1254 |nsStringBuffer                        |        8     4584|  153114      573|

According to https://developer.mozilla.org/en-US/docs/Mozilla/Performance/BloatView this might actually be 573 objects. Oops.
Yes, you are leaking 573 of them, and they are 8 bytes each. I can take a look into the failures you are seeing.
Depends on: 1350729
It looks like bug 1350729 was already filed on the refcount logging problem.
Comment on attachment 8886670 [details]
bug 1366294 - Part 12 - Clean up after onesself.

https://reviewboard.mozilla.org/r/157456/#review162844

::: toolkit/components/telemetry/TelemetryHistogram.cpp:1634
(Diff revision 1)
>    StaticMutexAutoLock locker(gTelemetryHistogramMutex);
>    gCanRecordBase = false;
>    gCanRecordExtended = false;
>    gNameToHistogramIDMap.Clear();
>    gInitDone = false;
> +  for (size_t i = 0; i < HistogramCount; ++i) {

Readability nit: Add empty line before, add a comment about storage cleanup.
Attachment #8886670 - Flags: review?(gfritzsche) → review+
The latest try run[1] shows a few failures in telemetry-related tests on e10s only. Specifically:

browser/components/sessionstore/test/browser_sessionStorage_size.js (it tries appending "#content" to get the content amount. That's not gonna work anymore)
dom/ipc/tests/browser_remote_navigation_delay_telemetry.js (ditto. I didn't realize so many tests were using this backdoor)
dom/base/test/browser_use_counters.js (triple ditto)
toolkit/components/osfile/tests/xpcshell/test_telemetry.js (uses histogramSnapshots which now returns a process-aware structure.)
toolkit/components/terminator/tests/xpcshell/test_terminator_reload.js (ditto histogramSnapshots)

A quick searchfox suggests that these are exactly the tests that depend on this sort of "inside baseball" knowledge of Telemetry client implementation. So fixing these ought to be the last of it.

[1]: https://treeherder.mozilla.org/#/jobs?repo=try&revision=784369a4305338d8b4feaad5a1e966415c10a9f6&selectedJob=114465394
Blocks: 1381470
Attachment #8880941 - Attachment is obsolete: true
Attachment #8880942 - Attachment is obsolete: true
Attachment #8880943 - Attachment is obsolete: true
Attachment #8880944 - Attachment is obsolete: true
Attachment #8880945 - Attachment is obsolete: true
Attachment #8880946 - Attachment is obsolete: true
Attachment #8880947 - Attachment is obsolete: true
Attachment #8883317 - Attachment is obsolete: true
Attachment #8883318 - Attachment is obsolete: true
Attachment #8884323 - Attachment is obsolete: true
Comment on attachment 8880940 [details]
bug 1366294 - Part 13 - Fix telemetry-using tests.

https://reviewboard.mozilla.org/r/152304/#review162942
Attachment #8880940 - Flags: review?(alessio.placitelli) → review+
Comment on attachment 8887512 [details]
Bug 1366294 - Part 1 - Remove base::StatisticsRecorder.

https://reviewboard.mozilla.org/r/158356/#review163616
Attachment #8887512 - Flags: review?(chutten) → review+
Comment on attachment 8887513 [details]
Bug 1366294 - Part 2 - Cleanup Chromium Histogram code.

https://reviewboard.mozilla.org/r/158358/#review163620
Attachment #8887513 - Flags: review?(chutten) → review+
Comment on attachment 8887514 [details]
Bug 1366294 - Part 3 - Refactor TelemetryHistogram storage.

https://reviewboard.mozilla.org/r/158360/#review163622
Attachment #8887514 - Flags: review?(chutten) → review+
Comment on attachment 8887515 [details]
bug 1366294 - Part 4 - Small cleanups to previous patches.

https://reviewboard.mozilla.org/r/158362/#review163634
Attachment #8887515 - Flags: review?(gfritzsche) → review+
Comment on attachment 8887516 [details]
bug 1366294 - Part 5 - Send appropriately-structured Telemetry data to JS

https://reviewboard.mozilla.org/r/158364/#review163636
Attachment #8887516 - Flags: review?(gfritzsche) → review+
Comment on attachment 8887517 [details]
bug 1366294 - Part 6 - Update test_TelemetryHistograms for new snapshots format

https://reviewboard.mozilla.org/r/158366/#review163638
Attachment #8887517 - Flags: review?(gfritzsche) → review+
Comment on attachment 8887518 [details]
bug 1366294 - Part 7 - Use keyed histogram snaphots to build payloads.

https://reviewboard.mozilla.org/r/158368/#review163640
Attachment #8887518 - Flags: review?(gfritzsche) → review+
Comment on attachment 8887519 [details]
bug 1366294 - Part 8 - Include a message on telemetry testfails.

https://reviewboard.mozilla.org/r/158370/#review163642
Attachment #8887519 - Flags: review?(gfritzsche) → review+
Comment on attachment 8887520 [details]
bug 1366294 - Part 9 - Fix Windows Build.

https://reviewboard.mozilla.org/r/158372/#review163644
Attachment #8887520 - Flags: review?(gfritzsche) → review+
Comment on attachment 8887521 [details]
bug 1366294 - Part 10 - Nail down count histogram semantics.

https://reviewboard.mozilla.org/r/158374/#review163646
Attachment #8887521 - Flags: review?(gfritzsche) → review+
Comment on attachment 8887522 [details]
bug 1366294 - Part 11 - Fix OSX Build.

https://reviewboard.mozilla.org/r/158376/#review163648
Attachment #8887522 - Flags: review?(gfritzsche) → review+
Pushed by chutten@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/57400dd449ad
Part 1 - Remove base::StatisticsRecorder. r=chutten
https://hg.mozilla.org/integration/mozilla-inbound/rev/96b594816a21
Part 2 - Cleanup Chromium Histogram code. r=chutten
https://hg.mozilla.org/integration/mozilla-inbound/rev/175c3ccb0015
Part 3 - Refactor TelemetryHistogram storage. r=chutten
https://hg.mozilla.org/integration/mozilla-inbound/rev/1cc736607aab
Part 4 - Small cleanups to previous patches. r=gfritzsche
https://hg.mozilla.org/integration/mozilla-inbound/rev/e7ce6a06c131
Part 5 - Send appropriately-structured Telemetry data to JS r=gfritzsche
https://hg.mozilla.org/integration/mozilla-inbound/rev/476daf9a5846
Part 6 - Update test_TelemetryHistograms for new snapshots format r=gfritzsche
https://hg.mozilla.org/integration/mozilla-inbound/rev/c387459a798b
Part 7 - Use keyed histogram snaphots to build payloads. r=gfritzsche
https://hg.mozilla.org/integration/mozilla-inbound/rev/72635bc1ba25
Part 8 - Include a message on telemetry testfails. r=gfritzsche
https://hg.mozilla.org/integration/mozilla-inbound/rev/beb5f00c4b19
Part 9 - Fix Windows Build. r=gfritzsche
https://hg.mozilla.org/integration/mozilla-inbound/rev/14f9f9521d4f
Part 10 - Nail down count histogram semantics. r=gfritzsche
https://hg.mozilla.org/integration/mozilla-inbound/rev/aee317b1445d
Part 11 - Fix OSX Build. r=gfritzsche
https://hg.mozilla.org/integration/mozilla-inbound/rev/b722d638f6ed
Part 12 - Clean up after onesself. r=gfritzsche
https://hg.mozilla.org/integration/mozilla-inbound/rev/f2d79c13799a
Part 13 - Fix telemetry-using tests. r=Dexter
I now have a working asan build that should help me nail these down.

It looks as though KeyedHistogram needs some manual memory management as well, beyond what I already introduced in Part 11.
Flags: needinfo?(chutten)
Pushed by chutten@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/e3825e35e2e2
Part 1 - Remove base::StatisticsRecorder. r=chutten
https://hg.mozilla.org/integration/mozilla-inbound/rev/df6adc4c95f2
Part 2 - Cleanup Chromium Histogram code. r=chutten
https://hg.mozilla.org/integration/mozilla-inbound/rev/ed0027bb1187
Part 3 - Refactor TelemetryHistogram storage. r=chutten
https://hg.mozilla.org/integration/mozilla-inbound/rev/8c0cbcd04cc4
Part 4 - Small cleanups to previous patches. r=gfritzsche
https://hg.mozilla.org/integration/mozilla-inbound/rev/f80a1985bca7
Part 5 - Send appropriately-structured Telemetry data to JS r=gfritzsche
https://hg.mozilla.org/integration/mozilla-inbound/rev/fb911884876f
Part 6 - Update test_TelemetryHistograms for new snapshots format r=gfritzsche
https://hg.mozilla.org/integration/mozilla-inbound/rev/ccf043ec6160
Part 7 - Use keyed histogram snaphots to build payloads. r=gfritzsche
https://hg.mozilla.org/integration/mozilla-inbound/rev/43fb00c0f96d
Part 8 - Include a message on telemetry testfails. r=gfritzsche
https://hg.mozilla.org/integration/mozilla-inbound/rev/0ed44ecf9062
Part 9 - Fix Windows Build. r=gfritzsche
https://hg.mozilla.org/integration/mozilla-inbound/rev/09043ee913d4
Part 10 - Nail down count histogram semantics. r=gfritzsche
https://hg.mozilla.org/integration/mozilla-inbound/rev/140948f5b955
Part 11 - Fix OSX Build. r=gfritzsche
https://hg.mozilla.org/integration/mozilla-inbound/rev/fb40775d1451
Part 12 - Clean up after onesself. r=gfritzsche
https://hg.mozilla.org/integration/mozilla-inbound/rev/e7a3ac603f7c
Part 13 - Fix telemetry-using tests. r=Dexter
Backed out for crashing various tests with [@ TelemetryHistogram::DeInitializeGlobalState()], e.g. in bc's browser_permission_dismiss.js:

https://hg.mozilla.org/integration/mozilla-inbound/rev/4d75ebaf1ba3bf009f5e5972c2cf381bfe38f255
https://hg.mozilla.org/integration/mozilla-inbound/rev/caad54a2c451b62f8b14d4074359994b2b753139
https://hg.mozilla.org/integration/mozilla-inbound/rev/76372cf9a757ceebff471c698c8e352fc32e6b40
https://hg.mozilla.org/integration/mozilla-inbound/rev/542a7e1b45c6a92c89d55dd6635a82233e985adf
https://hg.mozilla.org/integration/mozilla-inbound/rev/0941c910f0535bba633821fa8f19d5b00242085d
https://hg.mozilla.org/integration/mozilla-inbound/rev/b79682d0b9357d73f7dfb6016623ca4b5709d480
https://hg.mozilla.org/integration/mozilla-inbound/rev/1f566354cea1c40981e78db8986e52d31e22c876
https://hg.mozilla.org/integration/mozilla-inbound/rev/dbb8be04d8ce79bc43dae791a46ed81ebcda016c
https://hg.mozilla.org/integration/mozilla-inbound/rev/6da4ecf79a4b61cc0e10457b59541ac8fb26a39f
https://hg.mozilla.org/integration/mozilla-inbound/rev/9f869d8dc3e36c3e8376e9651d3304e95809814f
https://hg.mozilla.org/integration/mozilla-inbound/rev/724d5d0cef066e4716af07ba8f60ac65703c33cc
https://hg.mozilla.org/integration/mozilla-inbound/rev/e87d51f223d9bf9ff03a6dfcef7e949945c1a098
https://hg.mozilla.org/integration/mozilla-inbound/rev/2c1ee57abd493421609a89e40aeb44702859ecc1

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=e7a3ac603f7c6389c60eb70e6a595b2e24858b7f&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=runnable&filter-resultStatus=usercancel
Failure log example: https://treeherder.mozilla.org/logviewer.html#?job_id=116369582&repo=mozilla-inbound

[task 2017-07-21T13:13:01.518840Z] 13:13:01     INFO - TEST-START | Shutdown
[task 2017-07-21T13:13:01.519126Z] 13:13:01     INFO - Browser Chrome Test Summary
[task 2017-07-21T13:13:01.519342Z] 13:13:01     INFO - Passed:  312
[task 2017-07-21T13:13:01.519565Z] 13:13:01     INFO - Failed:  0
[task 2017-07-21T13:13:01.520879Z] 13:13:01     INFO - Todo:    0
[task 2017-07-21T13:13:01.524665Z] 13:13:01     INFO - Mode:    non-e10s
[task 2017-07-21T13:13:01.528113Z] 13:13:01     INFO - *** End BrowserChrome Test Results ***
[task 2017-07-21T13:13:02.316446Z] 13:13:02     INFO - GECKO(1593) | *** UTM:SVC TimerManager:registerTimer called after profile-before-change notification. Ignoring timer registration for id: telemetry_modules_ping
[task 2017-07-21T13:13:02.903959Z] 13:13:02     INFO - GECKO(1593) | ExceptionHandler::GenerateDump cloned child 1762
[task 2017-07-21T13:13:02.907332Z] 13:13:02     INFO - GECKO(1593) | ExceptionHandler::SendContinueSignalToChild sent continue signal to child
[task 2017-07-21T13:13:02.909417Z] 13:13:02     INFO - GECKO(1593) | ExceptionHandler::WaitForContinueSignal waiting for continue signal...
[task 2017-07-21T13:13:03.210350Z] 13:13:03     INFO - TEST-INFO | Main app process: exit 11
[task 2017-07-21T13:13:03.210729Z] 13:13:03     INFO - Buffered messages finished
[task 2017-07-21T13:13:03.216292Z] 13:13:03    ERROR - TEST-UNEXPECTED-FAIL | browser/base/content/test/webextensions/browser_update_interactive_noprompt.js | application terminated with exit code 11
[task 2017-07-21T13:13:03.219312Z] 13:13:03     INFO - runtests.py | Application ran for: 0:01:02.347616
[task 2017-07-21T13:13:03.224977Z] 13:13:03     INFO - zombiecheck | Reading PID log: /tmp/tmplgf_C6pidlog
[task 2017-07-21T13:13:03.227263Z] 13:13:03     INFO - ==> process 1593 launched child process 1612
[task 2017-07-21T13:13:03.229222Z] 13:13:03     INFO - zombiecheck | Checking for orphan process with PID: 1612
[task 2017-07-21T13:13:03.233707Z] 13:13:03     INFO - mozcrash Downloading symbols from: https://queue.taskcluster.net/v1/task/dk_NrXRkQH-HIkZ3fkY3Vg/artifacts/public/build/target.crashreporter-symbols.zip
[task 2017-07-21T13:13:16.698407Z] 13:13:16     INFO - mozcrash Copy/paste: /usr/local/bin/linux64-minidump_stackwalk /tmp/tmpJdoaB0.mozrunner/minidumps/2eee755c-06eb-edbe-b9db-757401a84d05.dmp /tmp/tmpSFY1gk
[task 2017-07-21T13:13:33.611602Z] 13:13:33     INFO - mozcrash Saved minidump as /home/worker/workspace/build/blobber_upload_dir/2eee755c-06eb-edbe-b9db-757401a84d05.dmp
[task 2017-07-21T13:13:33.614105Z] 13:13:33     INFO - mozcrash Saved app info as /home/worker/workspace/build/blobber_upload_dir/2eee755c-06eb-edbe-b9db-757401a84d05.extra
[task 2017-07-21T13:13:33.750534Z] 13:13:33     INFO - PROCESS-CRASH | browser/base/content/test/webextensions/browser_update_interactive_noprompt.js | application crashed [@ TelemetryHistogram::DeInitializeGlobalState]
[task 2017-07-21T13:13:33.753377Z] 13:13:33     INFO - Crash dump filename: /tmp/tmpJdoaB0.mozrunner/minidumps/2eee755c-06eb-edbe-b9db-757401a84d05.dmp
[task 2017-07-21T13:13:33.756128Z] 13:13:33     INFO - Operating system: Linux
[task 2017-07-21T13:13:33.757909Z] 13:13:33     INFO -                   0.0.0 Linux 3.13.0-100-generic #147-Ubuntu SMP Tue Oct 18 16:48:51 UTC 2016 x86_64
[task 2017-07-21T13:13:33.759499Z] 13:13:33     INFO - CPU: amd64
[task 2017-07-21T13:13:33.761228Z] 13:13:33     INFO -      family 6 model 45 stepping 7
[task 2017-07-21T13:13:33.764226Z] 13:13:33     INFO -      1 CPU
[task 2017-07-21T13:13:33.765871Z] 13:13:33     INFO - 
[task 2017-07-21T13:13:33.767496Z] 13:13:33     INFO - GPU: UNKNOWN
[task 2017-07-21T13:13:33.769354Z] 13:13:33     INFO - 
[task 2017-07-21T13:13:33.772236Z] 13:13:33     INFO - Crash reason:  SIGSEGV
[task 2017-07-21T13:13:33.773909Z] 13:13:33     INFO - Crash address: 0x0
[task 2017-07-21T13:13:33.776154Z] 13:13:33     INFO - Process uptime: not available
[task 2017-07-21T13:13:33.777792Z] 13:13:33     INFO - 
[task 2017-07-21T13:13:33.779448Z] 13:13:33     INFO - Thread 0 (crashed)
[task 2017-07-21T13:13:33.783030Z] 13:13:33     INFO -  0  libxul.so!TelemetryHistogram::DeInitializeGlobalState [TelemetryHistogram.cpp:e7a3ac603f7c : 649 + 0x10]
[task 2017-07-21T13:13:33.784802Z] 13:13:33     INFO -     rax = 0xe5e5e5e5e5e5e5e5   rdx = 0x0000000000000020
[task 2017-07-21T13:13:33.786482Z] 13:13:33     INFO -     rcx = 0x00007f4958ed5e00   rbx = 0x00007ffc52d76a80
[task 2017-07-21T13:13:33.788134Z] 13:13:33     INFO -     rsi = 0x00007f49592d4640   rdi = 0x00007f4979b5a200
[task 2017-07-21T13:13:33.790098Z] 13:13:33     INFO -     rbp = 0x00007ffc52d76ae0   rsp = 0x00007ffc52d76a60
[task 2017-07-21T13:13:33.792268Z] 13:13:33     INFO -      r8 = 0x0000000000000000    r9 = 0x00007f49957003e0
[task 2017-07-21T13:13:33.794026Z] 13:13:33     INFO -     r10 = 0x00007f4952a00ad0   r11 = 0x00007f49957003f0
[task 2017-07-21T13:13:33.796279Z] 13:13:33     INFO -     r12 = 0x00007f49592d4640   r13 = 0x00007f498b0850f0
[task 2017-07-21T13:13:33.798027Z] 13:13:33     INFO -     r14 = 0x00007f498b099ce0   r15 = 0x0000000000000001
[task 2017-07-21T13:13:33.800244Z] 13:13:33     INFO -     rip = 0x00007f4988994540
[task 2017-07-21T13:13:33.801932Z] 13:13:33     INFO -     Found by: given as instruction pointer in context
[task 2017-07-21T13:13:33.804351Z] 13:13:33     INFO -  1  libxul.so!TelemetryImpl::ShutdownTelemetry [Telemetry.cpp:e7a3ac603f7c : 1577 + 0x5]
[task 2017-07-21T13:13:33.808200Z] 13:13:33     INFO -     rbx = 0x00007f49956bed40   rbp = 0x00007ffc52d76b00
[task 2017-07-21T13:13:33.810261Z] 13:13:33     INFO -     rsp = 0x00007ffc52d76af0   r12 = 0x00007f49956bed70
[task 2017-07-21T13:13:33.812022Z] 13:13:33     INFO -     r13 = 0x00007f49956b3378   r14 = 0x00007f498308f1a8
[task 2017-07-21T13:13:33.813715Z] 13:13:33     INFO -     r15 = 0x0000000000000045   rip = 0x00007f498898bfd6
[task 2017-07-21T13:13:33.815503Z] 13:13:33     INFO -     Found by: call frame info
[task 2017-07-21T13:13:33.817286Z] 13:13:33     INFO -  2  libxul.so!nsTArray_Impl<nsAutoPtr<nsComponentManagerImpl::KnownModule>, nsTArrayInfallibleAllocator>::RemoveElementsAt [nsComponentManager.h:e7a3ac603f7c : 229 + 0x2]
[task 2017-07-21T13:13:33.820282Z] 13:13:33     INFO -     rbx = 0x00007f49830ce4c0   rbp = 0x00007ffc52d76b50
[task 2017-07-21T13:13:33.821921Z] 13:13:33     INFO -     rsp = 0x00007ffc52d76b10   r12 = 0x00007f498308f230
[task 2017-07-21T13:13:33.823601Z] 13:13:33     INFO -     r13 = 0x00007f49956b3378   r14 = 0x00007f498308f1a8
[task 2017-07-21T13:13:33.826936Z] 13:13:33     INFO -     r15 = 0x0000000000000045   rip = 0x00007f49857845ac
[task 2017-07-21T13:13:33.828653Z] 13:13:33     INFO -     Found by: call frame info
[task 2017-07-21T13:13:33.830513Z] 13:13:33     INFO -  3  libxul.so!nsComponentManagerImpl::Shutdown [nsTArray.h:e7a3ac603f7c : 1738 + 0x5]
[task 2017-07-21T13:13:33.832213Z] 13:13:33     INFO -     rbx = 0x00007f49956b3290   rbp = 0x00007ffc52d76b70
[task 2017-07-21T13:13:33.834038Z] 13:13:33     INFO -     rsp = 0x00007ffc52d76b60   r12 = 0x00007ffc52d76ba0
[task 2017-07-21T13:13:33.836261Z] 13:13:33     INFO -     r13 = 0x00007ffc52d76b8f   r14 = 0x00007f495152f920
[task 2017-07-21T13:13:33.838114Z] 13:13:33     INFO -     r15 = 0x00007f498574a6b0   rip = 0x00007f4985784e89
[task 2017-07-21T13:13:33.840160Z] 13:13:33     INFO -     Found by: call frame info
[task 2017-07-21T13:13:33.844229Z] 13:13:33     INFO -  4  libxul.so!mozilla::ShutdownXPCOM [XPCOMInit.cpp:e7a3ac603f7c : 1007 + 0x5]
[task 2017-07-21T13:13:33.845899Z] 13:13:33     INFO -     rbx = 0x00007ffc52d76bb0   rbp = 0x00007ffc52d76bf0
[task 2017-07-21T13:13:33.847650Z] 13:13:33     INFO -     rsp = 0x00007ffc52d76b80   r12 = 0x00007ffc52d76ba0
[task 2017-07-21T13:13:33.849322Z] 13:13:33     INFO -     r13 = 0x00007ffc52d76b8f   r14 = 0x00007f495152f920
[task 2017-07-21T13:13:33.851187Z] 13:13:33     INFO -     r15 = 0x00007f498574a6b0   rip = 0x00007f49857b8afd
[task 2017-07-21T13:13:33.852815Z] 13:13:33     INFO -     Found by: call frame info
[task 2017-07-21T13:13:33.854488Z] 13:13:33     INFO -  5  libxul.so!ScopedXPCOMStartup::~ScopedXPCOMStartup [nsAppRunner.cpp:e7a3ac603f7c : 1464 + 0x8]
[task 2017-07-21T13:13:33.856325Z] 13:13:33     INFO -     rbx = 0x00007f49956bd788   rbp = 0x00007ffc52d76c20
[task 2017-07-21T13:13:33.859157Z] 13:13:33     INFO -     rsp = 0x00007ffc52d76c00   r12 = 0x00007ffc52d76cc0
[task 2017-07-21T13:13:33.860825Z] 13:13:33     INFO -     r13 = 0x0000000000000000   r14 = 0x0000000000000000
[task 2017-07-21T13:13:33.862555Z] 13:13:33     INFO -     r15 = 0x00007f499566bd80   rip = 0x00007f4988a0ef82
[task 2017-07-21T13:13:33.867230Z] 13:13:33     INFO -     Found by: call frame info
[task 2017-07-21T13:13:33.868926Z] 13:13:33     INFO -  6  libxul.so!XREMain::XRE_main [UniquePtr.h:e7a3ac603f7c : 528 + 0x8]
[task 2017-07-21T13:13:33.870562Z] 13:13:33     INFO -     rbx = 0x00007f49956bd788   rbp = 0x00007ffc52d76cb0
[task 2017-07-21T13:13:33.872412Z] 13:13:33     INFO -     rsp = 0x00007ffc52d76c30   r12 = 0x00007ffc52d76cc0
[task 2017-07-21T13:13:33.874102Z] 13:13:33     INFO -     r13 = 0x0000000000000000   r14 = 0x0000000000000000
[task 2017-07-21T13:13:33.875813Z] 13:13:33     INFO -     r15 = 0x00007f499566bd80   rip = 0x00007f4988a14fc3
[task 2017-07-21T13:13:33.877397Z] 13:13:33     INFO -     Found by: call frame info
[task 2017-07-21T13:13:33.879053Z] 13:13:33     INFO -  7  libxul.so!XRE_main [nsAppRunner.cpp:e7a3ac603f7c : 4855 + 0x5]
[task 2017-07-21T13:13:33.880865Z] 13:13:33     INFO -     rbx = 0x0000000000000005   rbp = 0x00007ffc52d76e80
[task 2017-07-21T13:13:33.884172Z] 13:13:33     INFO -     rsp = 0x00007ffc52d76cc0   r12 = 0x00007ffc52d76cc0
[task 2017-07-21T13:13:33.885810Z] 13:13:33     INFO -     r13 = 0x00007ffc52d77ff8   r14 = 0x00007ffc52d76ea0
[task 2017-07-21T13:13:33.887458Z] 13:13:33     INFO -     r15 = 0x00007ffc52d76ea0   rip = 0x00007f4988a153a8
[task 2017-07-21T13:13:33.889133Z] 13:13:33     INFO -     Found by: call frame info
[task 2017-07-21T13:13:33.890891Z] 13:13:33     INFO -  8  firefox!do_main [nsBrowserApp.cpp:e7a3ac603f7c : 236 + 0xe]
[task 2017-07-21T13:13:33.894698Z] 13:13:33     INFO -     rbx = 0x0000000000000005   rbp = 0x00007ffc52d77ed0
[task 2017-07-21T13:13:33.896376Z] 13:13:33     INFO -     rsp = 0x00007ffc52d76e90   r12 = 0x00007ffc52d78028
[task 2017-07-21T13:13:33.898691Z] 13:13:33     INFO -     r13 = 0x0000000000000000   r14 = 0x00007ffc52d77ff8
[task 2017-07-21T13:13:33.900360Z] 13:13:33     INFO -     r15 = 0x00007ffc52d76ea0   rip = 0x0000000000406557
[task 2017-07-21T13:13:33.904173Z] 13:13:33     INFO -     Found by: call frame info
[task 2017-07-21T13:13:33.905804Z] 13:13:33     INFO -  9  firefox!main [nsBrowserApp.cpp:e7a3ac603f7c : 309 + 0xe]
[task 2017-07-21T13:13:33.907467Z] 13:13:33     INFO -     rbx = 0x00007ffc52d77ff8   rbp = 0x00007ffc52d77f10
[task 2017-07-21T13:13:33.909158Z] 13:13:33     INFO -     rsp = 0x00007ffc52d77ee0   r12 = 0x0000000000000005
[task 2017-07-21T13:13:33.910851Z] 13:13:33     INFO -     r13 = 0x00007ffc52d78028   r14 = 0x000023116bf2628c
[task 2017-07-21T13:13:33.912649Z] 13:13:33     INFO -     r15 = 0x0000000000000000   rip = 0x0000000000405c34
[task 2017-07-21T13:13:33.916206Z] 13:13:33     INFO -     Found by: call frame info
[task 2017-07-21T13:13:33.917795Z] 13:13:33     INFO - 10  libc-2.23.so + 0x20830
[task 2017-07-21T13:13:33.919396Z] 13:13:33     INFO -     rbx = 0x0000000000000000   rbp = 0x0000000000420e40
[task 2017-07-21T13:13:33.921086Z] 13:13:33     INFO -     rsp = 0x00007ffc52d77f20   r12 = 0x0000000000405f1c
[task 2017-07-21T13:13:33.922678Z] 13:13:33     INFO -     r13 = 0x00007ffc52d77ff0   r14 = 0x0000000000000000
[task 2017-07-21T13:13:33.924445Z] 13:13:33     INFO -     r15 = 0x0000000000000000   rip = 0x00007f499584d830
[task 2017-07-21T13:13:33.928340Z] 13:13:33     INFO -     Found by: call frame info
[task 2017-07-21T13:13:33.930332Z] 13:13:33     INFO - 11  firefox + 0x5bc0
[task 2017-07-21T13:13:33.932020Z] 13:13:33     INFO -     rsp = 0x00007ffc52d77f40   rip = 0x0000000000405bc0
[task 2017-07-21T13:13:33.933979Z] 13:13:33     INFO -     Found by: stack scanning
[task 2017-07-21T13:13:33.936186Z] 13:13:33     INFO - 12  firefox + 0x5f1c
[task 2017-07-21T13:13:33.937896Z] 13:13:33     INFO -     rsp = 0x00007ffc52d77f58   rip = 0x0000000000405f1c
[task 2017-07-21T13:13:33.940252Z] 13:13:33     INFO -     Found by: stack scanning
[task 2017-07-21T13:13:33.941923Z] 13:13:33     INFO - 13  firefox + 0x5bc0
[task 2017-07-21T13:13:33.943669Z] 13:13:33     INFO -     rsp = 0x00007ffc52d77fa8   rip = 0x0000000000405bc0
[task 2017-07-21T13:13:33.948248Z] 13:13:33     INFO -     Found by: stack scanning
[task 2017-07-21T13:13:33.949930Z] 13:13:33     INFO - 14  firefox + 0x20e30
[task 2017-07-21T13:13:33.951608Z] 13:13:33     INFO -     rsp = 0x00007ffc52d77fb0   rip = 0x0000000000420e30
[task 2017-07-21T13:13:33.953233Z] 13:13:33     INFO -     Found by: stack scanning
[task 2017-07-21T13:13:33.955011Z] 13:13:33     INFO - 15  firefox + 0x5f1c
[task 2017-07-21T13:13:33.956681Z] 13:13:33     INFO -     rsp = 0x00007ffc52d77fc8   rip = 0x0000000000405f1c
[task 2017-07-21T13:13:33.958269Z] 13:13:33     INFO -     Found by: stack scanning
[task 2017-07-21T13:13:33.960189Z] 13:13:33     INFO - 16  firefox!_start + 0x29
[task 2017-07-21T13:13:33.962913Z] 13:13:33     INFO -     rsp = 0x00007ffc52d77fe0   rip = 0x0000000000405f45
[task 2017-07-21T13:13:33.964493Z] 13:13:33     INFO -     Found by: stack scanning
Flags: needinfo?(chutten)
Nailing these down one-by-one. Turns out that keyed histograms could be expired, too. Glad it was caught by tests.

Then there was a group of tests that only failed because they cleared histograms after using them.

Then there was the GTests which don't run on a generic ./mach test setup.

I'm hopeful this short run will be better https://treeherder.mozilla.org/#/jobs?repo=try&revision=29c1923d7bd2b1795328cffca041cf9de957c8be

Then I'll give it another push to inbound. Third time's the charm, right?
Flags: needinfo?(chutten)
Pushed by chutten@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/51f2cc05efde
Part 1 - Remove base::StatisticsRecorder. r=chutten
https://hg.mozilla.org/integration/mozilla-inbound/rev/d51abac894b8
Part 2 - Cleanup Chromium Histogram code. r=chutten
https://hg.mozilla.org/integration/mozilla-inbound/rev/9332d032c50d
Part 3 - Refactor TelemetryHistogram storage. r=chutten
https://hg.mozilla.org/integration/mozilla-inbound/rev/b376e9bcdfba
Part 4 - Small cleanups to previous patches. r=gfritzsche
https://hg.mozilla.org/integration/mozilla-inbound/rev/2c1ac79386ec
Part 5 - Send appropriately-structured Telemetry data to JS r=gfritzsche
https://hg.mozilla.org/integration/mozilla-inbound/rev/208fd1dccdd8
Part 6 - Update test_TelemetryHistograms for new snapshots format r=gfritzsche
https://hg.mozilla.org/integration/mozilla-inbound/rev/4a83b37bcc22
Part 7 - Use keyed histogram snaphots to build payloads. r=gfritzsche
https://hg.mozilla.org/integration/mozilla-inbound/rev/0aabd0830512
Part 8 - Include a message on telemetry testfails. r=gfritzsche
https://hg.mozilla.org/integration/mozilla-inbound/rev/517dfa79fb92
Part 9 - Fix Windows Build. r=gfritzsche
https://hg.mozilla.org/integration/mozilla-inbound/rev/b8baeb0f3f70
Part 10 - Nail down count histogram semantics. r=gfritzsche
https://hg.mozilla.org/integration/mozilla-inbound/rev/da02984a8e1f
Part 11 - Fix OSX Build. r=gfritzsche
https://hg.mozilla.org/integration/mozilla-inbound/rev/bd42fa11936e
Part 12 - Clean up after onesself. r=gfritzsche
https://hg.mozilla.org/integration/mozilla-inbound/rev/7b77ed78841f
Part 13 - Fix telemetry-using tests. r=Dexter
Blocks: 1383210
when this landed we see an improvmeent in our AWSY (memory) tests:
== Change summary for alert #8187 (as of July 24 2017 13:52 UTC) ==

Improvements:

  3%  Explicit Memory summary windows7-32 pgo      294,441,177.62 -> 284,339,985.39
  3%  Explicit Memory summary windows7-32 opt      292,889,092.76 -> 283,150,102.48
  3%  Explicit Memory summary windows10-64 pgo     375,935,551.08 -> 365,518,702.32
  3%  Explicit Memory summary windows10-64 opt     374,667,020.61 -> 364,865,660.94

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=8187
(In reply to Joel Maher ( :jmaher) (UTC-8) from comment #145)
> when this landed we see an improvmeent in our AWSY (memory) tests:

This is expected, intentional, and lovely.
Performance Impact: --- → P3
Whiteboard: [qf:p3]
You need to log in before you can comment on or make changes to this bug.