Closed Bug 1373240 Opened 3 years ago Closed 2 years ago

Further refactor TelemetryHistogram storage to be kinder to the binary, child processes

Categories

(Toolkit :: Telemetry, enhancement, P1)

enhancement
Points:
2

Tracking

()

RESOLVED FIXED
mozilla57
Tracking Status
firefox57 --- fixed

People

(Reporter: froydnj, Assigned: chutten)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

internal_GetSubsessionHistogram has:

  static Histogram* subsession[mozilla::Telemetry::HistogramCount] = {};
  static Histogram* subsessionContent[mozilla::Telemetry::HistogramCount] = {};
  static Histogram* subsessionGPU[mozilla::Telemetry::HistogramCount] = {};
  static Histogram* subsessionExtension[mozilla::Telemetry::HistogramCount] = {};

These arrays take up ~50KB of space in the binary.  My initial thought was that any given process should only need one of these arrays, so we're wasting a good bit of space here.  But--being unfamiliar with the code here--it's entirely possible that the *parent* process actually needs all of these and the content/gpu/extension/file/whatever process only needs one?

In any event, could we consider allocating these dynamically, so we could trim down the binary size and not waste space when we don't need it?
Storage is being refactored in bug 1366294, so these are going away.

That being said, there's nothing in the current patchset that prevents a similar sort of allocation. They are indeed parent-only (all child processes dispatch their accumulations via IPC, so all TelemetryHistogram storage is superfluous in those processes).

I'll do my best to keep that in mind while working on the next few patches in the bug.
To keep the scope of bug 1366294 limited, we could just follow up in this bug on trimming down size.
Depends on: 1366294
Priority: -- → P3
Points: --- → 2
Assignee: nobody → chutten
Priority: P3 → P1
Summary: internal_GetSubsessionHistogram's subsession cache arrays are too big → Further refactor TelemetryHistogram storage to be kinder to the binary, child processes
Priority: P1 → P2
Status: NEW → ASSIGNED
Priority: P2 → P1
Comment on attachment 8902389 [details]
bug 1373240 - Use somewhat-more-dynamically-allocated histogram storage

https://reviewboard.mozilla.org/r/173960/#review179190

::: toolkit/components/telemetry/TelemetryHistogram.cpp:229
(Diff revision 1)
> -Histogram* gHistogramStorage[HistogramCount][uint32_t(ProcessID::Count)][uint32_t(SessionType::Count)] = {};
> +unordered_map<HistogramID, unordered_map<ProcessID, unordered_map<SessionType,
> +shared_ptr<Histogram>>>>
> +gHistogramStorage = {};

Maybe a single `unordered_map` keyed by some (HistogramID, ProcessID, SessionType) tuple/struct would be better?  I think it probably winds up with less overhead.

Also, any reason why you're not using `nsClassHashtable` or similar here?

::: toolkit/components/telemetry/TelemetryHistogram.cpp:234
(Diff revision 1)
> -KeyedHistogram* gKeyedHistogramStorage[HistogramCount][uint32_t(ProcessID::Count)] = {};
> +unordered_map<HistogramID, unordered_map<ProcessID,
> +shared_ptr<KeyedHistogram>>>
> +gKeyedHistogramStorage = {};

Same argument here for not nesting `unordered_map` types.
Comment on attachment 8902389 [details]
bug 1373240 - Use somewhat-more-dynamically-allocated histogram storage

https://reviewboard.mozilla.org/r/173960/#review179190

> Maybe a single `unordered_map` keyed by some (HistogramID, ProcessID, SessionType) tuple/struct would be better?  I think it probably winds up with less overhead.
> 
> Also, any reason why you're not using `nsClassHashtable` or similar here?

Ah, but then I'd actually have to think about the hash value beyond just casting to size_t :D

You're right, of course. It would also take care of the Warning C4503 from MSVC I got on try.

As for using nsClassHashtable, there's no reason except for my unfamiliarity with it. Shall I presume https://developer.mozilla.org/en-US/docs/Detailed_XPCOM_hashtable_guide is still a decent guide for messing about with hashtables?
Comment on attachment 8902389 [details]
bug 1373240 - Use somewhat-more-dynamically-allocated histogram storage

https://reviewboard.mozilla.org/r/173960/#review179378

Hey Chris, I think this looks good Telemetry-wise, with the nits below addressed. I'm holding off the r+ because I'd love to take another look at this after the nsClassHashtable change (assuming that's happening). Let me know if that is not happening or froydnj will take care of that part of the review and I'll gladly r+.

::: toolkit/components/telemetry/TelemetryHistogram.cpp:238
(Diff revision 1)
>  // KeyedHistogram keeps track of session & subsession histograms internally.
> -KeyedHistogram* gKeyedHistogramStorage[HistogramCount][uint32_t(ProcessID::Count)] = {};
> +unordered_map<HistogramID, unordered_map<ProcessID,
> +shared_ptr<KeyedHistogram>>>
> +gKeyedHistogramStorage = {};
> +//Histogram* gHistogramStorage[HistogramCount][uint32_t(ProcessID::Count)][uint32_t(SessionType::Count)] = {};
> +//KeyedHistogram* gKeyedHistogramStorage[HistogramCount][uint32_t(ProcessID::Count)] = {};

nit: let's remove these commented code lines

::: toolkit/components/telemetry/TelemetryHistogram.cpp:315
(Diff revision 1)
> +                                    Histogram* aHistogram)
> +{
> +  MOZ_ASSERT(XRE_IsParentProcess());
> +  if (gExpiredHistogram && aHistogram == gExpiredHistogram.get()) {
> +    gHistogramStorage[aHistogramId][aProcessId][aSessionType] =
> +    gExpiredHistogram;

nit: let's indent this with 2 spaces

```
  gKeyedHistogramStorage[aHistogramId][aProcessId] =
    gExpiredHistogram;
```

::: toolkit/components/telemetry/TelemetryHistogram.cpp:319
(Diff revision 1)
> +    gHistogramStorage[aHistogramId][aProcessId][aSessionType] =
> +    gExpiredHistogram;
> +    return;
> +  }
> +  gHistogramStorage[aHistogramId][aProcessId][aSessionType] =
> +  shared_ptr<Histogram>(aHistogram);

nit: shouldn't this be indented with 2 spaces too?

::: toolkit/components/telemetry/TelemetryHistogram.cpp:347
(Diff revision 1)
> +                                         ProcessID aProcessId,
> +                                         KeyedHistogram* aKeyedHistogram)
> +{
> +  MOZ_ASSERT(XRE_IsParentProcess());
> +  gKeyedHistogramStorage[aHistogramId][aProcessId] =
> +  shared_ptr<KeyedHistogram>(aKeyedHistogram);

nit: should this be indented with +2 spaces too?

::: toolkit/components/telemetry/TelemetryHistogram.cpp:1788
(Diff revision 1)
>    gCanRecordBase = false;
>    gCanRecordExtended = false;
>    gNameToHistogramIDMap.Clear();
>    gInitDone = false;
>  
>    // FactoryGet `new`s Histograms for us, but requires us to manually delete.

Is this comment still relevant?
Comment on attachment 8902389 [details]
bug 1373240 - Use somewhat-more-dynamically-allocated histogram storage

https://reviewboard.mozilla.org/r/173960/#review179904

::: toolkit/components/telemetry/TelemetryHistogram.cpp:251
(Diff revision 3)
>  
> +uint64_t internal_StorageKey(HistogramID aHistogramId,
> +                             ProcessID aProcessId,
> +                             SessionType aSessionType = SessionType::Subsession)
> +{
> +  static_assert(uint32_t(SessionType::Count) < (1 << 17),

Does this win us that much compared to having a nested maps as you had in the first iteration of this?

If the answer is *yes*, the I would be happier if we could add a comment here explaining why this is better than the other option and describing a bit what's going on here.
Comment on attachment 8902389 [details]
bug 1373240 - Use somewhat-more-dynamically-allocated histogram storage

https://reviewboard.mozilla.org/r/173960/#review179904

> Does this win us that much compared to having a nested maps as you had in the first iteration of this?
> 
> If the answer is *yes*, the I would be happier if we could add a comment here explaining why this is better than the other option and describing a bit what's going on here.

Oh, I just realized I'm wrong and this addresses Nathan's suggestion :) Let's drop this.
Comment on attachment 8902389 [details]
bug 1373240 - Use somewhat-more-dynamically-allocated histogram storage

https://reviewboard.mozilla.org/r/173960/#review179908
Attachment #8902389 - Flags: review?(alessio.placitelli) → review+
Attachment #8902389 - Flags: feedback?(nfroyd)
Comment on attachment 8902389 [details]
bug 1373240 - Use somewhat-more-dynamically-allocated histogram storage

https://reviewboard.mozilla.org/r/173960/#review180474

::: toolkit/components/telemetry/TelemetryHistogram.cpp:199
(Diff revision 3)
> -Histogram* gHistogramStorage[HistogramCount][uint32_t(ProcessID::Count)][uint32_t(SessionType::Count)] = {};
> +nsClassHashtable<nsUint64HashKey, Histogram> gHistogramStorage;
>  // Keyed histograms internally map string keys to individual Histogram instances.
>  // KeyedHistogram keeps track of session & subsession histograms internally.
> -KeyedHistogram* gKeyedHistogramStorage[HistogramCount][uint32_t(ProcessID::Count)] = {};
> +nsClassHashtable<nsUint64HashKey, KeyedHistogram> gKeyedHistogramStorage;

We just spend some time to avoid hashtable lookups in bug 1366294.
This lookup path is i think the most critical in our histogram code.

How does the `nsClassHashtable<uint64>` lookup performance compare to a flat array?

If there is any difference, why can't we use dynamically allocated arrays here?
In childs we wouldn't even use the buffer. In the parent the code would look basically identical to now.
Flags: needinfo?(chutten)
To be fair, the hashtable lookups we were avoiding were string-keyed. Constructing and hashing a 64-bit key is likely trivial in comparison.

That being said, it is almost certainly slower than the pointer arithmetic it replaced. Adding is fast, after all. And there's likely to be a lot of code between subsequent calls into the data structure, so any fun caching tricks are probably not something we can count on making up the difference.

So the question then becomes: are some handful of instructions for the key assembly and hashing worth the couple of kilobytes it saves?

I argue the convenience and readability of nsClassHashtable is worth it compared to

gHistogramStorage = new Histogram***[HistogramCount];
for (uint i = 0; i < HistogramCount; ++i) {
  gHistogramStorage[i] = new Histogram**[ProcessCount];
  for (uint j = 0; j < ProcessCount; ++j) {
    gHistogramStorage[i][j] = new Histogram*[SessionCount];
  }
}

(and again in reverse in DeInitialize)

:froydnj, do you have any comments about dynamic allocation patterns in mozilla-central and what is typical for high-performance lookups?
Flags: needinfo?(chutten) → needinfo?(nfroyd)
(In reply to Chris H-C :chutten from comment #13)
> To be fair, the hashtable lookups we were avoiding were string-keyed.
> Constructing and hashing a 64-bit key is likely trivial in comparison.

Especially if StatisticsRecorder was using `map` instead of `unordered_map`.

> :froydnj, do you have any comments about dynamic allocation patterns in
> mozilla-central and what is typical for high-performance lookups?

That's a pretty broad topic!

I will say that hashtable lookups have been considered hot enough to attempt to remove them from some DOM and HTML parser codepaths by putting small array-backed caches in front of them.  That's not really feasible here; a thread-safe cache would be a bit tedious, and maybe not even that fast.

I'd try measuring to see if you can discern a difference.  The hashtable has a shot at being more compact than your giant lookup array, which probably helps performance some?

One caveat is that resizing the hashtable while getting histograms may fail and therefore crash the program.  It may be better from that standpoint to allocate all your memory upfront in some sort of flat, non-resizeable data structure.

> I argue the convenience and readability of nsClassHashtable is worth it
> compared to
> 
> gHistogramStorage = new Histogram***[HistogramCount];
> for (uint i = 0; i < HistogramCount; ++i) {
>   gHistogramStorage[i] = new Histogram**[ProcessCount];
>   for (uint j = 0; j < ProcessCount; ++j) {
>     gHistogramStorage[i][j] = new Histogram*[SessionCount];
>   }
> }
> 
> (and again in reverse in DeInitialize)

If you did decide to dynamically allocate a array-based structure, please don't do it this way.  Please do something more like:

gHistogramStorage = new Histogram*[HistogramCount * ProcessCount * SessionCount];

and then perform the multi-dimensional array arithmetic yourself, rather than allocating a bunch of arrays.  You could have a single function for computing the index for a given (histogram, process, session) tuple, much like you do now for hashtables.

Not sure if all that is helpful or not.
Flags: needinfo?(nfroyd)
Comment on attachment 8902389 [details]
bug 1373240 - Use somewhat-more-dynamically-allocated histogram storage

https://reviewboard.mozilla.org/r/173960/#review180542

::: toolkit/components/telemetry/TelemetryHistogram.cpp:332
(Diff revision 3)
> -  Histogram* h = gHistogramStorage[histogramId][uint32_t(processId)][uint32_t(sessionType)];
> +  Histogram* h = internal_GetHistogramFromStorage(histogramId,
> +                                                  processId,
> +                                                  sessionType);
>    if (h || !instantiate) {
>      return h;
>    }
>  
>    const HistogramInfo& info = gHistogramInfos[histogramId];
>    const int bucketsOffset = gExponentialBucketLowerBoundIndex[histogramId];
>    h = internal_CreateHistogramInstance(info, bucketsOffset);
>    MOZ_ASSERT(h);
> -  gHistogramStorage[histogramId][uint32_t(processId)][uint32_t(sessionType)] = h;
> +  internal_SetHistogramInStorage(histogramId, processId, sessionType, h);

This only matters for the first get of the histogram, but it'd be worth thinking about how to design the API so that you actually return some sort of pointer or wrapper for the Histogram slot in your storage, something like:

```
  auto entry = GetHistogramEntryFromStorage(...);
  if (entry) {
    return *entry;
  }
  if (!instantiate) {
    return nullptr;
  }
  ...
  Histogram *h = ...;
  entry.Store(h);
```

which would avoid the second lookup for the store.  (The compiler probably does this for you already with the array scheme you have, fwiw.)  `nsBaseHashtable` (and therefore `nsClassHashtable` has such an API:

http://dxr.mozilla.org/mozilla-central/source/xpcom/ds/nsBaseHashtable.h#298

and it's been really useful in eliminating extraneous hashtable lookups.
Comment on attachment 8902389 [details]
bug 1373240 - Use somewhat-more-dynamically-allocated histogram storage

Commented in mozreview; not sure from the preceding discussion whether you'll use a hashtable or some kind of array, but this code looks OK.
Attachment #8902389 - Flags: feedback?(nfroyd) → feedback+
Arrays for the Quantum Array gods!
Comment on attachment 8902389 [details]
bug 1373240 - Use somewhat-more-dynamically-allocated histogram storage

https://reviewboard.mozilla.org/r/173960/#review182274

This looks good, just some smaller comments below.

::: toolkit/components/telemetry/TelemetryHistogram.cpp:254
(Diff revision 4)
> +size_t internal_HistogramStorageIndex(HistogramID aHistogramId,
> +                                      ProcessID aProcessId,
> +                                      SessionType aSessionType)
> +{

It looks like hiding this in a class will improve readability / arguing about this.
Do we have a bug on that as a follow-up?

::: toolkit/components/telemetry/TelemetryHistogram.cpp:286
(Diff revision 4)
> +{
> +  // Histograms are stored only in the parent process
> +  MOZ_ASSERT(XRE_IsParentProcess());
> +
> +  size_t index = internal_HistogramStorageIndex(aHistogramId, aProcessId, aSessionType);
> +  gHistogramStorage[index] = aHistogram;

This is a potential footgun for overwriting the storage.
Lets assert that this is currently set to `nullptr`.

Ditto for `SetKeyedHistogramInStorage()` below.

::: toolkit/components/telemetry/TelemetryHistogram.cpp:1739
(Diff revision 4)
>    gNameToHistogramIDMap.Clear();
>    gInitDone = false;
>  
>    // FactoryGet `new`s Histograms for us, but requires us to manually delete.
> -  for (size_t i = 0; i < HistogramCount; ++i) {
> -    for (uint32_t process = 0; process < static_cast<uint32_t>(ProcessID::Count); ++process) {
> +  if (XRE_IsParentProcess()) {
> +    for (size_t i = 0; i < HistogramCount * size_t(ProcessID::Count) * size_t(SessionType::Count); ++i) {

You can just do:
```
for (Histogram* h : gHistogramStorage) {
  ...
}
...
```

::: toolkit/components/telemetry/TelemetryHistogram.cpp:1743
(Diff revision 4)
> -      for (uint32_t session = 0; session <
> -        static_cast<uint32_t>(SessionType::Count); ++session) {
> -        if (gHistogramStorage[i][process][session] == gExpiredHistogram) {
> -          continue;
> -        }
> +      }
> -        delete gHistogramStorage[i][process][session];
> +      if (gHistogramStorage[i] && gHistogramStorage[i] != gExpiredHistogram) {

You can skip the first check, calling `delete nullptr` is fine (for standard deallocation functions or if the user-provided one handles it).
Attachment #8902389 - Flags: review?(gfritzsche)
Blocks: 1397749
Comment on attachment 8902389 [details]
bug 1373240 - Use somewhat-more-dynamically-allocated histogram storage

https://reviewboard.mozilla.org/r/173960/#review182274

> It looks like hiding this in a class will improve readability / arguing about this.
> Do we have a bug on that as a follow-up?

Filed bug 1397749 to hold the investigation, discussion, implementation.

> You can just do:
> ```
> for (Histogram* h : gHistogramStorage) {
>   ...
> }
> ...
> ```

Not if I want to clear both arrays at the same time I can't. But to be fair, clearing both arrays in a single loop doesn't aid comprehension and probably isn't a particularly good thing from a perf standpoint either. I'll split.

> You can skip the first check, calling `delete nullptr` is fine (for standard deallocation functions or if the user-provided one handles it).

I was worried more about the possibility that gExpiredHistogram was null (no expired histograms in the session) and thus the vast majority of the storage would be coded as "expired" rather than empty.

Turns out, upon actually thinking about it, that it doesn't matter since there is no actual consequence in the code as written... but there you go. Brains will be brains.
Comment on attachment 8902389 [details]
bug 1373240 - Use somewhat-more-dynamically-allocated histogram storage

https://reviewboard.mozilla.org/r/173960/#review182274

> Not if I want to clear both arrays at the same time I can't. But to be fair, clearing both arrays in a single loop doesn't aid comprehension and probably isn't a particularly good thing from a perf standpoint either. I'll split.

Annnnd, actually, since these arrays aren't actually arrays and are instead pointers, I can't use new-style loops.
(In reply to Chris H-C :chutten from comment #21)
> > Not if I want to clear both arrays at the same time I can't. But to be fair, clearing both arrays in a single loop doesn't aid comprehension and probably isn't a particularly good thing from a perf standpoint either. I'll split.
> 
> Annnnd, actually, since these arrays aren't actually arrays and are instead
> pointers, I can't use new-style loops.

Ah, right. So, that is an argument for an `nsTArray`.
Comment on attachment 8902389 [details]
bug 1373240 - Use somewhat-more-dynamically-allocated histogram storage

https://reviewboard.mozilla.org/r/173960/#review182454

This looks ok.
If it's easy to switch i would much prefer a `nsTArray` as storage here as it enables the range-based for loop.
Attachment #8902389 - Flags: review?(gfritzsche) → review+
Pushed by chutten@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/0181bf16af4f
Use somewhat-more-dynamically-allocated histogram storage r=Dexter,gfritzsche
Landing commits as-is. Rewriting to use an nsTArray<(Keyed)Histogram*> was proving to be not quite as trivial as I'd hoped. I can file a follow-up if you'd like?
Backed out for leaks detected by asan:

https://hg.mozilla.org/integration/autoland/rev/a8d468af2cd355a1b01e566dcfc2ca5d70363dc0

Push with failreures: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=0181bf16af4f45ddab48bf2e7d3b0410a17c0d08&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=129570283&repo=autoland

[task 2017-09-08T14:25:52.707272Z] 14:25:52     INFO - TEST-INFO | LeakSanitizer | This can be done in testing/mozbase/mozrunner/mozrunner/utils.py
[task 2017-09-08T14:25:52.708677Z] 14:25:52    ERROR - TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at base::LinearHistogram::FactoryGet, GetHistogram, GetHistogram, internal_Accumulate
[task 2017-09-08T14:25:52.718082Z] 14:25:52    ERROR - TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at allocate, allocate, _M_allocate, std::vector
[task 2017-09-08T14:25:52.721065Z] 14:25:52    ERROR - TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at allocate, allocate, _M_allocate, _M_create_storage
[task 2017-09-08T14:25:52.726955Z] 14:25:52    ERROR - TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at base::Histogram::FactoryGet, GetHistogram, GetHistogram, internal_Accumulate
Flags: needinfo?(chutten)
Dang. There was some cruft left in the patch from when expired histograms were accounted for in a separate hashset.
Flags: needinfo?(chutten)
(In reply to Chris H-C :chutten from comment #26)
> Landing commits as-is. Rewriting to use an nsTArray<(Keyed)Histogram*> was
> proving to be not quite as trivial as I'd hoped. I can file a follow-up if
> you'd like?

That sounds good. Or possibly take it to the existing follow-up bug (about wrapping storage in a class).
Comment on attachment 8902389 [details]
bug 1373240 - Use somewhat-more-dynamically-allocated histogram storage

https://reviewboard.mozilla.org/r/173960/#review182820
Pushed by chutten@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5e511ebfe505
Use somewhat-more-dynamically-allocated histogram storage r=Dexter,gfritzsche
https://hg.mozilla.org/mozilla-central/rev/5e511ebfe505
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
You need to log in before you can comment on or make changes to this bug.