Closed Bug 1475557 Opened 6 years ago Closed 6 years ago

Investigate detailed memory usage of Telemetry in content processes

Categories

(Toolkit :: Telemetry, defect, P1)

defect

Tracking

()

RESOLVED FIXED

People

(Reporter: janerik, Assigned: janerik)

References

Details

In order to come up with a plan on how to reduce memory usage, we first need to know what is consuming it.
This bug is to track the investigation.
Assignee: nobody → jrediger
Blocks: 1475556
Priority: -- → P1
Depends on: 1475988
As a first step I implemented a more detailed view of allocated memory.
With the patches from bug 1475988 I now get these memory reports (from about:memory, post-processed for easy text-only viewing):

Main Process (pid 79146): 466072 (100.00%)
  telemetry: 466072 (100.00%)
    impl: 464 (0.10%)
    histogram: 405968 (87.10%)
      shallow: 131072 (28.12%)
      data: 274896 (58.98%)
    scalar: 8832 (1.89%)
      shallow: 8192 (1.76%)
      data: 640 (0.14%)
    WebRTC: 0 (0.00%)
    PrivateSQL: 0 (0.00%)
    SanitizedSQL: 0 (0.00%)
    IOObserver: 576 (0.12%)
    event: 50232 (10.78%)
      data: 50232 (10.78%)

Web Content (pid 79148): 156592 (100.00%)
  telemetry: 156592 (100.00%)
    impl: 464 (0.30%)
    histogram: 131072 (83.70%)
      shallow: 131072 (83.70%)
      data: 0 (0.00%)
    scalar: 8192 (5.23%)
      shallow: 8192 (5.23%)
      data: 0 (0.00%)
    WebRTC: 0 (0.00%)
    PrivateSQL: 0 (0.00%)
    SanitizedSQL: 0 (0.00%)
    event: 16864 (10.77%)
      data: 16864 (10.77%)

WebExtensions (pid 79149): 156592 (100.00%)
  telemetry: 156592 (100.00%)
    impl: 464 (0.30%)
    histogram: 131072 (83.70%)
      shallow: 131072 (83.70%)
      data: 0 (0.00%)
    scalar: 8192 (5.23%)
      shallow: 8192 (5.23%)
      data: 0 (0.00%)
    WebRTC: 0 (0.00%)
    PrivateSQL: 0 (0.00%)
    SanitizedSQL: 0 (0.00%)
    event: 16864 (10.77%)
      data: 16864 (10.77%)
The histogram/shallow memory is coming from |gNameToHistogramIDMap|[1].
We allocate a large hashtable of String->Integer mappings (the strings are not allocated/copied, they come from static memory).
This hashtable is completely static/immutable after initialization. We might be able to actually generate such a table at compile time (e.g. Perfect Hash Table) to avoid allocation. This however requires a bit more compile-time magic (as the list of valid histogram names depends on some defines, e.g. XP_UNIX).

Similar is true for scalar/shallow. We have a String->ID map there as well, but that one is mutable due to dynamic scalars.

I have not yet looked into event/data. I suspect this memory is used by actual storage and dynamic event info.


[1]: https://searchfox.org/mozilla-central/rev/6f86cc3479f80ace97f62634e2c82a483d1ede40/toolkit/components/telemetry/TelemetryHistogram.cpp#1977-1988
Event data:

gEventRecords: 0
gEventNameIDMap: 8192
gEventNameIDMap Keys: 8288
gCategoryNames: 384
gEnabledCategories: 0
As a PoC I replaced the huge dynamic gNameToHistogramIDMap with a perfect hash table generated at compile time using gperf[1].
For now this stripped all cpp_guards, as those complicate build-time generation a bit.

The numbers before this change (Histogram only)

Main Process: 422024
  telemetry: 422024
    histogram: 361920
      shallow: 131072
      data: 230848
Web Content: 156592
  telemetry: 156592
    histogram: 131072
      shallow: 131072
      data: 0

After swapping for the PHF:

Main Process: 296936
  telemetry: 296936
    histogram: 236832
      shallow: 0
      data: 236832
Web Content: 25520
  telemetry: 25520
    histogram: 0
      shallow: 0
      data: 0

-> All dynamic allocated Histogram data in the content process eliminated, in the Main process we reduced it by ~125Kb (slight increase for the storage of slots, but far bigger reduce of the dynamic map)

[1]: https://www.gnu.org/software/gperf/
Using the patch from bug 1463569 and the tools from heapgraph[1] I get similar numbers as reported in bug 1470339.
For TelemetryController most seems to be in the actual implementation[2] as expected.
Next step is to break that down into its parts and see what is actually needed in a Content process (same for the other JSM files)


[1]: https://github.com/amccreight/heapgraph/tree/master/g
[2]: https://searchfox.org/mozilla-central/rev/c296d5b2391c8b37374b118180b64cca66c0aa16/toolkit/components/telemetry/TelemetryController.jsm#268
Depends on: 1477213
No longer depends on: 1477213
Document describing the different ways to analyse the memory reports: https://docs.google.com/document/d/18C7c-KAdkmldkt3j1zQujOUdz9irso6yt8NCGbZlDSA/
Proposal/Ideas to refactor current Telemetry modules to reduce memory usage: https://docs.google.com/document/d/10m2MS19J1ybCW-ruxlc7B9HeBCJ-6ykRLAeNt0lrrj4/edit#
The proposal is quickly expanding with some action items listed (some are already submitted as patches).
I opened additional bugs for further investigation/implementation of the different parts (and linked all to the metabug).

The investigation is therefore done for now with follow-up work lined up.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.