Reduce the overhead of telemetry in content processes
Categories
(Toolkit :: Telemetry, enhancement, P3)
Tracking
()
People
(Reporter: erahm, Unassigned)
References
(Blocks 1 open bug)
Details
(Whiteboard: [overhead:100k])
Comment 1•7 years ago
|
||
| Reporter | ||
Comment 2•7 years ago
|
||
Comment 3•7 years ago
|
||
Comment 4•7 years ago
|
||
Updated•7 years ago
|
Comment 5•7 years ago
|
||
Comment 6•7 years ago
|
||
| Reporter | ||
Comment 7•7 years ago
|
||
Comment 8•7 years ago
|
||
Comment 9•7 years ago
|
||
| Reporter | ||
Comment 10•7 years ago
|
||
Comment 11•6 years ago
|
||
I spent a little time investigating whether expired metrics are a memory burden. They shouldn't be, conceptually, since all operations on them are ignored... but there are a lot of them.
There are over 300 expired Histograms, about 30 expired scalars, and 0 expired events. So let's look deeper into the cost of maintaining expired Histograms.
To error on unknown Histograms but no-op on expired histograms we need to keep IDs and Strings for expired histograms around. So no savings there unless we're willing to change semantics (for instance, if we didn't error on any unknown or expired histogram, or only errored on unknown histograms in debug mode we could drop a bunch of strings from the table). But we do have some data structures proportional in size to the number of histogram IDs there are, expired and non.
We have
- gHistogramStorage and gKeyedHistogramStorage which are pointer arrays for speedy access. Both are of size HistogramCount * ProcessTypeCount and are allocated on the heap. They're only built in the parent process, though.
- gHistogramInfos and gHistogramBucketLowerBoundIndex are const arrays that could have expired info removed. gHistogramInfos is an array of POD structs HistogramCount in length and about... 344b apiece? gHistogramBucketLowerBoundIndex is an int16_t[HistogramCount].
- gHistogramRecordingDisabled is a bool[HistogramCount] statically allocated in each process.
Summing it all up, the extra 300 expired histograms are costing us 230kbit in the parent process heap, 110kbit readonly shared, and 2kbit per process. In a 10-process world that's 360kbit. In a 100-process world, 540kbit. In a 1000-process world, 2.3mbit.
To me this doesn't appear to be worth the effort (Too much squeeze, not enough juice), but I'm documenting it here in case it does turn out to be worth it.
(( The plan was to sort the histograms by expiry and not allocate anything from before the current version. Expiry checks become aId >= UnexpiredHistogramCount instead of strcmp, and all the storage mentioned above would be of length UnexpiredHistogramCount. ))
Comment 12•6 years ago
|
||
(In reply to Chris H-C :chutten from comment #11)
Summing it all up, the extra 300 expired histograms are costing us 230kbit in the parent process heap, 110kbit readonly shared, and 2kbit per process. In a 10-process world that's 360kbit. In a 100-process world, 540kbit. In a 1000-process world, 2.3mbit.
To me this doesn't appear to be worth the effort (Too much squeeze, not enough juice), but I'm documenting it here in case it does turn out to be worth it.
How much work is it to make this overhead go away? Saving 200kb of memory (even if only in the parent process), and reducing read-only tables by 110kb seems totally worth it,.
Comment 13•6 years ago
|
||
One thing I just noticed is that, if we have to preserve semantics, we can't actually remove the expired histograms' HistogramInfo from gHistogramInfos because we still need to know that they're expired, not just unknown. If we're willing to treat both expired and unknown histograms referenced by name as though they're expired (no error, just no-ops) then we can increase our savings and actually make this worth our while.
Because then not only do we get the gHistogramInfos savings I erroneously mentioned earlier, we also save their names from gHistogramStringTable, so that's sum(len(expired histogram names)) * sizeof(char) from read-only tables. Let's say ~25 chars per name, 300 names: 60kbit
And a bit of other savings I missed before is that HistogramInfo would no longer need expiration_offset (a uint32_t) so that's 32bit * 1500 (so 48kbit). We also don't need the version strings in the string table, so that's another sum(len(unique version strings)) * sizeof(char). (both from the read-only tables). Let's say 35 unique version strings of about 2 characters plus \0: 840bit
Totalled up that adds another ~110kbit in the read-only tables we could save.
As for the work required, well...
It requires making gen_histogram_data perform the expiry checks and omitting expired data. Which is the easy part.
But then it gets tricky. We need to dive deep into TelemetryHistogram to change how some things are handled. We assume if you have a HistogramID that's good enough to get you an index into certain storages, and that will no longer be universally true.
I'm not sure how tricky that'd be.
I think it might be worth going into it if we changed our semantics so that unknown histograms referenced by their string name are handled the same way as expired histograms.
ni?gfritzsche for his opinions on changing the semantics as described.
Comment 14•5 years ago
|
||
Is this question still current?
Comment 15•5 years ago
|
||
Yes. Do you have opinions on changing getHistogramById(id) to always return some sentinel Histogram instance, even if the id is utter nonsense?
Comment 16•5 years ago
|
||
I don't think i have good context for this right now, but i'm wondering how this works out for error handling, when specifying an invalid histogram id.
Getting feedback on a proposed solution from Firefox JS engineers on how that would affect them seems important.
Comment 17•5 years ago
|
||
That's exactly the sort of wisdom I was hoping for.
Alrighty, this is prioritized and questions are answered. To the backlog with you, bug!
Updated•3 years ago
|
Description
•