Consider reading ProfileBuffer once to generate the JSON profile
Categories
(Core :: Gecko Profiler, task, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox95 | --- | fixed |
People
(Reporter: mozbugz, Assigned: mozbugz)
References
(Blocks 1 open bug)
Details
Attachments
(10 files)
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
Bug 1577658 - Factor DoStreamSamplesToJSON out of ProfileBuffer::StreamSamplesToJSON - r?canaltinova
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review |
Currently the profiler reads the whole ProfileBuffer contents multiple times: Once for each section of the output JSON.
This means we read the whole multi-megabyte buffer twice per profiled thread (samples + markers) and a few more times for paused ranges, overheads, and counters.
The advantage is that we only deal with one writer with one writing "head".
It would be interesting to see if doing the opposite (one reader, multiple writers) would be faster.
Intuitively:
- The multi-megabyte buffer probably doesn't fit in any CPU cache, so reading it once should help (good)
- Managing multiple writers needs more work (bad)
- Each writer only goes forward, and there are a limited number of them, so I think each writing head will have its own little cache area, and therefore adding bytes to each one in small bursts should be cache-friendly (good, but quite hand-wavey!)
- Parallelization opportunities: The reader could quickly determine for each entry which type it is and (if needed) which threadId it relates to, and let the relevant writer thread handle the data payload (good, if the overhead of sending the workload is not too big).
Input welcome.
(Realistically, I don't think this is of high-enough priority compared to much of the other overhead improvements needed, but I'm capturing the idea in this bug, so I don't keep thinking about it, and others can chime in as well.)
Comment 1•5 years ago
|
||
The only concern I have is that by writing to more writers, and then concatenating them all in one big json, we might have more memory usage.
Otherwise this sounds good!
Assignee | ||
Comment 2•3 years ago
|
||
(In reply to Julien Wajsberg [:julienw] from comment #1)
The only concern I have is that by writing to more writers, and then concatenating them all in one big json, we might have more memory usage.
Actually, the profilers' SpliceableJSONWriter
uses a list of chunks, so when concatenating big JSON strings, they're "spliced" in by just taking their unique pointers into the list, there is no extra memory used (apart from the pointer list itself).
(Sorry I didn't reply earlier, maybe I didn't know what to say then?)
Assignee | ||
Comment 3•3 years ago
|
||
Since we're collecting more and more data (more markers, more threads, more processes!), this is getting more urgent.
Assignee | ||
Comment 4•3 years ago
|
||
Previously, DeserializeAfterKindAndStream would take a JSON writer and a thread id, using the thread id (if specified) to only output markers from that thread to the given writer.
Now, DeserializeAfterKindAndStream takes a lambda, which will be called with the thread id found in the marker, that lambda can either return null (nothing to output) or a pointer to a writer, in which case the marker is read and output to the given writer.
This makes DeserializeAfterKindAndStream more flexible, and will allow handling markers from different threads, each possibly being output to different writers.
Also, for simplicity the entry is now always fully read, so there is no need for the caller to do anything. The return bool is therefore unnecessary, and has been removed.
Assignee | ||
Comment 5•3 years ago
|
||
When it's constructed, UniqueStacks can now take a ProfilerCodeAddressService pointer, instead of having to add it separately.
It will be even more useful in a later patch, when constructing it as a struct member in one shot.
Depends on D128433
Assignee | ||
Comment 6•3 years ago
|
||
The start of StreamJSON is about setting up the UniqueStacks, it can be factored out.
In a later patch, this will be used to prepare the UniqueStacks for each thread in one loop, before processing the profile buffer.
Depends on D128434
Assignee | ||
Comment 7•3 years ago
|
||
While processing the profile buffer, the EntryGetter's iterator is used to read some "modern" entries, but then the EntryGetter is left to skip these entries again until the next "legacy" entry (with e.Next()
).
Now, after reading modern entries, the EntryGetter's iterator is updated to directly skip the already-read modern entries.
And in a later patch this will be in fact necessary, otherwise we would process this modern data twice!
Depends on D128437
Assignee | ||
Comment 8•3 years ago
|
||
The body of StreamSamplesAndMarkers has been factored out into a templated function, with some lambdas to cover the buffer-reading and streaming of samples and markers.
In a later patch, DoStreamSamplesAndMarkers will be used to also stream already-processed samples and markers.
Depends on D128438
Assignee | ||
Comment 9•3 years ago
|
||
StreamTraceLoggerJSON doesn't rely on ProfiledThreadData, so it can just be a file-static function.
The code hasn't changed at all, the function needed to be moved up the file, before its first use.
Depends on D128439
Assignee | ||
Comment 10•3 years ago
|
||
This new function will be used a second time in a later patch.
Depends on D128440
Assignee | ||
Comment 11•3 years ago
|
||
ThreadStreamingContext contains all the information needed to output samples and markers relevant to one thread.
ProcessStreamingContext contains a list of ThreadStreamingContext's for all profiled threads. This way, when reading the profile buffer, it will be possible to send each sample&marker to the relevant JSON writer.
Depends on D128441
Assignee | ||
Comment 12•3 years ago
|
||
This refactoring is a bit more complex (to prepare for the next patch).
The body of StreamSamplesToJSON was moved to a function that takes a lambda, which will provide streaming parameters relevant to the thread id of each read sample.
For now, it's only used by StreamSamplesToJSON, so in this case the lambda only checks if the entry thread id is the expected one, and it provides the JSON writer and other things needed to write these samples.
One extra complexity is that the ProfileSample sample
that was outside the loop is now removed -- because in later patches this code will handle samples from different threads, so they cannot all share this one sample object. Instead the information that must be carried over is in the streaming parameters (mPreviousStackState and mPreviousStack), so that they correspond to one thread each. But the overall logic is the same, apart from where this out-of-the-loop information lives.
And another difference is that old samples (before aSinceTime
) are fully processed, and then discarded just before being output, because we need their information to be in the thread's UniqueStacks, in case it's used by a later same-as-before sample.
Depends on D128442
Assignee | ||
Comment 13•3 years ago
|
||
Finally it all comes together!
Factored-out functions from previous patches are now used by the new functions, which work with streaming contexts, in order to prepare stacks are other things first, then read the profile buffer just once to extract all samples and markers, and finally output this data in the expected JSON format.
One trick, is that when working on "legacy" entries (mostly related to samples), we need to handle "modern" marker entries as we encounter them; this happens in EntryGetter, and also when reading compact stacks and same-samples.
Depends on D128443
Comment 14•3 years ago
|
||
Comment 15•3 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/e34e1bf0af09
https://hg.mozilla.org/mozilla-central/rev/a2a8751250c1
https://hg.mozilla.org/mozilla-central/rev/622312b0321a
https://hg.mozilla.org/mozilla-central/rev/d3046c6ab51e
https://hg.mozilla.org/mozilla-central/rev/40a54e6db0db
https://hg.mozilla.org/mozilla-central/rev/6c17f87fab6e
https://hg.mozilla.org/mozilla-central/rev/ba7fdaf84afb
https://hg.mozilla.org/mozilla-central/rev/b9eb9afeae64
https://hg.mozilla.org/mozilla-central/rev/d4ba9282fd84
https://hg.mozilla.org/mozilla-central/rev/98084f195152
Description
•