Closed Bug 1577658 Opened 2 years ago Closed 2 months ago

Consider reading ProfileBuffer once to generate the JSON profile

Categories

(Core :: Gecko Profiler, task, P2)

task

Tracking

()

RESOLVED FIXED
95 Branch
Tracking Status
firefox95 --- fixed

People

(Reporter: gerald, Assigned: gerald)

References

(Blocks 1 open bug)

Details

Attachments

(10 files)

48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review
48 bytes, text/x-phabricator-request
Details | Review

Currently the profiler reads the whole ProfileBuffer contents multiple times: Once for each section of the output JSON.
This means we read the whole multi-megabyte buffer twice per profiled thread (samples + markers) and a few more times for paused ranges, overheads, and counters.

The advantage is that we only deal with one writer with one writing "head".

It would be interesting to see if doing the opposite (one reader, multiple writers) would be faster.

Intuitively:

  • The multi-megabyte buffer probably doesn't fit in any CPU cache, so reading it once should help (good)
  • Managing multiple writers needs more work (bad)
  • Each writer only goes forward, and there are a limited number of them, so I think each writing head will have its own little cache area, and therefore adding bytes to each one in small bursts should be cache-friendly (good, but quite hand-wavey!)
  • Parallelization opportunities: The reader could quickly determine for each entry which type it is and (if needed) which threadId it relates to, and let the relevant writer thread handle the data payload (good, if the overhead of sending the workload is not too big).

Input welcome.

(Realistically, I don't think this is of high-enough priority compared to much of the other overhead improvements needed, but I'm capturing the idea in this bug, so I don't keep thinking about it, and others can chime in as well.)

The only concern I have is that by writing to more writers, and then concatenating them all in one big json, we might have more memory usage.

Otherwise this sounds good!

See Also: → 1586618
Depends on: 1601087
No longer depends on: 1601087

(In reply to Julien Wajsberg [:julienw] from comment #1)

The only concern I have is that by writing to more writers, and then concatenating them all in one big json, we might have more memory usage.

Actually, the profilers' SpliceableJSONWriter uses a list of chunks, so when concatenating big JSON strings, they're "spliced" in by just taking their unique pointers into the list, there is no extra memory used (apart from the pointer list itself).

(Sorry I didn't reply earlier, maybe I didn't know what to say then?)

Since we're collecting more and more data (more markers, more threads, more processes!), this is getting more urgent.

Assignee: nobody → gsquelart
Priority: P3 → P2
Depends on: 1735273

Previously, DeserializeAfterKindAndStream would take a JSON writer and a thread id, using the thread id (if specified) to only output markers from that thread to the given writer.

Now, DeserializeAfterKindAndStream takes a lambda, which will be called with the thread id found in the marker, that lambda can either return null (nothing to output) or a pointer to a writer, in which case the marker is read and output to the given writer.
This makes DeserializeAfterKindAndStream more flexible, and will allow handling markers from different threads, each possibly being output to different writers.

Also, for simplicity the entry is now always fully read, so there is no need for the caller to do anything. The return bool is therefore unnecessary, and has been removed.

When it's constructed, UniqueStacks can now take a ProfilerCodeAddressService pointer, instead of having to add it separately.
It will be even more useful in a later patch, when constructing it as a struct member in one shot.

Depends on D128433

The start of StreamJSON is about setting up the UniqueStacks, it can be factored out.
In a later patch, this will be used to prepare the UniqueStacks for each thread in one loop, before processing the profile buffer.

Depends on D128434

While processing the profile buffer, the EntryGetter's iterator is used to read some "modern" entries, but then the EntryGetter is left to skip these entries again until the next "legacy" entry (with e.Next()).

Now, after reading modern entries, the EntryGetter's iterator is updated to directly skip the already-read modern entries.
And in a later patch this will be in fact necessary, otherwise we would process this modern data twice!

Depends on D128437

The body of StreamSamplesAndMarkers has been factored out into a templated function, with some lambdas to cover the buffer-reading and streaming of samples and markers.
In a later patch, DoStreamSamplesAndMarkers will be used to also stream already-processed samples and markers.

Depends on D128438

StreamTraceLoggerJSON doesn't rely on ProfiledThreadData, so it can just be a file-static function.
The code hasn't changed at all, the function needed to be moved up the file, before its first use.

Depends on D128439

This new function will be used a second time in a later patch.

Depends on D128440

ThreadStreamingContext contains all the information needed to output samples and markers relevant to one thread.

ProcessStreamingContext contains a list of ThreadStreamingContext's for all profiled threads. This way, when reading the profile buffer, it will be possible to send each sample&marker to the relevant JSON writer.

Depends on D128441

This refactoring is a bit more complex (to prepare for the next patch).

The body of StreamSamplesToJSON was moved to a function that takes a lambda, which will provide streaming parameters relevant to the thread id of each read sample.
For now, it's only used by StreamSamplesToJSON, so in this case the lambda only checks if the entry thread id is the expected one, and it provides the JSON writer and other things needed to write these samples.

One extra complexity is that the ProfileSample sample that was outside the loop is now removed -- because in later patches this code will handle samples from different threads, so they cannot all share this one sample object. Instead the information that must be carried over is in the streaming parameters (mPreviousStackState and mPreviousStack), so that they correspond to one thread each. But the overall logic is the same, apart from where this out-of-the-loop information lives.

And another difference is that old samples (before aSinceTime) are fully processed, and then discarded just before being output, because we need their information to be in the thread's UniqueStacks, in case it's used by a later same-as-before sample.

Depends on D128442

Finally it all comes together!
Factored-out functions from previous patches are now used by the new functions, which work with streaming contexts, in order to prepare stacks are other things first, then read the profile buffer just once to extract all samples and markers, and finally output this data in the expected JSON format.

One trick, is that when working on "legacy" entries (mostly related to samples), we need to handle "modern" marker entries as we encounter them; this happens in EntryGetter, and also when reading compact stacks and same-samples.

Depends on D128443

Pushed by gsquelart@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e34e1bf0af09
DeserializeAfterKindAndStream takes a lambda: (tid) -> writer - r=canaltinova
https://hg.mozilla.org/integration/autoland/rev/a2a8751250c1
UniqueStacks constructor can accept a ProfilerCodeAddressService - r=canaltinova
https://hg.mozilla.org/integration/autoland/rev/622312b0321a
Factor PrepareUniqueStacks out of ProfiledThreadData::StreamJSON - r=canaltinova
https://hg.mozilla.org/integration/autoland/rev/d3046c6ab51e
EntryGetter::RestartAfter - r=canaltinova
https://hg.mozilla.org/integration/autoland/rev/40a54e6db0db
Factor out DoStreamSamplesAndMarkers to abstract sample and marker data output - r=canaltinova
https://hg.mozilla.org/integration/autoland/rev/6c17f87fab6e
Make ProfiledThreadData::StreamTraceLoggerJSON a static non-member function - r=canaltinova
https://hg.mozilla.org/integration/autoland/rev/ba7fdaf84afb
Factor StreamTablesAndTraceLogger out of ProfiledThreadData::StreamJSON - r=canaltinova
https://hg.mozilla.org/integration/autoland/rev/b9eb9afeae64
Add classes containing process and thread streaming contexts - r=canaltinova
https://hg.mozilla.org/integration/autoland/rev/d4ba9282fd84
Factor DoStreamSamplesToJSON out of ProfileBuffer::StreamSamplesToJSON - r=canaltinova
https://hg.mozilla.org/integration/autoland/rev/98084f195152
Read samples&markers from all threads at once - r=canaltinova
Regressions: 1737376
You need to log in before you can comment on or make changes to this bug.