Closed Bug 1581049 Opened 5 years ago Closed 5 years ago

Optimize stack-sample handling in profiler

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla71

Tracking Flags:

Tracking

Status

firefox71

---

fixed

People

(Reporter: mozbugz, Assigned: mozbugz)

References

(Blocks 1 open bug)

Details

Attachments

(4 files)

Bug 1581049 - ModuloBuffer::Iterator::ReadInto(ModuloBuffer::Iterator&) - r?gregtatum 5 years ago Gerald Squelart (he/him) (not at Mozilla since 2022-09-15) 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1581049 - BlocksRingBuffer (de)serializer uses optimized inter-ModuloBuffer copier - r?gregtatum 5 years ago Gerald Squelart (he/him) (not at Mozilla since 2022-09-15) 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1581049 - Gather stack sample data into fewer entries - r?gregtatum 5 years ago Gerald Squelart (he/him) (not at Mozilla since 2022-09-15) 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1581049 - Use AutoCStrings in StreamSamplesToJSON to avoid mallocs - r?gregtatum 5 years ago Gerald Squelart (he/him) (not at Mozilla since 2022-09-15) 47 bytes, text/x-phabricator-request		Details \| Review

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Assignee

Description

•

5 years ago

Following the previous patches (especially bug 1576551 and bug 1576555), there is a significant drop in profiler performance when storing the JSON profile, and during some other operations handling the stack samples.
This make capturing less-than-trivial profiles much slower, so slow that it sometimes fails to make it to the front-end. This needs fixing, before we can land all these patches.

Profiling the profiler showed a massive amount of time spent just skipping entries!

This got worse with the new ProfileBuffer storage, because:

As detailed in bug 1577658, creating the JSON output involves reading the full buffer many times, at least twice per thread, and a few more times for the other types of data. So most of the each reading pass ignores most of the buffer!
Entries now have variable sizes. So instead of "just" reading the ProfileBufferEntry::Kind byte and then always skipping 9 bytes, we now need to read the entry size (usually 1 byte) and (also usually) the full legacy entry, before skipping a variable number of bytes.
Stack samples may contain hundreds to thousands of entries, which need to be skipped one-by-one in most cases.

Bug 1577658 should help with the above, but it is too much work to implement right now, and it is yet uncertain whether we will actually gain speed in the end, because we would be shifting the burden towards writing many parts of the JSON output at the same time.

So as a first step, a smaller optimization that should make a big difference, would be to group all data entries from a sample into one new type of entry. This means that when we're skipping samples, we will only skip one entry instead of hundreds/thousands.

And I will explore other simple optimizations based on further profiling-the-profiler investigation...

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Assignee

Comment 1

•

5 years ago

Attached file Bug 1581049 - ModuloBuffer::Iterator::ReadInto(ModuloBuffer::Iterator&) - r?gregtatum — Details

Some objects are copied byte-by-byte to/from ModuloBuffers.
E.g., serialized BlocksRingBuffers, or duplicate stacks. (And more to come.)

Iterator::ReadInto(Iterator&) optimizes these copies by using the minimum
number of memcpys possible.

Depends on D44435

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Assignee

Comment 2

•

5 years ago

Attached file Bug 1581049 - BlocksRingBuffer (de)serializer uses optimized inter-ModuloBuffer copier - r?gregtatum — Details

Instead of copying BlocksRingBuffer data byte-by-byte (using iterators byte
dereferencers), we can now use ModuloBuffer::Iterator::ReadInto(Iterator&) to
copy them using a small number of memcpys.

Depends on D45838

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Assignee

Comment 3

•

5 years ago

Attached file Bug 1581049 - Gather stack sample data into fewer entries - r?gregtatum — Details

Stack samples may contain up to hundreds or thousands of entries, which need to
be copied (during sampling and duplicating), and also mostly skipped when
creating the JSON output for other threads or other types of profile data.

This patch gathers all sample legacy entries (apart from the thread id and the
timestamp) into a single non-legacy entry, which is much faster to copy or
skip. The preceding timestamp now has a distinct Kind (TimeBeforeCompactStack)
to know whether to handle legacy entries of the new CompactStack entry kind.

Depends on D45839

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Assignee

Comment 4

•

5 years ago

Attached file Bug 1581049 - Use AutoCStrings in StreamSamplesToJSON to avoid mallocs - r?gregtatum — Details

Profiling the profiler showed a lot of time spent allocating memory for the
frame-address strings, almost half the time of StreamSamplesToJSON!
Using an "Auto" string will prevent these allocations in most cases.

Also removed the printf("0x%llx"), instead just appending "0x" and an hex-
formatted number with AppendInt() (which handles 32- and 64-bit numbers
separately, so there is no more needs to do a double-cast to avoid sign
extensions -- There is still a double cast, but a no-op one to fix the type to
either uint32_t or uint64_t).

Using an Auto string for frame labels as well.

Depends on D45840

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Assignee

Updated

•

5 years ago

Blocks: 1577656

Pulsebot

Comment 5

•

5 years ago

Pushed by gsquelart@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b30942f9db54
ModuloBuffer::Iterator::ReadInto(ModuloBuffer::Iterator&) - r=gregtatum
https://hg.mozilla.org/integration/autoland/rev/b4b6bbb18cc1
BlocksRingBuffer (de)serializer uses optimized inter-ModuloBuffer copier - r=gregtatum
https://hg.mozilla.org/integration/autoland/rev/be1ec4577d37
Gather stack sample data into fewer entries - r=gregtatum
https://hg.mozilla.org/integration/autoland/rev/27afea20c396
Use AutoCStrings in StreamSamplesToJSON to avoid mallocs - r=gregtatum

Bogdan Tara[:bogdan_tara | bogdant]

Comment 6

•

5 years ago

Backed out 21 changesets (Bug 1578327, Bug 1575448, Bug 1580091, Bug 1576554, Bug 1581049, Bug 1576551, Bug 1578329, Bug 1576555) for platform.cpp related bustage

Push with failure: https://treeherder.mozilla.org/#/jobs?repo=autoland&fromchange=27afea20c396eaa2731fa6e705471b446c6683e6&tochange=9c10aac0356842219e95d1f6deb0bf7e7baed334&selectedJob=266985776

Backout link: https://hg.mozilla.org/integration/autoland/rev/9c10aac0356842219e95d1f6deb0bf7e7baed334

Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=266985776&repo=autoland&lineNumber=53923

Pulsebot

Comment 7

•

5 years ago

Pushed by gsquelart@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5829fae53029
ModuloBuffer::Iterator::ReadInto(ModuloBuffer::Iterator&) - r=gregtatum
https://hg.mozilla.org/integration/autoland/rev/257f8a17217f
BlocksRingBuffer (de)serializer uses optimized inter-ModuloBuffer copier - r=gregtatum
https://hg.mozilla.org/integration/autoland/rev/1ceccc3c86a4
Gather stack sample data into fewer entries - r=gregtatum
https://hg.mozilla.org/integration/autoland/rev/4001594f15dd
Use AutoCStrings in StreamSamplesToJSON to avoid mallocs - r=gregtatum

Alexandru Michis [:malexandru]

Comment 8

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/5829fae53029
https://hg.mozilla.org/mozilla-central/rev/257f8a17217f
https://hg.mozilla.org/mozilla-central/rev/1ceccc3c86a4
https://hg.mozilla.org/mozilla-central/rev/4001594f15dd

Status: NEW → RESOLVED

Closed: 5 years ago

status-firefox71: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla71

Julian Descottes [:jdescottes]

Updated

•

5 years ago

Regressions: 1582357

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Assignee

Updated

•

5 years ago

Regressions: 1593112

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Optimize stack-sample handling in profiler

Categories

(Core :: Gecko Profiler, task, P2)

Tracking

()

People

(Reporter: mozbugz, Assigned: mozbugz)

References

(Blocks 1 open bug)

Details

Crash Data

Security

(public)

User Story

Attachments

(4 files)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Updated

Comment 5

Comment 6

Comment 7

Comment 8

Updated

Updated

Attachment

General

Description

File Name

Content Type