Closed Bug 1663382 Opened 5 years ago Closed 4 years ago

Optimize deserialization of ProfilerStringView and others, to avoid copying non-split contents

Categories

(Core :: Gecko Profiler, task, P3)

task

Tracking

()

RESOLVED FIXED
94 Branch
Tracking Status
firefox94 --- fixed

People

(Reporter: mozbugz, Assigned: mozbugz)

References

(Blocks 2 open bugs)

Details

Attachments

(4 files)

In bug 1646266, ProfilerStringView is used to capture strings when adding markers, the original strings are only referenced by ProfilerStringView and are immediately added stored into the profiler buffer.
In case of non-literal strings, the string content is serialized, and later when the profile is captured, these strings are fully deserialized and copied into an owning ProfilerStringView, to be output to JSON.
This copy may be necessary because the string could be split between separate buffer chunks.

However in the majority of cases, strings should fit in only one chunk, and therefore the ProfilerStringView could just reference them.

(The copy will still be necessary when the string is split, because JSONWriter expects single-segment strings -- this could be a later optimization, but with lower return on value.)

This will require the buffer to somehow expose chunk boundaries.
Idea: ProfileBufferEntryReader could provide an equivalent of ReadBytes() that, instead of memcpying, would return one or two SpanOfConstBytes. Then the deserializer for ProfilerStringView could just keep the span into the buffer in the case of a single span, otherwise copy the full contents into an owned buffer in the case of two spans.

To extend the original idea: We could expose a pair of spans into the chunk(s), so that other code could avoid copies when dealing with non-split sections.

Summary: Optimize deserialization of ProfilerStringView to avoid copying non-split string contents → Optimize deserialization of ProfilerStringView and others, to avoid copying non-split contents

Working on it, as it could also be useful in the new Rust marker API (to reduce copies).

Assignee: nobody → gsquelart

Try: https://treeherder.mozilla.org/jobs?repo=try&group_state=expanded&revision=bf1bb2bec6c8eb9f33f86146ac37cdba726a0cc7

Some nice numbers, from two startup+reddit runs:
Total number of string deserializations, and average time for each: 3,345,098 142ns
Of which:

  • Copies in allocated buffers: 109(!) 2,248ns
  • Reference to in-buffer strings: 3,344,989 35ns

So 99.997% of strings were efficiently referenced (instead of being duplicated outside the profiler buffer), which was 64 times faster, and saved an estimated 7.5 seconds of processing overall.
I think it's a keeper! 🎉

Because string contents could be split in two separate chunks, the default ProfilerStringView deserializer needed to concatenate it together in an off-chunk buffer.
But now thanks to ProfileBufferEntryReader::ReadSpans, it is possible to know if the contents are in a single memory area inside one chunk (which should be the vast majority of cases), in which case the ProfilerStringView can just reference it using its internal std::string_view, which saves managing a separate buffer and copying data into it.

Depends on D124429

Attachment #9239310 - Attachment description: Bug 1663382 - ProfileBufferEntryReader::{Peek,Read}Spans returns pointers to entry bytes - r?canaltinova → Bug 1663382 - ProfileBufferEntryReader::{Peek,Read}Spans return pointers to entry bytes - r?canaltinova
Pushed by gsquelart@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/0b3495e23e77 ProfileBufferEntryReader::{Peek,Read}Spans return pointers to entry bytes - r=canaltinova https://hg.mozilla.org/integration/autoland/rev/bac8ae20a8af Optimize Deserializer<ProfilerStringView> to reference in-buffer data when possible - r=canaltinova

It's due to the alignment for char16_t being 2, and this patch can point into the profile buffer at an odd address, leading to a misaligned read.
I'll add a check for that...

Flags: needinfo?(gsquelart)

Quick update:

I tried to check that the alignment of the data matched alignof(CHAR) (Try), but that still failed.

So then I tried to only enable the optimization for 8-bit char strings (Try), but that also failed, and always failing in a UTF16-to-UTF8 conversion routine even though 16-bit strings should not be optimized, this is very strange!

To be debugged...

That was bad patching on my part in comment 9, I've got it working now. Try

Pushed by gsquelart@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/2fd6d02ad407 ProfileBufferEntryReader::{Peek,Read}Spans return pointers to entry bytes - r=canaltinova https://hg.mozilla.org/integration/autoland/rev/f3a453eb5700 Optimize Deserializer<ProfilerStringView> to reference in-buffer data when possible - r=canaltinova

Backed out for bc failures on browser_test_marker_network_serviceworker_synthetized_response.js

Backout link: https://hg.mozilla.org/integration/autoland/rev/7376fc1438acc88737eb1c7611694fd992b4401f
Log link: https://treeherder.mozilla.org/logviewer?job_id=350721705&repo=autoland&lineNumber=23740

Please also check failures on browser_test_marker_network_serviceworker_cache_first.js -> https://treeherder.mozilla.org/logviewer?job_id=350725843&repo=autoland&lineNumber=8215

Flags: needinfo?(gsquelart)

I've found the issue:
The string inside the buffer is just the content without null-terminator.
Before this patch, the string was always deserialized with an added '\0'. But with this patch a deserialized ProfilerStringView may now point directly at the non-terminated string.
And I see it's sometimes used as if it was terminated, e.g., in NS_ConvertUTF16toUTF8(aName.Data()). The correct call here should have been NS_ConvertUTF16toUTF8(aName.Data(), aName.Length())).

I think the Data() member function is too dangerous, I will remove it, in favor of something that returns a Span or similar, so it's obvious that there may not be a string terminator.

Flags: needinfo?(gsquelart)

ProfilerStringView::Data() would return a pointer to the start of the string, but there may not be a null terminator at the end!
To reduce the likelihood of misuses, that function is now more clearly named DataWithoutTerminator(). Removing it completely would have made some uses more difficult and less efficient.

Also fixed a few mistakes, where Data() was passed alone to NS_ConvertUTF16toUTF8, potentially leading to some longer-than-expected string in the JSON profiles.
It was not an issue until now, because deserialized string would always be terminated when copied out of the profile buffer, but a following patch will add optimized code where the non-terminated string inside the buffer will be directly pointed at.

Add NS_ConvertUTF16toUTF8::NS_ConvertUTF16toUTF8(const Span<const char16_t>) and NS_ConvertUTF16toUTF8::NS_ConvertUTF8toUTF16(const Span<const char>) explicit constructors.
This is consistent with NS_ConvertASCIItoUTF16 that already had one.
More importantly, other constructors were calling AppendUTF{16,8}To{8,16} functions taking a Span, so for cases where the caller already has a Span it's most efficient to have constructors accepting that Span directly.

Attachment #9240212 - Attachment description: Bug 1663382 - Rename ProfilerStringView::Data() to DataWithoutTerminator(), fix misuses - r?florian → Bug 1663382 - Remove dangerous ProfilerStringView::Data() - r?florian
Pushed by gsquelart@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/90800fa33cb5 Add NS_Convert... constructors taking a Span - r=florian https://hg.mozilla.org/integration/autoland/rev/6622e2a4c66c Remove dangerous ProfilerStringView::Data() - r=florian https://hg.mozilla.org/integration/autoland/rev/26c563a519c1 ProfileBufferEntryReader::{Peek,Read}Spans return pointers to entry bytes - r=canaltinova https://hg.mozilla.org/integration/autoland/rev/d433afaaccb6 Optimize Deserializer<ProfilerStringView> to reference in-buffer data when possible - r=canaltinova

(In reply to Gerald Squelart [:gerald] (he/him) from comment #3)

So 99.997% of strings were efficiently referenced (instead of being duplicated outside the profiler buffer)

Since I had to modify the code to exclude non-aligned 16-bit strings, I reran a quick test, and luckily the impact is small: 99.75% of all strings are still referenced instead of being copied. So this is still a very useful optimization.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: