Inconsistent values for BufferRangeStart and SamplePositionInBuffer in MergeStacks
Categories
(Core :: Gecko Profiler, defect, P1)
Tracking
()
People
(Reporter: jandem, Assigned: mozbugz)
References
(Blocks 1 open bug, Regressed 1 open bug, Regression)
Details
(Keywords: regression)
Attachments
(1 file)
48 bytes,
text/x-phabricator-request
|
pascalc
:
approval-mozilla-beta-
|
Details | Review |
In MergeStacks, if I add the following line right before the SetJSContextProfilerSampleBufferRangeStart
call near the end:
fprintf(stderr, "%lu, %lu\n", samplePosInBuffer ? *samplePosInBuffer : 0, bufferRangeStart);
I get very inconsistent results: the values should be somewhat similar as that's what the JS engine depends on, but in practice the first one (*samplePosInBuffer
) is much smaller (factor 1000 or so) than bufferRangeStart
. Both values keep increasing.
Noticed this while tracking down confusing JS profiler results in bug 1684224.
Reporter | ||
Comment 1•4 years ago
|
||
This is on Linux x64, recent central tip.
Reporter | ||
Updated•4 years ago
|
Comment 2•4 years ago
|
||
Thanks for looking into this. If you need fresh profiles from a try build or nightly, feel free to ping me :)
Comment 3•4 years ago
|
||
Context: The samplePosInBuffer
value is stored on the JitcodeGlobalEntry
so that the JS engine can keep the entry alive as long as any sample that points to that entry is still in the profiler buffer. Once all samples that use an entry are evicted from the buffer, i.e. once bufferRangeStart
moves past the last sample that refers to a given JitcodeGlobalEntry
, the JS engine can free that entry.
In the past, positions in buffer space were measured in "buffer entries". I'm not sure what unit it's measured in today.
Assignee | ||
Comment 4•4 years ago
|
||
I think I know what's happening:
In the SamplerThread loop, for each thread:
- In the global profiler buffer, we record the initial sample entry, and its position is passed through DoPeriodicSample -> DoSharedSample -> ProfileBufferCollector -> MergeStacks.
- The thread is suspended to perform the actual sampling. During this time we cannot allocate memory, which the global buffer could do, so we use a pre-allocated local buffer to record the sample data, and the
ProfileBufferCollector
reports theBufferRangeStart()
for that local buffer.
The two buffers have their own incrementing indices, that's why they don't seem related at all.
So I think what we need to do, is to report the range start of the global buffer from ProfileBufferCollector
. The entry is in bytes; I don't think the unit really matters, as long as both values are in the same unit, and in same "arena", which is the problem here.
This fix seems easy enough, but I'll need to make sure that no-one is using ProfileBufferCollector::BufferRangeStart()
to try and read the local buffer; at first glance, the API should prevent such misuse.
Thank you Jan for the report. This issue would have started when the local sampling buffer was introduced in bug 1576555, landed in Firefox 71!
Updated•4 years ago
|
Assignee | ||
Comment 5•4 years ago
|
||
ProfileBufferCollector::SamplePositionInBuffer() and BufferRangeStart() need to provide indices in the same main buffer, because they will be used to discard old data (at some previous SamplePositionInBuffer
) once the BufferRangeStart
indicates that it is not referenced by the profiler anymore.
Because the periodic sampler uses a local buffer (to avoid allocations and locks), we need to record the special location from the main profiler buffer in ProfileBufferCollector.
Reporter | ||
Comment 6•4 years ago
|
||
I tested the patch and I can confirm the numbers now match what I'd expect. Also, JIT frames now show up properly when profiling the URL in bug 1684224 for 30-60 seconds, and the data makes sense. Without the patch, the profile is missing frames and time is attributed incorrectly to other frames and execution modes.
Thanks for fixing this so quickly. This should make profiling of JS JIT frames a lot more reliable.
Comment 8•4 years ago
|
||
bugherder |
Comment 9•4 years ago
|
||
The patch landed in nightly and beta is affected.
:gerald, is this bug important enough to require an uplift?
If not please set status_beta
to wontfix
.
For more information, please visit auto_nag documentation.
Assignee | ||
Comment 10•4 years ago
|
||
Comment on attachment 9201961 [details]
Bug 1691504 - ProfileBufferCollector now provides the main profiler buffer's range start - r?canaltinova
Beta/Release Uplift Approval Request
- User impact if declined: Users of the Firefox Profiler would keep getting some incorrect Javascript stack traces in profiles.
- Is this code covered by automated tests?: Unknown
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): - "Is this code covered by automated tests" -> "Unknown" because: "Yes" the code is covered by tests, but "no" its results are not explicitly checked; Tests at least show that the fix doesn't have detectable negative effects (i.e., the fixed code doesn't crash). The fixed results were tested manually.
- This change is only in the profiler, which is only used by few users.
- The change is only one number (from the profiler to the JS engine) that was incorrect, such that JS data was discarded too soon. If still incorrect but the other way, the worst it could do is use more memory while profiling.
- String changes made/needed:
Comment 11•4 years ago
|
||
Comment on attachment 9201961 [details]
Bug 1691504 - ProfileBufferCollector now provides the main profiler buffer's range start - r?canaltinova
We are out of beta and this feels a bit risky just before RC for an S3 impacting very few users, I prefer to let it ride the trains. Thanks.
Updated•4 years ago
|
Description
•