Closed Bug 1691504 Opened 4 years ago Closed 4 years ago

Inconsistent values for BufferRangeStart and SamplePositionInBuffer in MergeStacks

Categories

(Core :: Gecko Profiler, defect, P1)

defect

Tracking

()

RESOLVED FIXED
87 Branch
Tracking Status
firefox-esr78 --- wontfix
firefox85 --- wontfix
firefox86 --- wontfix
firefox87 --- fixed

People

(Reporter: jandem, Assigned: mozbugz)

References

(Blocks 1 open bug, Regressed 1 open bug, Regression)

Details

(Keywords: regression)

Attachments

(1 file)

In MergeStacks, if I add the following line right before the SetJSContextProfilerSampleBufferRangeStart call near the end:

    fprintf(stderr, "%lu, %lu\n", samplePosInBuffer ? *samplePosInBuffer : 0, bufferRangeStart);

I get very inconsistent results: the values should be somewhat similar as that's what the JS engine depends on, but in practice the first one (*samplePosInBuffer) is much smaller (factor 1000 or so) than bufferRangeStart. Both values keep increasing.

Noticed this while tracking down confusing JS profiler results in bug 1684224.

Flags: needinfo?(gsquelart)

This is on Linux x64, recent central tip.

Summary: Inconsistent values for BufferRangeStart and SamePositionInBuffer in MergeStacks → Inconsistent values for BufferRangeStart and SamplePositionInBuffer in MergeStacks

Thanks for looking into this. If you need fresh profiles from a try build or nightly, feel free to ping me :)

Context: The samplePosInBuffer value is stored on the JitcodeGlobalEntry so that the JS engine can keep the entry alive as long as any sample that points to that entry is still in the profiler buffer. Once all samples that use an entry are evicted from the buffer, i.e. once bufferRangeStart moves past the last sample that refers to a given JitcodeGlobalEntry, the JS engine can free that entry.
In the past, positions in buffer space were measured in "buffer entries". I'm not sure what unit it's measured in today.

I think I know what's happening:
In the SamplerThread loop, for each thread:

The two buffers have their own incrementing indices, that's why they don't seem related at all.

So I think what we need to do, is to report the range start of the global buffer from ProfileBufferCollector. The entry is in bytes; I don't think the unit really matters, as long as both values are in the same unit, and in same "arena", which is the problem here.
This fix seems easy enough, but I'll need to make sure that no-one is using ProfileBufferCollector::BufferRangeStart() to try and read the local buffer; at first glance, the API should prevent such misuse.

Thank you Jan for the report. This issue would have started when the local sampling buffer was introduced in bug 1576555, landed in Firefox 71!

Assignee: nobody → gsquelart
Severity: normal → S3
Type: task → defect
Flags: needinfo?(gsquelart)
Keywords: regression
Priority: -- → P1
Regressed by: 1576555
Has Regression Range: --- → yes

ProfileBufferCollector::SamplePositionInBuffer() and BufferRangeStart() need to provide indices in the same main buffer, because they will be used to discard old data (at some previous SamplePositionInBuffer) once the BufferRangeStart indicates that it is not referenced by the profiler anymore.

Because the periodic sampler uses a local buffer (to avoid allocations and locks), we need to record the special location from the main profiler buffer in ProfileBufferCollector.

I tested the patch and I can confirm the numbers now match what I'd expect. Also, JIT frames now show up properly when profiling the URL in bug 1684224 for 30-60 seconds, and the data makes sense. Without the patch, the profile is missing frames and time is attributed incorrectly to other frames and execution modes.

Thanks for fixing this so quickly. This should make profiling of JS JIT frames a lot more reliable.

Pushed by gsquelart@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/4471d0996a4d ProfileBufferCollector now provides the main profiler buffer's range start - r=canaltinova
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 87 Branch

The patch landed in nightly and beta is affected.
:gerald, is this bug important enough to require an uplift?
If not please set status_beta to wontfix.

For more information, please visit auto_nag documentation.

Flags: needinfo?(gsquelart)

Comment on attachment 9201961 [details]
Bug 1691504 - ProfileBufferCollector now provides the main profiler buffer's range start - r?canaltinova

Beta/Release Uplift Approval Request

  • User impact if declined: Users of the Firefox Profiler would keep getting some incorrect Javascript stack traces in profiles.
  • Is this code covered by automated tests?: Unknown
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): - "Is this code covered by automated tests" -> "Unknown" because: "Yes" the code is covered by tests, but "no" its results are not explicitly checked; Tests at least show that the fix doesn't have detectable negative effects (i.e., the fixed code doesn't crash). The fixed results were tested manually.
  • This change is only in the profiler, which is only used by few users.
  • The change is only one number (from the profiler to the JS engine) that was incorrect, such that JS data was discarded too soon. If still incorrect but the other way, the worst it could do is use more memory while profiling.
  • String changes made/needed:
Flags: needinfo?(gsquelart)
Attachment #9201961 - Flags: approval-mozilla-beta?

Comment on attachment 9201961 [details]
Bug 1691504 - ProfileBufferCollector now provides the main profiler buffer's range start - r?canaltinova

We are out of beta and this feels a bit risky just before RC for an S3 impacting very few users, I prefer to let it ride the trains. Thanks.

Attachment #9201961 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
Regressions: 1699681
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: