Closed Bug 1820813 Opened 1 year ago Closed 1 year ago

Markers for UserTiming `measure` seems to be mispositioned

Categories

(Core :: Gecko Profiler, defect, P1)

defect

Tracking

()

RESOLVED FIXED
113 Branch
Tracking Status
firefox113 --- fixed

People

(Reporter: julienw, Assigned: canova)

References

(Blocks 1 open bug)

Details

(Whiteboard: [sp3])

Attachments

(3 files)

Attached file performancetiming.html

STR:

  1. Open the attachment.
  2. Start the profiler.
  3. Click the button.
  4. Capture the profiler
  5. Zoom in near the start or the end of the measure marker.

=> notice that the measure marker edges are not aligned with the mark markers.

Here is an example of such a profile: https://share.firefox.dev/3YmTFQx
Same profile zoomed in towards the start: https://share.firefox.dev/3F36mJb
There's a 727µs difference in this case so I don't think this is just some rounding errors.

I believe we need to fix this for the speedometer 3 work.

Whiteboard: [sp3]

We compute the timestamps used in the marker in https://searchfox.org/mozilla-central/rev/3002762e41363de8ee9ca80196d55e79651bcb6b/dom/performance/Performance.cpp#630-633

In the testcase, I tried 2 ways of setting the marker: once with previous marks, once with actual timestamps. It looks like they both agree, but they're both misaligned with the performance mark markers.

See Also: → 1820826

With this example, I have 2 marks but the resulting measure is 0 duration.

https://share.firefox.dev/3ypeMqN

Padenot helped me investigate.

Here is how the performance API gets the timestamp in performance.mark:
https://searchfox.org/mozilla-central/rev/416f31a51174620f04fc994d248b664b54517699/dom/performance/PerformanceMark.cpp#53-55

(Similarly in performance.measure when we don't specify the end: https://searchfox.org/mozilla-central/rev/416f31a51174620f04fc994d248b664b54517699/dom/performance/Performance.cpp#514)

We use the same code as for performance.now():
https://searchfox.org/mozilla-central/rev/416f31a51174620f04fc994d248b664b54517699/dom/performance/Performance.cpp#118-131

Especially we run this code:

  return nsRFPService::ReduceTimePrecisionAsMSecs(
      rawTime, GetRandomTimelineSeed(), mRTPCallerType);

As the name implies, these timestamps are clamped to a millisecond precision.

This is also used for normal timestamps retrieved uisng performance.now directly, which explains why we get the same behavior when passing timestamps directly in performance.measure.

Then in performance.measure, we use the timestamps from the marks (if marks are passed) or timestamps (if timestamps are passed). They've already reduced precisions!

Now why don't the mark markers have the same issue? Because for these markers we don't use the clamped timestamp and instead retrieve our own (see bug 1820826). That's why the mark markers and the measure markers are not aligned.

How to fix?

I can see 2 different solutions.

Solution 1, easiest:
In PerformanceMark we keep both unclamped and clamped values, so that we can use one or another depending on the context. The unclamped value wouldn't be exposed to JavaScript ever.

Solution 2, more difficult because it involves more changes:
PerformanceMark keeps only the unclamped value, and clamps it whenever we use it from JavaScript. Probably the same would happen for PerformanceMeasure. We'll need to make sure that it's not possible to find the unclamped values from JavaScript.

Hey Olli, would you have a preference between these 2 solutions? I think the solution 1 is easiest and safest in this case. What do you think?

Flags: needinfo?(smaug)

Hey Tom, I see that performance.now() has different results on Chrome than on Firefox: the precision is 10 times better on Chrome. Is that what you'd expect? Do you want me to file a separate bug?

Flags: needinfo?(tom)

Quick note:

return nsRFPService::ReduceTimePrecisionAsMSecs(
rawTime, GetRandomTimelineSeed(), mRTPCallerType);

As the name implies, these timestamps are clamped to a millisecond precision.

The name is actually referring to the input value. In this case it's saying that the input rawTime is in milliseconds. If it's Seconds or Microseconds there's separate function calls for those. The time reduction can be to one of several different precisions (one of which is a millisecond) depending on page circumstances and prefs.


I'm a little unclear about what the underlying issue is. Is the problem that we are reducing a timestamp that has already had its precision reduced? Or is it that we are performing mathematical operations (e.g. subtraction or addition) upon timestamps whose values have been reduced (or maybe only one whose value has been reduced, and one that hasn't?)

Both of these situations can result in bugs, but double-reduction bugs generally occur very infrequently because it happens with values on the boundaries of floating point precision. So I'm inclined to think it's the operation one.

Precision Reduction should not be thought of as clamping. It's like that, but it can also round unexpected values in unexpected directions. For example, under the right circumstances, a timestamp of 14.0002 ms can be transformed to 15 ms. And 14.9998 ms could be rounded to 14 ms. This is due to jitter, which is designed to make timing attacks more difficult to perform. For this reason, it's generally the best idea to retain unreduced timestamps and reduce them immediately before exposure to web JS. The NowUnclamped() function exists for this purpose.


Our timestamp precision varies under the circumstances. If you have COOP+COEP enabled, then you will get timestamps with a precision of 20 microseconds. If you don't, you get 1 ms. If you have RFP enabled you get 16.6667 ms. At this time there hasn't been a reason to increase the precision of non-isolated websites from 1ms. Doing so would decrease the difficulty of performing timing attacks that can reveal cross-origin resources.

Flags: needinfo?(tom)

Thanks for the thoughtful comment Tom, this is very useful! I was pinging you just to know whether the reduction was like you expected, especially that Chrome is doing something different and I was wondering if we could have webcompat issues because of that. Maybe performance experts use performance.now() in their tools and get less precision in Firefox, but that may also be good enough?

The underlying issue itself is that we're using these reduced precision values to position a profiler marker. The fix is that we should use the unreduced timestamps instead. My question is: how do we do that? But that's more a question to :smaug :-)

(I think tjr answered here already)

Flags: needinfo?(smaug)

(In reply to Julien Wajsberg [:julienw] from comment #6)

Thanks for the thoughtful comment Tom, this is very useful! I was pinging you just to know whether the reduction was like you expected, especially that Chrome is doing something different and I was wondering if we could have webcompat issues because of that. Maybe performance experts use performance.now() in their tools and get less precision in Firefox, but that may also be good enough?

The complaints we've gotten in the past have been from people doing very timing-sensitive work, mostly "I'm running DOOM in the browser" type stuff. For these applications, the fact that we have an easy answer (now, at the time we didn't) to enable COOP/COEP and get the increased precision is good. (We also haven't gotten complaints to that effect in a bit, that I've seen anyway.)

The underlying issue itself is that we're using these reduced precision values to position a profiler marker. The fix is that we should use the unreduced timestamps instead. My question is: how do we do that? But that's more a question to :smaug :-)

I'm not sure what 'position a profile marker' means. Do you mean the Firefox profiler is reading PerformanceMark entries? The Firefox Profiler is privileged code and shouldn't use reduced timestamps for anything. I'm not sure at this moment how to make that happen, since I think the Profiler is implemented as a WebExtension...? But it's got to be privileged in some way, right? Maybe you can point me at where it's getting its timestamps?

I think I described everything in comment 1 and comment 3, please tell me if anything is missing and I'd be happy to give some more precisions!

Okay, I think I understand it better. I think the fix for this is a bit more involved. I think that, the Firefox Profiler shouldn't use clamped values ever, anywhere. This would mean that Profiler performance markers wouldn't line up with (Web) performance API markers, but I think that's okay (?) because the Profiler wouldn't be showing or comparing the vales from the Web API, it will only be showing markers added via profiler_add_marker. That call happens in the same control flow as adding Web API performance markers, so they look like they're the same, but conceptually we only show markers added via profiler_add_marker and those Markers are separate from the markers added in the peformance API.

Here's a place we're using a clamped timestamp for a profiler marker that's independent from the PerformanceMark/PerformanceMeasure objects. Here's a place we're using clamped timestamps and I don't think this goes into the profiler, but it might be relying on precision/behavior that we don't have... Fortunately I think that example I gave is the only other example of profiler_add_marker relying on a clamped timestamp - every other one I spot-checked because I thought it might was either using Timestamp::Now() (which is not fiddled) or getting the timestamp internally.

That's not correct, I had gotten confused thinking that CreationTimestamp was clamped, but it is not.

I think Bug 1820826 arises in more cases than described. profiler_add_marker is using an internal (accurate) timestamp but no matter what is passed to the PerformanceMarker ctor, the Marker's startTime will be clamped. It will either come from Performance::Now() (clamped) or it will come from another Mark's startTime (which eventually will have to be Performance::Now()).

I think both of these approach is going to be complicated, but that Option 2 is going to be a better approach. Depending on the urgency of this need, I may be able to help prototype it.

This would mean that Profiler performance markers wouldn't line up with (Web) performance API markers, but I think that's okay (?) because the Profiler wouldn't be showing or comparing the vales from the Web API

I definitely use web performance markers for timing against other things, and being incorrect about them would be bad IMO. I would go as far as saying we should remove them if they are lying, as perf metrics shouldn't provide misleading information.

(In reply to Greg Tatum [:gregtatum] from comment #11)

This would mean that Profiler performance markers wouldn't line up with (Web) performance API markers, but I think that's okay (?) because the Profiler wouldn't be showing or comparing the vales from the Web API

I definitely use web performance markers for timing against other things, and being incorrect about them would be bad IMO. I would go as far as saying we should remove them if they are lying, as perf metrics shouldn't provide misleading information.

Can you explain more about how you compare things? Are you comparing the values obtained from the performance API in the devtools console vs these profile markers?

(sorry for the brevity, I'm on my phone)
I think option 1 is just about adding another variable to the object that wouldn't be clamped, of course not exposing it to JS. Would that be difficult?
If option 2 is more desirable though, I think this involves more changes. Especially making sure we're not exposing unclamped values by mistake.

@greg this bug is about making the profiler markers more precise. What Tom says is that the profiler markers' timestamps wouldn't be equal to the performance timing entries, because profiler markers would be precise while performance timing entries would have reduced precision. I think this is what we want.

Nazim, do you think you would be able to look at this soon? This is important to be able to focus precisely on the work happening in specific tests.

Flags: needinfo?(canaltinova)

What Tom says is that the profiler markers' timestamps wouldn't be equal to the performance timing entries, because profiler markers would be precise while performance timing entries would have reduced precision. I think this is what we want.

Oooo... Thanks for the clarification. Yes I agree with this then. The profiler should definitely be the most precise measurements possible. Sorry for adding to the noise of this conversation.

I will try to take a look at it this week after my assembly view related task. Keeping needinfo for the reminder.

Severity: -- → S3
Flags: needinfo?(canaltinova)
Priority: -- → P1
Assignee: nobody → canaltinova
Status: NEW → ASSIGNED
Pushed by canaltinova@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/d5b12497c86b
Keep an unclamped startTime inside PerformanceMark and use it for profiler markers r=sefeng
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 113 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: