Closed Bug 1925191 Opened 10 months ago Closed 3 months ago

Crash in [@ mozilla::baseprofiler::SpliceableJSONWriter::TimeDoubleMsProperty ]

Categories

(Core :: Gecko Profiler, defect, P2)

Unspecified
Windows
defect

Tracking

()

VERIFIED FIXED
140 Branch
Tracking Status
firefox-esr115 --- unaffected
firefox-esr128 --- wontfix
firefox131 --- wontfix
firefox132 --- wontfix
firefox133 --- wontfix
firefox134 --- wontfix
firefox138 --- wontfix
firefox139 --- wontfix
firefox140 --- fixed

People

(Reporter: mayankleoboy1, Assigned: jlink)

References

(Regression)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(2 files)

I got a crash while profiling some code today.

https://crash-stats.mozilla.org/report/index/989683f2-4ce6-430d-b7f2-f6bc30241017#tab-bugzilla

Steps:
Start the profile
Go to https://photoshop.adobe.com/discover
Let the profiler run for 2-3 seconds
If a popup appears to allow "storage" by the app, deny it.
Continue to profile till something loads.
Stop profiling and capture the profiler.

AR: The adobe app tab crashes
ER: Not so

Steps are reproducible.

This bug has been marked as a regression. Setting status flag for Nightly to affected.

Setting Bug 1841352 as the regressor, please correct if needed.

Flags: needinfo?(jlink)
Regressed by: 1841352

I have been able to reproduce this a crash in the debugger following the above instructions. My call stack is a little bit different (and my code is a bit out-of-date), but it's a crash and I suspect that it's the same/related. I'll dig in a little bit and also try reverting my change to see if it is related.

Flags: needinfo?(jlink)

Small adjustment to the above comment - what I've got in the debugger does have the same call stack as the crash reported.

Set release status flags based on info from the regressing bug 1841352

I've looked at this for a bit and I think I have a pretty good idea what's going on. Here is a quick explanation of what's happening:

The crash (actually a MOZ_RELEASE_ASSERT) is coming from attempting to add a marker to the profile and the start (and end) of that marker are more than ~INT64_MAX nanoseconds into the future.

We create that profile marker as a result of a call to performance.measure() with the arguments '${e}-${t}',{start:performance.timeOrigin,duration:performance.now()}) (coming from here).

The code that creates the profile marker assumes that the timestamps it receives are relative to performance.timeOrigin and thus adds the navigation start value to the timestamps that it receives in order to get absolute timestamps. In this particular case, the timestamps are not relative to performance.timeOrigin but rather are already absolute, so adding in the navigation start time pushes them way into the future.

The MDN entry for performance.timeOrigin made me initially think that all timestamps are supposed to be relative to timeOrigin, but the entry for DOMHighResTimeStamp says that the timestamps "... can be used to describe a discrete point in time or a time interval (the difference in time between two discrete points in time). The starting time can be either a specific time determined by the script for a site or app, or the time origin."

This makes me think that mapping a user-provided DOMHighResTimeStamp to an absolute point in time is not possible (without knowing how it was calculated) so I'm not sure that we can reliably create a profiler marker from all performance.measure() calls. On the other hand, in the (common?) case where the user does not provide any timestamps we know that they are relative to timeOrigin so we can confidently handle that case correctly.

My thought right now is that the best way to go right now is to only create profile markers in response to performance.measure() calls in the case where the user doesn't provide any timestamps. That would ensure that we only put correct markers in the profile but will mean that some calls to performance.measure() don't end up in the profile, which could be confusing. Another option would be to continue to treat all timestamps as relative to timeOrigin and just decide not to create markers if they end up too far in the future.

We could also try something hacky where, if the timestamp is less than, say, half of the navigation start time, we treat it as relative to the navigation start time but otherwise we treat it as absolute. That should work here, and probably for any timestamps that are absolute or relative to timeOrigin, but would produce (probably subtly) incorrect markers in the profile for performance.measure() calls that use timestamps that are relative to something else, which would also be confusing.

I'd be interested in hearing any feedback on which option(s) are desirable or to know if I've made a mistake in my interpretations of the specifications.

Oh, also I should add that this crash did, in fact, start happening only after my fix for timestamp comparisons that was cited as the regressor but the behavior was incorrect before that (it just didn't get caught by anything).

The reason for this is that, prior to my change, timestamp comparisons/subtractions were broken for timestamps that differ by more than ~INT64_MAX ticks. With those fixed, subtracting two timestamps that are far apart correctly results in a TimeDuration that is capped at the maximum, which is then interpreted as "infinite" when converted to a double representation (seconds, milliseconds, etc) which fails the comparison in the MOZ_RELEASE_ASSERT.

Flags: needinfo?(canaltinova)

Hey Justin, so sorry for the late response. I finally got around to this.

It looks like this has been fixed in the Adobe's side so it's not reproducible anymore (their code was incorrect, see my message below for details), but I managed to create a smallest case:

data:text/html,<html><script>performance.measure('testing', { start: performance.timeOrigin, end: performance.now() });</script></html>

STR is the same as the initial message but instead of Adobe's website, use this.

I did some investigation:

I'm confused as to why Adobe decided to use performance.timeOrigin as their start time. Looking at the spec, it doesn't mention anything about the timeOrigin while computing the start time: https://w3c.github.io/user-timing/#dom-performance-measure
In this link, see the list item #3.1:

If startOrMeasureOptions is a PerformanceMeasureOptions object, and if its start member exists, let start time be the value returned by running the convert a mark to a timestamp algorithm passing in startOrMeasureOptions's start.

"convert a mark to a timestamp" algorithm is here: https://w3c.github.io/user-timing/#dfn-convert-a-mark-to-a-timestamp
And it doesn't say anything about the timeOrigin either. #3.1 mentions the negative error but otherwise it just takes that values without doing anything (#3.2)

Also looking at the DOMHighResTimeStamp spec, it says:

The `DOMHighResTimeStamp` type is used to store a time value in milliseconds, measured relative from the time origin, global monotonic clock, or a time value that represents a duration between two `DOMHighResTimeStamp`s.

So I'm pretty sure that this code is invalid, they should've passed 0 there instead. Probably that's why we can't reproduce this issue as they fixed their bug on their website. BUT it doesn't mean that we should crash the browser when web devs make a mistake :)

I also tested this issue on macOS and can't reproduce this.

@justin, so my questions are:

  1. Why is this crash only reproducible on Windows and not on macOS or Linux?
  2. Can we do something similar to these platforms on Window too?

Some answers to your message also:

My thought right now is that the best way to go right now is to only create profile markers in response to performance.measure() calls in the case where the user doesn't provide any timestamps.

I don't think this is the best solution as we will lose a lot of markers. I would prefer to solve it another way if we can. Can we ignore if the values are unrealistic? Or do what the other platforms are doing?

Another option would be to continue to treat all timestamps as relative to timeOrigin and just decide not to create markers if they end up too far in the future.

This could be an easier solution which doesn't break break most of the markers. I think we can go with this solution if we can't find a better one.

I will capture a pernosco session to see how linux handles this and how we don't just crash there.

Flags: needinfo?(canaltinova) → needinfo?(jlink)
OS: Unspecified → Windows

Forgot to share the pernosco session, here's the one I captured: https://pernos.co/debug/rsQNWqWuQPyMl5uPZZWMZw/index.html
This is a session where it didn't crash.

This line is the execution of that performance.measure marker in pernosco.

The start value of that marker: 1732796586918495600
INT64_MAX: 9223372036854775807

So it's still correct that 1732796586918495600 < 9223372036854775807. I'm not so sure which values we are getting while crashing on Windows. I should take a look at that next.

As a quick update here, Nazim and I have discussed this offline. Based on that discussion, I think that the approach to take is to presume that all timestamps used in a performance.measure() call are relative to performance.timeOrigin (the default and probably most common behavior). If any timestamp is "too far" in the future, then we will just not create a profiler marker for that performance.measure() call. All other behavior will remain the same.

Also I should note that the repro case that Nazim cites above cites uses performance.now() as the end parameter whereas, in the original problem, performance.now() is passed as the duration parameter. In that case, the timestamps, if interpreted as relative to navigation start, are far in the future, but not so far that they trigger the error/crash during serialization to JSON.

Flags: needinfo?(jlink)
Assignee: nobody → jlink
Status: NEW → ASSIGNED
Attachment #9445957 - Attachment description: WIP: Bug 1925191: Don't create profiler markers for performance.measure() calls with timestamps that are too far in the → Bug 1925191: Don't create profiler markers for performance.measure() calls with timestamps that are too far in the future r=mstange
Severity: -- → S3
Priority: -- → P2

There are some r+ patches which didn't land and no activity in this bug for 2 weeks.
:jlink, could you have a look please?
If you still have some work to do, you can add an action "Plan Changes" in Phabricator.
For more information, please visit BugBot documentation.

Flags: needinfo?(mstange.moz)
Flags: needinfo?(jlink)

This is ready to land, as far as I know.

Flags: needinfo?(mstange.moz)

Ah yes, I just wanted to rebase locally to make sure that it would still build and I hadn't done that yet. Let me try that right now and then I can land it.

Flags: needinfo?(jlink)
Pushed by jlink@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/aa0dac3555d5 Moved the adding of profiler markers from performance.measure() calls into its own function r=mstange https://hg.mozilla.org/integration/autoland/rev/d6c8516f5c40 Don't create profiler markers for performance.measure() calls with timestamps that are too far in the future r=mstange
Status: ASSIGNED → RESOLVED
Closed: 3 months ago
Resolution: --- → FIXED
Target Milestone: --- → 140 Branch

This is fixed for me on the latest nightly. Thanks for the fix!

Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: