Closed Bug 1925191 Opened 10 months ago Closed 3 months ago

Crash in [@ mozilla::baseprofiler::SpliceableJSONWriter::TimeDoubleMsProperty ]

Tracking

()

Status:

VERIFIED FIXED

Milestone:

140 Branch

Tracking Flags:

Tracking

Status

firefox-esr115

---

unaffected

firefox-esr128

---

wontfix

firefox131

---

wontfix

firefox132

---

wontfix

firefox133

---

wontfix

firefox134

---

wontfix

firefox138

---

wontfix

firefox139

---

wontfix

firefox140

---

fixed

People

(Reporter: mayankleoboy1, Assigned: jlink)

References

(Regression)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(2 files)

Bug 1925191: Moved the adding of profiler markers from performance.measure() calls into its own function r=mstange 8 months ago Justin Link 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1925191: Don't create profiler markers for performance.measure() calls with timestamps that are too far in the future r=mstange 8 months ago Justin Link 48 bytes, text/x-phabricator-request		Details \| Review

Mayank Bansal

Reporter

Description

•

10 months ago

•

Edited

I got a crash while profiling some code today.

https://crash-stats.mozilla.org/report/index/989683f2-4ce6-430d-b7f2-f6bc30241017#tab-bugzilla

Steps:
Start the profile
Go to https://photoshop.adobe.com/discover
Let the profiler run for 2-3 seconds
If a popup appears to allow "storage" by the app, deny it.
Continue to profile till something loads.
Stop profiling and capture the profiler.

AR: The adobe app tab crashes
ER: Not so

Steps are reproducible.

Mayank Bansal

Reporter

Comment 1

•

10 months ago

Bisection: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=09a138b12b3b18e85b4234788a6f92fa756908af&tochange=a097691d609be9c8917fd042a4a0054dcdf7a5e1

Suspects: Bug 1841352 but really not sure.

Has STR: --- → yes

Keywords: crash, regression

BugBot [:suhaib / :marco/ :calixte]

Comment 2

•

10 months ago

This bug has been marked as a regression. Setting status flag for Nightly to affected.

status-firefox133: --- → affected

Donal Meehan [:dmeehan]

Comment 3

•

10 months ago

Setting Bug 1841352 as the regressor, please correct if needed.

status-firefox131: --- → wontfix

status-firefox132: --- → wontfix

status-firefox-esr115: --- → unaffected

status-firefox-esr128: --- → affected

Flags: needinfo?(jlink)

Regressed by: 1841352

Justin Link

Assignee

Comment 4

•

10 months ago

I have been able to reproduce this a crash in the debugger following the above instructions. My call stack is a little bit different (and my code is a bit out-of-date), but it's a crash and I suspect that it's the same/related. I'll dig in a little bit and also try reverting my change to see if it is related.

Flags: needinfo?(jlink)

Justin Link

Assignee

Comment 5

•

10 months ago

Small adjustment to the above comment - what I've got in the debugger does have the same call stack as the crash reported.

BugBot [:suhaib / :marco/ :calixte]

Comment 6

•

10 months ago

Set release status flags based on info from the regressing bug 1841352

status-firefox134: --- → affected

Donal Meehan [:dmeehan]

Updated

•

10 months ago

status-firefox133: affected → fix-optional

Justin Link

Assignee

Comment 7

•

10 months ago

I've looked at this for a bit and I think I have a pretty good idea what's going on. Here is a quick explanation of what's happening:

The crash (actually a MOZ_RELEASE_ASSERT) is coming from attempting to add a marker to the profile and the start (and end) of that marker are more than ~INT64_MAX nanoseconds into the future.

We create that profile marker as a result of a call to performance.measure() with the arguments '${e}-${t}',{start:performance.timeOrigin,duration:performance.now()}) (coming from here).

The code that creates the profile marker assumes that the timestamps it receives are relative to performance.timeOrigin and thus adds the navigation start value to the timestamps that it receives in order to get absolute timestamps. In this particular case, the timestamps are not relative to performance.timeOrigin but rather are already absolute, so adding in the navigation start time pushes them way into the future.

The MDN entry for performance.timeOrigin made me initially think that all timestamps are supposed to be relative to timeOrigin, but the entry for DOMHighResTimeStamp says that the timestamps "... can be used to describe a discrete point in time or a time interval (the difference in time between two discrete points in time). The starting time can be either a specific time determined by the script for a site or app, or the time origin."

This makes me think that mapping a user-provided DOMHighResTimeStamp to an absolute point in time is not possible (without knowing how it was calculated) so I'm not sure that we can reliably create a profiler marker from all performance.measure() calls. On the other hand, in the (common?) case where the user does not provide any timestamps we know that they are relative to timeOrigin so we can confidently handle that case correctly.

My thought right now is that the best way to go right now is to only create profile markers in response to performance.measure() calls in the case where the user doesn't provide any timestamps. That would ensure that we only put correct markers in the profile but will mean that some calls to performance.measure() don't end up in the profile, which could be confusing. Another option would be to continue to treat all timestamps as relative to timeOrigin and just decide not to create markers if they end up too far in the future.

We could also try something hacky where, if the timestamp is less than, say, half of the navigation start time, we treat it as relative to the navigation start time but otherwise we treat it as absolute. That should work here, and probably for any timestamps that are absolute or relative to timeOrigin, but would produce (probably subtly) incorrect markers in the profile for performance.measure() calls that use timestamps that are relative to something else, which would also be confusing.

I'd be interested in hearing any feedback on which option(s) are desirable or to know if I've made a mistake in my interpretations of the specifications.

Justin Link

Assignee

Comment 8

•

10 months ago

Oh, also I should add that this crash did, in fact, start happening only after my fix for timestamp comparisons that was cited as the regressor but the behavior was incorrect before that (it just didn't get caught by anything).

The reason for this is that, prior to my change, timestamp comparisons/subtractions were broken for timestamps that differ by more than ~INT64_MAX ticks. With those fixed, subtracting two timestamps that are far apart correctly results in a TimeDuration that is capped at the maximum, which is then interpreted as "infinite" when converted to a double representation (seconds, milliseconds, etc) which fails the comparison in the MOZ_RELEASE_ASSERT.

Justin Link

Assignee

Updated

•

10 months ago

Flags: needinfo?(canaltinova)

Nazım Can Altınova [:canova][:canaltinova on phabricator]

Comment 9

•

9 months ago

Hey Justin, so sorry for the late response. I finally got around to this.

It looks like this has been fixed in the Adobe's side so it's not reproducible anymore (their code was incorrect, see my message below for details), but I managed to create a smallest case:

data:text/html,<html><script>performance.measure('testing', { start: performance.timeOrigin, end: performance.now() });</script></html>

STR is the same as the initial message but instead of Adobe's website, use this.

I did some investigation:

I'm confused as to why Adobe decided to use performance.timeOrigin as their start time. Looking at the spec, it doesn't mention anything about the timeOrigin while computing the start time: https://w3c.github.io/user-timing/#dom-performance-measure
In this link, see the list item #3.1:

If startOrMeasureOptions is a PerformanceMeasureOptions object, and if its start member exists, let start time be the value returned by running the convert a mark to a timestamp algorithm passing in startOrMeasureOptions's start.

"convert a mark to a timestamp" algorithm is here: https://w3c.github.io/user-timing/#dfn-convert-a-mark-to-a-timestamp
And it doesn't say anything about the timeOrigin either. #3.1 mentions the negative error but otherwise it just takes that values without doing anything (#3.2)

Also looking at the DOMHighResTimeStamp spec, it says:

The `DOMHighResTimeStamp` type is used to store a time value in milliseconds, measured relative from the time origin, global monotonic clock, or a time value that represents a duration between two `DOMHighResTimeStamp`s.

So I'm pretty sure that this code is invalid, they should've passed 0 there instead. Probably that's why we can't reproduce this issue as they fixed their bug on their website. BUT it doesn't mean that we should crash the browser when web devs make a mistake :)

I also tested this issue on macOS and can't reproduce this.

@justin, so my questions are:

Why is this crash only reproducible on Windows and not on macOS or Linux?
Can we do something similar to these platforms on Window too?

Some answers to your message also:

My thought right now is that the best way to go right now is to only create profile markers in response to performance.measure() calls in the case where the user doesn't provide any timestamps.

I don't think this is the best solution as we will lose a lot of markers. I would prefer to solve it another way if we can. Can we ignore if the values are unrealistic? Or do what the other platforms are doing?

Another option would be to continue to treat all timestamps as relative to timeOrigin and just decide not to create markers if they end up too far in the future.

This could be an easier solution which doesn't break break most of the markers. I think we can go with this solution if we can't find a better one.

I will capture a pernosco session to see how linux handles this and how we don't just crash there.

Flags: needinfo?(canaltinova) → needinfo?(jlink)

Nazım Can Altınova [:canova][:canaltinova on phabricator]

Updated

•

9 months ago

OS: Unspecified → Windows

Nazım Can Altınova [:canova][:canaltinova on phabricator]

Comment 10

•

9 months ago

Forgot to share the pernosco session, here's the one I captured: https://pernos.co/debug/rsQNWqWuQPyMl5uPZZWMZw/index.html
This is a session where it didn't crash.

Nazım Can Altınova [:canova][:canaltinova on phabricator]

Comment 11

•

9 months ago

This line is the execution of that performance.measure marker in pernosco.

The start value of that marker: 1732796586918495600
INT64_MAX: 9223372036854775807

So it's still correct that 1732796586918495600 < 9223372036854775807. I'm not so sure which values we are getting while crashing on Windows. I should take a look at that next.

Jens Stutte [:jstutte]

Updated

•

8 months ago

status-firefox133: fix-optional → wontfix

status-firefox134: affected → fix-optional

Justin Link

Assignee

Comment 12

•

8 months ago

As a quick update here, Nazim and I have discussed this offline. Based on that discussion, I think that the approach to take is to presume that all timestamps used in a performance.measure() call are relative to performance.timeOrigin (the default and probably most common behavior). If any timestamp is "too far" in the future, then we will just not create a profiler marker for that performance.measure() call. All other behavior will remain the same.

Also I should note that the repro case that Nazim cites above cites uses performance.now() as the end parameter whereas, in the original problem, performance.now() is passed as the duration parameter. In that case, the timestamps, if interpreted as relative to navigation start, are far in the future, but not so far that they trigger the error/crash during serialization to JSON.

Flags: needinfo?(jlink)

Justin Link

Assignee

Comment 13

•

8 months ago

Attached file Bug 1925191: Moved the adding of profiler markers from performance.measure() calls into its own function r=mstange — Details

Phabricator Automation

Updated

•

8 months ago

Assignee: nobody → jlink

Status: NEW → ASSIGNED

Justin Link

Assignee

Comment 14

•

8 months ago

Attached file Bug 1925191: Don't create profiler markers for performance.measure() calls with timestamps that are too far in the future r=mstange — Details

future r=mstange

Depends on D233308

Phabricator Automation

Updated

•

8 months ago

Attachment #9445957 - Attachment description: WIP: Bug 1925191: Don't create profiler markers for performance.measure() calls with timestamps that are too far in the → Bug 1925191: Don't create profiler markers for performance.measure() calls with timestamps that are too far in the future r=mstange

Nazım Can Altınova [:canova][:canaltinova on phabricator]

Updated

•

7 months ago

Severity: -- → S3

Priority: -- → P2

BugBot [:suhaib / :marco/ :calixte]

Comment 15

•

4 months ago

There are some r+ patches which didn't land and no activity in this bug for 2 weeks.
:jlink, could you have a look please?
If you still have some work to do, you can add an action "Plan Changes" in Phabricator.
For more information, please visit BugBot documentation.

Flags: needinfo?(mstange.moz)

Flags: needinfo?(jlink)

Markus Stange [:mstange]

Comment 16

•

3 months ago

This is ready to land, as far as I know.

Flags: needinfo?(mstange.moz)

Justin Link

Assignee

Comment 17

•

3 months ago

Ah yes, I just wanted to rebase locally to make sure that it would still build and I hadn't done that yet. Let me try that right now and then I can land it.

Flags: needinfo?(jlink)

Pulsebot

Comment 18

•

3 months ago

Pushed by jlink@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/aa0dac3555d5 Moved the adding of profiler markers from performance.measure() calls into its own function r=mstange https://hg.mozilla.org/integration/autoland/rev/d6c8516f5c40 Don't create profiler markers for performance.measure() calls with timestamps that are too far in the future r=mstange

Donal Meehan [:dmeehan]

Updated

•

3 months ago

status-firefox134: fix-optional → wontfix

status-firefox138: --- → wontfix

status-firefox139: --- → wontfix

status-firefox140: --- → affected

status-firefox-esr128: affected → wontfix

Cristina Horotan [:chorotan]

Comment 19

•

3 months ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/aa0dac3555d5
https://hg.mozilla.org/mozilla-central/rev/d6c8516f5c40

Status: ASSIGNED → RESOLVED

Closed: 3 months ago

status-firefox140: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → 140 Branch

Mayank Bansal

Reporter

Comment 20

•

3 months ago

This is fixed for me on the latest nightly. Thanks for the fix!

Status: RESOLVED → VERIFIED

You need to log in before you can comment on or make changes to this bug.