If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

Crash in LongestPhaseSelfTime

RESOLVED FIXED in Firefox 56

Status

()

Core
JavaScript: GC
--
critical
RESOLVED FIXED
4 months ago
5 days ago

People

(Reporter: calixte, Assigned: jonco)

Tracking

(Blocks: 1 bug, {crash, regression})

55 Branch
mozilla57
Unspecified
Windows 10
crash, regression
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox-esr52 unaffected, firefox53 unaffected, firefox54 unaffected, firefox55 wontfix, firefox56 fixed, firefox57 fixed)

Details

(Whiteboard: [clouseau][tbird crash], crash signature)

Attachments

(4 attachments)

(Reporter)

Description

4 months ago
This bug was filed from the Socorro interface and is 
report bp-6ff83aa6-587f-4952-81c6-a67f80170519.
=============================================================

There is 1 crash in nightly 55 with buildid 20170518030213. In analyzing the backtrace, the regression may have been introduced by patch [1] to fix bug 1361458.

[1] https://hg.mozilla.org/mozilla-central/rev?node=b2eb05d5fad2fea928f47d3e6a329024c0aaf70e
Flags: needinfo?(jcoppeard)
(Assignee)

Updated

4 months ago
See Also: → bug 1367055
(Assignee)

Comment 1

4 months ago
This shouldn't be possible.

I'm wondering if it's worth changing the way we store the times for each phase to only store self-time and not the sum of time under this phase.  That would make this go away although it wouldn't fix whatever is causing this.
49 crashes across 53 installations in the last 7 days, on 55.
56 is affected as well, updating the status flag.
status-firefox56: --- → affected
Let's check crashstats next week, once it has some more exposure on Beta.
Flags: needinfo?(miket)
(Assignee)

Comment 5

3 months ago
Created attachment 8879090 [details] [diff] [review]
bug1366217-add-stats-assertions

Not a fix, but adds some more assertions that:

1) Time increases monotonically
2) The current phase is only changed from the main thread

I made the former a release assert since the failure here is a release check with infrequent failures.  The latter will fail in testing if this is in fact a problem.
Assignee: nobody → jcoppeard
Flags: needinfo?(jcoppeard)
Attachment #8879090 - Flags: review?(sphink)
(Assignee)

Updated

3 months ago
Keywords: leave-open
Attachment #8879090 - Flags: review?(sphink) → review+

Comment 6

3 months ago
Pushed by jcoppeard@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/123b7b9082b2
Add assertions to GC stats code that time increases monotonically r=sfink
Flags: needinfo?(miket)

Comment 7

3 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/123b7b9082b2
(Reporter)

Comment 8

3 months ago
There is 1 crash ([1]) in nightly 56 with signature 'js::gcstats::Statistics::recordPhaseBegin'. The backtrace shows that the crash is due to a just added assertion.


[1] https://crash-stats.mozilla.com/report/index/df348caa-2951-4c18-8e12-599f80170622
Crash Signature: [@ LongestPhaseSelfTime] → [@ LongestPhaseSelfTime] [@ js::gcstats::Statistics::recordPhaseBegin ]
[@ js::gcstats::Statistics::endPhase] shows up in crash-stats as another crash after the assertion was added.
Crash Signature: [@ LongestPhaseSelfTime] [@ js::gcstats::Statistics::recordPhaseBegin ] → [@ LongestPhaseSelfTime] [@ js::gcstats::Statistics::recordPhaseBegin ] [@ js::gcstats::Statistics::endPhase]

Updated

3 months ago
Crash Signature: [@ LongestPhaseSelfTime] [@ js::gcstats::Statistics::recordPhaseBegin ] [@ js::gcstats::Statistics::endPhase] → [@ LongestPhaseSelfTime] [@ js::gcstats::Statistics::recordPhaseBegin ] [@ js::gcstats::Statistics::endPhase] [@ js::gcstats::Statistics::recordPhaseEnd]
(Assignee)

Comment 10

3 months ago
Created attachment 8884006 [details] [diff] [review]
bug1366217-stats-time

These asserts are failing rarely on Android.

The patch bodges the now time on Android if it is before the previous time and sets the aborted flag. 

I also did some cleanup around the aborted flag, making it so we don't report any telemetry if this flag is set (rather than only some telemetry).  We also checked this flag in endSlice() after it could have been cleared by endGC() so I moved the clearing to the end.
Attachment #8884006 - Flags: review?(sphink)
Comment on attachment 8884006 [details] [diff] [review]
bug1366217-stats-time

Review of attachment 8884006 [details] [diff] [review]:
-----------------------------------------------------------------

I really ought to figure out a good way to test that aborted flag. Thanks.
Attachment #8884006 - Flags: review?(sphink) → review+

Comment 12

3 months ago
Pushed by jcoppeard@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/b07957a01b98
Disregard GC stats if we get an inconsistent timestamp on Android r=sfink

Comment 13

2 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/b07957a01b98
Jon, is there anything useful to glean from crashstats in the past week?

The only crash with a comment says "Opening Google Docs": https://crash-stats.mozilla.com/report/index/34f7db77-2960-41ab-8105-885ae0170709
Flags: needinfo?(jcoppeard)
(Assignee)

Comment 15

2 months ago
Created attachment 8887418 [details] [diff] [review]
bug1366217-add-more-assertions

I'm not hopeful, but I added a few more assertions to the stats code in the hope of tracking this down, removed some dead code and applied the Android time leniency change to recording parallal tasks.
Attachment #8887418 - Flags: review?(sphink)
Comment on attachment 8887418 [details] [diff] [review]
bug1366217-add-more-assertions

Review of attachment 8887418 [details] [diff] [review]:
-----------------------------------------------------------------

Man, this thing is infuriating. And I thought one of the main points of TimeStamp was to have monotonic time. It's tempting to inject consistency checks into eg TimeStamp::Now() or something. (But it'd require a TLS read/write, so maybe not doable in a place where it would get wide enough testing.)
Attachment #8887418 - Flags: review?(sphink) → review+

Comment 17

2 months ago
Pushed by jcoppeard@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/9b93d7a06157
Add more assertions to help track down LongestPhaseSelfTime crashes r=sfink

Comment 18

2 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/9b93d7a06157
There's been one new hit in bug 1367055 since the latest diagnostic patches landed. Not sure if it'll help, though, since it was on an opt run.
https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-central&job_id=117054484
(Assignee)

Updated

2 months ago
Crash Signature: [@ LongestPhaseSelfTime] [@ js::gcstats::Statistics::recordPhaseBegin ] [@ js::gcstats::Statistics::endPhase] [@ js::gcstats::Statistics::recordPhaseEnd] → [@ LongestPhaseSelfTime] [@ LongestPhaseSelfTimeInMajorGC ] [@ js::gcstats::Statistics::recordPhaseBegin ] [@ js::gcstats::Statistics::endPhase] [@ js::gcstats::Statistics::recordPhaseEnd]
(Assignee)

Updated

2 months ago
Duplicate of this bug: 1385130
(Assignee)

Comment 21

2 months ago
Created attachment 8893302 [details] [diff] [review]
bug1366217-stats-assert

Crash stats shows that we're getting inconsistent timestamps on Windows too and that the checks in LongestPhaseSelfTime are still failing.

Here's a patch to make the checks assert in debug mode only and to ignore the timestamps in release mode.  It's better that we don't crash and just skip collecting some telemetry data.
Flags: needinfo?(jcoppeard)
Attachment #8893302 - Flags: review?(sphink)
Comment on attachment 8893302 [details] [diff] [review]
bug1366217-stats-assert

Review of attachment 8893302 [details] [diff] [review]:
-----------------------------------------------------------------

This is definitely good. I suppose to make it complete, we ought to have telemetry for how often this happens. I don't think it's anywhere near the rate at which it could explain our 1.4% massively long GC_MAX_PAUSE_MS, but if we had the measurement we could rule it out. And it would be nice to see the OS breakdown. That could be another bug.
Attachment #8893302 - Flags: review?(sphink) → review+
(Assignee)

Updated

2 months ago
See Also: → bug 1387368

Comment 23

2 months ago
Pushed by jcoppeard@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/95846f969c25
Discard GC stats if timing inconsistencies are found r=sfink
https://hg.mozilla.org/mozilla-central/rev/95846f969c25
status-firefox55: affected → wontfix
status-firefox57: --- → affected
#6 crash for Firefox 56.0b1. Needs uplift to beta
https://crash-stats.mozilla.com/topcrashers/?product=Firefox&version=56.0b1
Whiteboard: [clouseau] → [clouseau][tbird crash]

Comment 26

a month ago
now in 56.0b the new asserts are getting hit a couple of times more often:
https://crash-stats.mozilla.com/signature/?signature=TimeSince&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=moz_crash_reason&_sort=-date&page=1#reports
Crash Signature: [@ LongestPhaseSelfTime] [@ LongestPhaseSelfTimeInMajorGC ] [@ js::gcstats::Statistics::recordPhaseBegin ] [@ js::gcstats::Statistics::endPhase] [@ js::gcstats::Statistics::recordPhaseEnd] → [@ LongestPhaseSelfTime] [@ LongestPhaseSelfTimeInMajorGC ] [@ js::gcstats::Statistics::recordPhaseBegin ] [@ js::gcstats::Statistics::endPhase] [@ js::gcstats::Statistics::recordPhaseEnd] [@ TimeSince ]
(Assignee)

Comment 27

20 days ago
Comment on attachment 8893302 [details] [diff] [review]
bug1366217-stats-assert

Approval Request Comment
[Feature/Bug causing the regression]: Previous release assertions added by the first patch in this bug are failing.
[User impact if declined]: Possible crashes.
[Is this code covered by automated tests?]: Yes.
[Has the fix been verified in Nightly?]: Yes.
[Needs manual test from QE? If yes, steps to reproduce]: No.
[List of other uplifts needed for the feature/fix]: None.
[Is the change risky?]: No.
[Why is the change risky/not risky?]: This patch relaxes release mode assertions into debug mode ones and has been on m-c for 27 days.
[String changes made/needed]: None.
Attachment #8893302 - Flags: approval-mozilla-beta?
Blocks: 1396527
Comment on attachment 8893302 [details] [diff] [review]
bug1366217-stats-assert

This should help with a top crash, and it's been on Nightly for a while. Let's try uplift for beta 10.
Attachment #8893302 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Comment 29

14 days ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-beta/rev/83a250a0a345
status-firefox56: affected → fixed
(Assignee)

Comment 30

5 days ago
This should be fixed since the final patch landed.
Status: NEW → RESOLVED
Last Resolved: 5 days ago
Resolution: --- → FIXED
status-firefox57: affected → fixed
status-firefox-esr52: --- → unaffected
Keywords: leave-open
Target Milestone: --- → mozilla57
You need to log in before you can comment on or make changes to this bug.