Closed Bug 1523956 Opened 6 years ago Closed 6 years ago

Crash in RtlAcquireSRWLockExclusive | mozilla::detail::MutexImpl::lock | mozilla::gfx::VsyncSource::Display::NotifyVsync

Categories

(Core :: Graphics, defect, P2)

66 Branch
All
Windows
defect

Tracking

()

RESOLVED FIXED
mozilla67
Tracking Status
firefox-esr60 --- unaffected
firefox65 --- unaffected
firefox66 + fixed
firefox67 --- fixed

People

(Reporter: philipp, Assigned: kats)

References

Details

(Keywords: crash, regression)

Crash Data

Attachments

(1 file)

This bug is for crash report bp-a1b74f48-d272-42a8-b228-991990190130.

Top 10 frames of crashing thread:

0 ntdll.dll RtlAcquireSRWLockExclusive 
1 mozglue.dll mozilla::detail::MutexImpl::lock mozglue/misc/Mutex_windows.cpp:22
2 xul.dll mozilla::gfx::VsyncSource::Display::NotifyVsync gfx/thebes/VsyncSource.cpp:69
3 xul.dll void D3DVsyncSource::D3DVsyncDisplay::VBlankLoop gfx/thebes/gfxWindowsPlatform.cpp:1880
4 xul.dll nsresult mozilla::detail::RunnableMethodImpl< xpcom/threads/nsThreadUtils.h:1171
5 xul.dll bool MessageLoop::DeferOrRunPendingTask ipc/chromium/src/base/message_loop.cc:450
6 xul.dll MessageLoop::DoWork ipc/chromium/src/base/message_loop.cc:523
7 xul.dll base::MessagePumpDefault::Run ipc/chromium/src/base/message_pump_default.cc:35
8 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:308
9 xul.dll base::Thread::ThreadMain ipc/chromium/src/base/thread.cc:192

these browser crash signatures are regressing (at least in volume) in firefox 66. there isn't too much data for this yet since 66 started into the beta cycle just yesterday but one crash comment mentioned it took place while livestreaming on a video site.

Tracking for 66 to keep an eye on it, since there are significant crashes in beta 3.

Assigning to myself to keep an eye on it

Assignee: nobody → jbonisteel
Priority: -- → P2

It looks as if all of these crashes happened during startup, to there are no comments or URLs.

Is there anyone who can investigate further? This is a bit worrying for 66 release.

Flags: needinfo?(jbonisteel)

(In reply to Liz Henry (:lizzard) (use needinfo) from comment #4)

Is there anyone who can investigate further? This is a bit worrying for 66 release.

Will figure out who could look at this

Flags: needinfo?(jbonisteel)
Assignee: jbonisteel → jmuizelaar

So it looks like we're passing in garbage to the mutex lock in RefreshTimerVsyncDispatcher::NotifyVsync(). I guess something is not initialized here.

Bug 1503339 touched some vsync code in 66, might be related.

Most likely because mVsyncDispatcher is null

Assignee: jmuizelaar → kats

This was all for stuff that's now disabled on 66 so if necessary we could just back things out from 66?

So I think one possibility here is if thread A calls MoveListenersToNewSource and acquires the lock, and then thread B calls NotifyVsync and blocks on the same lock and then after thread A releases the lock, thread B runs the function while mRefreshTimerVsyncDispatcher is nullptr. It's calling a nonvirtual function so presumably it could result in the crash seen here.

The data I'm seeing is consistent with my theory in comment 10. Specifically:

  • the codepath in question is invoked via ReinitFrameRate() which should generally run only on startup. The crashes are mostly startup crashes.

  • ReinitFrameRate should only run if we're deciding to flip the frame rate from default to 30fps or vice-versa, which in turn should only happen for beta users if they update from a 65 beta into an early 66 beta (in which case we'd go from default -> 30fps) or from an early 66 beta to a late 66 beta (in which case we'd go from 30fps back to default). The 1:1 install:crash ratio supports this, as do the number of crashes steadily declining from b3 to b6 (since users are progressively less likely to update from 65 beta to a higher-numbered 66 beta).

I think a null check should be all that's needed to fix this. Building it locally now.

Pushed by kgupta@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/987fe1073099 Add a null check against mRefreshTimerVsyncDispatcher getting nulled concurrently. r=Gijs
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla67

Comment on attachment 9043710 [details]
Bug 1523956 - Add a null check against mRefreshTimerVsyncDispatcher getting nulled concurrently. r?Gijs

Beta/Release Uplift Approval Request

Feature/Bug causing the regression

Bug 1503339

User impact if declined

Possible crash due to race condition on low-end devices when we switch framerates

Is this code covered by automated tests?

No

Has the fix been verified in Nightly?

No

Needs manual test from QE?

No

If yes, steps to reproduce

List of other uplifts needed

none

Risk to taking this patch

Low

Why is the change risky/not risky? (and alternatives if risky)

Low risk fix, null pointer check. Main risk is that it doesn't actually fix the crash in which case we're no worse off than before.

String changes made/needed

none

Attachment #9043710 - Flags: approval-mozilla-beta?

The crashes seem to have gone away after unsetting EARLY_BETA_OR_EARLIER (and thus, presumably, turning off performance.adjust_to_machine. Do we need this on 66 anyway?

It's safer to put it in, in case somebody fiddles with the prefs manually. But could also skip it if you're uneasy about it.

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #18)

It's safer to put it in, in case somebody fiddles with the prefs manually.

Yes. Similar code will still run if users manually alter layout.frame_rate in about:config .

Comment on attachment 9043710 [details]
Bug 1523956 - Add a null check against mRefreshTimerVsyncDispatcher getting nulled concurrently. r?Gijs

add a null check to prevent crash, approved for 66.0b10

Attachment #9043710 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: