Closed Bug 1754867 Opened 2 years ago Closed 2 years ago

Nullptr crash due to reentrancy in ~AutoTaskDispatcher

Tracking

()

Status:

RESOLVED FIXED

Milestone:

99 Branch

Tracking Flags:

Tracking

Status

firefox99

---

fixed

People

(Reporter: bwc, Assigned: bwc)

References

Details

Crash Data

Attachments

(1 file)

Bug 1754867: Use unique_ptr instead of Maybe for mTailDispatcher to avoid reentrancy. r?nika 2 years ago Byron Campen [:bwc] 48 bytes, text/x-phabricator-request		Details \| Review

Byron Campen [:bwc]

Assignee

Description

•

2 years ago

See bug 1748333 (comment 13 and comment 33)

Byron Campen [:bwc]

Assignee

Updated

•

2 years ago

Comment 1

•

2 years ago

This looks like just a nullptr crash, and it is pretty hard to trigger. Rate on crash-stats is pretty low:

https://crash-stats.mozilla.org/signature/?signature=mozilla%3A%3AAutoTaskDispatcher%3A%3ADispatchTasksFor&date=%3E%3D2022-01-10T20%3A47%3A00.000Z&date=%3C2022-02-10T20%3A47%3A00.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_columns=startup_crash&_sort=-date&page=1#reports

Severity: -- → S3

Crash Signature: AutoTaskDispatcher::DispatchTasksFor

Priority: -- → P3

Byron Campen [:bwc]

Assignee

Updated

•

2 years ago

Crash Signature: AutoTaskDispatcher::DispatchTasksFor → [@ AutoTaskDispatcher::DispatchTasksFor ]

Byron Campen [:bwc]

Assignee

Updated

•

2 years ago

Crash Signature: [@ AutoTaskDispatcher::DispatchTasksFor ] → [@ mozilla::AutoTaskDispatcher::DispatchTasksFor ]

Byron Campen [:bwc]

Assignee

Comment 2

•

2 years ago

There's a few things we could blame here:

This code results in ~AutoTaskDispatcher going reentrant and calling mTailDispatcher.DispatchTasksFor, because Maybe::reset invokes the d'tor first, and then marks itself as containing Nothing:

https://searchfox.org/mozilla-central/rev/9f61d854547cedbde0773b2893e4f925352be3b3/xpcom/threads/AbstractThread.cpp#196
https://searchfox.org/mozilla-central/rev/9f61d854547cedbde0773b2893e4f925352be3b3/mfbt/Maybe.h#639,642

This is pretty inconsistent with the way things like unique_ptr work, which is basically how we're using mTailDispatcher. I think that this is probably something we need to fix, one way or another, even if there are other things we should be fixing here.

It is probably not ideal that we are setting up a situation where reentrancy could occur in the first place. It is harder to say what we could do to fix this, though.

Byron Campen [:bwc]

Assignee

Updated

•

2 years ago

Summary: Fuzzing nullptr crash due to reentrancy in ~AutoTaskDispatcher → Nullptr crash due to reentrancy in ~AutoTaskDispatcher

Byron Campen [:bwc]

Assignee

Comment 3

•

2 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=d76839bf1a0d8a2002ce7cb73babd1e5bbb348ef
https://treeherder.mozilla.org/#/jobs?repo=try&revision=467107ab96add2b7eb9964ac9f1b2751c1ef5248
https://treeherder.mozilla.org/#/jobs?repo=try&revision=8654ce42f50ac3d20d3f28832c95055f866cf596

Byron Campen [:bwc]

Assignee

Comment 4

•

2 years ago

Attached file Bug 1754867: Use unique_ptr instead of Maybe for mTailDispatcher to avoid reentrancy. r?nika — Details

The core of the problem here is that Maybe::reset invokes the d'tor first, and
then clears the value, whereas unique_ptr::reset does the opposite.

Phabricator Automation

Updated

•

2 years ago

Assignee: nobody → docfaraday

Status: NEW → ASSIGNED

Byron Campen [:bwc]

Assignee

Comment 5

•

2 years ago

webrtc jobs look ok, although this change could effect pretty much any feature.

Byron Campen [:bwc]

Assignee

Comment 6

•

2 years ago

Trying to determine whether anything has regressed in mochitest writ large:

https://treeherder.mozilla.org/jobs?repo=try&resultStatus=testfailed%2Cbusted%2Cexception%2Cretry%2Cusercancel%2Crunning%2Cpending%2Crunnable&revision=d742e72b4e1f3a6c9583801f45b180f5caf86670

Here's the baseline push:

https://treeherder.mozilla.org/jobs?repo=try&duplicate_jobs=visible&revision=2ba69bced06adce1d6968022f34fb6ea473f37a2

So many oranges...

Byron Campen [:bwc]

Assignee

Comment 7

•

2 years ago

Here's a failure caused by this bug on try.

https://treeherder.mozilla.org/jobs?repo=try&revision=46272de3f29201f032e7aebbff0786d8cef1a7a2&selectedTaskRun=GhM1Z3ttRs23CyAb-IoL4Q.0

BugBot [:suhaib / :marco/ :calixte]

Updated

•

2 years ago

Crash Signature: [@ mozilla::AutoTaskDispatcher::DispatchTasksFor ] → [@ mozilla::AutoTaskDispatcher::DispatchTasksFor ] [@ mozilla::AutoTaskDispatcher::DispatchTasksFor(mozilla::AbstractThread*)]

Byron Campen [:bwc]

Assignee

Comment 9

•

2 years ago

•

Edited

Seeing a lot more failures in test-windows7-32-qr/opt-mochitest-browser-chrome-fis-e10s-5 (including several novel-looking failures at the end of js/xpconnect/tests/browser):

With changes:
https://treeherder.mozilla.org/jobs?repo=try&selectedTaskRun=O9J8VfX1SMO2rix7BmGq0g.0&revision=d742e72b4e1f3a6c9583801f45b180f5caf86670

Without changes:
https://treeherder.mozilla.org/jobs?repo=try&duplicate_jobs=visible&revision=2ba69bced06adce1d6968022f34fb6ea473f37a2&selectedTaskRun=TUnn-v3iTM-DvwkRIns2pQ.0

Here's a try push focusing on js/xpconnect/tests/browser. Maybe I can get better logging of this problem on some other platform.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=c9572c441295de22e708579498b95715e7f228d7

Byron Campen [:bwc]

Assignee

Comment 10

•

2 years ago

Yeah, that only seems to happen on windows7 jobs, and it does not appear that we have any windows7 debug jobs even with --full...

Crash Signature: [@ mozilla::AutoTaskDispatcher::DispatchTasksFor ] [@ mozilla::AutoTaskDispatcher::DispatchTasksFor(mozilla::AbstractThread*)] → [@ mozilla::AutoTaskDispatcher::DispatchTasksFor ] [@ mozilla::AutoTaskDispatcher::DispatchTasksFor(mozilla::AbstractThread*)]

Byron Campen [:bwc]

Assignee

Comment 11

•

2 years ago

Any idea why this patch might be causing a new hang-on-finish in the xpconnect bc tests?

https://treeherder.mozilla.org/jobs?repo=try&revision=c9572c441295de22e708579498b95715e7f228d7&selectedTaskRun=dhoRVCkFTESygEZfZopjLw.0

Flags: needinfo?(nika)

Nika Layzell [:nika] (ni? for response)

Comment 12

•

2 years ago

I'm not sure why that would be happening. I do notice that the log has the line [task 2022-02-14T23:20:25.127Z] 23:20:25 INFO - GECKO(3384) | [Exiting due to channel error. (https://treeherder.mozilla.org/logviewer?job_id=367873398&repo=try&lineNumber=1955) which suggests that the parent process crashed in some way & lost connection to a child process unexpectedly (child processes log that when the parent process disappears unexpectedly & they quick-exit), though that doesn't line up super well with the application hanging.

Given that this failure is occuring on opt builds, we wouldn't have deadlock detection running, so one of the most likely options here is that we're encountering a deadlock somewhere in the parent process. Unfortunately I don't really have any idea where that would be.

Flags: needinfo?(nika)

Byron Campen [:bwc]

Assignee

Comment 13

•

2 years ago

Hmm. The work in progress over in bug 1755318 makes this happen far more often on try:

https://treeherder.mozilla.org/jobs?repo=try&revision=46df533e014f3fda1f17c0d5f5cc0b447e658f52

I think this will have to block bug 1755318.

Blocks: 1755318

Byron Campen [:bwc]

Assignee

Updated

•

2 years ago

Severity: S3 → S2

Priority: P3 → P2

Byron Campen [:bwc]

Assignee

Comment 14

•

2 years ago

Seems that the issue does not reproduce when I hack win7 opt to be more in line with a debug build:

https://treeherder.mozilla.org/jobs?repo=try&revision=a7a34bddcf240948fa697573bcc7a8c25da1bc59

It is possible that disabling assertions prevents the issue, but when assertions are left enabled, we hit assertions and crash before we even run any tests.

It may be that I need to selectively enable just the debugging features I want (eg; just deadlock detection).

Byron Campen [:bwc]

Assignee

Comment 15

•

2 years ago

•

Edited

Actually, there's another possibility; maybe the rebase I just did fixed the issue. Here's a try push for that:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=427fca1b66223e6904e0603f2884b4fdabeb6599

Edit: Well that's a lucky break. A rebase has either stopped the issue, or made it so uncommon that it doesn't show up in 20 runs.

Pulsebot

Comment 16

•

2 years ago

Pushed by bcampen@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/68192b01f87d
Use unique_ptr instead of Maybe for mTailDispatcher to avoid reentrancy. r=nika

Cristian Tuns

Comment 17

•

2 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/68192b01f87d

Status: ASSIGNED → RESOLVED

Closed: 2 years ago

status-firefox99: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 99 Branch

Comment hidden (Intermittent Failures Robot)

Mathew Hodson

Updated

•

2 years ago

Component: WebRTC: Audio/Video → XPCOM

Mathew Hodson

Updated

•

2 years ago

Blocks: 1752959

Mathew Hodson

Updated

•

2 years ago

No longer blocks: 1752959