Closed Bug 1712357 Opened 4 years ago Closed 4 years ago

Intermittent org.mozilla.geckoview.test.NavigationDelegateTest.onNewSession_childShouldLoad | application crashed [@ mozilla::dom::ContentParent::Pid() const]

Tracking

()

Status:

RESOLVED FIXED

Milestone:

91 Branch

Tracking Flags:

Tracking

Status

firefox-esr78

---

unaffected

firefox88

---

wontfix

firefox89

---

wontfix

firefox90

fixed

firefox91

fixed

People

(Reporter: mccr8, Assigned: mccr8)

References

Details

(4 keywords, Whiteboard: [post-critsmash-triage][adv-main90+r])

Crash Data

Attachments

(1 file)

Bug 1712357 - Don't adjust the priority of ContentParents that are going away. 4 years ago Andrew McCreight [:mccr8] 48 bytes, text/x-phabricator-request	RyanVM : approval-mozilla-beta+	Details \| Review

Andrew McCreight [:mccr8]

Assignee

Description

•

4 years ago

As seen in bug 1710153, this UAF has continued. In bug 1709931 I landed a patch to ensure that the timer is shut down when the particular process manager is destroyed, but looking at the stack again I think the issue is that the ContentParent we're calling Pid() on has been destroyed.

One possible reason for this is that we failed to get the observer service. I'm not sure why it would be intermittently unavailable, but this is a unit test, and the priority manager might start up early, so I guess it could happen? It is interesting that the failures seem to be entirely in debug builds, even though we seem to also run this test with opt. If it is some startup race, maybe we lose more often with a slower debug build?

Andrew McCreight [:mccr8]

Assignee

Comment 1

•

4 years ago

I'll try to confirm this hypothesis by asserting that the observer service exists in both the priority manager and the place in ContentParent where the observation notice is sent. I can work around this particular failure by making a direct call instead of using the observer service, though maybe there are other things that will need to be fixed if the issue is ContentParent failing to send ipc:content-shutdown.

Andrew McCreight [:mccr8]

Assignee

Comment 2

•

4 years ago

I added asserts for the observer service, but the crash still happens: https://treeherder.mozilla.org/logviewer?job_id=340470750&repo=try&lineNumber=8096

I'll try taking a more direct approach and crash if we destroy a ContentParent while it is still referenced.

Andrew McCreight [:mccr8]

Assignee

Comment 3

•

4 years ago

I think I've figured out what is going wrong here. I added some instrumentation to the process priority manager to track the content parent ids we've seen in ProcessPriorityManagerImpl::ObserveContentParentDestroyed, then asserts if we see one of those again later in GetParticularProcessPriorityManager. In a try run, that seemed to fail 100% of the time, with this stack:

0 ProcessPriorityManagerImpl::GetParticularProcessPriorityManager(mozilla::dom::ContentParent*) [ProcessPriorityManager.cpp : 405 + 0x29]
1 mozilla::ProcessPriorityManager::ActivityChanged(mozilla::dom::BrowserParent*, bool) [ProcessPriorityManager.cpp: 925 + 0x16]
2 mozilla::dom::BrowserParent::DestroyInternal() [BrowserParent.cpp: 603 + 0x5]
3 mozilla::dom::BrowserParent::Destroy() [BrowserParent.cpp: 628 + 0x8]
4 mozilla::dom::BrowserBridgeParent::Destroy() [BrowserBridgeParent.cpp: 132 + 0x10]
5 mozilla::ipc::IProtocol::DestroySubtree(mozilla::ipc::IProtocol::ActorDestroyReason) [ProtocolUtils.cpp: 623 + 0xc]
6 mozilla::dom::PBrowserBridgeParent::OnMessageReceived(IPC::Message const&) [PBrowserBridgeParent.cpp: 321 + 0x8]

So, the problem is that we tell the priority manager that a content parent is going away, which destroys the particular process priority manager (PPPM). Later, we destroy the browser parent. This makes the browser parent tell the priority manager that the browser parent is no longer active, which makes GetParticularProcessPriorityManager create a new PPPM for the content parent that has already been destroyed. This new PPPM has a priority of UNKNOWN, so we do a ScheduleResetPriority() to turn the priority to BACKGROUND, which creates a timer.

I'm not sure what goes wrong after that, but I expect there's some kind of race between destroying the ContentParent, running the timer, and canceling the timer. We must usually cancel all timers before we run the PPPM timer, as I can't see how we'd win the race otherwise.

I added the call to ActivityChanged to BrowserParent::Deactivated() in bug 1704545, so maybe that's the regressor? Though I'm not sure why the crash didn't show up earlier.

One fix would be to change ActivityChanged to not create a new PPPM if aIsActive is false. The goal of the call is really just to tell the priority manager to stop considering the BrowserParent when calculating the priority, but if the priority manager has no record of the process, clearly it is already not doing so.

Andrew McCreight [:mccr8]

Assignee

Comment 4

•

4 years ago

I guess the easier thing to do is to just never create a new PPPM in GetParticularProcessPriorityManager if ContentParent::IsDead() returns true.

Andrew McCreight [:mccr8]

Assignee

Comment 5

•

4 years ago

Attached file Bug 1712357 - Don't adjust the priority of ContentParents that are going away. — Details

Andrew McCreight [:mccr8]

Assignee

Comment 6

•

4 years ago

I looked in crash stats for at crashes that contain some of the frames here and I didn't see any evidence of UAF poison values.

Andrew McCreight [:mccr8]

Assignee

Comment 7

•

4 years ago

https://hg.mozilla.org/integration/autoland/rev/671001812b0fb13cb699697ad14207525e58101a

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 8

•

4 years ago

Don't adjust the priority of ContentParents that are going away. r=kmag
https://hg.mozilla.org/mozilla-central/rev/671001812b0f

Group: dom-core-security → core-security-release

Status: NEW → RESOLVED

Closed: 4 years ago

status-firefox91: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 91 Branch

Brindusa Tot, Desktop QA

Updated

•

4 years ago

Flags: qe-verify-

Whiteboard: [post-critsmash-triage]

Andrew McCreight [:mccr8]

Assignee

Updated

•

4 years ago

Blocks: 1710153

Andrew McCreight [:mccr8]

Assignee

Updated

•

4 years ago

status-firefox90: --- → affected

Andrew McCreight [:mccr8]

Assignee

Comment 9

•

4 years ago

Comment on attachment 9224481 [details]
Bug 1712357 - Don't adjust the priority of ContentParents that are going away.

Beta/Release Uplift Approval Request

User impact if declined: There's no evidence that this is affecting anybody in practice, but maybe it could cause a security issue on Windows, so I think it would be good to fix just in case.
Is this code covered by automated tests?: Yes
Has the fix been verified in Nightly?: Yes
Needs manual test from QE?: No
If yes, steps to reproduce:
List of other uplifts needed: None
Risk to taking this patch: Low
Why is the change risky/not risky? (and alternatives if risky): It just stops us from running code in a situation where we really don't want to run it.
String changes made/needed: none

Attachment #9224481 - Flags: approval-mozilla-beta?

Ryan VanderMeulen [:RyanVM]

Updated

•

4 years ago

status-firefox88: --- → wontfix

status-firefox89: --- → wontfix

status-firefox-esr78: --- → wontfix

Andrew McCreight [:mccr8]

Assignee

Updated

•

4 years ago

status-firefox-esr78: wontfix → unaffected

Ryan VanderMeulen [:RyanVM]

Comment 10

•

4 years ago

Comment on attachment 9224481 [details]
Bug 1712357 - Don't adjust the priority of ContentParents that are going away.

Approved for 90.0b4.

Attachment #9224481 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Ryan VanderMeulen [:RyanVM]

Comment 11

•

4 years ago

uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/621b344d28c8

status-firefox90: affected → fixed

tracking-firefox90: --- → +

tracking-firefox91: --- → +

Andrew McCreight [:mccr8]

Assignee

Comment 12

•

4 years ago

It looks like there were at least 9 crashes like this on Fenix in the wild, down to the poison values. None after the 20210527092758 build.

bp-9ddecfe2-3f48-48bc-9346-5267f0210603

Tom Ritter [:tjr]

Updated

•

4 years ago

Whiteboard: [post-critsmash-triage] → [post-critsmash-triage][adv-main90+r]

Daniel Veditz [:dveditz]

Updated

•

3 years ago

Group: core-security-release

You need to log in before you can comment on or make changes to this bug.