Closed Bug 1516425 Opened 5 years ago Closed 5 years ago

Crash in mozilla::layout::GetLayerManager

Categories

(Core :: Graphics: Layers, defect, P1)

65 Branch
x86
Windows
defect

Tracking

()

RESOLVED FIXED
mozilla68
Tracking Status
firefox-esr60 --- unaffected
firefox64 --- unaffected
firefox65 --- wontfix
firefox66 + wontfix
firefox67 + fixed
firefox68 --- fixed

People

(Reporter: philipp, Assigned: rhunt)

References

Details

(4 keywords, Whiteboard: [post-critsmash-triage][adv-main67+])

Crash Data

Attachments

(3 files)

This bug was filed from the Socorro interface and is
report bp-fd6e33b9-bd7e-4df3-8073-e02640181226.
=============================================================

Top 10 frames of crashing thread:

0 xul.dll static struct already_AddRefed<mozilla::layers::LayerManager> mozilla::layout::GetLayerManager layout/ipc/RenderFrame.cpp:33
1 xul.dll mozilla::layout::RenderFrame::AttachLayerManager layout/ipc/RenderFrame.cpp:106
2 xul.dll mozilla::layout::RenderFrame::OwnerContentChanged layout/ipc/RenderFrame.cpp:124
3 xul.dll void nsFrameLoader::SetOwnerContent dom/base/nsFrameLoader.cpp:1748
4 xul.dll nsFrameLoader::StartDestroy dom/base/nsFrameLoader.cpp:1582
5 xul.dll mozilla::dom::XULFrameElement::DestroyContent dom/xul/XULFrameElement.cpp:158
6 xul.dll mozilla::dom::FragmentOrElement::DestroyContent dom/base/FragmentOrElement.cpp:1217
7 xul.dll nsDocument::Destroy dom/base/nsDocument.cpp:7608
8 xul.dll nsDocumentViewer::Destroy layout/base/nsDocumentViewer.cpp:1804
9 xul.dll nsDocShell::Destroy docshell/base/nsDocShell.cpp:5046

=============================================================

this crash signature is increasing during the 65.0b cycle - it's mostly hitting 32bit builds on windows and looking security sensitive...
Huh, interesting that this first appeared between b2 and b3 (i.e. during the soft freeze at the end of the 65 nightly cycle). Ryan, are you aware of anything which landed late in the 65 nightly cycle which might be related to this?
Flags: needinfo?(rhunt)
This is likely caused by bug 1503655. I don't recall it landing during the soft-freeze, so it might be a latent issue that took time to surface.

I'll take a look at this crash.
Assignee: nobody → rhunt
Flags: needinfo?(rhunt)
Keywords: sec-high
This is weird. The crash is happening during the destruction of a windowless browser at shutdown. RenderFrame is crashing when accessing its pointer to the frame loader, but it is contained in a RefPtr so it should still be alive.

I guess maybe it's possible that the RenderFrame is junk somehow. The RenderFrame is stored directly in the TabParent, so that would imply that the TabParent is freed as well. I haven't been able to rule this out yet, because destruction is a bit hairy.
Group: layout-core-security → gfx-core-security

Hi Ryan, we've got about a week left in the Fx65 cycle. Did you have a chance to look into this more?

Flags: needinfo?(rhunt)

It seems likely that the renderframe itself has been deleted and written over with e5e5's

Interesting/unusual that most of the crashes are on about:blank (not all - and other about:'s show up (support, etc))

(In reply to Randell Jesup [:jesup] from comment #5)

It seems likely that the renderframe itself has been deleted and written
over with e5e5's

That implies that TabParent has been freed as well. I have not been able to figure out how that can be the case yet.. TabParent is refcounted, and nsFrameLoader doesn't hold a ref, but TabParent::ActorDestroy should always be called before delete (as IPDL delete clears a ref) which is what clears the ref in nsFrameLoader.

More interesting things (from a sampling of reports):

  1. They're all in XPCOM shutdown
  2. It's while an observer is running
  3. The native call is WindowlessBrowser::Close()

If there's crash reports not like that, I've missed them. It could be nice to know what script is running here.

Keeping ni?.

Still not sure what's going on. But I think this would be fairly easy to paper over if it's important enough. I wrote a quick patch to do it, if it passes try I'll put it up here.

Flags: needinfo?(rhunt)

This patch would paper over this crash as this call to RenderFrame::SetOwnerContent isn't necessary in this situation and can be detected.

I've done a small review of the code after this point in destruction, and it doesn't look like we'll dereference the garbage TabParent after this point. Not an expert here, so it could be possible.

I think solving the source of the problem is much better, but if we can't figure it out and this is urgent enough this might be an option.

Too late for a patch to go into the Fx65 release at this point, but we'll want to keep an eye on the crash rates post-release to see if we need to wallpaper it in a dot release.

It seems odd that the crash rate was higher in beta 65 than it is in release 65.
rhunt what do you think, is there anything you want to try here?

Flags: needinfo?(rhunt)

I actually cleaned up the ownership model for RenderFrame in bug 1500257. We no longer call RenderFrame:OwnerContentChanged inside of nsFrameLoader::SetOwnerContent, so this crash stack isn't possible anymore.

I'm hopeful that this might resolve the issue, but there's always the chance this just shifts the crash to elsewhere.

Flags: needinfo?(rhunt)

Oh, nice. That looks like a lot for beta uplift though so better to let that ride with 67.
Do you think it is worth it to try your papering-over for 66?

Flags: needinfo?(rhunt)

Yeah, looking at the crash statistics it seems worth a shot. Here's a more neutral version of the patch to paper over the crash.

Flags: needinfo?(rhunt)
Attachment #9044045 - Flags: review?(aosmond)
Attachment #9044045 - Flags: review?(aosmond) → review+
Priority: -- → P3

Comment on attachment 9044045 [details] [diff] [review]
owner-content-shutdown.patch

Security Approval Request

How easily could an exploit be constructed based on the patch?

Not easily. It shouldn't be obvious there could be a lifetime issue here as it looks like we're just avoiding some unnecessary graphics operations on tab close.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?

No

Which older supported branches are affected by this flaw?

None

If not all supported branches, which bug introduced the flaw?

Bug 1503655

Do you have backports for the affected branches?

No

If not, how different, hard to create, and risky will they be?

How likely is this patch to cause regressions; how much testing does it need?

Not likely, we're avoiding a no-op operation during tab close when there's a chance the TabParent could be bogus.

Attachment #9044045 - Flags: sec-approval?

(In reply to Ryan Hunt [:rhunt] from comment #15)

Which older supported branches are affected by this flaw?

None

The status flags say that Firefox 66 is affected and 67 is marked as "?" which means we aren't sure (?). Based on comment 11, it looks like 65 is affected as well. Can you clarify?

Flags: needinfo?(rhunt)

Ah yes, 65 is release which means that's affected. I was only thinking about ESR.

So,
Release, 65, is affected
Beta, 66, is affected
Central, 67, is not affected - fixed by patch that is probably too big to be uplifted
ESR, 60, is not affected - before the regression was introduced

Flags: needinfo?(rhunt)
Comment on attachment 9044045 [details] [diff] [review]
owner-content-shutdown.patch

Giving sec-approval+ and adding beta approval.
Attachment #9044045 - Flags: sec-approval?
Attachment #9044045 - Flags: sec-approval+
Attachment #9044045 - Flags: approval-mozilla-beta+
Blocks: 1503655
Group: gfx-core-security → core-security-release
Status: NEW → RESOLVED
Closed: 5 years ago
Depends on: 1500257
Resolution: --- → FIXED
Target Milestone: --- → mozilla67
Depends on: 1530165
Flags: qe-verify-
Whiteboard: [post-critsmash-triage]

It looks like the paper over patch just shifted the crash signature in Beta.

Crash Signature: [@ mozilla::layout::GetLayerManager] → [@ mozilla::layout::GetLayerManager] [@ mozilla::dom::TabParent::Destroy]
Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Ryan, are you still working on this one?

Flags: needinfo?(rhunt)

No I haven't taken a look further at this since trying to paper over it. If needed, I can try and write something upliftable from bug 1500257.

Flags: needinfo?(rhunt)

I'm a little worried about this crash volume - it's high for beta and could turn into a much higher volume on release.

Flags: needinfo?(rhunt)

I've been taking another deep look at this today. I'm confused on the introduction of this crash.

Looking at the crash-stats data [1], it looks like it first spiked in beta 65, but I can't find any crashes in nightly 65 and only two crashes in nightly 66. There should also be crashes for the first half of nightly 67 before bug 1500257 landed and removed this crash signature.

So maybe this crash only affects beta/release somehow?

If that's the case, I don't think any of the patches in bug 1500257 will have solved the underlying issue here. The problem is that the TabParent is being freed but there's still a dangling pointer in nsFrameLoader. The relevant patch in bug 1500257 would prevent us from crashing in GetLayerManager, but we would still crash later in TabParent::Destroy. Which is what we saw with the patch to paper over the crash.

This would be worrisome, because it would mean that we'll still have this issue when nightly 67 becomes beta 67.

[1] https://crash-stats.mozilla.com/signature/?product=Firefox&signature=mozilla%3A%3Alayout%3A%3AGetLayerManager&date=%3E%3D2018-09-06T05%3A00%3A00.000Z&date=%3C2019-03-06T05%3A00%3A00.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_columns=startup_crash&_sort=-date&page=1#summary

I think we can solve this issue by making nsFrameLoader hold a strong reference to the TabParent. I think this could cause a cycle though, so I believe we'd have to make TabParent participate in cycle collection.

I hacked together a patch to do this here [1]. The try run might blow up, I haven't done this before.

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=554c4b408d65f711e5fb665b9146c1744c4bf927

Flags: needinfo?(rhunt)

Looks green. I did retriggers for all the orange tests to make sure they weren't permafailing.

Currently TabParent is refcounted, but nsFrameLoader hold a weak pointer. The pointer
should be cleared out when the TabParent is destroyed, but that's a bit of a footgun
and it's not obvious that we always do this correctly.

Because TabParent holds a reference to a nsFrameLoader and the frame element (which
contains a nsFrameLoader), I think this means we need to cycle collect TabParent.

Comment on attachment 9049320 [details]
Bug 1516425 - Hold a strong reference to TabParent from nsFrameLoader and make TabParent cycle collected. r?smaug

Security Approval Request

  • How easily could an exploit be constructed based on the patch?: Not trivially. Someone could infer that there is a lifetime issue from the fact that we're making a weak reference strong, but it's not clear what triggers the issue (I don't even know).
  • Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: Unknown
  • Which older supported branches are affected by this flaw?: 65, 66
  • If not all supported branches, which bug introduced the flaw?: Unsure, I originally thought bug 1503655, but I'm not sure anymore.
  • Do you have backports for the affected branches?: Yes
  • If not, how different, hard to create, and risky will they be?: I'm assuming this patch can be rebased for the older branches easily, but if not it should be easy to backport.
  • How likely is this patch to cause regressions; how much testing does it need?: I think it's unlikely but possible. This patch adds a strong reference and then makes a class cycle collected to prevent the ref from causing a memory leak. This code is old, so unexpected things can happen.
Attachment #9049320 - Flags: sec-approval?

I think this looks a bit too scary for the last moment now that I look at the patch tbh.

We're too late in the cycle for this as we're out of betas and making release candidates. This can check in on April 2, two weeks into the 67 cycle.

Whiteboard: [post-critsmash-triage] → [post-critsmash-triage][checkin on 4/2/19]
Attachment #9049320 - Flags: sec-approval? → sec-approval+

(In reply to Liz Henry (:lizzard) (use needinfo) from comment #30)

I think this looks a bit too scary for the last moment now that I look at the patch tbh.

and looking at this some more, we've already taken one patch. Do we have to take this now because of the crash rate with the first patch?

Flags: needinfo?(lhenry)

No, because the first patch just kinda moved the crash to a new signature. It isn't really worse than it was originally. So, I just talked with Ryan and he will wait till early April to land the new patch on 68 in m-c, then request uplift to beta 67.

Flags: needinfo?(lhenry)

Ryan, want to land now?

Flags: needinfo?(rhunt)
Priority: P3 → P1
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
Target Milestone: mozilla67 → mozilla68

Please nominate this for Beta approval when you get a chance. It grafts cleanly as-landed.

Flags: needinfo?(rhunt)
Whiteboard: [post-critsmash-triage][checkin on 4/2/19] → [post-critsmash-triage]

Comment on attachment 9049320 [details]
Bug 1516425 - Hold a strong reference to TabParent from nsFrameLoader and make TabParent cycle collected. r?smaug

Beta/Release Uplift Approval Request

  • Feature/Bug causing the regression: Bug 1503655
  • User impact if declined: UAF crash during shutdown
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): It adds a strong reference to simplify lifetimes.
  • String changes made/needed:
Flags: needinfo?(rhunt)
Attachment #9049320 - Flags: approval-mozilla-beta?

Comment on attachment 9049320 [details]
Bug 1516425 - Hold a strong reference to TabParent from nsFrameLoader and make TabParent cycle collected. r?smaug

Uplift approved for 67 beta, thanks.

Attachment #9049320 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Whiteboard: [post-critsmash-triage] → [post-critsmash-triage][adv-main67+]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: