Crash in mozilla::layout::GetLayerManager
Categories
(Core :: Graphics: Layers, defect, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr60 | --- | unaffected |
firefox64 | --- | unaffected |
firefox65 | --- | wontfix |
firefox66 | + | wontfix |
firefox67 | + | fixed |
firefox68 | --- | fixed |
People
(Reporter: philipp, Assigned: rhunt)
References
Details
(4 keywords, Whiteboard: [post-critsmash-triage][adv-main67+])
Crash Data
Attachments
(3 files)
1.34 KB,
patch
|
Details | Diff | Splinter Review | |
985 bytes,
patch
|
aosmond
:
review+
abillings
:
approval-mozilla-beta+
abillings
:
sec-approval+
|
Details | Diff | Splinter Review |
47 bytes,
text/x-phabricator-request
|
pascalc
:
approval-mozilla-beta+
abillings
:
sec-approval+
|
Details | Review |
This bug was filed from the Socorro interface and is report bp-fd6e33b9-bd7e-4df3-8073-e02640181226. ============================================================= Top 10 frames of crashing thread: 0 xul.dll static struct already_AddRefed<mozilla::layers::LayerManager> mozilla::layout::GetLayerManager layout/ipc/RenderFrame.cpp:33 1 xul.dll mozilla::layout::RenderFrame::AttachLayerManager layout/ipc/RenderFrame.cpp:106 2 xul.dll mozilla::layout::RenderFrame::OwnerContentChanged layout/ipc/RenderFrame.cpp:124 3 xul.dll void nsFrameLoader::SetOwnerContent dom/base/nsFrameLoader.cpp:1748 4 xul.dll nsFrameLoader::StartDestroy dom/base/nsFrameLoader.cpp:1582 5 xul.dll mozilla::dom::XULFrameElement::DestroyContent dom/xul/XULFrameElement.cpp:158 6 xul.dll mozilla::dom::FragmentOrElement::DestroyContent dom/base/FragmentOrElement.cpp:1217 7 xul.dll nsDocument::Destroy dom/base/nsDocument.cpp:7608 8 xul.dll nsDocumentViewer::Destroy layout/base/nsDocumentViewer.cpp:1804 9 xul.dll nsDocShell::Destroy docshell/base/nsDocShell.cpp:5046 ============================================================= this crash signature is increasing during the 65.0b cycle - it's mostly hitting 32bit builds on windows and looking security sensitive...
Comment 1•5 years ago
|
||
Huh, interesting that this first appeared between b2 and b3 (i.e. during the soft freeze at the end of the 65 nightly cycle). Ryan, are you aware of anything which landed late in the 65 nightly cycle which might be related to this?
Assignee | ||
Comment 2•5 years ago
|
||
This is likely caused by bug 1503655. I don't recall it landing during the soft-freeze, so it might be a latent issue that took time to surface. I'll take a look at this crash.
Assignee | ||
Comment 3•5 years ago
|
||
This is weird. The crash is happening during the destruction of a windowless browser at shutdown. RenderFrame is crashing when accessing its pointer to the frame loader, but it is contained in a RefPtr so it should still be alive. I guess maybe it's possible that the RenderFrame is junk somehow. The RenderFrame is stored directly in the TabParent, so that would imply that the TabParent is freed as well. I haven't been able to rule this out yet, because destruction is a bit hairy.
Updated•5 years ago
|
Updated•5 years ago
|
Comment 4•5 years ago
|
||
Hi Ryan, we've got about a week left in the Fx65 cycle. Did you have a chance to look into this more?
Comment 5•5 years ago
|
||
It seems likely that the renderframe itself has been deleted and written over with e5e5's
Comment 6•5 years ago
|
||
Interesting/unusual that most of the crashes are on about:blank (not all - and other about:'s show up (support, etc))
Assignee | ||
Comment 7•5 years ago
|
||
(In reply to Randell Jesup [:jesup] from comment #5)
It seems likely that the renderframe itself has been deleted and written
over with e5e5's
That implies that TabParent has been freed as well. I have not been able to figure out how that can be the case yet.. TabParent is refcounted, and nsFrameLoader doesn't hold a ref, but TabParent::ActorDestroy should always be called before delete (as IPDL delete clears a ref) which is what clears the ref in nsFrameLoader.
More interesting things (from a sampling of reports):
- They're all in XPCOM shutdown
- It's while an observer is running
- The native call is WindowlessBrowser::Close()
If there's crash reports not like that, I've missed them. It could be nice to know what script is running here.
Keeping ni?.
Assignee | ||
Comment 8•5 years ago
|
||
Still not sure what's going on. But I think this would be fairly easy to paper over if it's important enough. I wrote a quick patch to do it, if it passes try I'll put it up here.
Assignee | ||
Comment 9•5 years ago
|
||
This patch would paper over this crash as this call to RenderFrame::SetOwnerContent isn't necessary in this situation and can be detected.
I've done a small review of the code after this point in destruction, and it doesn't look like we'll dereference the garbage TabParent after this point. Not an expert here, so it could be possible.
I think solving the source of the problem is much better, but if we can't figure it out and this is urgent enough this might be an option.
Comment 10•5 years ago
|
||
Too late for a patch to go into the Fx65 release at this point, but we'll want to keep an eye on the crash rates post-release to see if we need to wallpaper it in a dot release.
Comment 11•5 years ago
|
||
It seems odd that the crash rate was higher in beta 65 than it is in release 65.
rhunt what do you think, is there anything you want to try here?
Assignee | ||
Comment 12•5 years ago
|
||
I actually cleaned up the ownership model for RenderFrame in bug 1500257. We no longer call RenderFrame:OwnerContentChanged inside of nsFrameLoader::SetOwnerContent, so this crash stack isn't possible anymore.
I'm hopeful that this might resolve the issue, but there's always the chance this just shifts the crash to elsewhere.
Comment 13•5 years ago
|
||
Oh, nice. That looks like a lot for beta uplift though so better to let that ride with 67.
Do you think it is worth it to try your papering-over for 66?
Assignee | ||
Comment 14•5 years ago
|
||
Yeah, looking at the crash statistics it seems worth a shot. Here's a more neutral version of the patch to paper over the crash.
Updated•5 years ago
|
Updated•5 years ago
|
Assignee | ||
Comment 15•5 years ago
|
||
Comment on attachment 9044045 [details] [diff] [review]
owner-content-shutdown.patch
Security Approval Request
How easily could an exploit be constructed based on the patch?
Not easily. It shouldn't be obvious there could be a lifetime issue here as it looks like we're just avoiding some unnecessary graphics operations on tab close.
Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?
No
Which older supported branches are affected by this flaw?
None
If not all supported branches, which bug introduced the flaw?
Do you have backports for the affected branches?
No
If not, how different, hard to create, and risky will they be?
How likely is this patch to cause regressions; how much testing does it need?
Not likely, we're avoiding a no-op operation during tab close when there's a chance the TabParent could be bogus.
Comment 16•5 years ago
|
||
(In reply to Ryan Hunt [:rhunt] from comment #15)
Which older supported branches are affected by this flaw?
None
The status flags say that Firefox 66 is affected and 67 is marked as "?" which means we aren't sure (?). Based on comment 11, it looks like 65 is affected as well. Can you clarify?
Assignee | ||
Comment 17•5 years ago
|
||
Ah yes, 65 is release which means that's affected. I was only thinking about ESR.
So,
Release, 65, is affected
Beta, 66, is affected
Central, 67, is not affected - fixed by patch that is probably too big to be uplifted
ESR, 60, is not affected - before the regression was introduced
Updated•5 years ago
|
Comment 18•5 years ago
|
||
Comment on attachment 9044045 [details] [diff] [review] owner-content-shutdown.patch Giving sec-approval+ and adding beta approval.
Comment 19•5 years ago
|
||
uplift |
https://hg.mozilla.org/releases/mozilla-beta/rev/820b728e1ebe
Also, calling this fixed for 67 via bug 1500257.
Updated•5 years ago
|
Assignee | ||
Comment 21•5 years ago
|
||
It looks like the paper over patch just shifted the crash signature in Beta.
Reporter | ||
Updated•5 years ago
|
Assignee | ||
Comment 23•5 years ago
|
||
No I haven't taken a look further at this since trying to paper over it. If needed, I can try and write something upliftable from bug 1500257.
Comment 24•5 years ago
|
||
I'm a little worried about this crash volume - it's high for beta and could turn into a much higher volume on release.
Assignee | ||
Comment 25•5 years ago
|
||
I've been taking another deep look at this today. I'm confused on the introduction of this crash.
Looking at the crash-stats data [1], it looks like it first spiked in beta 65, but I can't find any crashes in nightly 65 and only two crashes in nightly 66. There should also be crashes for the first half of nightly 67 before bug 1500257 landed and removed this crash signature.
So maybe this crash only affects beta/release somehow?
If that's the case, I don't think any of the patches in bug 1500257 will have solved the underlying issue here. The problem is that the TabParent is being freed but there's still a dangling pointer in nsFrameLoader. The relevant patch in bug 1500257 would prevent us from crashing in GetLayerManager
, but we would still crash later in TabParent::Destroy
. Which is what we saw with the patch to paper over the crash.
This would be worrisome, because it would mean that we'll still have this issue when nightly 67 becomes beta 67.
Assignee | ||
Comment 26•5 years ago
|
||
I think we can solve this issue by making nsFrameLoader hold a strong reference to the TabParent. I think this could cause a cycle though, so I believe we'd have to make TabParent participate in cycle collection.
I hacked together a patch to do this here [1]. The try run might blow up, I haven't done this before.
[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=554c4b408d65f711e5fb665b9146c1744c4bf927
Assignee | ||
Comment 27•5 years ago
|
||
Looks green. I did retriggers for all the orange tests to make sure they weren't permafailing.
Assignee | ||
Comment 28•5 years ago
|
||
Currently TabParent is refcounted, but nsFrameLoader hold a weak pointer. The pointer
should be cleared out when the TabParent is destroyed, but that's a bit of a footgun
and it's not obvious that we always do this correctly.
Because TabParent holds a reference to a nsFrameLoader and the frame element (which
contains a nsFrameLoader), I think this means we need to cycle collect TabParent.
Assignee | ||
Comment 29•5 years ago
|
||
Comment on attachment 9049320 [details]
Bug 1516425 - Hold a strong reference to TabParent from nsFrameLoader and make TabParent cycle collected. r?smaug
Security Approval Request
- How easily could an exploit be constructed based on the patch?: Not trivially. Someone could infer that there is a lifetime issue from the fact that we're making a weak reference strong, but it's not clear what triggers the issue (I don't even know).
- Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: Unknown
- Which older supported branches are affected by this flaw?: 65, 66
- If not all supported branches, which bug introduced the flaw?: Unsure, I originally thought bug 1503655, but I'm not sure anymore.
- Do you have backports for the affected branches?: Yes
- If not, how different, hard to create, and risky will they be?: I'm assuming this patch can be rebased for the older branches easily, but if not it should be easy to backport.
- How likely is this patch to cause regressions; how much testing does it need?: I think it's unlikely but possible. This patch adds a strong reference and then makes a class cycle collected to prevent the ref from causing a memory leak. This code is old, so unexpected things can happen.
Comment 30•5 years ago
|
||
I think this looks a bit too scary for the last moment now that I look at the patch tbh.
Comment 31•5 years ago
|
||
We're too late in the cycle for this as we're out of betas and making release candidates. This can check in on April 2, two weeks into the 67 cycle.
Updated•5 years ago
|
Comment 32•5 years ago
|
||
(In reply to Liz Henry (:lizzard) (use needinfo) from comment #30)
I think this looks a bit too scary for the last moment now that I look at the patch tbh.
and looking at this some more, we've already taken one patch. Do we have to take this now because of the crash rate with the first patch?
Comment 33•5 years ago
|
||
No, because the first patch just kinda moved the crash to a new signature. It isn't really worse than it was originally. So, I just talked with Ryan and he will wait till early April to land the new patch on 68 in m-c, then request uplift to beta 67.
Updated•5 years ago
|
Assignee | ||
Comment 35•5 years ago
|
||
Yup, pushed this earlier today.
https://hg.mozilla.org/integration/mozilla-inbound/rev/56c9e5b7f7ae99d43c874cc03e96e65ba1cfe6f2
Comment 36•5 years ago
|
||
Comment 37•5 years ago
|
||
Please nominate this for Beta approval when you get a chance. It grafts cleanly as-landed.
Assignee | ||
Comment 38•5 years ago
|
||
Comment on attachment 9049320 [details]
Bug 1516425 - Hold a strong reference to TabParent from nsFrameLoader and make TabParent cycle collected. r?smaug
Beta/Release Uplift Approval Request
- Feature/Bug causing the regression: Bug 1503655
- User impact if declined: UAF crash during shutdown
- Is this code covered by automated tests?: No
- Has the fix been verified in Nightly?: No
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): It adds a strong reference to simplify lifetimes.
- String changes made/needed:
Comment 39•5 years ago
|
||
Comment on attachment 9049320 [details]
Bug 1516425 - Hold a strong reference to TabParent from nsFrameLoader and make TabParent cycle collected. r?smaug
Uplift approved for 67 beta, thanks.
Comment 40•5 years ago
|
||
uplift |
Updated•5 years ago
|
Updated•4 years ago
|
Description
•