Closed Bug 1354933 Opened 7 years ago Closed 4 years ago

assert frame pointer is null in view destructor

Categories

(Core :: Web Painting, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: tnikkel, Assigned: tnikkel)

References

Details

Attachments

(5 files, 1 obsolete file)

      No description provided.
Attached patch patch (obsolete) — Splinter Review
We shouldn't be hitting this, but just in case.
Attachment #8856283 - Flags: review?(mats)
If nsView::mFrame is non-null in ~nsView() then I don't think we can assume
anything about its liveness, or even if the shell is alive, so this might
introduce an UAF?

Given that we have this:
http://searchfox.org/mozilla-central/rev/eace920a0372051a11d8d275bd9b8f14f3024ecd/layout/generic/nsFrame.cpp#786
we should be able to MOZ_RELEASE_ASSERT(!mFrame) instead, no?
(In reply to Mats Palmgren (:mats) from comment #2)
> If nsView::mFrame is non-null in ~nsView() then I don't think we can assume
> anything about its liveness, or even if the shell is alive, so this might
> introduce an UAF?

If we make the following assumptions:
1) nsFrame::DestroyFrom is called for every frame before it goes away.
2) for a given nsView* v, if v->mFrame is not null then v->mFrame->GetView() == v.
Then we can assume that nsView::mFrame is always a valid pointer (or null) as long as it was valid when we set it. No?

> Given that we have this:
> http://searchfox.org/mozilla-central/rev/
> eace920a0372051a11d8d275bd9b8f14f3024ecd/layout/generic/nsFrame.cpp#786
> we should be able to MOZ_RELEASE_ASSERT(!mFrame) instead, no?

That handles the case where the frame goes away before the view. But what if the view goes away before the frame? Non-root views are effectively owned by the frame, so their lifetime should be strictly a subset of the lifetime of the associated frame. But root views are special in this regard. For example, the root view exists before the root frame. It is created here

https://dxr.mozilla.org/mozilla-central/rev/2a3ecdb7d1ea814708021fee6735b3aedcf03e48/layout/base/nsDocumentViewer.cpp#2457

and then later when the root frame is constructed it hooks into the root view.

https://dxr.mozilla.org/mozilla-central/rev/2a3ecdb7d1ea814708021fee6735b3aedcf03e48/layout/base/nsCSSFrameConstructor.cpp#2737
(In reply to Timothy Nikkel (:tnikkel) from comment #3)
> If we make the following assumptions:
> 1) nsFrame::DestroyFrom is called for every frame before it goes away.

Yes, I'm quite confident that's true today, but we've had bugs in the past
where frames leaked and weren't Destroyed properly.

> 2) for a given nsView* v, if v->mFrame is not null then v->mFrame->GetView()
> == v.

That should also hold, but I'm slightly less confident about this one.
(just my gut feeling, fwiw)

> Then we can assume that nsView::mFrame is always a valid pointer (or null)
> as long as it was valid when we set it. No?

Well, we have the "detached view" business for nsSubDocumentFrames...
which I suspect is buggy (or at least trips some other bug) based on
crash reports.

But yes, as long as the nsView is alive then the above should hold.
If nsView::Destroy() is called twice on the same view, then not so much.

> > we should be able to MOZ_RELEASE_ASSERT(!mFrame) instead, no?
> 
> That handles the case where the frame goes away before the view. But what if
> the view goes away before the frame?

I think it does the opposite of what you said - it detects the case where
the view goes away before the frame.  If the frame is Destroy'ed first,
then mFrame is null due to the line I linked above.

Presumably you want to add this to catch some bug and in that case I'm
a bit nervous about what mFrame might point to given that nsView *isn't*
allocated from the pres arena.  We do poison it in nsView::Destroy() but
that only lasts until the memory is reallocated for something else.

I don't think we should support calling nsView::Destroy() before at least
unhooking its frame, so I think it's better to try to catch any such
instances with a MOZ_RELEASE_ASSERT(!mFrame) rather than try to handle it.
It would also safely crash any attempt to double-Destroy a view (assuming
mFrame is some non-null value).
We could also add assertions in nsView::SetFrame (mFrame should be null)
and in nsIFrame::SetView (that NS_FRAME_HAS_VIEW isn't set).
Okay, let's assert its null and go from there.
Attachment #8856283 - Attachment is obsolete: true
Attachment #8856283 - Flags: review?(mats)
Attachment #8858744 - Flags: review?(mats)
Attachment #8858744 - Flags: review?(mats) → review+
Pushed by tnikkel@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/2c6ecadd506b
Assert that the frame pointer is null in the nsView destructor. r=mats
https://hg.mozilla.org/mozilla-central/rev/2c6ecadd506b
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Summary: clear view pointer of frame in view destructor → assert frame pointer is null in view destructor
We had two crashes on this assert already
https://crash-stats.mozilla.com/report/index/f92762ae-89e0-4683-bf97-53c310170420
https://crash-stats.mozilla.com/report/index/1be00e09-ad17-4466-9ecd-700440170420

Looks like two different users (at least different processors). One of them appears to have no addons installed except ones that come with an fresh install. Uptime was 50 minutes and 2 hours.

Both of them are from the nsDocumentViewer destructor. We are destroying the view manager and we start by destroying the root view, which destroys its children. We assert when we are destroying a child of the root view. The root view could also have a non-null frame, we just hit the fatal assert for a child view first before getting to the assert for the root view.

If the nsDocumentViewer has a presshell then it should be calling destroy on that presshell, which should be destroying the root frame, which should destroy all the frames, which should not only be clearing the frame pointer on all the views, it should be destroying the views, and destroying the root view would clear the mRootView pointer on the viewmanager.

I'll back this out since it won't provide any more useful data. And then think about how to figure out what is happening here.
back out
https://hg.mozilla.org/integration/mozilla-inbound/rev/5ba7199ef8388481eff4f519c84ec2f61619bd01
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla55 → ---
This crash is slight different
https://crash-stats.mozilla.com/report/index/1e2823ef-b727-40ad-ba3c-8b4610170421
We're flushing in some presshell, and we call nsHideViewer::Run() for a subdocument frame. That calls nsDocumentViewer::Hide which also destroys the presshell first, and then drops its viewmanager reference. This time we hit the assert on the root view though.
Hmm, actually there is something interesting about the crashes coming from ~nsDocumentViewer. The release of the viewmanager that triggers its destruction is from ~nsDocumentViewer, not from nsDocumentViewer::Destroy which also clears the mViewManager pointer. So if we called Destroy (and it completed successfully) then the view manager would already be dead here. This is assuming that the stack is accurate; nsDocumentViewer::Destroy seems far too big to inline, and it's called from many more places.

If Destroy didn't complete successfully then it must have hit one of the two printing related early exits at the start of the function. If we didn't call Destroy from ~nsDocumentViewer (because there was no presshell) then we must have called destroy on the presshell before this (we must have had a presshell because we have frames). I checked all the places where mPresShell is changed to/from null and it looks like they all destroy the presshell.

The one crash coming from nsDocumentViewer::Hide must definitely call destroy on the presshell because we null check the presshell, and call DestroyPresShell directly. So that one doesn't have many leads to go on.
Attachment #8858744 - Attachment description: patch v2 → assert frame is null in view destructor
None of this hit on a full try run. And are the only ways I could see the observed asserts happening.
Attachment #8862322 - Flags: review?(mats)
Comment on attachment 8862320 [details] [diff] [review]
annotate when we hit the assert if it's print related

Could use a better commit message :-)
Attachment #8862320 - Flags: review?(mats) → review+
Comment on attachment 8862322 [details] [diff] [review]
add a bunch of asserts for document teardown

Could use a better commit message :-)

>layout/base/PresShell.cpp
> PresShell::Destroy()
>+      (!mViewManager->GetRootView()->GetFrame() && !mViewManager->GetRootView()->GetFirstChild()));

wrap after the && please

>layout/base/nsDocumentViewer.cpp
>+  bool mPresShellDestroyed;
>+  bool mDestroyWasFull;

Perhaps we should add "// use only for assertions" on these?

>+  bool vmThinksWeHaveARootFrame =
>+    mViewManager && mViewManager->GetRootView() && mViewManager->GetRootView()->GetFrame();
>+  bool psThinksWeHaveARootFrame = mPresShell && mPresShell->GetRootFrame();
>+  MOZ_RELEASE_ASSERT(vmThinksWeHaveARootFrame == psThinksWeHaveARootFrame);

Can't we make this stronger by asserting that 
mViewManager->GetRootView()->GetFrame() == mPresShell->GetRootFrame() ?

>   NS_ASSERTION(!mPresShell && !mPresContext,
>                "User did not call nsIContentViewer::Destroy");
>+  MOZ_RELEASE_ASSERT(!mPresShell && !mPresContext);

Does this really hold?  We have bug 827183 open on that assertion.

>+  MOZ_RELEASE_ASSERT(!mViewManager || !mViewManager->GetRootView() ||
>+    (!mViewManager->GetRootView()->GetFrame() && !mViewManager->GetRootView()->GetFirstChild()));

wrap after the && please

> nsDocumentViewer::DestroyPresShell()
> {
>+  bool vmThinksWeHaveARootFrame =
>+    mViewManager && mViewManager->GetRootView() && mViewManager->GetRootView()->GetFrame();
>+  bool psThinksWeHaveARootFrame = mPresShell && mPresShell->GetRootFrame();
>+  MOZ_RELEASE_ASSERT(vmThinksWeHaveARootFrame == psThinksWeHaveARootFrame);

As noted above, can we make this stronger too?

>+  MOZ_RELEASE_ASSERT(!mViewManager || !mViewManager->GetRootView() ||
>+    (!mViewManager->GetRootView()->GetFrame() && !mViewManager->GetRootView()->GetFirstChild()));

wrap after the && please

r+ on the condition that bug 827183 is already fixed somehow
Flags: needinfo?(tnikkel)
Attachment #8862322 - Flags: review?(mats) → review+
(In reply to Mats Palmgren (:mats) from comment #17)
> >+  MOZ_RELEASE_ASSERT(!mPresShell && !mPresContext);
> 
> Does this really hold?  We have bug 827183 open on that assertion.

Ah! I didn't know about that bug. We don't hit the assert on a full platform full try run, but I will try to reproduce with the testcases in that bug. I might remove that particular assert before landing even if I'm not able to reproduce since we know it was possible to reproduce at one time. If the other asserts turn up nothing we can return to this. But first I'll look at bug 827183.
Flags: needinfo?(tnikkel)
(In reply to Timothy Nikkel (:tnikkel) from comment #18)
> (In reply to Mats Palmgren (:mats) from comment #17)
> > >+  MOZ_RELEASE_ASSERT(!mPresShell && !mPresContext);
> > 
> > Does this really hold?  We have bug 827183 open on that assertion.
> 
> Ah! I didn't know about that bug. We don't hit the assert on a full platform
> full try run, but I will try to reproduce with the testcases in that bug. I
> might remove that particular assert before landing even if I'm not able to
> reproduce since we know it was possible to reproduce at one time. If the
> other asserts turn up nothing we can return to this. But first I'll look at
> bug 827183.

This does indeed still reproduce. We should fix that! I'm guessing it could be causing some of these crashes.

Also, sorry about the lack of commit msg, I just wanted to get it up for review last night.
Keywords: leave-open
(In reply to Mats Palmgren (:mats) from comment #17)
> Can't we make this stronger by asserting that 
> mViewManager->GetRootView()->GetFrame() == mPresShell->GetRootFrame() ?

Yes! That's much nicer.

Changed the patch according to the rest of your review comments.
Pushed by tnikkel@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/497dbf087ad5
Add annotation if the view is for a printing document if the view has a frame in its destructor. r=mats
(In reply to Pulsebot from comment #21)
> Pushed by tnikkel@gmail.com:
> https://hg.mozilla.org/integration/mozilla-inbound/rev/497dbf087ad5
> Add annotation if the view is for a printing document if the view has a
> frame in its destructor. r=mats

Just gonna land this patch for now because the other asserts (if we hit them) would interfere with getting proper data from this patch.
(In reply to Timothy Nikkel (:tnikkel) from comment #19)
> > Ah! I didn't know about that bug. We don't hit the assert on a full platform
> > full try run, but I will try to reproduce with the testcases in that bug. I
> > might remove that particular assert before landing even if I'm not able to
> > reproduce since we know it was possible to reproduce at one time. If the
> > other asserts turn up nothing we can return to this. But first I'll look at
> > bug 827183.
> 
> This does indeed still reproduce. We should fix that! I'm guessing it could
> be causing some of these crashes.

Hmm, probably not. It seems we handle this okay.
One crash so far and it was marked as print related
https://crash-stats.mozilla.com/report/index/bae3c446-b330-4978-a341-569c90170515
This one has nsPagePrintTimer on the crash stack, and the annotation has printRelated as true as expected
https://crash-stats.mozilla.com/report/index/ccf5f861-82b2-4fc4-b5b1-0f4210170515
https://crash-stats.mozilla.com/report/index/4da1b1a1-f20a-4c79-8ed6-2d2990170601

printRelated is true. This one happens in a build with bug 1354443 (the first such crash), I was hoping that bug would fix this.
Pushed by tnikkel@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/43af422173e3
Add some asserts for document viewer teardown. r=mats
in ~nsDocumentViewer the Destroy call does not full complete in this crash, the only two reasons for that are print related.
https://crash-stats.mozilla.com/report/index/67c82ae9-2fb4-41c9-baa3-ace410170615
This one is interesting
https://crash-stats.mozilla.com/report/index/d1a881f7-a92b-4bf3-a4b2-105200170618

We hit the assert in the nsView destructor that we have no child views (of a root view). But we will look we went down the full destroy path in the document viewer, so it should have torn down all child views when it destroyed the frame tree. None of the other asserts are hit before we get here, which is the interesting part.
Depends on: 1375440
Comment on attachment 8880547 [details] [diff] [review]
backed_out_changeset_43af422173e3

Okay, let's back this out of beta. We're not going to get any more information by leaving this in but we will cause crashes. I still want to know how we hit the assert in comment 32 without triggering any other asserts but I don't have any good plan of attack for that now.

Approval Request Comment
[Feature/Bug causing the regression]: back out of diagnostic asserts
[User impact if declined]: crashes when the diagnostic asserts hit
[Is this code covered by automated tests?]: no
[Has the fix been verified in Nightly?]: n/a
[Needs manual test from QE? If yes, steps to reproduce]: no
[List of other uplifts needed for the feature/fix]: none
[Is the change risky?]: no
[Why is the change risky/not risky?]: removal of diagnostic asserts
[String changes made/needed]: none
Attachment #8880547 - Flags: approval-mozilla-beta?
This data point could be interesting:
(62.07% in signature vs 00.14% overall) GFX_ERROR "ClientLayerManager::BeginTransaction with IPC channel down. GPU process may have died." = true [100.0% vs 45.45% if adapter_driver_version = 10.18.10.3740]
Comment on attachment 8880547 [details] [diff] [review]
backed_out_changeset_43af422173e3

backout some asserts that fire on beta, let's get this in 55.0b5
Attachment #8880547 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment on attachment 8880547 [details] [diff] [review]
backed_out_changeset_43af422173e3

Landed on Beta, but leaving the status unchanged since "fixed" would be a bit misleading :)
https://hg.mozilla.org/releases/mozilla-beta/rev/f018f0849a04
Attachment #8880547 - Flags: checkin+
MOZ_RELEASE_ASSERT(!GetFirstChild()):

bp-870249c7-900d-4c15-b5ff-01adb0170625

Randomly happened, no STR, stylo enabled.
See Also: → 1379159
We're not learning anything from further instances of hitting these asserts lets remove them. And we're hitting them fairly often (bug 1379159).

Approval Request Comment
[Feature/Bug causing the regression]: back out of diagnostic asserts
[User impact if declined]: crashes when the diagnostic asserts hit
[Is this code covered by automated tests?]: no
[Has the fix been verified in Nightly?]: n/a
[Needs manual test from QE? If yes, steps to reproduce]: no
[List of other uplifts needed for the feature/fix]: none
[Is the change risky?]: no
[Why is the change risky/not risky?]: removal of diagnostic asserts
[String changes made/needed]: none
Attachment #8884398 - Flags: approval-mozilla-beta?
Comment on attachment 8884398 [details] [diff] [review]
backed_out_changeset_497dbf087ad5

remove assertions, beta55+
Attachment #8884398 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Component: Layout: View Rendering → Layout: Web Painting

The leave-open keyword is there and there is no activity for 6 months.
:tnikkel, maybe it's time to close this bug?

Flags: needinfo?(tnikkel)
Pushed by tnikkel@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/14009f721da2
Assert that the frame pointer is null in the nsView destructor. r=mats

Thanks for the reminder. We've fixed some bugs in this area, maybe we can land this now. We'll have to watch crash stats after this lands.

Flags: needinfo?(tnikkel)

The leave-open keyword is there and there is no activity for 6 months.
:tnikkel, maybe it's time to close this bug?

Flags: needinfo?(tnikkel)

The leave-open keyword is there and there is no activity for 6 months.
:tnikkel, maybe it's time to close this bug?

Flags: needinfo?(tnikkel)

The leave-open keyword is there and there is no activity for 6 months.
:tnikkel, maybe it's time to close this bug?

Flags: needinfo?(tnikkel)

I guess we don't see these crashes anymore?

Status: REOPENED → RESOLVED
Closed: 7 years ago4 years ago
Flags: needinfo?(tnikkel)
Resolution: --- → FIXED
Keywords: leave-open
You need to log in before you can comment on or make changes to this bug.