Closed
Bug 1354933
Opened 8 years ago
Closed 4 years ago
assert frame pointer is null in view destructor
Categories
(Core :: Web Painting, defect)
Core
Web Painting
Tracking
()
RESOLVED
FIXED
People
(Reporter: tnikkel, Assigned: tnikkel)
References
Details
Attachments
(5 files, 1 obsolete file)
707 bytes,
patch
|
MatsPalmgren_bugz
:
review+
|
Details | Diff | Splinter Review |
11.94 KB,
patch
|
MatsPalmgren_bugz
:
review+
|
Details | Diff | Splinter Review |
7.14 KB,
patch
|
MatsPalmgren_bugz
:
review+
|
Details | Diff | Splinter Review |
7.18 KB,
patch
|
jcristau
:
approval-mozilla-beta+
RyanVM
:
checkin+
|
Details | Diff | Splinter Review |
12.25 KB,
patch
|
jcristau
:
approval-mozilla-beta+
RyanVM
:
checkin+
|
Details | Diff | Splinter Review |
No description provided.
Assignee | ||
Comment 1•8 years ago
|
||
We shouldn't be hitting this, but just in case.
Attachment #8856283 -
Flags: review?(mats)
Comment 2•8 years ago
|
||
If nsView::mFrame is non-null in ~nsView() then I don't think we can assume
anything about its liveness, or even if the shell is alive, so this might
introduce an UAF?
Given that we have this:
http://searchfox.org/mozilla-central/rev/eace920a0372051a11d8d275bd9b8f14f3024ecd/layout/generic/nsFrame.cpp#786
we should be able to MOZ_RELEASE_ASSERT(!mFrame) instead, no?
Assignee | ||
Comment 3•8 years ago
|
||
(In reply to Mats Palmgren (:mats) from comment #2)
> If nsView::mFrame is non-null in ~nsView() then I don't think we can assume
> anything about its liveness, or even if the shell is alive, so this might
> introduce an UAF?
If we make the following assumptions:
1) nsFrame::DestroyFrom is called for every frame before it goes away.
2) for a given nsView* v, if v->mFrame is not null then v->mFrame->GetView() == v.
Then we can assume that nsView::mFrame is always a valid pointer (or null) as long as it was valid when we set it. No?
> Given that we have this:
> http://searchfox.org/mozilla-central/rev/
> eace920a0372051a11d8d275bd9b8f14f3024ecd/layout/generic/nsFrame.cpp#786
> we should be able to MOZ_RELEASE_ASSERT(!mFrame) instead, no?
That handles the case where the frame goes away before the view. But what if the view goes away before the frame? Non-root views are effectively owned by the frame, so their lifetime should be strictly a subset of the lifetime of the associated frame. But root views are special in this regard. For example, the root view exists before the root frame. It is created here
https://dxr.mozilla.org/mozilla-central/rev/2a3ecdb7d1ea814708021fee6735b3aedcf03e48/layout/base/nsDocumentViewer.cpp#2457
and then later when the root frame is constructed it hooks into the root view.
https://dxr.mozilla.org/mozilla-central/rev/2a3ecdb7d1ea814708021fee6735b3aedcf03e48/layout/base/nsCSSFrameConstructor.cpp#2737
Comment 4•8 years ago
|
||
(In reply to Timothy Nikkel (:tnikkel) from comment #3)
> If we make the following assumptions:
> 1) nsFrame::DestroyFrom is called for every frame before it goes away.
Yes, I'm quite confident that's true today, but we've had bugs in the past
where frames leaked and weren't Destroyed properly.
> 2) for a given nsView* v, if v->mFrame is not null then v->mFrame->GetView()
> == v.
That should also hold, but I'm slightly less confident about this one.
(just my gut feeling, fwiw)
> Then we can assume that nsView::mFrame is always a valid pointer (or null)
> as long as it was valid when we set it. No?
Well, we have the "detached view" business for nsSubDocumentFrames...
which I suspect is buggy (or at least trips some other bug) based on
crash reports.
But yes, as long as the nsView is alive then the above should hold.
If nsView::Destroy() is called twice on the same view, then not so much.
> > we should be able to MOZ_RELEASE_ASSERT(!mFrame) instead, no?
>
> That handles the case where the frame goes away before the view. But what if
> the view goes away before the frame?
I think it does the opposite of what you said - it detects the case where
the view goes away before the frame. If the frame is Destroy'ed first,
then mFrame is null due to the line I linked above.
Presumably you want to add this to catch some bug and in that case I'm
a bit nervous about what mFrame might point to given that nsView *isn't*
allocated from the pres arena. We do poison it in nsView::Destroy() but
that only lasts until the memory is reallocated for something else.
I don't think we should support calling nsView::Destroy() before at least
unhooking its frame, so I think it's better to try to catch any such
instances with a MOZ_RELEASE_ASSERT(!mFrame) rather than try to handle it.
It would also safely crash any attempt to double-Destroy a view (assuming
mFrame is some non-null value).
Comment 5•8 years ago
|
||
We could also add assertions in nsView::SetFrame (mFrame should be null)
and in nsIFrame::SetView (that NS_FRAME_HAS_VIEW isn't set).
Assignee | ||
Comment 6•8 years ago
|
||
Okay, let's assert its null and go from there.
Attachment #8856283 -
Attachment is obsolete: true
Attachment #8856283 -
Flags: review?(mats)
Attachment #8858744 -
Flags: review?(mats)
Updated•8 years ago
|
Attachment #8858744 -
Flags: review?(mats) → review+
Pushed by tnikkel@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/2c6ecadd506b
Assert that the frame pointer is null in the nsView destructor. r=mats
Comment 8•8 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Assignee | ||
Updated•8 years ago
|
Summary: clear view pointer of frame in view destructor → assert frame pointer is null in view destructor
Assignee | ||
Comment 9•8 years ago
|
||
We had two crashes on this assert already
https://crash-stats.mozilla.com/report/index/f92762ae-89e0-4683-bf97-53c310170420
https://crash-stats.mozilla.com/report/index/1be00e09-ad17-4466-9ecd-700440170420
Looks like two different users (at least different processors). One of them appears to have no addons installed except ones that come with an fresh install. Uptime was 50 minutes and 2 hours.
Both of them are from the nsDocumentViewer destructor. We are destroying the view manager and we start by destroying the root view, which destroys its children. We assert when we are destroying a child of the root view. The root view could also have a non-null frame, we just hit the fatal assert for a child view first before getting to the assert for the root view.
If the nsDocumentViewer has a presshell then it should be calling destroy on that presshell, which should be destroying the root frame, which should destroy all the frames, which should not only be clearing the frame pointer on all the views, it should be destroying the views, and destroying the root view would clear the mRootView pointer on the viewmanager.
I'll back this out since it won't provide any more useful data. And then think about how to figure out what is happening here.
Assignee | ||
Comment 10•8 years ago
|
||
back out
https://hg.mozilla.org/integration/mozilla-inbound/rev/5ba7199ef8388481eff4f519c84ec2f61619bd01
Status: RESOLVED → REOPENED
status-firefox55:
fixed → ---
Resolution: FIXED → ---
Target Milestone: mozilla55 → ---
Assignee | ||
Comment 11•8 years ago
|
||
This crash is slight different
https://crash-stats.mozilla.com/report/index/1e2823ef-b727-40ad-ba3c-8b4610170421
We're flushing in some presshell, and we call nsHideViewer::Run() for a subdocument frame. That calls nsDocumentViewer::Hide which also destroys the presshell first, and then drops its viewmanager reference. This time we hit the assert on the root view though.
Assignee | ||
Comment 12•8 years ago
|
||
back out made it to central
https://hg.mozilla.org/mozilla-central/rev/5ba7199ef838
Assignee | ||
Comment 13•8 years ago
|
||
Hmm, actually there is something interesting about the crashes coming from ~nsDocumentViewer. The release of the viewmanager that triggers its destruction is from ~nsDocumentViewer, not from nsDocumentViewer::Destroy which also clears the mViewManager pointer. So if we called Destroy (and it completed successfully) then the view manager would already be dead here. This is assuming that the stack is accurate; nsDocumentViewer::Destroy seems far too big to inline, and it's called from many more places.
If Destroy didn't complete successfully then it must have hit one of the two printing related early exits at the start of the function. If we didn't call Destroy from ~nsDocumentViewer (because there was no presshell) then we must have called destroy on the presshell before this (we must have had a presshell because we have frames). I checked all the places where mPresShell is changed to/from null and it looks like they all destroy the presshell.
The one crash coming from nsDocumentViewer::Hide must definitely call destroy on the presshell because we null check the presshell, and call DestroyPresShell directly. So that one doesn't have many leads to go on.
Assignee | ||
Updated•8 years ago
|
Attachment #8858744 -
Attachment description: patch v2 → assert frame is null in view destructor
Assignee | ||
Comment 14•8 years ago
|
||
Attachment #8862320 -
Flags: review?(mats)
Assignee | ||
Comment 15•8 years ago
|
||
None of this hit on a full try run. And are the only ways I could see the observed asserts happening.
Attachment #8862322 -
Flags: review?(mats)
Comment 16•8 years ago
|
||
Comment on attachment 8862320 [details] [diff] [review]
annotate when we hit the assert if it's print related
Could use a better commit message :-)
Attachment #8862320 -
Flags: review?(mats) → review+
Comment 17•8 years ago
|
||
Comment on attachment 8862322 [details] [diff] [review]
add a bunch of asserts for document teardown
Could use a better commit message :-)
>layout/base/PresShell.cpp
> PresShell::Destroy()
>+ (!mViewManager->GetRootView()->GetFrame() && !mViewManager->GetRootView()->GetFirstChild()));
wrap after the && please
>layout/base/nsDocumentViewer.cpp
>+ bool mPresShellDestroyed;
>+ bool mDestroyWasFull;
Perhaps we should add "// use only for assertions" on these?
>+ bool vmThinksWeHaveARootFrame =
>+ mViewManager && mViewManager->GetRootView() && mViewManager->GetRootView()->GetFrame();
>+ bool psThinksWeHaveARootFrame = mPresShell && mPresShell->GetRootFrame();
>+ MOZ_RELEASE_ASSERT(vmThinksWeHaveARootFrame == psThinksWeHaveARootFrame);
Can't we make this stronger by asserting that
mViewManager->GetRootView()->GetFrame() == mPresShell->GetRootFrame() ?
> NS_ASSERTION(!mPresShell && !mPresContext,
> "User did not call nsIContentViewer::Destroy");
>+ MOZ_RELEASE_ASSERT(!mPresShell && !mPresContext);
Does this really hold? We have bug 827183 open on that assertion.
>+ MOZ_RELEASE_ASSERT(!mViewManager || !mViewManager->GetRootView() ||
>+ (!mViewManager->GetRootView()->GetFrame() && !mViewManager->GetRootView()->GetFirstChild()));
wrap after the && please
> nsDocumentViewer::DestroyPresShell()
> {
>+ bool vmThinksWeHaveARootFrame =
>+ mViewManager && mViewManager->GetRootView() && mViewManager->GetRootView()->GetFrame();
>+ bool psThinksWeHaveARootFrame = mPresShell && mPresShell->GetRootFrame();
>+ MOZ_RELEASE_ASSERT(vmThinksWeHaveARootFrame == psThinksWeHaveARootFrame);
As noted above, can we make this stronger too?
>+ MOZ_RELEASE_ASSERT(!mViewManager || !mViewManager->GetRootView() ||
>+ (!mViewManager->GetRootView()->GetFrame() && !mViewManager->GetRootView()->GetFirstChild()));
wrap after the && please
r+ on the condition that bug 827183 is already fixed somehow
Flags: needinfo?(tnikkel)
Attachment #8862322 -
Flags: review?(mats) → review+
Assignee | ||
Comment 18•8 years ago
|
||
(In reply to Mats Palmgren (:mats) from comment #17)
> >+ MOZ_RELEASE_ASSERT(!mPresShell && !mPresContext);
>
> Does this really hold? We have bug 827183 open on that assertion.
Ah! I didn't know about that bug. We don't hit the assert on a full platform full try run, but I will try to reproduce with the testcases in that bug. I might remove that particular assert before landing even if I'm not able to reproduce since we know it was possible to reproduce at one time. If the other asserts turn up nothing we can return to this. But first I'll look at bug 827183.
Flags: needinfo?(tnikkel)
Assignee | ||
Comment 19•8 years ago
|
||
(In reply to Timothy Nikkel (:tnikkel) from comment #18)
> (In reply to Mats Palmgren (:mats) from comment #17)
> > >+ MOZ_RELEASE_ASSERT(!mPresShell && !mPresContext);
> >
> > Does this really hold? We have bug 827183 open on that assertion.
>
> Ah! I didn't know about that bug. We don't hit the assert on a full platform
> full try run, but I will try to reproduce with the testcases in that bug. I
> might remove that particular assert before landing even if I'm not able to
> reproduce since we know it was possible to reproduce at one time. If the
> other asserts turn up nothing we can return to this. But first I'll look at
> bug 827183.
This does indeed still reproduce. We should fix that! I'm guessing it could be causing some of these crashes.
Also, sorry about the lack of commit msg, I just wanted to get it up for review last night.
Assignee | ||
Updated•8 years ago
|
Keywords: leave-open
Assignee | ||
Comment 20•8 years ago
|
||
(In reply to Mats Palmgren (:mats) from comment #17)
> Can't we make this stronger by asserting that
> mViewManager->GetRootView()->GetFrame() == mPresShell->GetRootFrame() ?
Yes! That's much nicer.
Changed the patch according to the rest of your review comments.
Comment 21•8 years ago
|
||
Pushed by tnikkel@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/497dbf087ad5
Add annotation if the view is for a printing document if the view has a frame in its destructor. r=mats
Assignee | ||
Comment 22•8 years ago
|
||
(In reply to Pulsebot from comment #21)
> Pushed by tnikkel@gmail.com:
> https://hg.mozilla.org/integration/mozilla-inbound/rev/497dbf087ad5
> Add annotation if the view is for a printing document if the view has a
> frame in its destructor. r=mats
Just gonna land this patch for now because the other asserts (if we hit them) would interfere with getting proper data from this patch.
Assignee | ||
Comment 23•8 years ago
|
||
(In reply to Timothy Nikkel (:tnikkel) from comment #19)
> > Ah! I didn't know about that bug. We don't hit the assert on a full platform
> > full try run, but I will try to reproduce with the testcases in that bug. I
> > might remove that particular assert before landing even if I'm not able to
> > reproduce since we know it was possible to reproduce at one time. If the
> > other asserts turn up nothing we can return to this. But first I'll look at
> > bug 827183.
>
> This does indeed still reproduce. We should fix that! I'm guessing it could
> be causing some of these crashes.
Hmm, probably not. It seems we handle this okay.
Comment 24•8 years ago
|
||
bugherder |
Assignee | ||
Comment 25•8 years ago
|
||
One crash so far and it was marked as print related
https://crash-stats.mozilla.com/report/index/bae3c446-b330-4978-a341-569c90170515
Assignee | ||
Comment 26•8 years ago
|
||
This one has nsPagePrintTimer on the crash stack, and the annotation has printRelated as true as expected
https://crash-stats.mozilla.com/report/index/ccf5f861-82b2-4fc4-b5b1-0f4210170515
Assignee | ||
Comment 28•8 years ago
|
||
https://crash-stats.mozilla.com/report/index/4da1b1a1-f20a-4c79-8ed6-2d2990170601
printRelated is true. This one happens in a build with bug 1354443 (the first such crash), I was hoping that bug would fix this.
Comment 29•8 years ago
|
||
Pushed by tnikkel@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/43af422173e3
Add some asserts for document viewer teardown. r=mats
Comment 30•8 years ago
|
||
bugherder |
Assignee | ||
Comment 31•8 years ago
|
||
in ~nsDocumentViewer the Destroy call does not full complete in this crash, the only two reasons for that are print related.
https://crash-stats.mozilla.com/report/index/67c82ae9-2fb4-41c9-baa3-ace410170615
Assignee | ||
Comment 32•8 years ago
|
||
This one is interesting
https://crash-stats.mozilla.com/report/index/d1a881f7-a92b-4bf3-a4b2-105200170618
We hit the assert in the nsView destructor that we have no child views (of a root view). But we will look we went down the full destroy path in the document viewer, so it should have torn down all child views when it destroyed the frame tree. None of the other asserts are hit before we get here, which is the interesting part.
Assignee | ||
Comment 33•8 years ago
|
||
Assignee | ||
Comment 34•8 years ago
|
||
Comment on attachment 8880547 [details] [diff] [review]
backed_out_changeset_43af422173e3
Okay, let's back this out of beta. We're not going to get any more information by leaving this in but we will cause crashes. I still want to know how we hit the assert in comment 32 without triggering any other asserts but I don't have any good plan of attack for that now.
Approval Request Comment
[Feature/Bug causing the regression]: back out of diagnostic asserts
[User impact if declined]: crashes when the diagnostic asserts hit
[Is this code covered by automated tests?]: no
[Has the fix been verified in Nightly?]: n/a
[Needs manual test from QE? If yes, steps to reproduce]: no
[List of other uplifts needed for the feature/fix]: none
[Is the change risky?]: no
[Why is the change risky/not risky?]: removal of diagnostic asserts
[String changes made/needed]: none
Attachment #8880547 -
Flags: approval-mozilla-beta?
Comment 35•8 years ago
|
||
This data point could be interesting:
(62.07% in signature vs 00.14% overall) GFX_ERROR "ClientLayerManager::BeginTransaction with IPC channel down. GPU process may have died." = true [100.0% vs 45.45% if adapter_driver_version = 10.18.10.3740]
Comment 36•8 years ago
|
||
Comment on attachment 8880547 [details] [diff] [review]
backed_out_changeset_43af422173e3
backout some asserts that fire on beta, let's get this in 55.0b5
Attachment #8880547 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 37•8 years ago
|
||
uplift |
Comment on attachment 8880547 [details] [diff] [review]
backed_out_changeset_43af422173e3
Landed on Beta, but leaving the status unchanged since "fixed" would be a bit misleading :)
https://hg.mozilla.org/releases/mozilla-beta/rev/f018f0849a04
Attachment #8880547 -
Flags: checkin+
Comment 38•8 years ago
|
||
MOZ_RELEASE_ASSERT(!GetFirstChild()):
bp-870249c7-900d-4c15-b5ff-01adb0170625
Randomly happened, no STR, stylo enabled.
Assignee | ||
Comment 39•8 years ago
|
||
We're not learning anything from further instances of hitting these asserts lets remove them. And we're hitting them fairly often (bug 1379159).
Approval Request Comment
[Feature/Bug causing the regression]: back out of diagnostic asserts
[User impact if declined]: crashes when the diagnostic asserts hit
[Is this code covered by automated tests?]: no
[Has the fix been verified in Nightly?]: n/a
[Needs manual test from QE? If yes, steps to reproduce]: no
[List of other uplifts needed for the feature/fix]: none
[Is the change risky?]: no
[Why is the change risky/not risky?]: removal of diagnostic asserts
[String changes made/needed]: none
Attachment #8884398 -
Flags: approval-mozilla-beta?
Comment 40•8 years ago
|
||
Comment on attachment 8884398 [details] [diff] [review]
backed_out_changeset_497dbf087ad5
remove assertions, beta55+
Attachment #8884398 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 41•8 years ago
|
||
uplift |
Comment on attachment 8884398 [details] [diff] [review]
backed_out_changeset_497dbf087ad5
https://hg.mozilla.org/releases/mozilla-beta/rev/e777857c44d2
Same story as comment 37.
Attachment #8884398 -
Flags: checkin+
Assignee | ||
Comment 42•7 years ago
|
||
Comment 43•7 years ago
|
||
backout bugherder |
Updated•6 years ago
|
Component: Layout: View Rendering → Layout: Web Painting
Comment 44•6 years ago
|
||
The leave-open keyword is there and there is no activity for 6 months.
:tnikkel, maybe it's time to close this bug?
Flags: needinfo?(tnikkel)
Comment 45•6 years ago
|
||
Pushed by tnikkel@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/14009f721da2
Assert that the frame pointer is null in the nsView destructor. r=mats
Assignee | ||
Comment 46•6 years ago
|
||
Thanks for the reminder. We've fixed some bugs in this area, maybe we can land this now. We'll have to watch crash stats after this lands.
Flags: needinfo?(tnikkel)
Comment 47•6 years ago
|
||
bugherder |
Comment 48•5 years ago
|
||
The leave-open keyword is there and there is no activity for 6 months.
:tnikkel, maybe it's time to close this bug?
Flags: needinfo?(tnikkel)
Comment 49•5 years ago
|
||
The leave-open keyword is there and there is no activity for 6 months.
:tnikkel, maybe it's time to close this bug?
Flags: needinfo?(tnikkel)
Comment 50•4 years ago
|
||
The leave-open keyword is there and there is no activity for 6 months.
:tnikkel, maybe it's time to close this bug?
Flags: needinfo?(tnikkel)
Assignee | ||
Comment 51•4 years ago
|
||
I guess we don't see these crashes anymore?
Status: REOPENED → RESOLVED
Closed: 8 years ago → 4 years ago
Flags: needinfo?(tnikkel)
Resolution: --- → FIXED
Assignee | ||
Updated•4 years ago
|
Keywords: leave-open
You need to log in
before you can comment on or make changes to this bug.
Description
•