Closed Bug 1261175 Opened 8 years ago Closed 7 years ago

crash in nsPresContext::GetRootPresContext or nsPresContext::GetParentPresContext (from nsRefreshDriver::IsWaitingForPaint())

Categories

(Core :: Layout, defect, P3)

x86
Windows NT
defect

Tracking

()

RESOLVED FIXED
mozilla59
Tracking Status
e10s + ---
firefox46 + wontfix
firefox47 + wontfix
firefox48 + wontfix
firefox49 + wontfix
firefox-esr45 --- wontfix
firefox50 --- wontfix
firefox51 --- wontfix
firefox52 --- wontfix
firefox-esr52 59+ fixed
firefox53 --- wontfix
firefox54 --- wontfix
firefox55 --- wontfix
firefox56 --- wontfix
firefox57 --- wontfix
firefox58 + fixed
firefox59 + fixed

People

(Reporter: bugs, Assigned: mattwoodrow)

References

(Depends on 1 open bug)

Details

(4 keywords, Whiteboard: [adv-esr52.7+][adv-main58+])

Crash Data

Attachments

(3 files, 1 obsolete file)

+++ This bug was initially created as a clone of Bug #1241217 +++

Based on bug 724355 comment 17, we have reason to believe that bug 1241217 has morphed into this one after fixing bug 1241651.

This crash is currently #35 in the e10s experiment crash list - 

https://crash-stats.mozilla.org/search/?ActiveExperiment=e10s-beta46-noapz%40experiments.mozilla.org&ActiveExperimentBranch=experiment-no-addons&_facets=signature&_columns=signature&_columns=product&_columns=build_id&_columns=platform#facet-signature

More crash reports can be found on bug 724355.
Whiteboard: [adv-main45+][adv-esr38.7+][post-critsmash-triage][fixed by bug 1241651]
[Tracking Requested - why for this release]: Potential UAF. Also e10s blocker.
Crash Signature: [@ nsPresContext::GetParentPresContext] → [@ nsPresContext::GetRootPresContext]
tracking-e10s: --- → ?
Depends on: 1260531
Bug 1260531 probably not related. mats debunked my theory.
No longer depends on: 1260531
In the long term I think we should simplify the nsPresContext hierarchy by simply
adding a mParent pointer and NOT going through the viewmanager / view tree since it
has proven to be rather fragile.  I made a patch for this in bug 984690 which seemed
to work at the time IIRC.  (I'm sorry I didn't follow through on that proposal).

That would also support another (neglected) goal which is to remove viewmanager/views.

I'm not sure it will fix these crashes, but unless someone has a better idea it might
be worth a shot...
(In reply to Mats Palmgren (:mats) from comment #3)
> In the long term I think we should simplify the nsPresContext hierarchy by
> simply
> adding a mParent pointer and NOT going through the viewmanager / view tree
> since it
> has proven to be rather fragile.  I made a patch for this in bug 984690
> which seemed
> to work at the time IIRC.  (I'm sorry I didn't follow through on that
> proposal).
> 
> That would also support another (neglected) goal which is to remove
> viewmanager/views.
> 
> I'm not sure it will fix these crashes, but unless someone has a better idea
> it might
> be worth a shot...

Mats, you are the chosen one! Please get 984690 rebased and tested. I like the idea of removing the dependency on the View system from the refresh driver.
Depends on: 984690
Posted a patch in bug 1261230 for another guess.

Also, am I crazy, or is nothing keeping the stashed views alive? I'll write a patch for this too (unless I'm missing something).
Group: core-security, core-security-release → layout-core-security
not top e10s crash, so e10s+ and P3
Priority: -- → P3
Flags: needinfo?(tnikkel)
Too late for 46 uplift, but since this is a sec-critical crash we may want to try to fix it before we ship e10s.   This crash signature is also showing up on 45.0.1. Brad are you sure this shouldn't block?
Flags: needinfo?(blassey.bugs)
We took a fix for bug 1261230 in FF47. Let's see if that affects the frequency of crashes with this signature.
Depends on: 1261230
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #7)
> Too late for 46 uplift, but since this is a sec-critical crash we may want
> to try to fix it before we ship e10s.   This crash signature is also showing
> up on 45.0.1. Brad are you sure this shouldn't block?

Did I say somewhere this shouldn't block? And what are you asking about blocking?
Flags: needinfo?(blassey.bugs)
We came across this today in the CritSmash triage, where we're trying to make sure sec-critical bugs get quick attention.  I was asking because this was marked as e10s:m9+ and then changed to e10s:+ which I interpreted as, "this used to block e10s going to release, but now it doesn't".  Rather than just throw it back into e10s:? for triage, it seemed faster to ask directly.  While a bug being rated sec-critical doesn't necessarily block e10s on release,  we wanted to double check that you considered the rating.
I don't think this was ever m9+. Jet nominated on 3/31 and I marked it + on 4/4. We're not taking security rating into account for e10s tracking purposes, under the assumption that they'll be handled with the normal security bug prioritization and tracking process.
A decent chance bug 1265577 fixes this.
Depends on: 1265577
Bug 1261230 appears to have landed in 47.0b4, and bug 1265577 in 47.0b5,
but we still crash in 47.0b5 with this signature:
bp-f6d7dbfb-eb64-49ef-ae99-2435f2160514
Shows up in 48.0a2 as well, with an e5e5 UAF crash:
https://crash-stats.mozilla.com/report/index/f1403679-2fe0-49f3-ac75-ee6482160511
Same call sequence as bug 1241217 as best I can tell, which means I think this is basically the same bug - i.e. per comment 0, i think it was never really fixed, perhaps just knocked down in frequency.  (I haven't delved that deeply into the specifics, but I saw a 48 crash (and a 47) in the first page of reports.  Most were from older versions (much), which implies the frequency has gone down).
Critsmash triage: can we get a patch for this? This is a sec-critical that is lurking about.
(In reply to Al Billings [:abillings] from comment #16)
> can we get a patch for this?

I spent a _HUGE_ amount of time trying to fix this bug recently. My work finally got uplifted last week, which finally gave us the chance to see if my work had any effect. Unfortunately it seems that it did not. I used up all my best guesses already in trying to get this fixed. I'll circle back to this and try again, but with ~1 month turn around time to be able to tell if any patch for this bug has any effect it is a very hard road to walk.

Anybody who has ideas for this is very welcome to come forward!
Marking fix optional for 49/50, and assuming this still affects 51. 
Jet, shall we just count this as sec-critical backlog or do you think it needs further investigation?
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #18)
> Jet, shall we just count this as sec-critical backlog or do you think it
> needs further investigation?

I think we should put it on the sec-critical backlog, _and_ continue to investigate. We want to see this fixed ASAP, though our attempts have not been conclusive. If a reproducible case comes up, this one goes right back on the front burner.
Flags: needinfo?(bugs)
Crash Signature: [@ nsPresContext::GetRootPresContext] → [@ nsPresContext::GetRootPresContext] [@ nsPresContext::GetRootPresContext()]
Still lots of crashes, mostly in 50b and 48.  (Oddly far fewer in 49, though non-zero)
Randell, if you still have the recent crash-stat query urls for this, could you paste them here.
Flags: needinfo?(rjesup)
I see a lot of crashes in 50, and a decent number in 51, but only a few in 52.
I wonder if we have several issues around here and if bug 1308459 helped in some case, but not in all.
We get a lot less usage in Aurora than beta and release, so lots in 50, fair number in 51, a few in 52 is  a pretty typical signature, no?
Crash Signature: [@ nsPresContext::GetRootPresContext] [@ nsPresContext::GetRootPresContext()] → [@ nsPresContext::GetRootPresContext] [@ nsPresContext::GetRootPresContext()] [@ nsPresContext::GetParentPresContext]
Summary: crash in nsPresContext::GetRootPresContext (from nsRefreshDriver::IsWaitingForPaint()) → crash in nsPresContext::GetRootPresContext or nsPresContext::GetParentPresContext (from nsRefreshDriver::IsWaitingForPaint())
The number or crashes for https://crash-stats.mozilla.com/signature/?signature=nsPresContext%3A%3AGetParentPresContext is low, but the number of crashes for https://crash-stats.mozilla.com/signature/?signature=nsPresContext%3A%3AGetRootPresContext is increasing slightly.

Jet, could somebody be assigned to this bug to investigate this as per comment 19?
Flags: needinfo?(bugs)
I just spent a bunch of time going through the latest crash-stats URLs and comments associated with the nsPresContext::GetRootPresContext crashes, hoping to find a way to reproduce. No luck. I did notice that there's an unusually high number of PDF related URLs. (By no means all - there are crashes of the Google homepage and Facebook homepage, for example.) There also seemed to be quite a high ratio of mentions of printing. Loading various URLs, navigating during/after load, navigating during/after printing to file, no crashes. :/

I'd also note that memory pressure does not seem to be an issue.
(In reply to Mats Palmgren (:mats) from comment #3)
> In the long term I think we should simplify the nsPresContext hierarchy by
> simply
> adding a mParent pointer and NOT going through the viewmanager / view tree

The stacks I'm seeing are stacks like:

https://crash-stats.mozilla.com/report/index/9f871728-4316-4009-a0f6-cfbde2161001

Those looks as likely to be caused by nsRefreshDriver's mPresContext raw pointer pointing at invalid memory as they are to be caused by the pc->GetParentPresContext() call in nsPresContext::GetRootPresContext() returning something invalid. Was there a reason to suspect the latter?
Flags: needinfo?(mats)
nsRefreshDriver::mPresContext is a mozilla::WeakPtr<nsPresContext>, not a raw pointer,
so when the nsPresContext is deleted this member should become null.  The only potential
problem I see with that is that mozilla::WeakPtr isn't thread safe:
http://searchfox.org/mozilla-central/source/mfbt/WeakPtr.h#17
but, as far as I know, destruction of nsPresContext and nsRefreshDriver::Tick are
only supposed to happen on the main thread, so that shouldn't be a problem.
Flags: needinfo?(mats)
(In reply to Jonathan Watt [:jwatt] from comment #30)
> (In reply to Mats Palmgren (:mats) from comment #3)
> > In the long term I think we should simplify the nsPresContext hierarchy by
> > simply
> > adding a mParent pointer and NOT going through the viewmanager / view tree
> 
> The stacks I'm seeing are stacks like:
> 
> https://crash-stats.mozilla.com/report/index/9f871728-4316-4009-a0f6-
> cfbde2161001

In addition to what Mats said, the crashing line in that report is inside a for loop, so we may have successfully iterated to a parent prescontext 1 or more times before encountering the crash, which would implicate one of the pointers we follow in GetParentPresContext being invalid.
(In reply to Mats Palmgren (:mats) from comment #31)
> nsRefreshDriver::mPresContext is a mozilla::WeakPtr<nsPresContext>, not a
> raw pointer

Ah, it looks like that's a somewhat recent change. If you follow the nsRefreshDriver.cpp link from the crash report I gave and then switch to the .h file (with the same rev), the pointer is a raw pointer.
(In reply to Jonathan Watt [:jwatt] from comment #33)
> (In reply to Mats Palmgren (:mats) from comment #31)
> > nsRefreshDriver::mPresContext is a mozilla::WeakPtr<nsPresContext>, not a
> > raw pointer
> 
> Ah, it looks like that's a somewhat recent change. If you follow the
> nsRefreshDriver.cpp link from the crash report I gave and then switch to the
> .h file (with the same rev), the pointer is a raw pointer.

The weak ptr change in 52, which is on beta right now, and we see plenty of crashes there, so it didn't fix it.
time is very short for beta/52.  Any chance of wallpaper, if not a full fix?
Flags: needinfo?(jwatt)
If I understand correctly, nobody has a solution to this right now that doesn't involve major refactoring.
Flags: needinfo?(jwatt)
Flags: needinfo?(tnikkel)
Flags: needinfo?(bugs)
(In reply to Jonathan Watt [:jwatt] from comment #36)
> If I understand correctly, nobody has a solution to this right now that
> doesn't involve major refactoring.

Sorry to bug you once again, Jet.
If this is truly a sec-critical (and I'm happy to be proven otherwise), we must consider the refactoring as an option.
Do you think you could discuss the required work in your team and come up with a rough time frame?
Flags: needinfo?(bugs)
Please don't cancel my needinfos (which I set and was asking myself for info).
Flags: needinfo?(tnikkel)
(In reply to Frederik Braun [:freddyb] from comment #37)
> (In reply to Jonathan Watt [:jwatt] from comment #36)
> > If I understand correctly, nobody has a solution to this right now that
> > doesn't involve major refactoring.
> 
> Sorry to bug you once again, Jet.
> If this is truly a sec-critical (and I'm happy to be proven otherwise), we
> must consider the refactoring as an option.
> Do you think you could discuss the required work in your team and come up
> with a rough time frame?

Our biggest issue continues to be a lack of reproducible steps. GetRootPresContext() is a *very* hot code path, and any changes here (refactoring, etc.) come with significant risk. We're willing to take on that risk if there's a reasonable chance that we'll actually fix this bug. That gets us back to reproducible steps :(

re: the sec-critical designation, this one should stay on the backlog since the crash signature is a potential UAF, but we can't reproduce. I think bug 984690 is still how we'd refactor this code and we may still do that independent of this bug. I would still love to get repro steps to be sure we actually fixed it.
Mass wontfix for bugs affecting firefox 52.
Still happening at volume; 1200+ crashes in the last week.
Would it be useful to get some URLs that have caused the crash? The volume seems quite high indeed.
Flags: needinfo?(rjesup)
Flags: needinfo?(rjesup)
The (few) comments still seem to involve printing often
(In reply to Randell Jesup [:jesup] from comment #43)
> The (few) comments still seem to involve printing often

Then bug 1354443 might be quite relevant.
I'll mark bug 1354443 as a dependency since it shouldn't be too hard to fix bug 1354443 and then we can see how many of these crashes remain after fixing it.
Depends on: 1354443
Looks like its dependent (bug 1354443) just got resolved. Does that turn this issue into something more actionable, Timothy?
Unfortunately it doesn't look like the fixes for bug 1354443 helped. There are lots of crashes still being reported for 55.0b13. (Fixing that bug also doesn't make this bug any more actionable.)
[Tracking Requested - why for this release]:
Because this is a perennial top-UAF crasher that hasn't gotten traction.

Poking this bug -- this is the #1 UAF crash (by a bunch) on 57/58:
https://crash-stats.mozilla.com/signature/?product=Firefox&version=58.0a1&version=57.0b&version=57.0a1&address=~e5e5&signature=nsPresContext%3A%3AGetRootPresContext&date=%3E%3D2017-10-02T14%3A00%3A00.000Z&date=%3C2017-10-09T14%3A00%3A00.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_sort=-date&page=1#reports
>250 crashes just in the last week on 57 and 58

only 4 of 266 crashes *don't* have IsWaitingForPaint() as the calling function:

https://crash-stats.mozilla.com/signature/?product=Firefox&version=58.0a1&version=57.0b&version=57.0a1&address=~e5e5&proto_signature=%21~IsWaitingForPaint&signature=nsPresContext%3A%3AGetRootPresContext&date=%3E%3D2017-10-02T14%3A00%3A00.000Z&date=%3C2017-10-09T14%3A00%3A00.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_sort=-date&page=1

Those are called from APZCCallbackHelper::ApplyCallbackTransform() (3) and nsCSSFrameConstructor::BeginUpdate() (1)

Just an observation: like most frequent crashes, this has a weekend dip.  However, the dip seems much larger than most.  Might be a (small) clue.
Flags: needinfo?(milan)
So, a pres context that the refresh driver has as a weak pointer to, is going away while the refresh driver still tries to use it?  Timothy, anything we can do here to get more information or to wallpaper it?  The volume is not huge, so if we can get rid of UAF and replace it with a release assert or some such, it would be an improvement.
Flags: needinfo?(tnikkel)
Flags: needinfo?(milan)
Flags: needinfo?(tnikkel)
So this would say that the #1 and #3 UAFs in 57/58 both probably have the same root cause...
Attached image Struct offsets —
So, I think this is because nsIPresShell::mViewManager is dangling somewhere.

The crash address is 0xE5E5E5ED, while we're accessing [eax+8], and eax is 0xE5E5E5E5.

eax looks to be 'view' here, so that suggests that 'viewManager' is a valid address, but is deleted, so looking up mRootView (0x14) returned 0xE5E5E5E5.

That would suggest that 'shell' is valid, and points to valid memory, but mViewManager (0x14) is dangling.
Some follow-on thoughts:

nsViewManager::~nsViewManager - Should we runtime assert that mPresShell is null, rather than just setting it to nullptr. If it's not nullptr, the presshell will have a dangling pointer to us.

nsIPresShell::mViewManager - comment says docViewer owns it, but we don’t always have a doc viewer (nsPrintEngine!)

nsPrintEngine::ReflowPrintObject - make !mPresContext assertion a runtime one. If we’ve called this twice without destroying the presentation, then we’ll drop the reference to the nsViewManager without calling Destroy on the shell. Possible that the shell then outlives the viewmanager. Should we call DestroyPresentation just to be safe?

nsDocumentViewer::InitInternal calls MakeWindow, which writes to mViewManager. Again, if we already had one (and a pres shell), then this can destroy the old view manager without Destroy being called on the old pres shell). Could this be called multiple times? Looks like Init is exposed to script?

Tim, Mats, does that all seem sane?

Still not sure what the actual problem is, but if that is all correct, some runtime asserts could probably catch it earlier (and in a non-UAF way).
Flags: needinfo?(mats)
(In reply to Matt Woodrow (:mattwoodrow) from comment #52)
> nsIPresShell::mViewManager - comment says docViewer owns it, but we don’t
> always have a doc viewer (nsPrintEngine!)

The nsPrintObject would own it then, it looks like.

> nsDocumentViewer::InitInternal calls MakeWindow, which writes to
> mViewManager. Again, if we already had one (and a pres shell), then this can
> destroy the old view manager without Destroy being called on the old pres
> shell). Could this be called multiple times? Looks like Init is exposed to
> script?

I don't think Init is callable by script, nsIContentViewer::Init has [noscript] on it.

The other ones sound reasonable.

My only guess is one of the nsDocumentViewer or relevant functions in our printing code can re-enter and mess things up. We've had that problem before, and fixed the ones we found, but no guarantee we know about them all. I had some hacky patches to detect this that I resurrected. I'll see if they can be made landable.
(In reply to Timothy Nikkel (:tnikkel) from comment #53)
> (In reply to Matt Woodrow (:mattwoodrow) from comment #52)
> > nsIPresShell::mViewManager - comment says docViewer owns it, but we don’t
> > always have a doc viewer (nsPrintEngine!)
> 
> The nsPrintObject would own it then, it looks like.

Indeed, just a confusing comments for something with fairly complex ownership requirements.

> 
> > nsDocumentViewer::InitInternal calls MakeWindow, which writes to
> > mViewManager. Again, if we already had one (and a pres shell), then this can
> > destroy the old view manager without Destroy being called on the old pres
> > shell). Could this be called multiple times? Looks like Init is exposed to
> > script?
> 
> I don't think Init is callable by script, nsIContentViewer::Init has
> [noscript] on it.

Ah right, good catch. There's still quite a few c++ callers to Init, I didn't audit them all to see if re-entry was possible.
Attached patch viewmanager-assert — — Splinter Review
I think this should be sufficient (if I have the cause correct) to prevent UAF, and it'll give us a stack to the actual bug.
Attachment #8917584 - Flags: review?(tnikkel)
Attachment #8917584 - Flags: review?(tnikkel) → review+
Comment on attachment 8917584 [details] [diff] [review]
viewmanager-assert

[Security approval request comment]
How easily could an exploit be constructed based on the patch?

Not easily. It'll probably make it easier to roughly figure out what triggers the bug (destroying the view manager before the pres shell), but you'd still have to audit the code to figure out how to make that happen (which we've failed to do thus far).

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?

No.

Which older supported branches are affected by this flaw?

All.

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?

No, but they should be identical.

How likely is this patch to cause regressions; how much testing does it need?

Fairly unlikely, just adding a new assertion.
Attachment #8917584 - Flags: sec-approval?
Comment on attachment 8917584 [details] [diff] [review]
viewmanager-assert

sec-approval+ for trunk.
We'll want a patch nominated for Beta and ESR52 as well, to hopefully land after this is on trunk.
Attachment #8917584 - Flags: sec-approval? → sec-approval+
It's more than a diagnostic - it makes us preemptively crash, safely, to avoid the UAF (per comment 55).

Perhaps we should keep this bug focused on the UAF, for saner semantics of tracking flags as we uplift it, and file a followup on investigating & fixing the (anticipated) MOZ_RELEASE_ASSERT crashes?   mattwoodrow, what do you think?  If you agree, go ahead and close this as FIXED, I think (since the patch has been merged to central).
Flags: needinfo?(matt.woodrow)
The added assertion seems correct to me.
Flags: needinfo?(mats)
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: needinfo?(matt.woodrow)
Resolution: --- → FIXED
Comment on attachment 8917584 [details] [diff] [review]
viewmanager-assert

Approval Request Comment
[Feature/Bug causing the regression]: Unknown, long standing sec bug.
[User impact if declined]: Potentially exploitable UAF bug.
[Is this code covered by automated tests?]: No, we have no reproducible test cases yet.
[Has the fix been verified in Nightly?]: No, again, no known way to reproduce.
[Needs manual test from QE? If yes, steps to reproduce]: No.
[List of other uplifts needed for the feature/fix]: None.
[Is the change risky?]: No.
[Why is the change risky/not risky?]: Just adds a release assert to verify our ownership requirements, should turn the UAF crash into a safe assertion crash (along with the stack that will help us fix the underlying problem).
[String changes made/needed]: None.
Attachment #8917584 - Flags: approval-mozilla-esr52?
Attachment #8917584 - Flags: approval-mozilla-beta?
Assignee: nobody → matt.woodrow
Target Milestone: --- → mozilla58
Comment on attachment 8917584 [details] [diff] [review]
viewmanager-assert

Sec-high, Beta57+, ESR52.5+
Attachment #8917584 - Flags: approval-mozilla-esr52?
Attachment #8917584 - Flags: approval-mozilla-esr52+
Attachment #8917584 - Flags: approval-mozilla-beta?
Attachment #8917584 - Flags: approval-mozilla-beta+
Group: layout-core-security → core-security-release
FYI, there are already quite a few crashes (77) on trunk hitting this assert, e.g.
bp-e757c47e-ee7c-421e-8d3a-b6bc90171018

Should we open a new bug to handle that or reopen this one?
Flags: needinfo?(matt.woodrow)
http://searchfox.org/mozilla-central/source/layout/base/nsDocumentViewer.cpp#1636

That code is wrong isn't it?

We call Destroy() from the nsDocumentViewer dtor, but if we take that early return then we don't call DestroyPresShell().
Flags: needinfo?(matt.woodrow)
the [@ nsViewManager::~nsViewManager] signature is the #6 top content crash in 57.0b10
(In reply to Matt Woodrow (:mattwoodrow) from comment #66)
> http://searchfox.org/mozilla-central/source/layout/base/nsDocumentViewer.
> cpp#1636
> 
> That code is wrong isn't it?
> 
> We call Destroy() from the nsDocumentViewer dtor, but if we take that early
> return then we don't call DestroyPresShell().

Yeah, pretty much, same with the next early return after that.
Crash Signature: [@ nsPresContext::GetRootPresContext] [@ nsPresContext::GetRootPresContext()] [@ nsPresContext::GetParentPresContext] → [@ nsPresContext::GetRootPresContext] [@ nsPresContext::GetRootPresContext()] [@ nsPresContext::GetParentPresContext] [@ nsViewManager::~nsViewManager]
Attached patch printing-fix (obsolete) — — Splinter Review
I still haven't found a way to reproduce this, so somewhat guessing here.

I *think* the destroy refcount early return is ok, since it looks like we manually call Destroy in this cases where we increment it (as well as calling Destroy from the dtor). Added a release assert to catch the case where we hit the dtor with it still set though.
Attachment #8923998 - Flags: review?(tnikkel)
Attachment #8923998 - Flags: review?(jwatt)
Whiteboard: [adv-main57+][adv-esr52.5+]
Comment on attachment 8923998 [details] [diff] [review]
printing-fix

Review of attachment 8923998 [details] [diff] [review]:
-----------------------------------------------------------------

I'd like Bob to take a look at this too.
Attachment #8923998 - Flags: review?(jwatt)
Attachment #8923998 - Flags: review?(bobowencode)
Attachment #8923998 - Flags: review+
Comment on attachment 8923998 [details] [diff] [review]
printing-fix

Review of attachment 8923998 [details] [diff] [review]:
-----------------------------------------------------------------

My main concern here is why is that check there in the first place.
Obviously the early return is wrong when called from ~nsDocumentViewer, but what about other callers of this?

Clearly the setting of mPrt->mDocWasToBeDestroyed isn't doing anything either way.

I also don't quite understand how mPrintEngine isn't null when this is called from ~nsDocumentViewer, as it looks like it should always have a strong reference to it.
I don't have good answers to these questions unfortunately.

I'm not even entirely sure that this is the bug. All we know is that we're calling the view manager destructor while we still have a pointer to a pres shell (which should mean it has a pointer back, which is now dangling).

That would suggest that that we missed calling Destroy() on the pres shell, and this patch fixes the only way I can see that it might be possible.

I can't reproduce this crash at all, and really don't know this code, so it's a bit of a guess.

Not running Destroy() when called from the dtor seems clearly like a problem, and the destroy ref count should already prevent it running if Destroy is called manually.

I'm not sure what CheckBeforeDestroy was trying to do by keeping things alive (without holding a reference).
CheckBeforeDestroy was there before nsPrintEngine::mDocViewerPrint became a strong reference.
Which I guess was not strong originally because of circular dependencies.
Also having a strong reference doesn't stop someone else from calling nsDocumentViewer::Destroy.

The mDestroyRefCount only seems to cover part of the print process (and looks pretty unreliable as someone else can decrement it via nsDocumentViewer::Destroy who hasn't incremented it, but that's a different issue).

Perhaps instead in ~nsDocumentViewer, if mPrintEngine is not null, we should call Destroy on it and null it out, before calling our own Destroy?
That would rule out the issue when called from ~nsDocumentViewer, but leave the existing behaviour when called from somewhere else.
Flags: needinfo?(matt.woodrow)
Reopening for now as we expect to take more patches here.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attached patch printing-fix-v2 — — Splinter Review
Flags: needinfo?(matt.woodrow)
Attachment #8927985 - Flags: review?(bobowencode)
Attachment #8927985 - Flags: review?(bobowencode) → review+
Comment on attachment 8927985 [details] [diff] [review]
printing-fix-v2

[Security approval request comment]
How easily could an exploit be constructed based on the patch?

Not easily, we still haven't figured out how to reproduce this.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?

Not particularly, same as above.

Which older supported branches are affected by this flaw?

All.

If not all supported branches, which bug introduced the flaw?

Unknown, very old bug though.

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?

Not yet, but it's old code so should be the same for all branches.

How likely is this patch to cause regressions; how much testing does it need?

Fairly low, but we don't have automated testing for this code, so it would be preferable to ride the trains.
Attachment #8927985 - Flags: sec-approval?
Sec-approval+ for checkin on November 28, two weeks into the new cycle. Please do not check in before that date.

We'll want this on other branches so branch patches should be made and nominated at that time as well.
Whiteboard: [adv-main57+][adv-esr52.5+] → [adv-main57+][adv-esr52.5+][checkin on 11/28]
Attachment #8927985 - Flags: sec-approval? → sec-approval+
Attachment #8923998 - Attachment is obsolete: true
Attachment #8923998 - Flags: review?(tnikkel)
Attachment #8923998 - Flags: review?(bobowencode)
https://hg.mozilla.org/integration/mozilla-inbound/rev/1f2a0a7ae4b37bd0073c6f7e0a63fe9b71a498de

I'm resetting the Fx58 status flag under the assumption that we'll eventually want to uplift this to Beta as well. We'll probably want to consider doing the same for ESR52 also.
Whiteboard: [adv-main57+][adv-esr52.5+][checkin on 11/28] → [adv-main57+][adv-esr52.5+]
Target Milestone: mozilla58 → ---
https://hg.mozilla.org/mozilla-central/rev/1f2a0a7ae4b3
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla59
Flags: needinfo?(bugs)
We've hit that release assert on mDestroyRefCount twice since this landed.
Given that nsPagePrintTimer holds a strong reference, the only thing I can think of is that somehow you can get back into printing before the old nsPagePrintTimer has died and you hit the mPrintEngine->CheckBeforeDestroy() as it dies and mDestroyRefCount never gets decremented.

Other than try to change entirely how this all works, maybe we should just move the decrement to another function (like the increment), which ~nsPagePrintTimer calls instead of Destroy.
This also gets rid of the potential problem of other things calling Destroy and decrementing mDestroyRefCount.

jwatt - what do you think?
Flags: needinfo?(tnikkel) → needinfo?(jwatt)
I'd actually like to do a much bigger rewrite, but for now that sounds good to me.
Flags: needinfo?(jwatt) → needinfo?(bobowencode)
Depends on: 1426087
Flags: needinfo?(bobowencode)
Should we uplift the last patch to Beta for Fx58 still?
Flags: needinfo?(matt.woodrow)
I'm not sure if Bob wanted to land another fix here first or not.
Flags: needinfo?(matt.woodrow) → needinfo?(bobowencode)
(In reply to Matt Woodrow (:mattwoodrow) from comment #83)
> I'm not sure if Bob wanted to land another fix here first or not.

I filed bug 1426087 for that, I'm not sure if uplifting highlights the issue here any more, but even if we just did this bug at least it makes the crash safe.
Flags: needinfo?(bobowencode)
Sounds like we can move forward with the uplift request then.
Flags: needinfo?(matt.woodrow)
Blocks: 1353228
Comment on attachment 8927985 [details] [diff] [review]
printing-fix-v2

Approval Request Comment
[Feature/Bug causing the regression]: Unknown, very old bug.
[User impact if declined]: Potential UAF with printing.
[Is this code covered by automated tests?]: No.
[Has the fix been verified in Nightly?]: No.
[Needs manual test from QE? If yes, steps to reproduce]: No known STR, so no.
[List of other uplifts needed for the feature/fix]: None.
[Is the change risky?]: No.
[Why is the change risky/not risky?]: Very low risk.
[String changes made/needed]: None.
Flags: needinfo?(matt.woodrow)
Attachment #8927985 - Flags: approval-mozilla-beta?
Comment on attachment 8927985 [details] [diff] [review]
printing-fix-v2

Sec-high fix. Beta58+.
Attachment #8927985 - Flags: approval-mozilla-release+
Attachment #8927985 - Flags: approval-mozilla-beta?
Attachment #8927985 - Flags: approval-mozilla-beta+
i've filed bug 1431009 for a new crash signature with the MOZ_RELEASE_ASSERT added here.
Comment on attachment 8927985 [details] [diff] [review]
printing-fix-v2

This patch appears to be working well on 58+. Let's close the loop and get it on ESR52 as well. See comment 86 for details.
Attachment #8927985 - Flags: approval-mozilla-esr52?
Comment on attachment 8927985 [details] [diff] [review]
printing-fix-v2

Looks like we landed a diagnostic patch for this crash in the ESR that went out with the 57 release  (52.5.0) and now we think this will fix the issue. Let's uplift for 52.7.0esr.
Attachment #8927985 - Flags: approval-mozilla-esr52? → approval-mozilla-esr52+
Al, just want to make sure you see this in case it needs mentioning in security advisories.
Flags: needinfo?(abillings)
I hate it when we fix a bug multiple times in the same bug. All the flags say this had advisories back in 57. 

Then we fixed it (again, sort of) in 58 but it got no advisory there because it was already marked as fixed in 57 so none of the queries brought it up (because they check whether the previous version was affected). 

Luckily, this is an internal report. Ideally, we would have fixed it on mainline Firefox (58) and ESR at the *same* time...
Flags: needinfo?(abillings)
Whiteboard: [adv-main57+][adv-esr52.5+]
Yes, that's why I just pinged you for this, because it didn't seem like the flags were going to show this to you for 59. We should definitely open new bugs for security sensitive issues rather than re-open and land more patches in the same bug. I will mention that in the channel meetings and relman's team meeting as something to watch out for.
Whiteboard: [adv-esr52.7+][adv-main58+]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: