1261175 - crash in nsPresContext::GetRootPresContext or nsPresContext::GetParentPresContext (from nsRefreshDriver::IsWaitingForPaint())

Reporter

Description

•

8 years ago

+++ This bug was initially created as a clone of Bug #1241217 +++

Based on bug 724355 comment 17, we have reason to believe that bug 1241217 has morphed into this one after fixing bug 1241651.

This crash is currently #35 in the e10s experiment crash list - 

https://crash-stats.mozilla.org/search/?ActiveExperiment=e10s-beta46-noapz%40experiments.mozilla.org&ActiveExperimentBranch=experiment-no-addons&_facets=signature&_columns=signature&_columns=product&_columns=build_id&_columns=platform#facet-signature

More crash reports can be found on bug 724355.

Jet Villegas (inactive)

Reporter

Updated

•

8 years ago

Whiteboard: [adv-main45+][adv-esr38.7+][post-critsmash-triage][fixed by bug 1241651]

Jet Villegas (inactive)

Reporter

Comment 1

•

8 years ago

[Tracking Requested - why for this release]: Potential UAF. Also e10s blocker.

Crash Signature: [@ nsPresContext::GetParentPresContext] → [@ nsPresContext::GetRootPresContext]

status-firefox46: --- → affected

status-firefox47: --- → affected

tracking-e10s: --- → ?

tracking-firefox46: --- → ?

Depends on: 1260531

Timothy Nikkel (:tnikkel)

Comment 2

•

8 years ago

Bug 1260531 probably not related. mats debunked my theory.

No longer depends on: 1260531

Mats Palmgren (inactive)

Comment 3

•

8 years ago

In the long term I think we should simplify the nsPresContext hierarchy by simply
adding a mParent pointer and NOT going through the viewmanager / view tree since it
has proven to be rather fragile.  I made a patch for this in bug 984690 which seemed
to work at the time IIRC.  (I'm sorry I didn't follow through on that proposal).

That would also support another (neglected) goal which is to remove viewmanager/views.

I'm not sure it will fix these crashes, but unless someone has a better idea it might
be worth a shot...

Jet Villegas (inactive)

Reporter

Comment 4

•

8 years ago

(In reply to Mats Palmgren (:mats) from comment #3)
> In the long term I think we should simplify the nsPresContext hierarchy by
> simply
> adding a mParent pointer and NOT going through the viewmanager / view tree
> since it
> has proven to be rather fragile.  I made a patch for this in bug 984690
> which seemed
> to work at the time IIRC.  (I'm sorry I didn't follow through on that
> proposal).
> 
> That would also support another (neglected) goal which is to remove
> viewmanager/views.
> 
> I'm not sure it will fix these crashes, but unless someone has a better idea
> it might
> be worth a shot...

Mats, you are the chosen one! Please get 984690 rebased and tested. I like the idea of removing the dependency on the View system from the refresh driver.

Depends on: 984690

Timothy Nikkel (:tnikkel)

Comment 5

•

8 years ago

Posted a patch in bug 1261230 for another guess.

Also, am I crazy, or is nothing keeping the stashed views alive? I'll write a patch for this too (unless I'm missing something).

Daniel Veditz [:dveditz]

Updated

•

8 years ago

Group: core-security, core-security-release → layout-core-security

Brad Lassey [:blassey] (use needinfo?)

Comment 6

•

8 years ago

not top e10s crash, so e10s+ and P3

tracking-e10s: ? → +

Priority: -- → P3

Timothy Nikkel (:tnikkel)

Updated

•

8 years ago

Flags: needinfo?(tnikkel)

Liz Henry (:lizzard) (relman/hg->git project)

Comment 7

•

8 years ago

Too late for 46 uplift, but since this is a sec-critical crash we may want to try to fix it before we ship e10s.   This crash signature is also showing up on 45.0.1. Brad are you sure this shouldn't block?

status-firefox46: affected → wontfix

tracking-firefox46: ? → +

tracking-firefox47: --- → +

tracking-firefox48: --- → +

Flags: needinfo?(blassey.bugs)

Jet Villegas (inactive)

Reporter

Comment 8

•

8 years ago

We took a fix for bug 1261230 in FF47. Let's see if that affects the frequency of crashes with this signature.

Depends on: 1261230

Brad Lassey [:blassey] (use needinfo?)

Comment 9

•

8 years ago

(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #7)
> Too late for 46 uplift, but since this is a sec-critical crash we may want
> to try to fix it before we ship e10s.   This crash signature is also showing
> up on 45.0.1. Brad are you sure this shouldn't block?

Did I say somewhere this shouldn't block? And what are you asking about blocking?

Flags: needinfo?(blassey.bugs)

Liz Henry (:lizzard) (relman/hg->git project)

Comment 10

•

8 years ago

We came across this today in the CritSmash triage, where we're trying to make sure sec-critical bugs get quick attention.  I was asking because this was marked as e10s:m9+ and then changed to e10s:+ which I interpreted as, "this used to block e10s going to release, but now it doesn't".  Rather than just throw it back into e10s:? for triage, it seemed faster to ask directly.  While a bug being rated sec-critical doesn't necessarily block e10s on release,  we wanted to double check that you considered the rating.

Brad Lassey [:blassey] (use needinfo?)

Comment 11

•

8 years ago

I don't think this was ever m9+. Jet nominated on 3/31 and I marked it + on 4/4. We're not taking security rating into account for e10s tracking purposes, under the assumption that they'll be handled with the normal security bug prioritization and tracking process.

Timothy Nikkel (:tnikkel)

Comment 12

•

8 years ago

A decent chance bug 1265577 fixes this.

Depends on: 1265577

Jim Mathies [:jimm]

Comment 13

•

8 years ago

Current crash stats report for the content process only crash:

https://crash-stats.mozilla.com/signature/?product=Firefox&dom_ipc_enabled=%21__null__&ipc_channel_error=%210x7&process_type=content&version=47.0b1&signature=nsPresContext%3A%3AGetRootPresContext

Mats Palmgren (inactive)

Comment 14

•

8 years ago

Bug 1261230 appears to have landed in 47.0b4, and bug 1265577 in 47.0b5,
but we still crash in 47.0b5 with this signature:
bp-f6d7dbfb-eb64-49ef-ae99-2435f2160514

Randell Jesup [:jesup] (needinfo me)

Comment 15

•

8 years ago

Shows up in 48.0a2 as well, with an e5e5 UAF crash:
https://crash-stats.mozilla.com/report/index/f1403679-2fe0-49f3-ac75-ee6482160511
Same call sequence as bug 1241217 as best I can tell, which means I think this is basically the same bug - i.e. per comment 0, i think it was never really fixed, perhaps just knocked down in frequency.  (I haven't delved that deeply into the specifics, but I saw a 48 crash (and a 47) in the first page of reports.  Most were from older versions (much), which implies the frequency has gone down).

Al Billings [:abillings - ex-MoCo]

Comment 16

•

8 years ago

Critsmash triage: can we get a patch for this? This is a sec-critical that is lurking about.

status-firefox49: --- → affected

tracking-firefox49: --- → +

Timothy Nikkel (:tnikkel)

Comment 17

•

8 years ago

(In reply to Al Billings [:abillings] from comment #16)
> can we get a patch for this?

I spent a _HUGE_ amount of time trying to fix this bug recently. My work finally got uplifted last week, which finally gave us the chance to see if my work had any effect. Unfortunately it seems that it did not. I used up all my best guesses already in trying to get this fixed. I'll circle back to this and try again, but with ~1 month turn around time to be able to tell if any patch for this bug has any effect it is a very hard road to walk.

Anybody who has ideas for this is very welcome to come forward!

Daniel Veditz [:dveditz]

Updated

•

8 years ago

Keywords: sec-critical → sec-high

Jim Mathies [:jimm]

Updated

•

8 years ago

status-firefox47: affected → wontfix

Liz Henry (:lizzard) (relman/hg->git project)

Comment 18

•

8 years ago

Marking fix optional for 49/50, and assuming this still affects 51. 
Jet, shall we just count this as sec-critical backlog or do you think it needs further investigation?

status-firefox48: affected → fix-optional

status-firefox49: affected → fix-optional

status-firefox50: --- → affected

status-firefox51: --- → affected

Flags: needinfo?(bugs)

Jet Villegas (inactive)

Reporter

Comment 19

•

8 years ago

(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #18)
> Jet, shall we just count this as sec-critical backlog or do you think it
> needs further investigation?

I think we should put it on the sec-critical backlog, _and_ continue to investigate. We want to see this fixed ASAP, though our attempts have not been conclusive. If a reproducible case comes up, this one goes right back on the front burner.

Flags: needinfo?(bugs)

Sylvestre Ledru [:Sylvestre]

Comment 20

•

8 years ago

too late for 48

status-firefox48: fix-optional → wontfix

Matt Woodrow (:mattwoodrow)

Assignee

Updated

•

8 years ago

Crash Signature: [@ nsPresContext::GetRootPresContext] → [@ nsPresContext::GetRootPresContext] [@ nsPresContext::GetRootPresContext()]

Randell Jesup [:jesup] (needinfo me)

Comment 21

•

8 years ago

Still lots of crashes, mostly in 50b and 48.  (Oddly far fewer in 49, though non-zero)

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 22

•

8 years ago

Randell, if you still have the recent crash-stat query urls for this, could you paste them here.

Flags: needinfo?(rjesup)

Randell Jesup [:jesup] (needinfo me)

Comment 23

•

8 years ago

https://crash-stats.mozilla.com/signature/?signature=nsPresContext%3A%3AGetRootPresContext&date=%3E%3D2016-09-01T00%3A00%3A00.000Z&date=%3C2016-10-05T00%3A00%3A00.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_sort=-date&page=1#summary

For example... All I did was click on a date in the signature graph (https://addons.mozilla.org/en-US/firefox/addon/bugzilla-socorro-lens/?src=api) and then expand the date range

Flags: needinfo?(rjesup)

Ryan VanderMeulen [:RyanVM][PTO June 24-28]

Updated

•

8 years ago

status-firefox49: fix-optional → wontfix

status-firefox50: affected → wontfix

status-firefox52: --- → affected

status-firefox53: --- → affected

Andrew McCreight [:mccr8]

Comment 24

•

8 years ago

I see a lot of crashes in 50, and a decent number in 51, but only a few in 52.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 25

•

8 years ago

I wonder if we have several issues around here and if bug 1308459 helped in some case, but not in all.

Randell Jesup [:jesup] (needinfo me)

Comment 26

•

8 years ago

We get a lot less usage in Aurora than beta and release, so lots in 50, fair number in 51, a few in 52 is  a pretty typical signature, no?

Mats Palmgren (inactive)

Updated

•

8 years ago

Crash Signature: [@ nsPresContext::GetRootPresContext] [@ nsPresContext::GetRootPresContext()] → [@ nsPresContext::GetRootPresContext] [@ nsPresContext::GetRootPresContext()] [@ nsPresContext::GetParentPresContext]

Summary: crash in nsPresContext::GetRootPresContext (from nsRefreshDriver::IsWaitingForPaint()) → crash in nsPresContext::GetRootPresContext or nsPresContext::GetParentPresContext (from nsRefreshDriver::IsWaitingForPaint())

Stephanie Ouillon [:arroway] (needinfo me)

Comment 28

•

8 years ago

The number or crashes for https://crash-stats.mozilla.com/signature/?signature=nsPresContext%3A%3AGetParentPresContext is low, but the number of crashes for https://crash-stats.mozilla.com/signature/?signature=nsPresContext%3A%3AGetRootPresContext is increasing slightly.

Jet, could somebody be assigned to this bug to investigate this as per comment 19?

Flags: needinfo?(bugs)

Jonathan Watt [:jwatt]

Comment 29

•

8 years ago

I just spent a bunch of time going through the latest crash-stats URLs and comments associated with the nsPresContext::GetRootPresContext crashes, hoping to find a way to reproduce. No luck. I did notice that there's an unusually high number of PDF related URLs. (By no means all - there are crashes of the Google homepage and Facebook homepage, for example.) There also seemed to be quite a high ratio of mentions of printing. Loading various URLs, navigating during/after load, navigating during/after printing to file, no crashes. :/

I'd also note that memory pressure does not seem to be an issue.

Jonathan Watt [:jwatt]

Comment 30

•

8 years ago

(In reply to Mats Palmgren (:mats) from comment #3)
> In the long term I think we should simplify the nsPresContext hierarchy by
> simply
> adding a mParent pointer and NOT going through the viewmanager / view tree

The stacks I'm seeing are stacks like:

https://crash-stats.mozilla.com/report/index/9f871728-4316-4009-a0f6-cfbde2161001

Those looks as likely to be caused by nsRefreshDriver's mPresContext raw pointer pointing at invalid memory as they are to be caused by the pc->GetParentPresContext() call in nsPresContext::GetRootPresContext() returning something invalid. Was there a reason to suspect the latter?

Flags: needinfo?(mats)

Mats Palmgren (inactive)

Comment 31

•

8 years ago

nsRefreshDriver::mPresContext is a mozilla::WeakPtr<nsPresContext>, not a raw pointer,
so when the nsPresContext is deleted this member should become null.  The only potential
problem I see with that is that mozilla::WeakPtr isn't thread safe:
http://searchfox.org/mozilla-central/source/mfbt/WeakPtr.h#17
but, as far as I know, destruction of nsPresContext and nsRefreshDriver::Tick are
only supposed to happen on the main thread, so that shouldn't be a problem.

Flags: needinfo?(mats)

Timothy Nikkel (:tnikkel)

Comment 32

•

8 years ago

(In reply to Jonathan Watt [:jwatt] from comment #30)
> (In reply to Mats Palmgren (:mats) from comment #3)
> > In the long term I think we should simplify the nsPresContext hierarchy by
> > simply
> > adding a mParent pointer and NOT going through the viewmanager / view tree
> 
> The stacks I'm seeing are stacks like:
> 
> https://crash-stats.mozilla.com/report/index/9f871728-4316-4009-a0f6-
> cfbde2161001

In addition to what Mats said, the crashing line in that report is inside a for loop, so we may have successfully iterated to a parent prescontext 1 or more times before encountering the crash, which would implicate one of the pointers we follow in GetParentPresContext being invalid.

Jonathan Watt [:jwatt]

Comment 33

•

7 years ago

(In reply to Mats Palmgren (:mats) from comment #31)
> nsRefreshDriver::mPresContext is a mozilla::WeakPtr<nsPresContext>, not a
> raw pointer

Ah, it looks like that's a somewhat recent change. If you follow the nsRefreshDriver.cpp link from the crash report I gave and then switch to the .h file (with the same rev), the pointer is a raw pointer.

Timothy Nikkel (:tnikkel)

Comment 34

•

7 years ago

(In reply to Jonathan Watt [:jwatt] from comment #33)
> (In reply to Mats Palmgren (:mats) from comment #31)
> > nsRefreshDriver::mPresContext is a mozilla::WeakPtr<nsPresContext>, not a
> > raw pointer
> 
> Ah, it looks like that's a somewhat recent change. If you follow the
> nsRefreshDriver.cpp link from the crash report I gave and then switch to the
> .h file (with the same rev), the pointer is a raw pointer.

The weak ptr change in 52, which is on beta right now, and we see plenty of crashes there, so it didn't fix it.

Randell Jesup [:jesup] (needinfo me)

Comment 35

•

7 years ago

time is very short for beta/52.  Any chance of wallpaper, if not a full fix?

Flags: needinfo?(jwatt)

Jonathan Watt [:jwatt]

Comment 36

•

7 years ago

If I understand correctly, nobody has a solution to this right now that doesn't involve major refactoring.

Flags: needinfo?(jwatt)

Frederik Braun [:freddy]

Updated

•

7 years ago

Flags: needinfo?(tnikkel)

Flags: needinfo?(bugs)

Frederik Braun [:freddy]

Comment 37

•

7 years ago

(In reply to Jonathan Watt [:jwatt] from comment #36)
> If I understand correctly, nobody has a solution to this right now that
> doesn't involve major refactoring.

Sorry to bug you once again, Jet.
If this is truly a sec-critical (and I'm happy to be proven otherwise), we must consider the refactoring as an option.
Do you think you could discuss the required work in your team and come up with a rough time frame?

Flags: needinfo?(bugs)

Timothy Nikkel (:tnikkel)

Comment 38

•

7 years ago

Please don't cancel my needinfos (which I set and was asking myself for info).

Flags: needinfo?(tnikkel)

Jet Villegas (inactive)

Reporter

Comment 39

•

7 years ago

(In reply to Frederik Braun [:freddyb] from comment #37)
> (In reply to Jonathan Watt [:jwatt] from comment #36)
> > If I understand correctly, nobody has a solution to this right now that
> > doesn't involve major refactoring.
> 
> Sorry to bug you once again, Jet.
> If this is truly a sec-critical (and I'm happy to be proven otherwise), we
> must consider the refactoring as an option.
> Do you think you could discuss the required work in your team and come up
> with a rough time frame?

Our biggest issue continues to be a lack of reproducible steps. GetRootPresContext() is a *very* hot code path, and any changes here (refactoring, etc.) come with significant risk. We're willing to take on that risk if there's a reasonable chance that we'll actually fix this bug. That gets us back to reproducible steps :(

re: the sec-critical designation, this one should stay on the backlog since the crash signature is a potential UAF, but we can't reproduce. I think bug 984690 is still how we'd refactor this code and we may still do that independent of this bug. I would still love to get repro steps to be sure we actually fixed it.

Julien Cristau [:jcristau]

Comment 40

•

7 years ago

Mass wontfix for bugs affecting firefox 52.

status-firefox52: affected → wontfix

Randell Jesup [:jesup] (needinfo me)

Comment 41

•

7 years ago

Still happening at volume; 1200+ crashes in the last week.

status-firefox51: affected → wontfix

status-firefox54: --- → affected

status-firefox55: --- → affected

status-firefox-esr45: --- → affected

status-firefox-esr52: --- → affected

Frederik Braun [:freddy]

Comment 42

•

7 years ago

Would it be useful to get some URLs that have caused the crash? The volume seems quite high indeed.

Flags: needinfo?(rjesup)

Randell Jesup [:jesup] (needinfo me)

Updated

•

7 years ago

Flags: needinfo?(rjesup)

Randell Jesup [:jesup] (needinfo me)

Comment 43

•

7 years ago

The (few) comments still seem to involve printing often

Timothy Nikkel (:tnikkel)

Comment 44

•

7 years ago

(In reply to Randell Jesup [:jesup] from comment #43)
> The (few) comments still seem to involve printing often

Then bug 1354443 might be quite relevant.

Timothy Nikkel (:tnikkel)

Comment 45

•

7 years ago

I'll mark bug 1354443 as a dependency since it shouldn't be too hard to fix bug 1354443 and then we can see how many of these crashes remain after fixing it.

Depends on: 1354443

Daniel Veditz [:dveditz]

Updated

•

7 years ago

Keywords: testcase-wanted

Frederik Braun [:freddy]

Comment 46

•

7 years ago

Looks like its dependent (bug 1354443) just got resolved. Does that turn this issue into something more actionable, Timothy?

Jonathan Watt [:jwatt]

Comment 47

•

7 years ago

Unfortunately it doesn't look like the fixes for bug 1354443 helped. There are lots of crashes still being reported for 55.0b13. (Fixing that bug also doesn't make this bug any more actionable.)

Randell Jesup [:jesup] (needinfo me)

Comment 48

•

7 years ago

[Tracking Requested - why for this release]:
Because this is a perennial top-UAF crasher that hasn't gotten traction.

Poking this bug -- this is the #1 UAF crash (by a bunch) on 57/58:
https://crash-stats.mozilla.com/signature/?product=Firefox&version=58.0a1&version=57.0b&version=57.0a1&address=~e5e5&signature=nsPresContext%3A%3AGetRootPresContext&date=%3E%3D2017-10-02T14%3A00%3A00.000Z&date=%3C2017-10-09T14%3A00%3A00.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_sort=-date&page=1#reports
>250 crashes just in the last week on 57 and 58

only 4 of 266 crashes *don't* have IsWaitingForPaint() as the calling function:

https://crash-stats.mozilla.com/signature/?product=Firefox&version=58.0a1&version=57.0b&version=57.0a1&address=~e5e5&proto_signature=%21~IsWaitingForPaint&signature=nsPresContext%3A%3AGetRootPresContext&date=%3E%3D2017-10-02T14%3A00%3A00.000Z&date=%3C2017-10-09T14%3A00%3A00.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_sort=-date&page=1

Those are called from APZCCallbackHelper::ApplyCallbackTransform() (3) and nsCSSFrameConstructor::BeginUpdate() (1)

Just an observation: like most frequent crashes, this has a weekend dip.  However, the dip seems much larger than most.  Might be a (small) clue.

status-firefox56: --- → affected

status-firefox57: --- → affected

status-firefox58: --- → affected

tracking-firefox58: --- → ?

Flags: needinfo?(milan)

Milan Sreckovic [:milan] (needinfo for best results)

Comment 49

•

7 years ago

So, a pres context that the refresh driver has as a weak pointer to, is going away while the refresh driver still tries to use it?  Timothy, anything we can do here to get more information or to wallpaper it?  The volume is not huge, so if we can get rid of UAF and replace it with a release assert or some such, it would be an improvement.

Flags: needinfo?(tnikkel)

Flags: needinfo?(milan)

Milan Sreckovic [:milan] (needinfo for best results)

Updated

•

7 years ago

Flags: needinfo?(tnikkel)

Milan Sreckovic [:milan] (needinfo for best results)

Updated

•

7 years ago

Updated

•

7 years ago

Comment 50

•

7 years ago

So this would say that the #1 and #3 UAFs in 57/58 both probably have the same root cause...

Matt Woodrow (:mattwoodrow)

Assignee

Comment 51

•

7 years ago

Attached image Struct offsets — Details

So, I think this is because nsIPresShell::mViewManager is dangling somewhere.

The crash address is 0xE5E5E5ED, while we're accessing [eax+8], and eax is 0xE5E5E5E5.

eax looks to be 'view' here, so that suggests that 'viewManager' is a valid address, but is deleted, so looking up mRootView (0x14) returned 0xE5E5E5E5.

That would suggest that 'shell' is valid, and points to valid memory, but mViewManager (0x14) is dangling.

Matt Woodrow (:mattwoodrow)

Assignee

Comment 52

•

7 years ago

Some follow-on thoughts:

nsViewManager::~nsViewManager - Should we runtime assert that mPresShell is null, rather than just setting it to nullptr. If it's not nullptr, the presshell will have a dangling pointer to us.

nsIPresShell::mViewManager - comment says docViewer owns it, but we don’t always have a doc viewer (nsPrintEngine!)

nsPrintEngine::ReflowPrintObject - make !mPresContext assertion a runtime one. If we’ve called this twice without destroying the presentation, then we’ll drop the reference to the nsViewManager without calling Destroy on the shell. Possible that the shell then outlives the viewmanager. Should we call DestroyPresentation just to be safe?

nsDocumentViewer::InitInternal calls MakeWindow, which writes to mViewManager. Again, if we already had one (and a pres shell), then this can destroy the old view manager without Destroy being called on the old pres shell). Could this be called multiple times? Looks like Init is exposed to script?

Tim, Mats, does that all seem sane?

Still not sure what the actual problem is, but if that is all correct, some runtime asserts could probably catch it earlier (and in a non-UAF way).

Flags: needinfo?(mats)

Timothy Nikkel (:tnikkel)

Comment 53

•

7 years ago

(In reply to Matt Woodrow (:mattwoodrow) from comment #52)
> nsIPresShell::mViewManager - comment says docViewer owns it, but we don’t
> always have a doc viewer (nsPrintEngine!)

The nsPrintObject would own it then, it looks like.

> nsDocumentViewer::InitInternal calls MakeWindow, which writes to
> mViewManager. Again, if we already had one (and a pres shell), then this can
> destroy the old view manager without Destroy being called on the old pres
> shell). Could this be called multiple times? Looks like Init is exposed to
> script?

I don't think Init is callable by script, nsIContentViewer::Init has [noscript] on it.

The other ones sound reasonable.

My only guess is one of the nsDocumentViewer or relevant functions in our printing code can re-enter and mess things up. We've had that problem before, and fixed the ones we found, but no guarantee we know about them all. I had some hacky patches to detect this that I resurrected. I'll see if they can be made landable.

Matt Woodrow (:mattwoodrow)

Assignee

Comment 54

•

7 years ago

(In reply to Timothy Nikkel (:tnikkel) from comment #53)
> (In reply to Matt Woodrow (:mattwoodrow) from comment #52)
> > nsIPresShell::mViewManager - comment says docViewer owns it, but we don’t
> > always have a doc viewer (nsPrintEngine!)
> 
> The nsPrintObject would own it then, it looks like.

Indeed, just a confusing comments for something with fairly complex ownership requirements.

> 
> > nsDocumentViewer::InitInternal calls MakeWindow, which writes to
> > mViewManager. Again, if we already had one (and a pres shell), then this can
> > destroy the old view manager without Destroy being called on the old pres
> > shell). Could this be called multiple times? Looks like Init is exposed to
> > script?
> 
> I don't think Init is callable by script, nsIContentViewer::Init has
> [noscript] on it.

Ah right, good catch. There's still quite a few c++ callers to Init, I didn't audit them all to see if re-entry was possible.

Matt Woodrow (:mattwoodrow)

Assignee

Comment 55

•

7 years ago

Attached patch viewmanager-assert — Details — Splinter Review

I think this should be sufficient (if I have the cause correct) to prevent UAF, and it'll give us a stack to the actual bug.

Attachment #8917584 - Flags: review?(tnikkel)

Timothy Nikkel (:tnikkel)

Updated

•

7 years ago

Attachment #8917584 - Flags: review?(tnikkel) → review+

Matt Woodrow (:mattwoodrow)

Assignee

Comment 56

•

7 years ago

Comment on attachment 8917584 [details] [diff] [review]
viewmanager-assert

[Security approval request comment]
How easily could an exploit be constructed based on the patch?

Not easily. It'll probably make it easier to roughly figure out what triggers the bug (destroying the view manager before the pres shell), but you'd still have to audit the code to figure out how to make that happen (which we've failed to do thus far).

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?

No.

Which older supported branches are affected by this flaw?

All.

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?

No, but they should be identical.

How likely is this patch to cause regressions; how much testing does it need?

Fairly unlikely, just adding a new assertion.

Attachment #8917584 - Flags: sec-approval?

Al Billings [:abillings - ex-MoCo]

Comment 57

•

7 years ago

Comment on attachment 8917584 [details] [diff] [review]
viewmanager-assert

sec-approval+ for trunk.
We'll want a patch nominated for Beta and ESR52 as well, to hopefully land after this is on trunk.

Attachment #8917584 - Flags: sec-approval? → sec-approval+

Al Billings [:abillings - ex-MoCo]

Updated

•

7 years ago

tracking-firefox57: --- → +

tracking-firefox58: ? → +

tracking-firefox-esr52: --- → 57+

Ryan VanderMeulen [:RyanVM][PTO June 24-28]

Comment 58

•

7 years ago

https://hg.mozilla.org/mozilla-central/rev/8281ed36bd49

I'm assuming leave-open was implied since the patch seems diagnostic in nature?

status-firefox53: affected → wontfix

status-firefox54: affected → wontfix

status-firefox55: affected → wontfix

status-firefox56: affected → wontfix

status-firefox-esr45: affected → wontfix

Daniel Holbert [:dholbert]

Comment 59

•

7 years ago

It's more than a diagnostic - it makes us preemptively crash, safely, to avoid the UAF (per comment 55).

Perhaps we should keep this bug focused on the UAF, for saner semantics of tracking flags as we uplift it, and file a followup on investigating & fixing the (anticipated) MOZ_RELEASE_ASSERT crashes?   mattwoodrow, what do you think?  If you agree, go ahead and close this as FIXED, I think (since the patch has been merged to central).

Flags: needinfo?(matt.woodrow)

Mats Palmgren (inactive)

Comment 60

•

7 years ago

The added assertion seems correct to me.

Flags: needinfo?(mats)

Matt Woodrow (:mattwoodrow)

Assignee

Updated

•

7 years ago

Status: NEW → RESOLVED

Closed: 7 years ago

Flags: needinfo?(matt.woodrow)

Resolution: --- → FIXED

Matt Woodrow (:mattwoodrow)

Assignee

Comment 61

•

7 years ago

Comment on attachment 8917584 [details] [diff] [review]
viewmanager-assert

Approval Request Comment
[Feature/Bug causing the regression]: Unknown, long standing sec bug.
[User impact if declined]: Potentially exploitable UAF bug.
[Is this code covered by automated tests?]: No, we have no reproducible test cases yet.
[Has the fix been verified in Nightly?]: No, again, no known way to reproduce.
[Needs manual test from QE? If yes, steps to reproduce]: No.
[List of other uplifts needed for the feature/fix]: None.
[Is the change risky?]: No.
[Why is the change risky/not risky?]: Just adds a release assert to verify our ownership requirements, should turn the UAF crash into a safe assertion crash (along with the stack that will help us fix the underlying problem).
[String changes made/needed]: None.

Attachment #8917584 - Flags: approval-mozilla-esr52?

Attachment #8917584 - Flags: approval-mozilla-beta?

Sylvestre Ledru [:Sylvestre]

Updated

•

7 years ago

Assignee: nobody → matt.woodrow

Julien Cristau [:jcristau]

Updated

•

7 years ago

status-firefox58: affected → fixed

Target Milestone: --- → mozilla58

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Comment 62

•

7 years ago

Comment on attachment 8917584 [details] [diff] [review]
viewmanager-assert

Sec-high, Beta57+, ESR52.5+

Attachment #8917584 - Flags: approval-mozilla-esr52?

Attachment #8917584 - Flags: approval-mozilla-esr52+

Attachment #8917584 - Flags: approval-mozilla-beta?

Attachment #8917584 - Flags: approval-mozilla-beta+

Ryan VanderMeulen [:RyanVM][PTO June 24-28]

Comment 63

•

7 years ago

uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/04f51b690e8c

status-firefox57: affected → fixed

Ryan VanderMeulen [:RyanVM][PTO June 24-28]

Comment 64

•

7 years ago

uplift

https://hg.mozilla.org/releases/mozilla-esr52/rev/77a2d4610275

status-firefox-esr52: affected → fixed

Daniel Veditz [:dveditz]

Updated

•

7 years ago

Group: layout-core-security → core-security-release

Mats Palmgren (inactive)

Comment 65

•

7 years ago

FYI, there are already quite a few crashes (77) on trunk hitting this assert, e.g.
bp-e757c47e-ee7c-421e-8d3a-b6bc90171018

Should we open a new bug to handle that or reopen this one?

Flags: needinfo?(matt.woodrow)

Matt Woodrow (:mattwoodrow)

Assignee

Comment 66

•

7 years ago

http://searchfox.org/mozilla-central/source/layout/base/nsDocumentViewer.cpp#1636

That code is wrong isn't it?

We call Destroy() from the nsDocumentViewer dtor, but if we take that early return then we don't call DestroyPresShell().

Flags: needinfo?(matt.woodrow)

[:philipp]

Comment 67

•

7 years ago

the [@ nsViewManager::~nsViewManager] signature is the #6 top content crash in 57.0b10

Timothy Nikkel (:tnikkel)

Comment 68

•

7 years ago

(In reply to Matt Woodrow (:mattwoodrow) from comment #66)
> http://searchfox.org/mozilla-central/source/layout/base/nsDocumentViewer.
> cpp#1636
> 
> That code is wrong isn't it?
> 
> We call Destroy() from the nsDocumentViewer dtor, but if we take that early
> return then we don't call DestroyPresShell().

Yeah, pretty much, same with the next early return after that.

David Bolter [:davidb] (NeedInfo me for attention)

Updated

•

7 years ago

Crash Signature: [@ nsPresContext::GetRootPresContext] [@ nsPresContext::GetRootPresContext()] [@ nsPresContext::GetParentPresContext] → [@ nsPresContext::GetRootPresContext] [@ nsPresContext::GetRootPresContext()] [@ nsPresContext::GetParentPresContext] [@ nsViewManager::~nsViewManager]

Matt Woodrow (:mattwoodrow)

Assignee

Comment 69

•

7 years ago

Attached patch printing-fix (obsolete) — Details — Splinter Review

I still haven't found a way to reproduce this, so somewhat guessing here.

I *think* the destroy refcount early return is ok, since it looks like we manually call Destroy in this cases where we increment it (as well as calling Destroy from the dtor). Added a release assert to catch the case where we hit the dtor with it still set though.

Attachment #8923998 - Flags: review?(tnikkel)

Matt Woodrow (:mattwoodrow)

Assignee

Updated

•

7 years ago

Attachment #8923998 - Flags: review?(jwatt)

Al Billings [:abillings - ex-MoCo]

Updated

•

7 years ago

Whiteboard: [adv-main57+][adv-esr52.5+]

Jonathan Watt [:jwatt]

Comment 70

•

7 years ago

Comment on attachment 8923998 [details] [diff] [review]
printing-fix

Review of attachment 8923998 [details] [diff] [review]:
-----------------------------------------------------------------

I'd like Bob to take a look at this too.

Attachment #8923998 - Flags: review?(jwatt)

Attachment #8923998 - Flags: review?(bobowencode)

Attachment #8923998 - Flags: review+

Bob Owen (:bobowen)

Comment 71

•

7 years ago

Comment on attachment 8923998 [details] [diff] [review]
printing-fix

Review of attachment 8923998 [details] [diff] [review]:
-----------------------------------------------------------------

My main concern here is why is that check there in the first place.
Obviously the early return is wrong when called from ~nsDocumentViewer, but what about other callers of this?

Clearly the setting of mPrt->mDocWasToBeDestroyed isn't doing anything either way.

I also don't quite understand how mPrintEngine isn't null when this is called from ~nsDocumentViewer, as it looks like it should always have a strong reference to it.

Matt Woodrow (:mattwoodrow)

Assignee

Comment 72

•

7 years ago

I don't have good answers to these questions unfortunately.

I'm not even entirely sure that this is the bug. All we know is that we're calling the view manager destructor while we still have a pointer to a pres shell (which should mean it has a pointer back, which is now dangling).

That would suggest that that we missed calling Destroy() on the pres shell, and this patch fixes the only way I can see that it might be possible.

I can't reproduce this crash at all, and really don't know this code, so it's a bit of a guess.

Not running Destroy() when called from the dtor seems clearly like a problem, and the destroy ref count should already prevent it running if Destroy is called manually.

I'm not sure what CheckBeforeDestroy was trying to do by keeping things alive (without holding a reference).

Bob Owen (:bobowen)

Comment 73

•

7 years ago

CheckBeforeDestroy was there before nsPrintEngine::mDocViewerPrint became a strong reference.
Which I guess was not strong originally because of circular dependencies.
Also having a strong reference doesn't stop someone else from calling nsDocumentViewer::Destroy.

The mDestroyRefCount only seems to cover part of the print process (and looks pretty unreliable as someone else can decrement it via nsDocumentViewer::Destroy who hasn't incremented it, but that's a different issue).

Perhaps instead in ~nsDocumentViewer, if mPrintEngine is not null, we should call Destroy on it and null it out, before calling our own Destroy?
That would rule out the issue when called from ~nsDocumentViewer, but leave the existing behaviour when called from somewhere else.

Bob Owen (:bobowen)

Updated

•

7 years ago

Flags: needinfo?(matt.woodrow)

Jet Villegas (inactive)

Reporter

Comment 74

•

7 years ago

Reopening for now as we expect to take more patches here.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Matt Woodrow (:mattwoodrow)

Assignee

Comment 75

•

7 years ago

Attached patch printing-fix-v2 — Details — Splinter Review

Flags: needinfo?(matt.woodrow)

Attachment #8927985 - Flags: review?(bobowencode)

Bob Owen (:bobowen)

Updated

•

7 years ago

Attachment #8927985 - Flags: review?(bobowencode) → review+

Matt Woodrow (:mattwoodrow)

Assignee

Comment 76

•

7 years ago

Comment on attachment 8927985 [details] [diff] [review]
printing-fix-v2

[Security approval request comment]
How easily could an exploit be constructed based on the patch?

Not easily, we still haven't figured out how to reproduce this.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?

Not particularly, same as above.

Which older supported branches are affected by this flaw?

All.

If not all supported branches, which bug introduced the flaw?

Unknown, very old bug though.

Do you have backports for the affected branches? If not, how different, hard to create, and risky will they be?

Not yet, but it's old code so should be the same for all branches.

How likely is this patch to cause regressions; how much testing does it need?

Fairly low, but we don't have automated testing for this code, so it would be preferable to ride the trains.

Attachment #8927985 - Flags: sec-approval?

Al Billings [:abillings - ex-MoCo]

Comment 77

•

7 years ago

Sec-approval+ for checkin on November 28, two weeks into the new cycle. Please do not check in before that date.

We'll want this on other branches so branch patches should be made and nominated at that time as well.

Whiteboard: [adv-main57+][adv-esr52.5+] → [adv-main57+][adv-esr52.5+][checkin on 11/28]

Al Billings [:abillings - ex-MoCo]

Updated

•

7 years ago

Attachment #8927985 - Flags: sec-approval? → sec-approval+

Bob Owen (:bobowen)

Updated

•

7 years ago

Attachment #8923998 - Attachment is obsolete: true

Attachment #8923998 - Flags: review?(tnikkel)

Attachment #8923998 - Flags: review?(bobowencode)

Ryan VanderMeulen [:RyanVM][PTO June 24-28]

Comment 78

•

7 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/1f2a0a7ae4b37bd0073c6f7e0a63fe9b71a498de

I'm resetting the Fx58 status flag under the assumption that we'll eventually want to uplift this to Beta as well. We'll probably want to consider doing the same for ESR52 also.

status-firefox58: fixed → affected

status-firefox59: --- → affected

tracking-firefox59: --- → ?

Whiteboard: [adv-main57+][adv-esr52.5+][checkin on 11/28] → [adv-main57+][adv-esr52.5+]

Target Milestone: mozilla58 → ---

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 79

•

7 years ago

https://hg.mozilla.org/mozilla-central/rev/1f2a0a7ae4b3

Status: REOPENED → RESOLVED

Closed: 7 years ago → 7 years ago

status-firefox59: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla59

Julien Cristau [:jcristau]

Updated

•

7 years ago

tracking-firefox59: ? → +

Jet Villegas (inactive)

Reporter

Updated

•

7 years ago

Flags: needinfo?(bugs)

Bob Owen (:bobowen)

Comment 80

•

7 years ago

We've hit that release assert on mDestroyRefCount twice since this landed.
Given that nsPagePrintTimer holds a strong reference, the only thing I can think of is that somehow you can get back into printing before the old nsPagePrintTimer has died and you hit the mPrintEngine->CheckBeforeDestroy() as it dies and mDestroyRefCount never gets decremented.

Other than try to change entirely how this all works, maybe we should just move the decrement to another function (like the increment), which ~nsPagePrintTimer calls instead of Destroy.
This also gets rid of the potential problem of other things calling Destroy and decrementing mDestroyRefCount.

jwatt - what do you think?

Flags: needinfo?(tnikkel) → needinfo?(jwatt)

Jonathan Watt [:jwatt]

Comment 81

•

7 years ago

I'd actually like to do a much bigger rewrite, but for now that sounds good to me.

Flags: needinfo?(jwatt) → needinfo?(bobowencode)

Bob Owen (:bobowen)

Updated

•

7 years ago

Depends on: 1426087

Bob Owen (:bobowen)

Updated

•

7 years ago

Flags: needinfo?(bobowencode)

Ryan VanderMeulen [:RyanVM][PTO June 24-28]

Comment 82

•

7 years ago

Should we uplift the last patch to Beta for Fx58 still?

Flags: needinfo?(matt.woodrow)

Matt Woodrow (:mattwoodrow)

Assignee

Comment 83

•

7 years ago

I'm not sure if Bob wanted to land another fix here first or not.

Flags: needinfo?(matt.woodrow) → needinfo?(bobowencode)

Bob Owen (:bobowen)

Comment 84

•

7 years ago

(In reply to Matt Woodrow (:mattwoodrow) from comment #83)
> I'm not sure if Bob wanted to land another fix here first or not.

I filed bug 1426087 for that, I'm not sure if uplifting highlights the issue here any more, but even if we just did this bug at least it makes the crash safe.

Flags: needinfo?(bobowencode)

Ryan VanderMeulen [:RyanVM][PTO June 24-28]

Comment 85

•

7 years ago

Sounds like we can move forward with the uplift request then.

Flags: needinfo?(matt.woodrow)

Jet Villegas (inactive)

Reporter

Updated

•

7 years ago

Blocks: 1353228

Matt Woodrow (:mattwoodrow)

Assignee

Comment 86

•

7 years ago

Comment on attachment 8927985 [details] [diff] [review]
printing-fix-v2

Approval Request Comment
[Feature/Bug causing the regression]: Unknown, very old bug.
[User impact if declined]: Potential UAF with printing.
[Is this code covered by automated tests?]: No.
[Has the fix been verified in Nightly?]: No.
[Needs manual test from QE? If yes, steps to reproduce]: No known STR, so no.
[List of other uplifts needed for the feature/fix]: None.
[Is the change risky?]: No.
[Why is the change risky/not risky?]: Very low risk.
[String changes made/needed]: None.

Flags: needinfo?(matt.woodrow)

Attachment #8927985 - Flags: approval-mozilla-beta?

Gerry Chang [:gchang]

Comment 87

•

7 years ago

Comment on attachment 8927985 [details] [diff] [review]
printing-fix-v2

Sec-high fix. Beta58+.

Attachment #8927985 - Flags: approval-mozilla-release+

Attachment #8927985 - Flags: approval-mozilla-beta?

Attachment #8927985 - Flags: approval-mozilla-beta+

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 88

•

7 years ago

uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/d9c057e41607 (FIREFOX_58b_RELBRANCH)
https://hg.mozilla.org/releases/mozilla-release/rev/75912c7725910ed540e53348c4620add73f36000

status-firefox58: affected → fixed

[:philipp]

Comment 89

•

7 years ago

i've filed bug 1431009 for a new crash signature with the MOZ_RELEASE_ASSERT added here.

Ryan VanderMeulen [:RyanVM][PTO June 24-28]

Comment 90

•

7 years ago

Comment on attachment 8927985 [details] [diff] [review]
printing-fix-v2

This patch appears to be working well on 58+. Let's close the loop and get it on ESR52 as well. See comment 86 for details.

Attachment #8927985 - Flags: approval-mozilla-esr52?

Ryan VanderMeulen [:RyanVM][PTO June 24-28]

Updated

•

7 years ago

status-firefox-esr52: fixed → affected

Liz Henry (:lizzard) (relman/hg->git project)

Comment 91

•

7 years ago

Comment on attachment 8927985 [details] [diff] [review]
printing-fix-v2

Looks like we landed a diagnostic patch for this crash in the ESR that went out with the 57 release  (52.5.0) and now we think this will fix the issue. Let's uplift for 52.7.0esr.

Attachment #8927985 - Flags: approval-mozilla-esr52? → approval-mozilla-esr52+

Liz Henry (:lizzard) (relman/hg->git project)

Updated

•

7 years ago

tracking-firefox-esr52: 57+ → 59+

Liz Henry (:lizzard) (relman/hg->git project)

Comment 92

•

7 years ago

Al, just want to make sure you see this in case it needs mentioning in security advisories.

Flags: needinfo?(abillings)

Al Billings [:abillings - ex-MoCo]

Comment 93

•

7 years ago

I hate it when we fix a bug multiple times in the same bug. All the flags say this had advisories back in 57. 

Then we fixed it (again, sort of) in 58 but it got no advisory there because it was already marked as fixed in 57 so none of the queries brought it up (because they check whether the previous version was affected). 

Luckily, this is an internal report. Ideally, we would have fixed it on mainline Firefox (58) and ESR at the *same* time...

status-firefox57: fixed → wontfix

tracking-firefox57: + → ---

Flags: needinfo?(abillings)

Whiteboard: [adv-main57+][adv-esr52.5+]

Liz Henry (:lizzard) (relman/hg->git project)

Comment 94

•

7 years ago

Yes, that's why I just pinged you for this, because it didn't seem like the flags were going to show this to you for 59. We should definitely open new bugs for security sensitive issues rather than re-open and land more patches in the same bug. I will mention that in the channel meetings and relman's team meeting as something to watch out for.

Ryan VanderMeulen [:RyanVM][PTO June 24-28]

Comment 95

•

7 years ago

uplift

https://hg.mozilla.org/releases/mozilla-esr52/rev/f0ec180993d2

status-firefox-esr52: affected → fixed

Al Billings [:abillings - ex-MoCo]

Updated

•

6 years ago

Whiteboard: [adv-esr52.7+][adv-main58+]

Daniel Veditz [:dveditz]

Updated

•

6 years ago

Group: core-security-release

Struct offsets 7 years ago Matt Woodrow (:mattwoodrow) 279.47 KB, image/png		Details
viewmanager-assert 7 years ago Matt Woodrow (:mattwoodrow) 1011 bytes, patch	tnikkel : review+ ritu : approval-mozilla-beta+ ritu : approval-mozilla-esr52+ abillings : sec-approval+	Details \| Diff \| Splinter Review
printing-fix 7 years ago Matt Woodrow (:mattwoodrow) 8.57 KB, patch	jwatt : review+	Details \| Diff \| Splinter Review
printing-fix-v2 7 years ago Matt Woodrow (:mattwoodrow) 1.00 KB, patch	bobowen : review+ gchang : approval-mozilla-beta+ gchang : approval-mozilla-release+ lizzard : approval-mozilla-esr52+ abillings : sec-approval+	Details \| Diff \| Splinter Review