1328129 - [e10s] Crash in nsLayoutUtils::GetDisplayRootFrame

Reporter

Description

•

7 years ago

This bug was filed from the Socorro interface and is 
report bp-6705fc6e-1fad-43b4-8acd-919662161227.
=============================================================
Crashing Thread (0)
Frame 	Module 	Signature 	Source
0 	xul.dll 	nsLayoutUtils::GetReferenceFrame(nsIFrame*) 	layout/base/nsLayoutUtils.cpp:7149
1 	xul.dll 	mozilla::ScrollFrameHelper::GetScrolledRect() 	layout/generic/nsGfxScrollFrame.cpp:5764
2 	xul.dll 	mozilla::ScrollFrameHelper::GetScrollRange(int, int) 	layout/generic/nsGfxScrollFrame.cpp:3841
3 	xul.dll 	mozilla::ScrollFrameHelper::WantAsyncScroll() 	layout/generic/nsGfxScrollFrame.cpp:1312
4 	xul.dll 	nsLayoutUtils::GetNearestScrollableFrame(nsIFrame*, unsigned int) 	layout/base/nsLayoutUtils.cpp:2089
5 	xul.dll 	nsLayoutUtils::ExpireDisplayPortOnAsyncScrollableAncestor(nsIFrame*) 	layout/base/nsLayoutUtils.cpp:3355
6 	xul.dll 	RemoveDisplayPortCallback(nsITimer*, void*) 	layout/generic/nsGfxScrollFrame.cpp:2537
7 	xul.dll 	nsTimerImpl::Fire() 	xpcom/threads/nsTimerImpl.cpp:521
8 	xul.dll 	nsTimerEvent::Run() 	xpcom/threads/TimerThread.cpp:286
9 	xul.dll 	nsThread::ProcessNextEvent(bool, bool*) 	xpcom/threads/nsThread.cpp:1067
10 	xul.dll 	mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) 	ipc/glue/MessagePump.cpp:96
11 	xul.dll 	mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate*) 	ipc/glue/MessagePump.cpp:301
12 	xul.dll 	_SEH_epilog4

crashes in the content process on all versions of windows are occurring with greater frequency on the beta channel since 51.0b6 - overall it's still rather low volume though. i'm not sure if there is really something in our code that caused the regression then (as there were also a handful of crashes on 51.0a1 & 51.0a2 before that), but here would be the pushlog between beta 4 and beta 6: https://hg.mozilla.org/releases/mozilla-beta/pushloghtml?fromchange=FIREFOX_51_0b4_RELEASE&tochange=FIREFOX_51_0b6_RELEASE

(marking the bug as security sensitive as a precaution, since many reports have a crashing address indicating a uaf situation)

Ryan VanderMeulen [:RyanVM]

Comment 1

•

7 years ago

Dan, I see that Jet is on PTO until next week. Any chance you could get the ball rolling in investigating this (or pass it off to someone else)?

Flags: needinfo?(dholbert)

Daniel Holbert [:dholbert]

Comment 2

•

7 years ago

This is in Async Panning & Zooming code (stacklevels 1-6), with a backtrace going up to a timer that was added back in bug 990916 (mDisplayPortExpiryTimer, which is what calls RemoveDisplayPortCallback).  kats, looks like you added this timer -- could you take a look?

Spitballing, my initial guess is that we're somehow firing this timer after frames have been destroyed -- we're missing a case where we need to clean it up (or we're cleaning it up too late). It looks like right now, we cancel the timer in ~ScrollFrameHelper (which will get invoked when the nsHTMLScrollFrame / nsXULScrollFrame destructors are invoked, since the ScrollFrameHelper is a member-variable).  Maybe that's not soon enough, though -- maybe all of the ~ScrollFrameHelper cleanup should really move to ScrollFrameHelper::Destroy?  Not sure.

(I'm also skeptical that the pushlog linked in comment 0 is related to this crash, given that we had a few instances of this crash before then -- nothing there looks particularly tied to this backtrace, IMO.)

Component: Layout → Panning and Zooming

Flags: needinfo?(dholbert) → needinfo?(bugmail)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 3

•

7 years ago

I agree with what Daniel said in comment 2. Timothy, do you know offhand if there are scenarios where a ScrollFrameHelper's mOuter pointer might be pointing to garbage before the ScrollFrameHelper's destructor is run?

If not, then I'll move the cleanup to ScrollFrameHelper::Destroy and see if that fixes the problem.

Assignee: nobody → bugmail

Flags: needinfo?(bugmail) → needinfo?(tnikkel)

Daniel Veditz [:dveditz]

Updated

•

7 years ago

Group: core-security → layout-core-security

Keywords: csectype-uaf, sec-high

Timothy Nikkel (:tnikkel)

Comment 4

•

7 years ago

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #3)
> I agree with what Daniel said in comment 2. Timothy, do you know offhand if
> there are scenarios where a ScrollFrameHelper's mOuter pointer might be
> pointing to garbage before the ScrollFrameHelper's destructor is run?

Nope.

> If not, then I'll move the cleanup to ScrollFrameHelper::Destroy and see if
> that fixes the problem.

We can try this (and it seems like a good idea anyway) but I'm not convinced the problem is the mOuter pointer. In RemoveDisplayPortCallback we have

  nsLayoutUtils::RemoveDisplayPort(helper->mOuter->GetContent());
  nsLayoutUtils::ExpireDisplayPortOnAsyncScrollableAncestor(helper->mOuter);

We crash inside the ExpireDisplayPortOnAsyncScrollableAncestor call. If mOuter was a bad pointer we should crash on the previous line (I searched crashstats for crashes in RemoveDisplayPort, GetContent, DeleteProperty, all don't exist are too low volume to be this crash). Further, the first thing we do in ExpireDisplayPortOnAsyncScrollableAncestor is call nsLayoutUtils::GetCrossDocParentFrame on the passed in frame. So when we crash later on we are looking at a proper ancestor of the scroll frame that is getting its displayport removed.

If the frame tree inside one document was messed up to cause this I would expect we'd have a lot more crashes. So a guess would be that the frames for a parent document have gone away but that didn't clear the parent view pointer on the root view in the child document. So when we try to get to the parent document we walk a bad pointer.

Flags: needinfo?(tnikkel)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 5

•

7 years ago

Good points. I looked for other crashes with "nsLayoutUtils" in the signature and 0xffffffffe5e5e5ed as the crash address (this showed up as the address for a large portion of the crashes associated with this bug). I found two other matching signatures, nsLayoutUtils::GetDisplayRootFrame (e.g. [1]) and nsLayoutUtils::GetCrossDocParentFrame (e.g. [2]). The former seems to *also* be triggered from the displayport expiry timer. The latter seems to have a few different crash stacks including stuff like imagelib (e.g. [3]).

So from all this I'm assuming the root cause is somewhere else but the APZ code seems to hit it most frequently because it often exercises codepaths that find the scrollable ancestor or walk up the frame tree across documents.

[1] https://crash-stats.mozilla.com/report/index/9fa9d46c-b149-49c7-ad75-29b5f2170105
[2] https://crash-stats.mozilla.com/report/index/e3088b73-db7c-4eef-895f-43f372161222
[3] https://crash-stats.mozilla.com/report/index/7320adac-0e36-40e0-bfcc-b2da82161231

Crash Signature: [@ nsLayoutUtils::GetReferenceFrame] → [@ nsLayoutUtils::GetReferenceFrame] [@ nsLayoutUtils::GetDisplayRootFrame]

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 6

•

7 years ago

Attached patch Move cancellation of mDisplayPortExpiryTimer to ScrollFrameHelper::Destroy() (obsolete) — Details — Splinter Review

Seems unlikely that this will fix the bug but we can land it. Although maybe I should land it in a separate, non-security bug that this one depends on?

Attachment #8824434 - Flags: review?(tnikkel)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 7

•

7 years ago

Also for the record I did a try push with the above patch, https://treeherder.mozilla.org/#/jobs?repo=try&revision=8eda13d57bff997dbde8b51bc3818aea64ab1698

Timothy Nikkel (:tnikkel)

Comment 8

•

7 years ago

Comment on attachment 8824434 [details] [diff] [review]
Move cancellation of mDisplayPortExpiryTimer to ScrollFrameHelper::Destroy()

If we think that we should move the mDisplayPortExpiryTimer cancel from the destructor to destroy, then shouldn't we also move everything else in the destructor? If one is a bad idea, then the others are too.

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #6)
> Seems unlikely that this will fix the bug but we can land it. Although maybe
> I should land it in a separate, non-security bug that this one depends on?

Yeah, seems like a good idea.

Attachment #8824434 - Flags: review?(tnikkel) → review+

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 9

•

7 years ago

I moved the patch to bug 1329663. I didn't link it to this bug in case malicious actors are looking for patches related to security bugs, and also because I'm unconvinced that will even help with this bug.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Updated

•

7 years ago

Attachment #8824434 - Attachment is obsolete: true

Randell Jesup [:jesup] (needinfo me)

Comment 10

•

7 years ago

Kats - can we close this one now?  The bug the patch landed on is marked fixed on 52 and 53.  Does this affect 51, as noted here?  If so, we should mark it fix-optional for 51 (and try to uplift on a point release), make an argument to block the release on this, or simply wontfix for 51.  It is a sec-high.

Flags: needinfo?(bugmail)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 11

•

7 years ago

I'm not convinced that the patch in bug 1329663 will actually fix this. We don't have enough crash-stats data to confirm or refute it right now. I think this bug does affect 51 (because there are crash reports on that version) but I don't think we should block the release on it because we don't really have any idea what's causing this or how to fix it. Any attempt to fix it "for reals" will probably involve a multi-step process where we land some sort of corruption canary to investigate what's going on, and then we can fix it properly. I'm going to mark this fix-optional for 51 for now.

status-firefox51: affected → fix-optional

Flags: needinfo?(bugmail)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 12

•

7 years ago

Thus far there are zero reports on 52 and 53, and considering it's nearing the end of the cycle I'm inclined to think that it doesn't affect those branches. Marking them as unaffected, but we can update that if we start seeing crashes on those branches.

status-firefox52: ? → unaffected

status-firefox53: ? → unaffected

Timothy Nikkel (:tnikkel)

Comment 13

•

7 years ago

It could just be low frequency that it doesn't appear anywhere before beta, or the people on those channels don't do the things that trigger this crash. For similar crashes where it looks like the frame tree is messed up at cross document boundaries they also didn't show up on aurora/nightly (or were very low frequency).

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 14

•

7 years ago

Agreed that it might just be low frequency. However 51 had 4 crashes on nightly and 4 crashes on aurora - so I would expect 52 and 53 to have comparable numbers. Also 51's crash rate went up in b6 - if there was something that was uplifted to 51 in b6 that caused the spike, that means that uplifted patch should already have been in 52 or 53, so I would definitely expect to see crashes there. But yeah, we should wait until 52 has some beta time to see if the crashes show up there.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 15

•

7 years ago

No crashes so far on 52 beta builds.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 16

•

7 years ago

There are still no crashes for the GetReferenceFrame signature on 52 beta, but there are crashes with the GetDisplayRootFrame signature, so morphing this by bug to cover those. The stack still originates in the display port removal callback.

Example crash report on 52b3: https://crash-stats.mozilla.com/report/index/db3bb95f-b96e-4801-b94b-6e6772170204

status-firefox51: fix-optional → wontfix

status-firefox52: unaffected → affected

Summary: [e10s] Crash in nsLayoutUtils::GetReferenceFrame → [e10s] Crash in nsLayoutUtils::GetDisplayRootFrame

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 17

•

7 years ago

I think the AsyncPanZoomEnabled check at [1] is not really needed, we can probably remove it. That might help, since that call is part of the crash stack. Also we can add a guard against frame being null there - there is an assert already but maybe in release builds we can encounter that condition? I'll write a patch on Monday.

[1] http://searchfox.org/mozilla-central/rev/b1aadb3572eaf7d2c70e19a2ba5413809d9ac698/layout/base/nsLayoutUtils.cpp#3399

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 18

•

7 years ago

Attached patch Patch (obsolete) — Details — Splinter Review

Attachment #8834170 - Flags: review?(tnikkel)

Timothy Nikkel (:tnikkel)

Comment 19

•

7 years ago

Comment on attachment 8834170 [details] [diff] [review]
Patch

Looks good.

But is this actually going to fix the bug? Or are we just going to crash later on in processing the RemoveDisplayPortCallback? If we're not sure we should land this in a separate bug to prevent this bug from getting confusing (not resolved but with patches landed).

Attachment #8834170 - Flags: review?(tnikkel) → review+

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 20

•

7 years ago

(In reply to Timothy Nikkel (:tnikkel) from comment #19)
> But is this actually going to fix the bug?

I don't know.

> Or are we just going to crash
> later on in processing the RemoveDisplayPortCallback?

I don't think that should happen. The rest of RemoveDisplayPortCallback (after the call to ExpireDisplayPortOnAsyncScrollableAncestor) doesn't do anything really, just uses helper->mOuter which we've already determined should be non-null.

> If we're not sure we
> should land this in a separate bug to prevent this bug from getting
> confusing (not resolved but with patches landed).

Yeah that's fair. I'll split it into another bug.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 21

•

7 years ago

Comment on attachment 8834170 [details] [diff] [review]
Patch

I moved the patch to bug 1337388.

Attachment #8834170 - Attachment is obsolete: true

Frederik Braun [:freddy]

Comment 22

•

7 years ago

kats, it looks like we hoped your bug 1337388 might fix this issue.
Can you take a look how things worked out?

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 23

•

7 years ago

There hasn't yet been a beta build with the fix. Need to wait more.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 24

•

7 years ago

52.0b6 has the fix and so far hasn't shown any of the GetDisplayRootFrame crashes. So that's looking good. The previous betas had significant numbers of the crash, so if b6 has zero I think we can safely say it is fixed.

52.0b1 	70 	34.83 %
52.0b2 	45 	22.39 %
52.0b3 	40 	19.90 %
52.0b4 	29 	14.43 %
52.0b5 	17 	8.46 %
52.0b6  (zero so far)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 25

•

7 years ago

b6 and b7 do still have some crashes with the @GetDisplayRootFrame signature, but they no longer go through the APZ code, and they have a different crash address. For example:

https://crash-stats.mozilla.com/report/index/9534c54b-e4c0-49dd-a86d-569742170221
https://crash-stats.mozilla.com/report/index/ddebb794-10ef-477b-be6c-2dde62170215

It looks like the underlying root problem (the cross-doc parent pointer is bad) is still there, but since we've avoided the displayport-expiry code from hitting it, it's getting hit from elsewhere instead (albeit at a much lower frequency).

Consider the root problem is that the cross-doc parent pointer is bad, this seems like a layout problem.

Assignee: bugmail → nobody

Component: Panning and Zooming → Layout

Frederik Braun [:freddy]

Comment 26

•

7 years ago

Jet, can you find someone to continue investigating?

Flags: needinfo?(bugs)

Jet Villegas (inactive)

Comment 27

•

7 years ago

(In reply to Frederik Braun [:freddyb] from comment #26)
> Jet, can you find someone to continue investigating?

This smells like raciness at document and/or window destruction, and we're traversing Layout frames that have gone away. 

Matt: do you have ideas for bulletproofing this?

Flags: needinfo?(bugs) → needinfo?(matt.woodrow)

Matt Woodrow (:mattwoodrow)

Comment 28

•

7 years ago

This is probably easiest for Timothy to keep looking at, as he knows View lifetimes well.

I had a look at some of the crash dumps though:

In the initial crash from comment 0, it appears that nsIFrame::GetView (within an inlined GetCrossDocParentFrame call) is returning a deleted pointer.

nsIFrame should be poisoned, so if we're returning a valid pointer from GetView(), then I assume the root frame must still be alive.

We call GetParent() on the returned view, which returns 0xe5e5e5e5 (the jemalloc poison value), and then we crash trying to call GetParent() on that.

This is similar to what tnikkel suggested in comment 4, but it seems that we destroyed the root view and didn't remove it from the frame?

For the first crash in comment 25:

We're crashing on the f->PresContext()->FrameManager()->GetRootFrame() call, looks like the PresShell has been deleted (and poisoned) so the return of FrameManager() is 0xe5e5e500.

I think we skip PresShell arena poisoning when we tear down the entire PresShell? So I guess 'f' is dead too here (but not poisoned) along with the PresContext/PresShell/FrameManager, but it takes a few derefs before we hit memory that's been poisoned.

I guess this could be similar to above, except we manage to deref our way through the dead nsViews and return a valid-looking nsIFrame from GetCrossDocParentFrame.

For the second crash in comment 25:

Crashing on f->HasAnyStateBits(NS_FRAME_IS_POPUP) because f is garbage (0xb4b4b2b3).

Again, looks like we returned something invalid from GetCrossDocParentFrame.


Any ideas Tim?

Flags: needinfo?(matt.woodrow) → needinfo?(tnikkel)

Matt Woodrow (:mattwoodrow)

Comment 29

•

7 years ago

Also, should nsView be in the PresShell arena and be poisoned when we release it?

Timothy Nikkel (:tnikkel)

Comment 30

•

7 years ago

(In reply to Matt Woodrow (:mattwoodrow) from comment #29)
> Also, should nsView be in the PresShell arena and be poisoned when we
> release it?

We do poison views when we destroy them

https://dxr.mozilla.org/mozilla-central/rev/e03e0c60462c775c7558a1dc9d5cf2076c3cd1f9/view/nsView.cpp#184

but it can only be so effective because they aren't allocated in an arena so the allocator can do whatever with that memory after we release it.

We could allocate views in an arena but it may not help because all the views that are a problem here seem to be root views, and hence they would likely only be going away with the presshell, so if we skip poisoning like you say when tearing down the whole presshell that wouldn't do any good.


I'm working on a few patches that make it impossible to hit certain kinds of problems with nsViews. I don't know if we even hit these problems in practice and if we do if it would help with these crashes though. My hope is that eventually one of the speculative patches will fix these crashes.

Flags: needinfo?(tnikkel)

Julien Cristau [:jcristau] (back April 22)

Comment 31

•

7 years ago

Too late for firefox 52, mass-wontfix.

status-firefox52: affected → wontfix

Randell Jesup [:jesup] (needinfo me)

Comment 32

•

7 years ago

Crashes in 53b8
Given status, fix-optional fro 53 (and likely won't be).  10 53bN crashes in the last week

status-firefox53: unaffected → fix-optional

status-firefox54: --- → ?

status-firefox55: --- → ?

Frederik Braun [:freddy]

Comment 33

•

7 years ago

(In reply to Timothy Nikkel (:tnikkel) from comment #30)
> ...
> I'm working on a few patches that make it impossible to hit certain kinds of
> problems with nsViews. I don't know if we even hit these problems in
> practice and if we do if it would help with these crashes though. My hope is
> that eventually one of the speculative patches will fix these crashes.

Are these patches happening here or in another bug?
I'm asking because this bug has been inactive for almost 2 weeks.

Flags: needinfo?(tnikkel)

Timothy Nikkel (:tnikkel)

Comment 34

•

7 years ago

(In reply to Frederik Braun [:freddyb] from comment #33)
> (In reply to Timothy Nikkel (:tnikkel) from comment #30)
> > ...
> > I'm working on a few patches that make it impossible to hit certain kinds of
> > problems with nsViews. I don't know if we even hit these problems in
> > practice and if we do if it would help with these crashes though. My hope is
> > that eventually one of the speculative patches will fix these crashes.
> 
> Are these patches happening here or in another bug?
> I'm asking because this bug has been inactive for almost 2 weeks.

They don't have a bug yet, they are very early. I've been swamped, and am going to be continue to be swamped. I have had zero time to work on them.

Flags: needinfo?(tnikkel)

Daniel Veditz [:dveditz]

Updated

•

7 years ago

Keywords: testcase-wanted

Stephanie Ouillon [:arroway] (needinfo me)

Comment 35

•

7 years ago

Matt, kats, can one of you take over this sec-high bug from Tim?

Flags: needinfo?(matt.woodrow)

Flags: needinfo?(bugmail)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 36

•

7 years ago

At this point I don't have any ideas on how to fix this or even what to do here.

Flags: needinfo?(bugmail)

Matt Woodrow (:mattwoodrow)

Comment 37

•

7 years ago

Neither do I sorry.

Flags: needinfo?(matt.woodrow)

Stephanie Ouillon [:arroway] (needinfo me)

Updated

•

7 years ago

Whiteboard: [sec-triage-backlog]

Randell Jesup [:jesup] (needinfo me)

Comment 38

•

7 years ago

Still happening - sec-high UAFs with no action

status-firefox53: fix-optional → wontfix

status-firefox54: ? → wontfix

status-firefox55: ? → wontfix

status-firefox56: --- → affected

status-firefox57: --- → affected

status-firefox58: --- → verified disabled

Milan Sreckovic [:milan] (needinfo for best results)

Updated

•

7 years ago

Comment 39

•

7 years ago

:jesup - verified disabled for 58 was an off by one error?

Flags: needinfo?(rjesup)

Randell Jesup [:jesup] (needinfo me)

Comment 40

•

7 years ago

(In reply to Milan Sreckovic [:milan] from comment #39)
> :jesup - verified disabled for 58 was an off by one error?

Dangers of the "page down" button on a dropdown...

status-firefox58: verified disabled → affected

Flags: needinfo?(rjesup)

Jet Villegas (inactive)

Updated

•

7 years ago

Priority: -- → P2

Wennie

Comment 41

•

7 years ago

Hi Jet:

I have assigned these security bugs to you to reassign them to appropriate developers in your team to investigate and fix them.

Thanks!

Wennie

Assignee: nobody → bugs

Jim Mathies [:jimm]

Updated

•

7 years ago

status-firefox56: affected → wontfix

status-firefox57: affected → wontfix

Randell Jesup [:jesup] (needinfo me)

Comment 42

•

6 years ago

Jet - can you reassign this?  Thanks

Flags: needinfo?(bugs)

Jet Villegas (inactive)

Comment 43

•

6 years ago

(In reply to Randell Jesup [:jesup] from comment #42)
> Jet - can you reassign this?  Thanks

I no longer see crashes for nsLayoutUtils::GetReferenceFrame, but still see low-volume crashiness for nsLayoutUtils::GetDisplayRootFrame

We need STR or a URL here.

Flags: needinfo?(bugs)

Randell Jesup [:jesup] (needinfo me)

Comment 44

•

6 years ago

A few possible URLs, though I suspect it's more "what ads were there" or "timing", or the crash itself isn't given to the actual url that caused it.

5 from http://aray.derpacif.com/contable/main.php but all on 50.0, comment "NO PUEDO TRABAJAR"; code is  nsIFrame* parent = GetCrossDocParentFrame(f);

https://www.youtube.com/watch?v=eACohWVwTOc  (seen twice on the same system a few minutes apart, 59b3)

pretty much all over the map on URLs, even a couple of localhost:8080 or 7070's.  We even have about:newtab listed a few times (including 59b10, on  f = f->PresContext()->FrameManager()->GetRootFrame();), which tells me this is one of the many reports where URL is not very useful, if at all.

Ryan VanderMeulen [:RyanVM]

Updated

•

6 years ago

status-firefox58: affected → wontfix

status-firefox59: --- → wontfix

status-firefox60: --- → affected

status-firefox-esr52: --- → affected

Ryan VanderMeulen [:RyanVM]

Updated

•

6 years ago

status-firefox60: affected → wontfix

status-firefox61: --- → fix-optional

status-firefox62: --- → affected

status-firefox-esr60: --- → affected

Julien Cristau [:jcristau] (back April 22)

Updated

•

6 years ago

status-firefox61: fix-optional → wontfix

status-firefox62: affected → fix-optional

status-firefox63: --- → affected

status-firefox-esr52: affected → wontfix

Sean Voisen (:svoisen)

Comment 45

•

6 years ago

:mattwoodrow – I still see crashes for 61 in nsLayoutUtils::GetDisplayRootFrame. But given the history of investigation here, is it fair to mark this as stalled for now?

Flags: needinfo?(matt.woodrow)

Sean Voisen (:svoisen)

Updated

•

6 years ago

Keywords: stalled

Matt Woodrow (:mattwoodrow)

Comment 46

•

6 years ago

I agree that stalled is the right status for this bug.

Flags: needinfo?(matt.woodrow)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 47

•

6 years ago

So I debugged bp-641c7e92-e5ae-41b1-b841-2e7be0180824 a little bit.  It looks like f (and aFrame, if I believe the debugger) are 0x0000000007BA2020 while aFrame->mComputedStyle.mRawPtr is 0x000000BD0BAD9C08, and dereferencing the latter is crashing.  That makes it seem (at least assuming I believe f == aFrame) like there's a bad frame pointer in the pres shell's mDirtyRoots list.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 48

•

6 years ago

I also debugged bp-1be7f209-3d8e-4801-a823-6ba550180826.  The debugger again showed f == aFrame (and it again seemed plausible, in that the frame coming from the stack wasn't recently touched in ways that would be sure to crash, like calling virtual functions).  So I looked at the disassembly in more detail.  The argument aFrame is passed to the function in RCX, but then maintained during the function in RSI.  Then f is maintained in RBX.  The return value is returned in RAX.  I read much of the function (though not the full inlining of IsPopup() and GetCrossDocParentFrame()) and everything seems consistent with that.  So I'm reasonably confident that seeing RBX and RSI as the same value means we're crashing the first time through the loop, and means GetDisplayRootFrame was passed a bad input rather than reaching that bad frame by traversing across GetCrossDocParentFrame.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 49

•

6 years ago

Anyway, the crashes I looked at feel like "this is the first time we're touching this data structure in a bit, and it's corrupt" type crashes.  Such crashes could be real memory corruption bugs, or could be bad RAM.  It's hard to know, though when they're low frequency like this, bad RAM is plausible.

I suspect (given the earlier analysis) also that the crashes we're seeing at these signatures aren't the same as what we were seeing when these bugs were filed -- that's why it's really important to categorize crash bugs by more than just the signature.

Sean Voisen (:svoisen)

Updated

•

6 years ago

Assignee: bugs → svoisen

Liz Henry (:lizzard) (relman/hg->git project)

Comment 50

•

6 years ago

Wontfix for 62/63.

status-firefox62: fix-optional → wontfix

status-firefox63: affected → wontfix

status-firefox64: --- → affected

status-firefox65: --- → affected

Ryan VanderMeulen [:RyanVM]

Updated

•

5 years ago

status-firefox64: affected → wontfix

status-firefox65: affected → wontfix

status-firefox-esr60: affected → wontfix

Daniel Holbert [:dholbert]

Updated

•

3 years ago

Assignee: sean → nobody

BMO Automation

Updated

•

2 years ago

Severity: critical → S2

Timothy Nikkel (:tnikkel)

Comment 51

•

2 years ago

Low crash volume, decreasing severity, -> S3.

Severity: S2 → S3

BugBot [:suhaib / :marco/ :calixte]

Comment 52

•

2 years ago

The severity field for this bug is set to S3. However, the bug is flagged with the sec-high keyword.
:dholbert, could you consider increasing the severity of this security bug?

For more information, please visit auto_nag documentation.

Flags: needinfo?(dholbert)

Daniel Holbert [:dholbert]

Comment 53

•

2 years ago

•

Edited

(In reply to Release mgmt bot [:suhaib / :marco/ :calixte] from comment #52)

:dholbert, could you consider increasing the severity of this security bug?

Comment 49 has the conclusions from the most thorough investigation that we've got, where the working theory was that this could easily just be cases of bad RAM.

So: given that (and given that earlier investigations here stalled out as well), I don't think we should consider this high-severity, and we might want to drop the security rating, unless we get something to disprove the "bad RAM" theory (e.g. a substantial increase in crash volume, or reliable STR, or a testcase that can reliably reproduce this).

Also, side note: some of these are likely similar to what tnikkel described in bug 1708808 comment 6; i.e. it looks like we've got some corruption in the frame tree from something that went wrong earlier (or possibly just from bad RAM, as noted above) and we're doing a whole-tree walk which makes us trip over it. e.g. bp-32b3dbe2-66f8-4133-9cf4-562300220802 seems to be a case of that, where we're in a recursive call to MaybeCreateDisplayPortInFirstScrollFrameEncountered (which walks the frame's child list at the end, which potentially results in a whole-frametree walk).

Flags: needinfo?(dholbert)

Tom Ritter [:tjr] (OOTO until 4/30 at least)

Comment 54

•

4 months ago

While this is still happening at a low rate (56 in 6 months, mostly on old ESR versions) we're working to close unactionable stalled bugs and until we have a reproducer this isn't actionable and may not be valid

Status: NEW → RESOLVED

Closed: 4 months ago

Resolution: --- → INCOMPLETE

BugBot (nomail) [:suhaib / :marco/ :calixte]

Comment 55

•

4 months ago

Since the bug is closed, the stalled keyword is now meaningless.
For more information, please visit BugBot documentation.

Keywords: stalled

Daniel Veditz [:dveditz]

Updated

•

2 months ago

Group: layout-core-security

Move cancellation of mDisplayPortExpiryTimer to ScrollFrameHelper::Destroy() 7 years ago Kartikaya Gupta (email:kats@mozilla.staktrace.com) 1.76 KB, patch	tnikkel : review+	Details \| Diff \| Splinter Review
Patch 7 years ago Kartikaya Gupta (email:kats@mozilla.staktrace.com) 1.38 KB, patch	tnikkel : review+	Details \| Diff \| Splinter Review