Intermittent browser_dbg_variables-view-popup-07.js | [@ nsView::DoResetWidgetBounds(bool, bool)] or [@ nsViewManager::ProcessPendingUpdatesForView(nsView*, bool)]

RESOLVED FIXED in Firefox 29, Firefox OS v1.3

Status

()

Core
Layout: View Rendering
--
critical
RESOLVED FIXED
4 years ago
2 years ago

People

(Reporter: Tomcat, Assigned: mats)

Tracking

(5 keywords)

Trunk
mozilla30
x86
Mac OS X
crash, csectype-uaf, intermittent-failure, mlk, sec-critical
Points:
---
Bug Flags:
in-testsuite -

Firefox Tracking Flags

(firefox27 wontfix, firefox28 wontfix, firefox29 fixed, firefox30+ fixed, firefox-esr24 unaffected, b2g-v1.2 unaffected, b2g-v1.3 fixed, b2g-v1.3T fixed, b2g-v1.4 fixed)

Details

(Whiteboard: [adv-main29+], crash signature, URL)

Attachments

(3 attachments, 3 obsolete attachments)

(Reporter)

Description

4 years ago
Rev5 MacOSX Mountain Lion 10.8 fx-team debug test mochitest-browser-chrome on 2013-12-04 19:55:34 PST for push bf901d895201

slave: talos-mtnlion-r5-061

https://tbpl.mozilla.org/php/getParsedLog.php?id=31472407&tree=Fx-Team

Summary

TEST-UNEXPECTED-FAIL | chrome://mochitests/content/browser/browser/devtools/debugger/test/browser_dbg_variables-view-popup-07.js | application terminated with exit code 256

PROCESS-CRASH | chrome://mochitests/content/browser/browser/devtools/debugger/test/browser_dbg_variables-view-popup-07.js | application crashed [@ nsView::DoResetWidgetBounds(bool, bool)]

also leaked 

TEST-UNEXPECTED-FAIL | leakcheck | 2968 bytes leaked (CompositorChild, CondVar, Mutex, PCompositorChild, PLayerTransactionChild, ...) but not sure if this belongs to the tracking bugs


20:36:06  WARNING -  PROCESS-CRASH | chrome://mochitests/content/browser/browser/devtools/debugger/test/browser_dbg_variables-view-popup-07.js | application crashed [@ nsView::DoResetWidgetBounds(bool, bool)]
20:36:06     INFO -  Crash dump filename: /var/folders/22/cg9zv2f53yd1fn8byjsck83r00000w/T/tmp0zxQG7/minidumps/DA3C4403-9249-4042-930A-83DE146F429D.dmp
20:36:06     INFO -  Operating system: Mac OS X
20:36:06     INFO -                    10.8.0 12A269
20:36:06     INFO -  CPU: amd64
20:36:06     INFO -       family 6 model 42 stepping 7
20:36:06     INFO -       8 CPUs
20:36:06     INFO -  Crash reason:  EXC_BAD_ACCESS / 0x0000000d
20:36:06     INFO -  Crash address: 0x0
20:36:06     INFO -  Thread 0 (crashed)
20:36:06     INFO -   0  XUL!nsView::DoResetWidgetBounds(bool, bool) [nsViewManager.h:bf901d895201 : 63 + 0x0]
20:36:06     INFO -      rbx = 0x000000018883bac0   r12 = 0x0000000112632760
20:36:06     INFO -      r13 = 0x0000000000000000   r14 = 0x000000018883bac0
20:36:06     INFO -      r15 = 0x0000000000000000   rip = 0x00000001027038e1
20:36:06     INFO -      rsp = 0x00007fff5fbe1e30   rbp = 0x00007fff5fbe1ed0
20:36:06     INFO -      Found by: given as instruction pointer in context
20:36:06     INFO -   1  XUL!nsViewManager::ProcessPendingUpdatesForView(nsView*, bool) [nsViewManager.cpp:bf901d895201 : 383 + 0xe]
20:36:06     INFO -      rbx = 0x000000018883bac0   r12 = 0x0000000112632760
20:36:06     INFO -      r13 = 0x0000000000000000   r14 = 0x000000018883bac0
20:36:06     INFO -      r15 = 0x0000000000000000   rip = 0x000000010270717c
20:36:06     INFO -      rsp = 0x00007fff5fbe1ee0   rbp = 0x00007fff5fbe1f30
20:36:06     INFO -      Found by: call frame info
20:36:06     INFO -   2  XUL!nsViewManager::ProcessPendingUpdatesForView(nsView*, bool) [nsViewManager.cpp:bf901d895201 : 389 + 0x18]
20:36:06     INFO -      rbx = 0x000000018883bac0   r12 = 0x0000000112632760
20:36:06     INFO -      r13 = 0x0000000000000000   r14 = 0x000000015dc3af00
20:36:06     INFO -      r15 = 0x0000000000000000   rip = 0x000000010270719e
20:36:06     INFO -      rsp = 0x00007fff5fbe1f40   rbp = 0x00007fff5fbe1f90
20:36:06     INFO -      Found by: call frame info
20:36:06     INFO -   3  XUL!nsViewManager::ProcessPendingUpdatesForView(nsView*, bool) [nsViewManager.cpp:bf901d895201 : 389 + 0x18]
20:36:06     INFO -      rbx = 0x000000015dc3af00   r12 = 0x0000000112632760
20:36:06     INFO -      r13 = 0x0000000000000000   r14 = 0x000000016297fae0
20:36:06     INFO -      r15 = 0x0000000000000000   rip = 0x000000010270719e
20:36:06     INFO -      rsp = 0x00007fff5fbe1fa0   rbp = 0x00007fff5fbe1ff0
20:36:06     INFO -      Found by: call frame info
20:36:06     INFO -   4  XUL!nsViewManager::ProcessPendingUpdatesForView(nsView*, bool) [nsViewManager.cpp:bf901d895201 : 389 + 0x18]
20:36:06     INFO -      rbx = 0x000000016297fae0   r12 = 0x0000000112632760
20:36:06     INFO -      r13 = 0x0000000000000000   r14 = 0x0000000166119790
20:36:06     INFO -      r15 = 0x0000000000000000   rip = 0x000000010270719e
20:36:06     INFO -      rsp = 0x00007fff5fbe2000   rbp = 0x00007fff5fbe2050
20:36:06     INFO -      Found by: call frame info
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Component: General → Graphics: Layers
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
(Assignee)

Comment 18

4 years ago
Looks like a Layout bug to me.  I don't like seeing 0x5a5a5a5a5a5a5a5a in the top
stack frame ... smells use-after-free.
Group: layout-core-security
Severity: normal → critical
Component: Graphics: Layers → Layout: View Rendering
Keywords: csectype-uaf, sec-critical
(Assignee)

Comment 19

4 years ago
Created attachment 8382911 [details]
Relevant part of the log, with crash stack
(Assignee)

Comment 20

4 years ago
stack frame #0:
https://hg.mozilla.org/integration/mozilla-inbound/file/bb5c1b2dc7e4/view/public/nsView.h#l286
I'm guessing this is the call site:
https://hg.mozilla.org/integration/mozilla-inbound/file/bb5c1b2dc7e4/view/src/nsViewManager.cpp#l382
and that 'aView' has been deleted.

Looking at the loop at line 387, and seeing what this method does,
in particular the script blocker on line 417, it seems plausible
that arbitrary nsViews might get deleted inside the loop, including
'childView' which would lead to the crash at hand.

Am I missing something?
Sounds plausible to me.
Actually it might be a bit more complicated then that. We get to ProcessPendingUpdatesForView via PresShell::FlushPendingNotifications, so it must be a call to UpdateWidgetGeometry, which means aFlushDirtyRegion is false and we don't hit the script blocker code. But we know (from eg https://bug968838.bugzilla.mozilla.org/attachment.cgi?id=8378668) that ResetWidgetBounds can re-enter into WillPaintWindow which will call ProcessPendingUpdatesForView with aFlushDirtyRegion true.

Maybe we can walk the view tree, saving the widgets we find that need painting or bounds resetting in an array (that holds refs to the widgets), then process the array by getting the view from each widget and doing the painting or bounds resetting.

(A small unrelated bug I just noticed is that ProcessPendingUpdatesForView always has this being the root vm (as the assetion at the top can attest) even with views from child vms, but it also looks members of this in ways which only make sense if this was the vm for the view. Probably doesn't cause many (any?) problems since the only child widgets we have now are popups and plugins.)

Comment 23

4 years ago
(In reply to Mats Palmgren (:mats) from comment #18)
> Looks like a Layout bug to me.  I don't like seeing 0x5a5a5a5a5a5a5a5a in
> the top stack frame ... smells use-after-free.

Yes, that's definitely our poison-on-free() value.
status-firefox27: --- → ?
status-firefox28: --- → affected
status-firefox29: --- → affected
status-firefox30: --- → affected
tracking-firefox30: --- → +
(In reply to Timothy Nikkel (:tn) from comment #22)
> (A small unrelated bug I just noticed is that ProcessPendingUpdatesForView
> always has this being the root vm (as the assetion at the top can attest)
> even with views from child vms, but it also looks members of this in ways
> which only make sense if this was the vm for the view. Probably doesn't
> cause many (any?) problems since the only child widgets we have now are
> popups and plugins.)

Filed this as bug 978001 with patch.
(Assignee)

Comment 25

4 years ago
Created attachment 8384289 [details] [diff] [review]
wip

This implements what you suggested in comment 22.
Also, it splits up the method into three.
The entry point ProcessPendingUpdatesForView now adds a script blocker
around the entire operation, not separately per widget.  Ditto for
SetPainting(true/false).  The 'this' vm is the root vm, as before.

It then calls ProcessPendingUpdatesRecurse on aView's vm, which recurse
over the view tree and collect the widgets into an array, or calls
FlushDirtyRegionToWidget directly when there isn't one.

Finally it iterates the widget array and calls ProcessPendingUpdatesPaint
for the associated view with its vm as 'this'.

It should be equivalent to the current code, except for wrapping the
entire view tree operation in a script blocker.  I think it would be
good if we can get away with that.

The methods names are rather random, I chose a common prefix
ProcessPendingUpdates to associate them with each other.  Bike
shedding is welcome.

It did trigger one assertion in the TextFrame code though:
https://tbpl.mozilla.org/php/getParsedLog.php?id=35491913&tree=Try&full=1
which is this assertion:
http://mxr.mozilla.org/mozilla-central/source/layout/generic/nsTextFrame.cpp#2642
so appearantly it lead to a situation where we are painting a text frame
before it has been reflowed.  This doesn't really seem like a bug to me,
since with interruptable reflow you can easily have unreflowed text frames
in the tree.  I see that we have a few bugs on file on this assertion
already.  I've included a fix.

https://tbpl.mozilla.org/?tree=Try&rev=da343bba5e14
https://tbpl.mozilla.org/?tree=Try&rev=ec9a45f789f6
(I've triggered a bunch of "bc" runs on 10.8 to see if the crash still occurs)
(Assignee)

Comment 26

4 years ago
Fwiw, there were no crashes in 139 "bc" runs on 10.8.  And it looks green otherwise,
with the usual crop of known oranges.
(Assignee)

Comment 27

4 years ago
I will investigate the nsTextFrame assertion a bit further...
(Assignee)

Comment 28

4 years ago
Created attachment 8385132 [details]
stack+frame dump

It's the "testuser2" nsTextFrame inside a nsTextControlFrame that
hasn't been reflowed yet.  Its ancestors up to and including the
nsTextControlFrame are marked dirty.  Given the test is 
test_basic_form_autocomplete.html I suspect the <input> value
was changed by form fill.  The old code is more eager to run
script runners - it's wrapping each "presShell->Paint" with a
separate script blocker, thus running new ones after each paint.
In my first "wip" patch, they are delayed and only run after all
painting is done.  So it might be that some script runner in
nsTextControlFrame / form-fill code depend on the old behavior.
(Assignee)

Comment 29

4 years ago
Created attachment 8385389 [details] [diff] [review]
fix

Running the ResetWidgetBounds() part outside the script blocker
seems to work without triggering the nsTextFrame assertion.
This should be robust enough to not crash I think.

https://tbpl.mozilla.org/?tree=Try&rev=320cbb7d11a0

(I still think the nsTextFrame assertion is essentially bogus,
but I don't have time to pursue it.)
Attachment #8384289 - Attachment is obsolete: true
Attachment #8385389 - Flags: review?(tnikkel)
(Assignee)

Comment 30

4 years ago
Forgot to include a link to the Android/B2G test run:
https://tbpl.mozilla.org/?tree=Try&rev=51111a806cfe
Comment on attachment 8385389 [details] [diff] [review]
fix

>+  for (size_t i = 0; i < widgets.Length(); ++i) {

I think Length returns a uint32_t.

>+    nsView* view = nsView::GetViewFor(widgets[i]);
>+    if (view) {
>+      view->ResetWidgetBounds(false, true);
>+    }
>+  }
>+  if (aFlushDirtyRegion) {
>+    nsAutoScriptBlocker scriptBlocker;
>+    SetPainting(true);

Since ResetWidgetBounds can re-enter I think at this point we need to guard against |this| being destroyed. Perhaps just do this per-widget and call it lower down when we have a view and a vm from the widget.

>+    for (size_t i = 0; i < widgets.Length(); ++i) {
>+      nsIWidget* widget = widgets[i];
>+      nsView* view = nsView::GetViewFor(widget);
>+      if (view && widget->NeedsPaint()) {
>+        view->GetViewManager()->ProcessPendingUpdatesPaint(widget);
>+      }
>+    }
>+    SetPainting(false);
>+  }
>+}

It looks like if we encounter a widget that has NeedsPaint() being false then we won't call FlushDirtyRegionToWidget, whereas we did before.

Otherwise looks good.
Attachment #8385389 - Flags: review?(tnikkel)
(Assignee)

Comment 32

4 years ago
Created attachment 8386749 [details] [diff] [review]
fix

(In reply to Timothy Nikkel (:tn) from comment #31)
> >+  for (size_t i = 0; i < widgets.Length(); ++i) {
> 
> I think Length returns a uint32_t.

Fixed, but I think it's a bug that it doesn't return a size_t.

> >+  if (aFlushDirtyRegion) {
> >+    nsAutoScriptBlocker scriptBlocker;
> >+    SetPainting(true);
> 
> Since ResetWidgetBounds can re-enter I think at this point we need to guard
> against |this| being destroyed. Perhaps just do this per-widget and call it
> lower down when we have a view and a vm from the widget.

Fair enough.  I'd like to keep it wrapped around the whole loop
though so I added a strong ref on the shell and check that 'this'
is still its vm.

> It looks like if we encounter a widget that has NeedsPaint() being false
> then we won't call FlushDirtyRegionToWidget, whereas we did before.

ProcessPendingUpdatesRecurse does that for non-widget views.

https://tbpl.mozilla.org/?tree=Try&rev=3f8f8b4e505f
https://tbpl.mozilla.org/?tree=Try&rev=faa0e0080fa0
Assignee: nobody → matspal
Attachment #8385389 - Attachment is obsolete: true
Attachment #8386749 - Flags: review?(tnikkel)
(Assignee)

Comment 33

4 years ago
Comment on attachment 8386749 [details] [diff] [review]
fix

> > It looks like if we encounter a widget that has NeedsPaint() being false
> > then we won't call FlushDirtyRegionToWidget, whereas we did before.
> 
> ProcessPendingUpdatesRecurse does that for non-widget views.

Oh, I see that I misunderstood your comment there.
Attachment #8386749 - Flags: review?(tnikkel)
(Assignee)

Updated

4 years ago
Duplicate of this bug: 980287

Updated

4 years ago
Summary: Intermittent TEST-UNEXPECTED-FAIL | test/browser_dbg_variables-view-popup-07.js | application terminated with exit code 256 | [@ nsView::DoResetWidgetBounds(bool, bool)] | | leakcheck | 2968 bytes leaked (CompositorChild, CondVar, Mutex, PCompositorChild → Intermittent browser_dbg_variables-view-popup-07.js | [@ nsView::DoResetWidgetBounds(bool, bool)] or [@ nsViewManager::ProcessPendingUpdatesForView(nsView*, bool)]
(Assignee)

Comment 35

4 years ago
Created attachment 8387388 [details] [diff] [review]
fix

(In reply to Timothy Nikkel (:tn) from comment #31)
> It looks like if we encounter a widget that has NeedsPaint() being false
> then we won't call FlushDirtyRegionToWidget, whereas we did before.

Good catch.  The name "NeedsPaint" seems slightly misleading though -
"HasNonEmptyVisibleArea" would be a better name for what it actually does
so skipping FlushDirtyRegionToWidget when it's false would probably be OK.
Anyway, let's play it safe and not change that bit.

https://tbpl.mozilla.org/?tree=Try&rev=00286c634d36
Attachment #8386749 - Attachment is obsolete: true
Attachment #8387388 - Flags: review?(tnikkel)
Comment on attachment 8387388 [details] [diff] [review]
fix

Looks good, thanks!
Attachment #8387388 - Flags: review?(tnikkel) → review+
Maybe this will fix bug 974542?
(Assignee)

Comment 38

4 years ago
https://tbpl.mozilla.org/?usebuildbot=1&tree=Mozilla-Inbound
status-firefox30: affected → fixed
Flags: in-testsuite-

Comment 39

4 years ago
landing
I think you meant
https://hg.mozilla.org/integration/mozilla-inbound/rev/4fe82d0824d1
(Assignee)

Comment 40

4 years ago
Yes, thank you.

I've backed it out temporarily to investigate a Tp5 regression:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d1c54bc2bb71
(Assignee)

Comment 41

4 years ago
The perf-o-matic chart btw:
http://graphs.mozilla.org/graph.html#tests=[[257,131,35]]&sel=1394102638000,1394275438000&displayrange=7&datatype=running
There's since been many new results and the graph has not gone back done. Something else must have caused it.
Looking at the graph values this changeset also seems to be in the regression range
https://hg.mozilla.org/integration/mozilla-inbound/rev/45ac7b7d7466
(Assignee)

Comment 44

4 years ago
I don't see a difference after the backout either, so I've re-landed it:
https://hg.mozilla.org/integration/mozilla-inbound/rev/399e4c0b9a6f
(Assignee)

Comment 45

4 years ago
(In reply to Timothy Nikkel (:tn) from comment #43)
> Looking at the graph values this changeset also seems to be in the
> regression range

Thanks, the other people with patches in the regression window are already notified.

Comment 46

4 years ago
landing
https://hg.mozilla.org/mozilla-central/rev/399e4c0b9a6f
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla30
Group: layout-core-security → core-security
Is this safe to take on 28 this late in the game?
status-b2g-v1.3: --- → affected
status-b2g-v1.4: --- → fixed
status-firefox-esr24: --- → unaffected
Flags: needinfo?(matspal)
(Assignee)

Comment 48

4 years ago
(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #47)
> Is this safe to take on 28 this late in the game?

No, that's too risky IMO.  I'd like at least a couple of weeks in Beta for something
like this to find any regressions.  It should be OK for Aurora though.
Flags: needinfo?(matspal)
(Assignee)

Comment 49

4 years ago
Comment on attachment 8387388 [details] [diff] [review]
fix

[Approval Request Comment]
Bug caused by (feature/regressing bug #): -
User impact if declined: use-after-free crash
Testing completed (on m-c, etc.): on m-c since 2014-03-08
Risk to taking this patch (and alternatives if risky): low-to-medium risk
String or IDL/UUID changes made by this patch: none
Attachment #8387388 - Flags: approval-mozilla-aurora?
Attachment #8387388 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
(Assignee)

Comment 50

4 years ago
landing
https://hg.mozilla.org/releases/mozilla-aurora/rev/28d2834f0617
status-firefox29: affected → fixed
We may want to get this on b2g28 (v1.3) at some point still.
status-firefox27: ? → wontfix
status-firefox28: affected → wontfix
(Assignee)

Comment 52

4 years ago
Note that you need the patches in bug 978001 too.
Took a look at crash-stats. This seems to have fixed our top two crashes that have nsView in the signature.
We've shipped a couple betas now that include this change. Mats, think this has baked enough that we're OK getting it on b2g28?
Flags: needinfo?(matspal)
(Assignee)

Comment 55

4 years ago
Yes, it should be OK now I think.
Flags: needinfo?(matspal)
Thanks!
https://hg.mozilla.org/releases/mozilla-b2g28_v1_3/rev/31177245de3f
status-b2g-v1.2: --- → unaffected
status-b2g-v1.3: affected → fixed
status-b2g-v1.3T: --- → fixed
Whiteboard: [adv-main29+]
Group: core-security
You need to log in before you can comment on or make changes to this bug.