Closed Bug 946658 Opened 11 years ago Closed 10 years ago

Intermittent browser_dbg_variables-view-popup-07.js | [@ nsView::DoResetWidgetBounds(bool, bool)] or [@ nsViewManager::ProcessPendingUpdatesForView(nsView*, bool)]

Categories

(Core :: Web Painting, defect)

x86
macOS
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla30
Tracking Status
firefox27 --- wontfix
firefox28 --- wontfix
firefox29 --- fixed
firefox30 + fixed
firefox-esr24 --- unaffected
b2g-v1.2 --- unaffected
b2g-v1.3 --- fixed
b2g-v1.3T --- fixed
b2g-v1.4 --- fixed

People

(Reporter: cbook, Assigned: MatsPalmgren_bugz)

References

()

Details

(5 keywords, Whiteboard: [adv-main29+])

Crash Data

Attachments

(3 files, 3 obsolete files)

Rev5 MacOSX Mountain Lion 10.8 fx-team debug test mochitest-browser-chrome on 2013-12-04 19:55:34 PST for push bf901d895201

slave: talos-mtnlion-r5-061

https://tbpl.mozilla.org/php/getParsedLog.php?id=31472407&tree=Fx-Team

Summary

TEST-UNEXPECTED-FAIL | chrome://mochitests/content/browser/browser/devtools/debugger/test/browser_dbg_variables-view-popup-07.js | application terminated with exit code 256

PROCESS-CRASH | chrome://mochitests/content/browser/browser/devtools/debugger/test/browser_dbg_variables-view-popup-07.js | application crashed [@ nsView::DoResetWidgetBounds(bool, bool)]

also leaked 

TEST-UNEXPECTED-FAIL | leakcheck | 2968 bytes leaked (CompositorChild, CondVar, Mutex, PCompositorChild, PLayerTransactionChild, ...) but not sure if this belongs to the tracking bugs


20:36:06  WARNING -  PROCESS-CRASH | chrome://mochitests/content/browser/browser/devtools/debugger/test/browser_dbg_variables-view-popup-07.js | application crashed [@ nsView::DoResetWidgetBounds(bool, bool)]
20:36:06     INFO -  Crash dump filename: /var/folders/22/cg9zv2f53yd1fn8byjsck83r00000w/T/tmp0zxQG7/minidumps/DA3C4403-9249-4042-930A-83DE146F429D.dmp
20:36:06     INFO -  Operating system: Mac OS X
20:36:06     INFO -                    10.8.0 12A269
20:36:06     INFO -  CPU: amd64
20:36:06     INFO -       family 6 model 42 stepping 7
20:36:06     INFO -       8 CPUs
20:36:06     INFO -  Crash reason:  EXC_BAD_ACCESS / 0x0000000d
20:36:06     INFO -  Crash address: 0x0
20:36:06     INFO -  Thread 0 (crashed)
20:36:06     INFO -   0  XUL!nsView::DoResetWidgetBounds(bool, bool) [nsViewManager.h:bf901d895201 : 63 + 0x0]
20:36:06     INFO -      rbx = 0x000000018883bac0   r12 = 0x0000000112632760
20:36:06     INFO -      r13 = 0x0000000000000000   r14 = 0x000000018883bac0
20:36:06     INFO -      r15 = 0x0000000000000000   rip = 0x00000001027038e1
20:36:06     INFO -      rsp = 0x00007fff5fbe1e30   rbp = 0x00007fff5fbe1ed0
20:36:06     INFO -      Found by: given as instruction pointer in context
20:36:06     INFO -   1  XUL!nsViewManager::ProcessPendingUpdatesForView(nsView*, bool) [nsViewManager.cpp:bf901d895201 : 383 + 0xe]
20:36:06     INFO -      rbx = 0x000000018883bac0   r12 = 0x0000000112632760
20:36:06     INFO -      r13 = 0x0000000000000000   r14 = 0x000000018883bac0
20:36:06     INFO -      r15 = 0x0000000000000000   rip = 0x000000010270717c
20:36:06     INFO -      rsp = 0x00007fff5fbe1ee0   rbp = 0x00007fff5fbe1f30
20:36:06     INFO -      Found by: call frame info
20:36:06     INFO -   2  XUL!nsViewManager::ProcessPendingUpdatesForView(nsView*, bool) [nsViewManager.cpp:bf901d895201 : 389 + 0x18]
20:36:06     INFO -      rbx = 0x000000018883bac0   r12 = 0x0000000112632760
20:36:06     INFO -      r13 = 0x0000000000000000   r14 = 0x000000015dc3af00
20:36:06     INFO -      r15 = 0x0000000000000000   rip = 0x000000010270719e
20:36:06     INFO -      rsp = 0x00007fff5fbe1f40   rbp = 0x00007fff5fbe1f90
20:36:06     INFO -      Found by: call frame info
20:36:06     INFO -   3  XUL!nsViewManager::ProcessPendingUpdatesForView(nsView*, bool) [nsViewManager.cpp:bf901d895201 : 389 + 0x18]
20:36:06     INFO -      rbx = 0x000000015dc3af00   r12 = 0x0000000112632760
20:36:06     INFO -      r13 = 0x0000000000000000   r14 = 0x000000016297fae0
20:36:06     INFO -      r15 = 0x0000000000000000   rip = 0x000000010270719e
20:36:06     INFO -      rsp = 0x00007fff5fbe1fa0   rbp = 0x00007fff5fbe1ff0
20:36:06     INFO -      Found by: call frame info
20:36:06     INFO -   4  XUL!nsViewManager::ProcessPendingUpdatesForView(nsView*, bool) [nsViewManager.cpp:bf901d895201 : 389 + 0x18]
20:36:06     INFO -      rbx = 0x000000016297fae0   r12 = 0x0000000112632760
20:36:06     INFO -      r13 = 0x0000000000000000   r14 = 0x0000000166119790
20:36:06     INFO -      r15 = 0x0000000000000000   rip = 0x000000010270719e
20:36:06     INFO -      rsp = 0x00007fff5fbe2000   rbp = 0x00007fff5fbe2050
20:36:06     INFO -      Found by: call frame info
Component: General → Graphics: Layers
Looks like a Layout bug to me.  I don't like seeing 0x5a5a5a5a5a5a5a5a in the top
stack frame ... smells use-after-free.
Group: layout-core-security
Severity: normal → critical
Component: Graphics: Layers → Layout: View Rendering
stack frame #0:
https://hg.mozilla.org/integration/mozilla-inbound/file/bb5c1b2dc7e4/view/public/nsView.h#l286
I'm guessing this is the call site:
https://hg.mozilla.org/integration/mozilla-inbound/file/bb5c1b2dc7e4/view/src/nsViewManager.cpp#l382
and that 'aView' has been deleted.

Looking at the loop at line 387, and seeing what this method does,
in particular the script blocker on line 417, it seems plausible
that arbitrary nsViews might get deleted inside the loop, including
'childView' which would lead to the crash at hand.

Am I missing something?
Sounds plausible to me.
Actually it might be a bit more complicated then that. We get to ProcessPendingUpdatesForView via PresShell::FlushPendingNotifications, so it must be a call to UpdateWidgetGeometry, which means aFlushDirtyRegion is false and we don't hit the script blocker code. But we know (from eg https://bug968838.bugzilla.mozilla.org/attachment.cgi?id=8378668) that ResetWidgetBounds can re-enter into WillPaintWindow which will call ProcessPendingUpdatesForView with aFlushDirtyRegion true.

Maybe we can walk the view tree, saving the widgets we find that need painting or bounds resetting in an array (that holds refs to the widgets), then process the array by getting the view from each widget and doing the painting or bounds resetting.

(A small unrelated bug I just noticed is that ProcessPendingUpdatesForView always has this being the root vm (as the assetion at the top can attest) even with views from child vms, but it also looks members of this in ways which only make sense if this was the vm for the view. Probably doesn't cause many (any?) problems since the only child widgets we have now are popups and plugins.)
(In reply to Mats Palmgren (:mats) from comment #18)
> Looks like a Layout bug to me.  I don't like seeing 0x5a5a5a5a5a5a5a5a in
> the top stack frame ... smells use-after-free.

Yes, that's definitely our poison-on-free() value.
(In reply to Timothy Nikkel (:tn) from comment #22)
> (A small unrelated bug I just noticed is that ProcessPendingUpdatesForView
> always has this being the root vm (as the assetion at the top can attest)
> even with views from child vms, but it also looks members of this in ways
> which only make sense if this was the vm for the view. Probably doesn't
> cause many (any?) problems since the only child widgets we have now are
> popups and plugins.)

Filed this as bug 978001 with patch.
Attached patch wip (obsolete) — Splinter Review
This implements what you suggested in comment 22.
Also, it splits up the method into three.
The entry point ProcessPendingUpdatesForView now adds a script blocker
around the entire operation, not separately per widget.  Ditto for
SetPainting(true/false).  The 'this' vm is the root vm, as before.

It then calls ProcessPendingUpdatesRecurse on aView's vm, which recurse
over the view tree and collect the widgets into an array, or calls
FlushDirtyRegionToWidget directly when there isn't one.

Finally it iterates the widget array and calls ProcessPendingUpdatesPaint
for the associated view with its vm as 'this'.

It should be equivalent to the current code, except for wrapping the
entire view tree operation in a script blocker.  I think it would be
good if we can get away with that.

The methods names are rather random, I chose a common prefix
ProcessPendingUpdates to associate them with each other.  Bike
shedding is welcome.

It did trigger one assertion in the TextFrame code though:
https://tbpl.mozilla.org/php/getParsedLog.php?id=35491913&tree=Try&full=1
which is this assertion:
http://mxr.mozilla.org/mozilla-central/source/layout/generic/nsTextFrame.cpp#2642
so appearantly it lead to a situation where we are painting a text frame
before it has been reflowed.  This doesn't really seem like a bug to me,
since with interruptable reflow you can easily have unreflowed text frames
in the tree.  I see that we have a few bugs on file on this assertion
already.  I've included a fix.

https://tbpl.mozilla.org/?tree=Try&rev=da343bba5e14
https://tbpl.mozilla.org/?tree=Try&rev=ec9a45f789f6
(I've triggered a bunch of "bc" runs on 10.8 to see if the crash still occurs)
Fwiw, there were no crashes in 139 "bc" runs on 10.8.  And it looks green otherwise,
with the usual crop of known oranges.
I will investigate the nsTextFrame assertion a bit further...
Attached file stack+frame dump
It's the "testuser2" nsTextFrame inside a nsTextControlFrame that
hasn't been reflowed yet.  Its ancestors up to and including the
nsTextControlFrame are marked dirty.  Given the test is 
test_basic_form_autocomplete.html I suspect the <input> value
was changed by form fill.  The old code is more eager to run
script runners - it's wrapping each "presShell->Paint" with a
separate script blocker, thus running new ones after each paint.
In my first "wip" patch, they are delayed and only run after all
painting is done.  So it might be that some script runner in
nsTextControlFrame / form-fill code depend on the old behavior.
Attached patch fix (obsolete) — Splinter Review
Running the ResetWidgetBounds() part outside the script blocker
seems to work without triggering the nsTextFrame assertion.
This should be robust enough to not crash I think.

https://tbpl.mozilla.org/?tree=Try&rev=320cbb7d11a0

(I still think the nsTextFrame assertion is essentially bogus,
but I don't have time to pursue it.)
Attachment #8384289 - Attachment is obsolete: true
Attachment #8385389 - Flags: review?(tnikkel)
Forgot to include a link to the Android/B2G test run:
https://tbpl.mozilla.org/?tree=Try&rev=51111a806cfe
Comment on attachment 8385389 [details] [diff] [review]
fix

>+  for (size_t i = 0; i < widgets.Length(); ++i) {

I think Length returns a uint32_t.

>+    nsView* view = nsView::GetViewFor(widgets[i]);
>+    if (view) {
>+      view->ResetWidgetBounds(false, true);
>+    }
>+  }
>+  if (aFlushDirtyRegion) {
>+    nsAutoScriptBlocker scriptBlocker;
>+    SetPainting(true);

Since ResetWidgetBounds can re-enter I think at this point we need to guard against |this| being destroyed. Perhaps just do this per-widget and call it lower down when we have a view and a vm from the widget.

>+    for (size_t i = 0; i < widgets.Length(); ++i) {
>+      nsIWidget* widget = widgets[i];
>+      nsView* view = nsView::GetViewFor(widget);
>+      if (view && widget->NeedsPaint()) {
>+        view->GetViewManager()->ProcessPendingUpdatesPaint(widget);
>+      }
>+    }
>+    SetPainting(false);
>+  }
>+}

It looks like if we encounter a widget that has NeedsPaint() being false then we won't call FlushDirtyRegionToWidget, whereas we did before.

Otherwise looks good.
Attachment #8385389 - Flags: review?(tnikkel)
Attached patch fix (obsolete) — Splinter Review
(In reply to Timothy Nikkel (:tn) from comment #31)
> >+  for (size_t i = 0; i < widgets.Length(); ++i) {
> 
> I think Length returns a uint32_t.

Fixed, but I think it's a bug that it doesn't return a size_t.

> >+  if (aFlushDirtyRegion) {
> >+    nsAutoScriptBlocker scriptBlocker;
> >+    SetPainting(true);
> 
> Since ResetWidgetBounds can re-enter I think at this point we need to guard
> against |this| being destroyed. Perhaps just do this per-widget and call it
> lower down when we have a view and a vm from the widget.

Fair enough.  I'd like to keep it wrapped around the whole loop
though so I added a strong ref on the shell and check that 'this'
is still its vm.

> It looks like if we encounter a widget that has NeedsPaint() being false
> then we won't call FlushDirtyRegionToWidget, whereas we did before.

ProcessPendingUpdatesRecurse does that for non-widget views.

https://tbpl.mozilla.org/?tree=Try&rev=3f8f8b4e505f
https://tbpl.mozilla.org/?tree=Try&rev=faa0e0080fa0
Assignee: nobody → matspal
Attachment #8385389 - Attachment is obsolete: true
Attachment #8386749 - Flags: review?(tnikkel)
Comment on attachment 8386749 [details] [diff] [review]
fix

> > It looks like if we encounter a widget that has NeedsPaint() being false
> > then we won't call FlushDirtyRegionToWidget, whereas we did before.
> 
> ProcessPendingUpdatesRecurse does that for non-widget views.

Oh, I see that I misunderstood your comment there.
Attachment #8386749 - Flags: review?(tnikkel)
Summary: Intermittent TEST-UNEXPECTED-FAIL | test/browser_dbg_variables-view-popup-07.js | application terminated with exit code 256 | [@ nsView::DoResetWidgetBounds(bool, bool)] | | leakcheck | 2968 bytes leaked (CompositorChild, CondVar, Mutex, PCompositorChild → Intermittent browser_dbg_variables-view-popup-07.js | [@ nsView::DoResetWidgetBounds(bool, bool)] or [@ nsViewManager::ProcessPendingUpdatesForView(nsView*, bool)]
Attached patch fixSplinter Review
(In reply to Timothy Nikkel (:tn) from comment #31)
> It looks like if we encounter a widget that has NeedsPaint() being false
> then we won't call FlushDirtyRegionToWidget, whereas we did before.

Good catch.  The name "NeedsPaint" seems slightly misleading though -
"HasNonEmptyVisibleArea" would be a better name for what it actually does
so skipping FlushDirtyRegionToWidget when it's false would probably be OK.
Anyway, let's play it safe and not change that bit.

https://tbpl.mozilla.org/?tree=Try&rev=00286c634d36
Attachment #8386749 - Attachment is obsolete: true
Attachment #8387388 - Flags: review?(tnikkel)
Comment on attachment 8387388 [details] [diff] [review]
fix

Looks good, thanks!
Attachment #8387388 - Flags: review?(tnikkel) → review+
Maybe this will fix bug 974542?
Yes, thank you.

I've backed it out temporarily to investigate a Tp5 regression:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d1c54bc2bb71
There's since been many new results and the graph has not gone back done. Something else must have caused it.
Looking at the graph values this changeset also seems to be in the regression range
https://hg.mozilla.org/integration/mozilla-inbound/rev/45ac7b7d7466
I don't see a difference after the backout either, so I've re-landed it:
https://hg.mozilla.org/integration/mozilla-inbound/rev/399e4c0b9a6f
(In reply to Timothy Nikkel (:tn) from comment #43)
> Looking at the graph values this changeset also seems to be in the
> regression range

Thanks, the other people with patches in the regression window are already notified.
https://hg.mozilla.org/mozilla-central/rev/399e4c0b9a6f
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla30
Group: layout-core-security → core-security
Is this safe to take on 28 this late in the game?
Flags: needinfo?(matspal)
(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #47)
> Is this safe to take on 28 this late in the game?

No, that's too risky IMO.  I'd like at least a couple of weeks in Beta for something
like this to find any regressions.  It should be OK for Aurora though.
Flags: needinfo?(matspal)
Comment on attachment 8387388 [details] [diff] [review]
fix

[Approval Request Comment]
Bug caused by (feature/regressing bug #): -
User impact if declined: use-after-free crash
Testing completed (on m-c, etc.): on m-c since 2014-03-08
Risk to taking this patch (and alternatives if risky): low-to-medium risk
String or IDL/UUID changes made by this patch: none
Attachment #8387388 - Flags: approval-mozilla-aurora?
Attachment #8387388 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
We may want to get this on b2g28 (v1.3) at some point still.
Note that you need the patches in bug 978001 too.
Took a look at crash-stats. This seems to have fixed our top two crashes that have nsView in the signature.
We've shipped a couple betas now that include this change. Mats, think this has baked enough that we're OK getting it on b2g28?
Flags: needinfo?(matspal)
Yes, it should be OK now I think.
Flags: needinfo?(matspal)
Whiteboard: [adv-main29+]
Group: core-security
Component: Layout: View Rendering → Layout: Web Painting
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: