If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

Intermittent components/satchel/test/test_form_autocomplete_with_list.html | Exited with code 1 during test run | application crashed [@ nsAutoLayoutPhase::Enter()]

RESOLVED FIXED in mozilla23

Status

()

Core
Layout
--
critical
RESOLVED FIXED
5 years ago
a year ago

People

(Reporter: RyanVM, Assigned: tnikkel)

Tracking

(Depends on: 1 bug, {crash, intermittent-failure})

Trunk
mozilla23
x86_64
Mac OS X
crash, intermittent-failure
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [snappy:p1], crash signature)

Attachments

(7 attachments, 7 obsolete attachments)

8.78 KB, text/plain
Details
116.70 KB, text/plain
Details
925 bytes, patch
mats
: review+
Details | Diff | Splinter Review
4.49 KB, patch
mossop
: review+
Details | Diff | Splinter Review
1.92 KB, patch
Details | Diff | Splinter Review
6.26 KB, patch
mats
: review+
Details | Diff | Splinter Review
2.08 KB, patch
tnikkel
: review+
Details | Diff | Splinter Review
(Reporter)

Description

5 years ago
https://tbpl.mozilla.org/php/getParsedLog.php?id=20687946&tree=Mozilla-Inbound

Rev5 MacOSX Mountain Lion 10.8 mozilla-inbound debug test mochitest-5 on 2013-03-15 06:23:23 PDT for push e90fa0a09ac5
slave: talos-mtnlion-r5-081

06:32:55     INFO -  168589 INFO TEST-PASS | /tests/toolkit/components/satchel/test/test_form_autocomplete_with_list.html | Checking form3 input
06:32:55     INFO -  168590 INFO TEST-INFO | /tests/toolkit/components/satchel/test/test_form_autocomplete_with_list.html | Starting test #201
06:32:55     INFO -  168591 INFO TEST-PASS | /tests/toolkit/components/satchel/test/test_form_autocomplete_with_list.html | Checking form3 input
06:32:55     INFO -  168592 INFO TEST-INFO | /tests/toolkit/components/satchel/test/test_form_autocomplete_with_list.html | Starting test #202
06:32:55     INFO -  168593 INFO TEST-PASS | /tests/toolkit/components/satchel/test/test_form_autocomplete_with_list.html | Checking form3 input
06:32:55     INFO -  168594 INFO TEST-INFO | /tests/toolkit/components/satchel/test/test_form_autocomplete_with_list.html | Starting test #203
06:32:55     INFO -  168595 INFO TEST-PASS | /tests/toolkit/components/satchel/test/test_form_autocomplete_with_list.html | Checking form3 input
06:32:55     INFO -  168596 INFO TEST-INFO | /tests/toolkit/components/satchel/test/test_form_autocomplete_with_list.html | Starting test #300
06:32:55     INFO -  Assertion failure: mPresContext->mLayoutPhaseCount[eLayoutPhase_Reflow] == 0 (painting in the middle of reflow), at ../../../layout/base/nsAutoLayoutPhase.cpp:37
06:32:58  WARNING -  TEST-UNEXPECTED-FAIL | /tests/toolkit/components/satchel/test/test_form_autocomplete_with_list.html | Exited with code 1 during test run
06:32:58     INFO -  INFO | automation.py | Application ran for: 0:08:40.454693
06:32:58     INFO -  INFO | automation.py | Reading PID log: /var/folders/6p/28y5jm7s6lz843fsd3j2d_lh00000w/T/tmpG1Adbtpidlog
06:33:09     INFO -  PROCESS-CRASH | /tests/toolkit/components/satchel/test/test_form_autocomplete_with_list.html | application crashed [@ nsAutoLayoutPhase::Enter()]
06:33:09     INFO -  Crash dump filename: /var/folders/6p/28y5jm7s6lz843fsd3j2d_lh00000w/T/tmpFQEu0z/minidumps/31F42F0B-23AC-47E3-801B-9B8D93E16461.dmp
06:33:09     INFO -  Operating system: Mac OS X
06:33:09     INFO -                    10.8.0 12A269
06:33:09     INFO -  CPU: amd64
06:33:09     INFO -       family 6 model 42 stepping 7
06:33:09     INFO -       8 CPUs
06:33:09     INFO -  Crash reason:  EXC_BAD_ACCESS / KERN_INVALID_ADDRESS
06:33:09     INFO -  Crash address: 0x0
06:33:09     INFO -  Thread 0 (crashed)
06:33:09     INFO -   0  XUL!nsAutoLayoutPhase::Enter() [nsAutoLayoutPhase.cpp:e90fa0a09ac5 : 34 + 0x0]
06:33:09     INFO -      rbx = 0x00007fff73398c68   r12 = 0x00007fff5fbfb2c8
06:33:09     INFO -      r13 = 0x0000000107680c10   r14 = 0x00007fff5fbfb2f8
06:33:09     INFO -      r15 = 0x000000010767ff80   rip = 0x00000001013f935f
06:33:09     INFO -      rsp = 0x00007fff5fbfb0e0   rbp = 0x00007fff5fbfb0f0
06:33:09     INFO -      Found by: given as instruction pointer in context
06:33:09     INFO -   1  XUL!PresShell::Paint(nsView*, nsRegion const&, unsigned int) [nsPresShell.cpp:e90fa0a09ac5 : 5512 + 0x4]
06:33:09     INFO -      rip = 0x00000001013dcfd0   rsp = 0x00007fff5fbfb100
06:33:09     INFO -      rbp = 0x00007fff5fbfb2a0
06:33:09     INFO -      Found by: stack scanning
06:33:09     INFO -   2  AppKit + 0xf65d8
06:33:09     INFO -      rip = 0x00007fff86b995d9   rsp = 0x00007fff5fbfb110
06:33:09     INFO -      rbp = 0x00007fff5fbfb2a0
06:33:09     INFO -      Found by: stack scanning
06:33:09     INFO -   3  AppKit + 0x124855
06:33:09     INFO -      rip = 0x00007fff86bc7856   rsp = 0x00007fff5fbfb120
06:33:09     INFO -      rbp = 0x00007fff5fbfb2a0
06:33:09     INFO -      Found by: stack scanning
06:33:09     INFO -   4  AppKit + 0x12477b
06:33:09     INFO -      rip = 0x00007fff86bc777c   rsp = 0x00007fff5fbfb140
06:33:09     INFO -      rbp = 0x00007fff5fbfb2a0
06:33:09     INFO -      Found by: stack scanning
06:33:09     INFO -   5  XUL!nsViewManager::ProcessPendingUpdatesForView(nsView*, bool) [nsViewManager.cpp:e90fa0a09ac5 : 407 + 0x7]
06:33:09     INFO -      rip = 0x0000000101b38dfd   rsp = 0x00007fff5fbfb170
06:33:09     INFO -      rbp = 0x00007fff5fbfb2a0
06:33:09     INFO -      Found by: stack scanning
06:33:09     INFO -   6  XUL!nsView::DoResetWidgetBounds(bool, bool) [nsView.cpp:e90fa0a09ac5 : 236 + 0x10]
06:33:09     INFO -      rip = 0x0000000101b35561   rsp = 0x00007fff5fbfb180
06:33:09     INFO -      rbp = 0x00007fff5fbfb2a0
Created attachment 725620 [details]
full stack trace for posterity

Updated

5 years ago
Severity: normal → critical
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
(Assignee)

Comment 5

5 years ago
Only happened in debug builds so far, so is mPresContext null or is the moz assert failing? I can't necessarily tell.
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)

Comment 11

5 years ago
This is triggering fairly consistently on my machine at the moment.  I added
some debug and can verify its the assert triggering.  mPresContext is non-null
and mPresContext->mLayoutPhaseCount[eLayoutPhase_Reflow] is 1 (so probably not
garbage).

The condition definitely seems timing related.  I had to play with the debug a
bit since simple printf's masked the problem completely.  Without any debug I
seem to trigger nearly 100% of the time.  With conditional debug it becomes
intermittent.

Interestingly, the test with debug also seems to run longer before triggering the
assert.  With no debug the assert triggers nearly immediately.  With a conditional
printf added I can see it stepping through the form automation for a few
seconds before triggering.

Anyway, just some observations in case it helps track this down.  This is on a
2009 macbook pro running lion.
(Assignee)

Comment 12

5 years ago
Is it possible you could get a stack? The ones in the tinderbox logs seem a little broken and don't explain why we are seeing this problem.

Comment 13

5 years ago
Created attachment 734471 [details]
Another stack dump

Sure.  Here is another stack dump.  I used NS_ABORT() to trigger it, so you can
see the immediate abort trace earlier in the log and then the minidump_stackwalk
at the end.

Is there a better way to do this?  I'm new to the moz-central dev environment.
(Assignee)

Comment 14

5 years ago
That's perfect, thank you. It clearly shows what the problem is now.

Comment 15

5 years ago
Awesome.  Let me know if you want me to test any patches, etc.
(Assignee)

Comment 16

5 years ago
So the problem is that during reflow of a popup we make it visible by setting the view to be visible, this calls Show and Resize on the widget, the OS calls back from the resize call and asks us to paint.

Comment 17

5 years ago
That makes sense looking at the stack trace, but I'm still trying to understand
how this was timing sensitive.  If the callback from the OS was done synchronously,
shouldn't this have always triggered?

Or are paints throttled such that if one occurred recently, then this new repaint
may get queued for as an async event?
(Assignee)

Comment 18

5 years ago
I guess the OS decides if it wants to ask us for a paint or not and it bases it's decision on something, and I don't know if we can always control that. I'm pretty sure I've seen the same issue on Windows too, so we should probably just fix it properly. Which is to make these widget calls to happen async in the view subsystem. I think mstange landed a patch for that a few years ago but it had to be backed out for a regression. But I wouldn't be surprised it we could do it now, or just fix the regression.
(Assignee)

Comment 19

5 years ago
I think bug 562138 is what I'm thinking about.
Comment hidden (Treeherder Robot)
(Assignee)

Comment 21

5 years ago
Created attachment 735381 [details] [diff] [review]
patch

Here Markus' patch from bug 562138 unbitrotted.

It causes some failures. I think Linux would fail too except test_panel.xul is disabled on linux.
https://tbpl.mozilla.org/?tree=Try&rev=bd6721e0627a

Comment 22

5 years ago
Created attachment 735555 [details]
Rendering error with patch at end of test_form_autocomplete_with_list.html

So I also tried the patch in attachment 735381 [details] [diff] [review] on my machine.  It does seem to
prevent the assertion in test_form_autocomplete_with_list.html, but as you said,
introduces some other issues.

The problem I saw was that while the test passed cleaning, the autocomplete
popup was not properly removed from the page at the end of the test.  See the
attached screenshot.

Comment 23

5 years ago
Created attachment 735594 [details] [diff] [review]
Updated patch

Updated the patch to correct the rendering error in the test.  I moved the
conditional mWindow->Show(false) to occur earlier in DoResetWidgetBounds().
In this particular case the test was triggering the following check and bailing
out before hiding the popup window:

  if (type == eWindowType_popup &&
      ((curBounds.IsEmpty() && mDimBounds.IsEmpty()) ||
       mVis == nsViewVisibility_kHide)) {
    // Don't manipulate empty or hidden popup widgets. For example there's no
    // point moving hidden comboboxes around, or doing X server roundtrips
    // to compute their true screen position. This could mean that WidgetToScreen
    // operations on these widgets don't return up-to-date values, but popup
    // positions aren't reliable anyway because of correction to be on or off-screen.
    return;
  }

I hope you don't mind me updating/obsoleting your patch.  Is this bad form?
Thanks.
Attachment #735381 - Attachment is obsolete: true

Updated

5 years ago
Attachment #735555 - Attachment is obsolete: true
(Assignee)

Comment 24

5 years ago
Great! Thanks for figuring that out.

I pushed the new patch to try
https://tbpl.mozilla.org/?tree=Try&rev=5f2075c4dbd1

It seems to have fixed the problems on mac, but we still have the Windows failures left.

Comment 25

5 years ago
I'm not near my dev environment at the moment, but it looks like that test has
an explicit delay programmed for Linux:

var waitSteps = 0;
function popupShown(event)
{
  var panel = event.target;

  if (waitSteps > 0 && navigator.platform.indexOf("Linux") >= 0 &&
      panel.boxObject.screenY == 210) {
    waitSteps--;
    setTimeout(popupShown, 10, event);
    return;
  }

  currentTest.result(currentTest.testname + " ", panel);
  panel.hidePopup();
}

Perhaps we need to adapt that for windows as well now that the popup visiblity
is deferred.

Do the different platforms flush pending updates at different rates?

Comment 26

5 years ago
Created attachment 736119 [details] [diff] [review]
Patch v3

This is a bit of a shot in the dark since I don't have a windows dev
environment to test this.

It looks like there is explicit code in this test to wait for popup visibility
changes, but only on Linux.  I went looking for some history on this and the
best I could find was bug 552982 comment 45.  It implies that the Linux OS
callback will always be deferred and "requires waiting for the window
manager to reply".

Now that we are deferring all visibility related forced paints, it seems like
it makes sense to relax the test to allow deferred painting for all platforms.
Attachment #735594 - Attachment is obsolete: true
(Assignee)

Comment 27

5 years ago
(In reply to Ben Kelly from comment #26)
> Now that we are deferring all visibility related forced paints, it seems like
> it makes sense to relax the test to allow deferred painting for all
> platforms.

Linux is particularly gross in this case because of how X11 works. It's the best we've been able to do so far. But I don't know if we should give up that easily for Windows. The popupshown event is already dispatched async here http://mxr.mozilla.org/mozilla-central/source/layout/xul/base/src/nsMenuPopupFrame.cpp#472 so if our coords are wrong when it's fired we've already waited a bit.

Comment 28

5 years ago
Hmm, I think I'll need to look when its not 1am to understand.

Do you think it would be worth running a try to see if this has any effect?  Even
if its not the final solution it might give us a data point as to what is going
on in the test.
(Assignee)

Comment 29

5 years ago
Sure, I pushed it to try

https://tbpl.mozilla.org/?tree=Try&rev=790e1ac8fd27

we can see if it ever gets the right position or not with this.
Comment hidden (Treeherder Robot)
Comment hidden (Treeherder Robot)

Updated

5 years ago
Attachment #736119 - Attachment is obsolete: true

Updated

5 years ago
Attachment #735594 - Attachment is obsolete: false

Comment 32

5 years ago
Well, attachment 736119 [details] [diff] [review] change to the test did not help.  Restoring the previous
patch as the best version.  Thanks for humoring me!
Comment hidden (Treeherder Robot)
(Assignee)

Comment 34

5 years ago
That's good news, that means it's not timing dependent which means it should be easier for someone with a Windows setup to debug.

Updated

5 years ago
Blocks: 566746
Comment hidden (Treeherder Robot)

Comment 36

5 years ago
I have a winxp build working in a VM and can now reproduce the failure in
test_panel.xul.  I will take a look at it over the weekend.

Comment 37

5 years ago
FYI, I'm still looking at this.  Slow going since the compile/debug cycle is about 30+ minutes on this VM.  My current idea is that the state mismatch between the view manager's GetVisibility() and the window's IsVisible() is causing an issue.
(Assignee)

Comment 38

5 years ago
Hey Ben you should be able to speed that up by using pymake if you aren't already. And then going into your objdir and then only building in the following folders under there
1) the folder under which you've made changes
2) layout
3) toolkit/library
Let me know if I can be of any help, here, or on email or irc.
(Assignee)

Comment 39

5 years ago
Some more things that might help: panels with titlebars we usually rely on a two part process to get them positioned and sized correctly where we first position it without knowing how big the titlebar will be, then get a size/move notification and then we can get the info we need to place it properly. Perhaps something it going wrong there? Logging the nsXULPopupManager::PopupMoved ad nsXULPopupManager::PopupResized both in the working case and not working case might help there.

Comment 40

5 years ago
Wow, thank you!  That cut my build time down to something much more reasonable.

And the explanation of how popups are displayed is very helpful.  I'll investigate this when I next get time.  Unfortunately this may not be until next week.

Thanks again!
This is blocking the landing of bug 566746, a large project that I'd like to get landed ASAP.

Timothy, can you take this if Ben's not available in the near term?

Updated

5 years ago
Whiteboard: [snappy:p1]

Comment 42

5 years ago
Yea, I won't have time until next week and my windows development environment is not the best.  Please feel free to drive this forward.  I'd be happy to help test.
(Assignee)

Comment 43

5 years ago
I'll debug the Windows failure.

Although that may not be the only thing blocking this from landing: the original patch from bug 562138 had a part 2 that was needed then (haven't looked into if it's still needed), and this patch did get backed out in bug 562138 for causing a regression (haven't looked into if it will still cause that regression).
QA Contact: tnikkel
(Assignee)

Updated

5 years ago
Assignee: nobody → tnikkel
QA Contact: tnikkel
(Assignee)

Comment 44

5 years ago
Created attachment 740431 [details] [diff] [review]
ignore popup move/resize on hidden widgets

This makes the main patch here green on try server.

But after looking into this more I'm thinking that taking the safer approach of just doing the view show from a runnable we dispatch is a better idea. Currently debugging one try server failure with that approach...
(Assignee)

Comment 45

5 years ago
Created attachment 741089 [details] [diff] [review]
run view show off a runnable instead of during reflow

Just setting the visibility on the view turned out to be a little more complicated than it sounds.

I re-used the nsXULPopupShownEvent event that we already dispatch so that the popup-shown event is still dispatched immediately after we call Show on the widget.
Attachment #741089 - Flags: review?(matspal)
(Assignee)

Comment 46

5 years ago
Created attachment 741090 [details] [diff] [review]
fix a test

After we send a mouse event to click on the cancel button the popup is hidden and shown before it turns into the "Would you like to restart the addon download?" popup. This test seems to be assuming that it stays shown and is just modified. So we need to add another layer of wait_for_notification to wait for the popup-shown event to be dispatched.

I'll attach a whitespace diff that will make this patch very clear.
Attachment #741090 - Flags: review?(dtownsend+bugmail)
(Assignee)

Comment 47

5 years ago
Created attachment 741091 [details] [diff] [review]
fix a test (diff -w)
Attachment #741090 - Flags: review?(dtownsend+bugmail) → review+

Comment 48

5 years ago
For what its worth, I can confirm attachment 741089 [details] [diff] [review] fixes the issue for me.  Thanks!
Comment on attachment 741089 [details] [diff] [review]
run view show off a runnable instead of during reflow

I'm a little worried about the latency in using an event.
Could we delay the mDoViewShow part using PostReflowCallback instead?
Comment hidden (Treeherder Robot)
(Assignee)

Comment 51

5 years ago
(In reply to Mats Palmgren [:mats] from comment #49)
> I'm a little worried about the latency in using an event.
> Could we delay the mDoViewShow part using PostReflowCallback instead?

That's a great idea, I don't know why I didn't think of it. It makes things simpler too because we know it will execute before the popupshown event dispatcher runs.
(Assignee)

Comment 52

5 years ago
Created attachment 741478 [details] [diff] [review]
use a reflow callback
Attachment #741089 - Attachment is obsolete: true
Attachment #741089 - Flags: review?(matspal)
Attachment #741478 - Flags: review?(matspal)

Comment 53

5 years ago
Test passes cleanly for me with attachment 741478 [details] [diff] [review].
Comment on attachment 741478 [details] [diff] [review]
use a reflow callback

>+nsMenuPopupFrame::ReflowFinished()
>+{
>+  mReflowCallbackPosted = false;
>+
>+  nsView* view = GetView();
>+  nsViewManager* viewManager = view->GetViewManager();
>+  if (view && viewManager) {

No need to null-check those.


>+    nsWeakFrame weak(this);
>+    viewManager->SetViewVisibility(view, nsViewVisibility_kShow);
>+    if (!weak.IsAlive()) {
>+      return false;
>+    }

I don't think SetViewVisibility() can destroy the frame.  If it can
then this code isn't safe.  AFAICT, this bug is only about a fatal
assertion in debug builds, am I missing something?

Maybe we should wrap the loop in PresShell::HandlePostedReflowCallbacks()
in a nsAutoScriptBlocker scope to prevent that from happening?
It would be a good safety precaution in general I think...


>+    mPopupState = ePopupOpenAndVisible;
>+    view = GetView();
>+    if (view) {

No need to null-check.


>   nsView* view = GetView();
>   nsViewManager* viewManager = view->GetViewManager();
>+  if (mReflowCallbackPosted) {
>+    PresContext()->PresShell()->CancelReflowCallback(this);
>+    mReflowCallbackPosted = false;
>+  }
>   viewManager->SetViewVisibility(view, nsViewVisibility_kHide);

Nit: move the added lines before the declaration of 'view' perhaps?

Looks good otherwise.
Examining the stack in comment 13 a bit more, there's:
PresShell::ProcessReflowCommands
  PresShell::DoReflow
   ...
    nsMenuPopupFrame::LayoutPopup
      nsViewManager::SetViewVisibility
       ...
        [ChildView viewWillDraw]
          nsViewManager::WillPaintWindow
            nsViewManager::ProcessPendingUpdatesForView

and ProcessPendingUpdatesForView has a nsAutoScriptBlocker...

The outermost ProcessReflowCommands runs DoReflow inside a
nsAutoScriptBlocker scope though, so all of the above is protected
by that.  Moving the SetViewVisibility call to a reflow callback
which isn't run with such protection seems like it might mess up
other callbacks that follow.
(Assignee)

Comment 56

5 years ago
(In reply to Mats Palmgren [:mats] from comment #54)
> I don't think SetViewVisibility() can destroy the frame.  If it can
> then this code isn't safe.  AFAICT, this bug is only about a fatal
> assertion in debug builds, am I missing something?

The stack in an attachment in this bug shows how SetViewVisibility can enter painting, and painting can flush via PresShell::WillPaint.
I think we need a couple of changes for this to be safe as a reflow
callback.  First, create a helper class to use as the callback that
has a nsWeakFrame member for the MenuPopupFrame.  Second, wrap the
loop in HandlePostedReflowCallbacks in a nsAutoScriptBlocker scope.

If that doesn't work for some reason then I think we should take
the nsPopupShowEvent solution.
(Assignee)

Comment 58

5 years ago
(In reply to Mats Palmgren [:mats] from comment #57)
> If that doesn't work for some reason then I think we should take
> the nsPopupShowEvent solution.

I'm actually thinking that the first patch, to do view visibility changes async through pending updates on the view manager (called "Updated patch"), is now the most desirable approach.
OK, I'll review that instead.
(sorry for wasting your time with the callback suggestion)
Yeah, I like that approach better too.  Feel free to ask me for review when you
and Ben have fixed the regressions.  One thing -- in the added code at the end of
DoResetWidgetBounds, can 'this' be dead there? if so, using mWindow isn't safe.
You should probably use a strong ref on the stack for the widget.

Updated

5 years ago
Attachment #741478 - Flags: review?(matspal)
(Assignee)

Comment 61

5 years ago
Comment on attachment 740431 [details] [diff] [review]
ignore popup move/resize on hidden widgets

We'll need this anyway to land the above patch. I'd like to land this patch separately since it can land now.

In nsMenuPopupFrame::LayoutPopup we resize the view and then set the view visibility to show and then set the menu popup state to ePopupOpenAndVisible.

Before the above patch the resize would be queued and not flushed to the widget. But when we set the view visibility to show it would flush the resize, and then call Show(true) on the widget. The OS sends us a resize notification that gets handled by nsXULPopupManager::PopupResized/Moved, it is ignored because the menu popup state is not ePopupOpenAndVisible. And then set the menu popup state to ePopupOpenAndVisible.

After the above patch the resize is queued and not flushed, and the view show is queued and not flushed. And then set the menu popup state to ePopupOpenAndVisible. Then when the resize and show is flushed to the widget the OS sends us back a resize notification (we don't care about it as long as it matches what we asked for). Then when we get to nsXULPopupManager::PopupResized/Moved we don't bail out because the menu popup state is ePopupOpenAndVisible now.

This whole thing is a little bit fragile. We do need a better way to tell resize notifications that come from our own actions and are what we expect to any other type, but we don't have that currently.
Attachment #740431 - Flags: review?(matspal)

Updated

5 years ago
Attachment #740431 - Flags: review?(matspal) → review+
(Assignee)

Comment 62

5 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/e116c3a34764
Whiteboard: [snappy:p1] → [snappy:p1][leave open]
(Assignee)

Comment 63

5 years ago
(In reply to Mats Palmgren [:mats] from comment #60)
> One thing -- in the added code at the end of DoResetWidgetBounds,
> can 'this' be dead there? if so, using mWindow isn't safe.
> You should probably use a strong ref on the stack for the widget.

So if we make the assumption that
1) calling widget->Show(true) when the view was visibility hidden
2) calling Resize/Move on a widget that is current visible
are the only things that do anything unsafe (ie painting/flushing) and that Show, Reize, Move don't need someone to be holding a ref to their widget then I think the code is ok.

The mWindow member on the view holds a ref, so that should keep it alive as long as it's not null.

If the visibility is not changing in DoResetWidgetBounds (ie we make no Show calls) then only the Move/Resize calls can be unsafe. But we don't use anything but local variables after those calls, so we should be fine.

If we are hiding the window we assumed Show(false) is safe, and the Move/Resize calls are happening on a hidden widget, so should be safe.

If we are showing the window the Move/Resize calls should be safe because they happen while the widget is hidden.

But I agree that this is complicated and fragile, and we have to make those assumptions, so we can add some precautions
Yeah, sounds very fragile to me.
I think we should use a local strong ref variable instead of mWindow.
https://hg.mozilla.org/mozilla-central/rev/e116c3a34764
(Assignee)

Comment 66

5 years ago
Created attachment 743479 [details] [diff] [review]
use view subsystem pending updates to handle show/hides

So I held the widget in a local and used that, but that doesn't make us safe from this getting destroyed still: mViewManager, mVis, mDimBounds, CalcWidgetBounds are all used after we hide the widget but before we resize/move or show it. I can make it completely safe if you'd prefer.

The widget show/hide will still happen before we hit the event loop after the reflow: the reason being that FlushPendingNotifications calls ProcessPendingUpdates after reflowing. ResizeReflowIgnoreOverride can still reflow without calling ProcessPendingUpdates after (perhaps it should?) but resizing the window and opening a popup should be very rare cases.
Attachment #735594 - Attachment is obsolete: true
Attachment #743479 - Flags: review?(matspal)
(Assignee)

Comment 67

5 years ago
I went through the other issues that might need to be resolved before landing this.

The part 2 patch in bug 562138 shouldn't needed to fix this, I think it should only be needed to fix bug 562138 (a white flash when showing popup windows). We can of course try landing it later.

The regressions caused by the original landing of bug 562138. There were two types: a crash, and popup window sizing/positioning issues.

The crash had no surviving stack (only a link to crash-stats that is now dead) and no one debugged it, so it's hard to tell either way. We have fixed a lot of painting related crashes in view code recently, so I hopeful this won't show up again.

The popup widget sizing/positioning issues. The other patch that landed in this bug fixed one such issue. In addition I've landed patches that fixed similar sounding issues caused by other changes since then. So I'm also hopeful these won't be a problem again this time.
> I can make it completely safe if you'd prefer.

Yes please.
(Assignee)

Comment 69

5 years ago
Created attachment 743934 [details] [diff] [review]
use view subsystem pending updates to handle show/hides
Attachment #743479 - Attachment is obsolete: true
Attachment #743479 - Flags: review?(matspal)
Attachment #743934 - Flags: review?(matspal)
Created attachment 744080 [details] [diff] [review]
addendum?

Might be simpler to just calculate the bool up front (and re-use it)
instead of stashing mDimBounds / mVis.
Comment on attachment 743934 [details] [diff] [review]
use view subsystem pending updates to handle show/hides

Looks good, r=mats (with/without the suggested addendum)
Attachment #743934 - Flags: review?(matspal) → review+
(Assignee)

Comment 72

5 years ago
Comment on attachment 744080 [details] [diff] [review]
addendum?

Good idea, thanks! I'll include this.
Attachment #744080 - Flags: review+
(Assignee)

Comment 73

5 years ago
I separated out all the new code I was writing from Markus' original patch and added Mats addendum.

https://hg.mozilla.org/integration/mozilla-inbound/rev/3ff6533160bd
https://hg.mozilla.org/integration/mozilla-inbound/rev/6151db377f70
https://hg.mozilla.org/integration/mozilla-inbound/rev/c0f3313e7759
(Assignee)

Updated

5 years ago
Whiteboard: [snappy:p1][leave open] → [snappy:p1]
(Assignee)

Updated

5 years ago
Attachment #741478 - Attachment is obsolete: true
https://hg.mozilla.org/mozilla-central/rev/3ff6533160bd
https://hg.mozilla.org/mozilla-central/rev/6151db377f70
https://hg.mozilla.org/mozilla-central/rev/c0f3313e7759
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla23
Thanks for fixing this one, Timothy (and for the assistance, Mats).
Comment hidden (Treeherder Robot)
(Assignee)

Updated

4 years ago
Depends on: 869151
(Assignee)

Updated

4 years ago
Blocks: 264231
(Assignee)

Updated

a year ago
Depends on: 1172302
You need to log in before you can comment on or make changes to this bug.