Closed Bug 891424 Opened 7 years ago Closed 7 years ago

Modifying XUL panel while open causes the browser to hide

Categories

(Core :: Widget: Cocoa, defect, critical)

x86
macOS
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla25
Tracking Status
firefox23 + verified
firefox24 + verified
firefox25 + verified

People

(Reporter: mkaply, Assigned: tnikkel)

References

Details

(Keywords: regression)

Attachments

(2 files, 1 obsolete file)

After the landing of bug 869151, our add-on popup panel doesn't work properly.

As the panel gets changed by typing in the autocomplete, the browser suddenly goes away although the panel still is displayed.

https://addons.mozilla.org/en-US/firefox/addon/webde-mailcheck/
Blocks: 869151
To avoid duplicating work, this has been regressed down to a single day's changesets or specifically to bug 869151?
Here's a video of the problem.

www.youtube.com/watch?v=ZhXNz81-Q1g

Single day's changesets. It's between:

2013-05-09-03-10-47-mozilla-central
and
2013-05-10-04-16-06-mozilla-central

I'll go through the chnagesets and see if there is anything else.
I'd bet this is related to bug 886317 on the UX branch.
I have verified that when I back out the patch from 869151, I go back to the "can't close xul popups" behavior, and I don't get any of the disappearing browser problem on typing.
Attached patch patch (obsolete) — Splinter Review
Awesome, thank you Michael. This patch will probably fix it. It backs out bug 869151 and fixes it in a different, safer, way. It would be great if you could confirm that.
Assignee: nobody → tnikkel
Attachment #772806 - Flags: review?(smichaud)
I can confirm that the patch fixes the original problem (unable to close the XUL panel) and doesn't cause the new problem I was seeing.
Great, thank you.
No, thank you for the patch.

I'm really curious how this was able to go undetected for so long though...
Me too. I specifically monitored new bug reports after the original patch landed to try to catch any possible regressions as soon as possible.
Comment on attachment 772806 [details] [diff] [review]
patch

Looks fine to me.

+  // nsBaseWidget::Destroy get removes us from GetParent(), but we don't
+  // implement GetParent so we remove from our parent here.

This comment is a bit garbled, though.  How about something like this:

"nsBaseWidget::Destroy() calls GetParent()->RemoveChild(this).  But we don't implement GetParent(), so we need to do the equivalent here."
Attachment #772806 - Flags: review?(smichaud) → review+
Thanks, that a much better comment. I wrote it quickly and forgot to improve it later.
It's utterly bizarre that implementing nsCocoaWindow::GetParent() caused/triggered this bug.

I expect the real bug is elsewhere, in code that assumes nsCocoaWindow objects can't have parents.  I'm not particularly eager to spend time looking for it, though :-(
Bug 869151, comment 22 provides a good starting point for why this would make a difference like this. Either way we need a patch suitable for beta and further deeper fixes are likely not a good idea for that. But this patch certainly doesn't stop us from doing that more involved work if we decide we want to.
Comment on attachment 772806 [details] [diff] [review]
patch

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 869151
User impact if declined: panels (things that popup over the main firefox window) will be very problematic in some cases on os x
Testing completed (on m-c, etc.): just landed
Risk to taking this patch (and alternatives if risky): this patch backs out bug 869151 and fixes bug 869151 in a safer way. I think that the testing it will get before release will be sufficient to expose any unexpected issue it might cause.
String or IDL/UUID changes made by this patch: none
Attachment #772806 - Flags: approval-mozilla-beta?
Attachment #772806 - Flags: approval-mozilla-aurora?
nsBaseWidget::RemoveChild checks that aChild->GetParent() == this, which is of course false. The function works perfectly fine in this case though.
aChild->GetParent() returns NULL, since nsCocoaWindow doesn't implement/override GetParent().  Maybe we should just change the assertion as follows:

nsIWidget* parent = aChild->GetParent();
NS_ASSERTION(!parent || parent == this, "Not one of our kids!");
Attached patch patch v2Splinter Review
Yeah, that seems like the most reasonable thing to do.
Attachment #772806 - Attachment is obsolete: true
Attachment #772806 - Flags: approval-mozilla-beta?
Attachment #772806 - Flags: approval-mozilla-aurora?
Attachment #772948 - Flags: review?(smichaud)
Comment on attachment 772948 [details] [diff] [review]
patch v2

You still need to fix that comment in nsCocoaWindow::Destroy :-)
Attachment #772948 - Flags: review?(smichaud) → review+
I actually did fix it before landing, that patch was just outdated. I'll make sure the correct comment gets landed.
I think I'm going to make the assertion hack in nsBaseWidget::RemoveChild ifdef'd to mac only so we limit the hackiness to the one platform that needs it.
Comment on attachment 772948 [details] [diff] [review]
patch v2

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 869151
User impact if declined: panels (things that popup over the main firefox window) will be very problematic in some cases on os x
Testing completed (on m-c, etc.): just landed
Risk to taking this patch (and alternatives if risky): this patch backs out bug 869151 and fixes bug 869151 in a safer way. I think that the testing it will get before release will be sufficient to expose any unexpected issue it might cause.
String or IDL/UUID changes made by this patch: none
Attachment #772948 - Flags: approval-mozilla-beta?
Attachment #772948 - Flags: approval-mozilla-aurora?
(In reply to comment #22 and comment #23)

> I actually did fix it before landing, that patch was just
> outdated. I'll make sure the correct comment gets landed.

OK.

> I think I'm going to make the assertion hack in
> nsBaseWidget::RemoveChild ifdef'd to mac only so we limit the
> hackiness to the one platform that needs it.

Good idea.
https://hg.mozilla.org/mozilla-central/rev/53104f8608a2
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla25
Attachment #772948 - Flags: approval-mozilla-beta?
Attachment #772948 - Flags: approval-mozilla-beta+
Attachment #772948 - Flags: approval-mozilla-aurora?
Attachment #772948 - Flags: approval-mozilla-aurora+
Keywords: verifyme
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:23.0) Gecko/20100101 Firefox/23.0

This issue is still reproducing on FF 23 beta 5 (build ID: 20130711122148).
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
QA Contact: cornel.ionce
I just built a nightly from beta-src and it's working for me...
Cornel, do you think you could double check your results given comment 30?
Flags: needinfo?(cornel.ionce)
Keywords: qawanted
I've checked again on beta and also on latest Nightly (build ID: 20130714030202) and issue is still reproducing.
Flags: needinfo?(cornel.ionce)
Keywords: qawanted
Are you using the MailCheck add-on to test?
Yes, using the link provided in description
Cornel, perhaps you could try the 2013-05-09 and 2013-05-10 nightlies to see if that was when the issue was introduced for you?
Flags: needinfo?(cornel.ionce)
The issue started to reproduce in 2013-05-10 build, it WFM in 2013-05-09 nightly.
Flags: needinfo?(cornel.ionce)
Tim, I can reproduce the problem on the official beta and official aurora builds.  I cannot reproduce it on my local beta build, with or without your change.
By the way, in my case, the STR is:
1. Install the above add-on
2. Start the browser
3. Type some text into the "Enter search term" in the add-on
4. Backspace to remove all the text.

As the last of the text is gone, the whole window (all tabs) disappears.
(In reply to Milan Sreckovic [:milan] from comment #38)
> Tim, I can reproduce the problem on the official beta and official aurora
> builds.  I cannot reproduce it on my local beta build, with or without your
> change.

Ah, very interesting. Something in the official builds vs your (and Mike's) local builds might be causing the difference we are seeing then. Could be debug vs opt maybe, or the version of the SDK being used?

Also, thanks for the STR.
For my local build "empty" .mozconfig file, so I imagine it isn't debug, but it would have other differences.  Let me know what info you need about my setup.
Hmm, that seems to match the mozconfigs we use from the tree here http://mxr.mozilla.org/mozilla-central/source/browser/config/mozconfigs/

It looks like we are using the 10.6 sdk for official builds from here http://mxr.mozilla.org/mozilla-central/source/build/macosx/universal/mozconfig.common#15 I bet most people building locally are using newer sdks than that, so that could be the reason. I don't know what other differences there are.
Seems like we have the 10.7 sdk installed on our build slaves as a 10.7sdk-using build seems to have completed fine (not so for a 10.8 try) and it's at http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/tnikkel@gmail.com-8ac5f8518cc7/try-macosx64/
Hmm, I though I was using 10.6 (I have it installed, together with 10.7 and 10.8), but is there a way to tell from the build?  Which of the two builds form comment 37 has the fix?
(In reply to Timothy Nikkel (:tn) from comment #37)
> Could you try these builds please to help me track down what is going on?
> http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/tnikkel@gmail.com-
> f52d3d2d27fc/try-macosx64/

Works for me with this build.
(In reply to Timothy Nikkel (:tn) from comment #43)
> Seems like we have the 10.7 sdk installed on our build slaves as a
> 10.7sdk-using build seems to have completed fine (not so for a 10.8 try) and
> it's at
> http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/tnikkel@gmail.com-
> 8ac5f8518cc7/try-macosx64/

Works for me with this build as well.
(In reply to Milan Sreckovic [:milan] from comment #46)
> (In reply to Timothy Nikkel (:tn) from comment #43)
> > Seems like we have the 10.7 sdk installed on our build slaves as a
> > 10.7sdk-using build seems to have completed fine (not so for a 10.8 try) and
> > it's at
> > http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/tnikkel@gmail.com-
> > 8ac5f8518cc7/try-macosx64/
> 
> Works for me with this build as well.

Since that only thing different in that build from trunk is using the 10.7 sdk that means the sdk version is definitely at play here. Steven, do you know of any differences between the 10.6 and 10.7 sdk that might be "fixing" this issue?
Flags: needinfo?(smichaud)
(In reply to Milan Sreckovic [:milan] from comment #45)
> (In reply to Timothy Nikkel (:tn) from comment #37)
> > Could you try these builds please to help me track down what is going on?
> > http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/tnikkel@gmail.com-
> > f52d3d2d27fc/try-macosx64/
> 
> Works for me with this build.

The build removes the RemoveChild call:
https://hg.mozilla.org/try/rev/5e5fc7daf5dd
Which SDK you use also makes a difference with bug 886317 -- that bug happens when linking to the 10.6 SDK, but not when linking to the 10.7 or 10.8 SDK.  But I have no idea *why* it makes a difference, and it would probably take a lot of time (and reverse engineering) to find out.
Flags: needinfo?(smichaud)
Keywords: verifyme
We don't have a lot of time to go digging here, so this is a patch that I'm going to propose. I still need to verify that it doesn't regress bug 869151, but I think it shouldn't.
Attachment #778284 - Flags: review?(smichaud)
(In reply to Timothy Nikkel (:tn) from comment #37)
> Could you try these builds please to help me track down what is going on?
> http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/tnikkel@gmail.com-
> f52d3d2d27fc/try-macosx64/
> http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/tnikkel@gmail.com-
> a3c4f8df9af9/try-macosx64/
> Thanks.

I could not reproduce this issue on these builds but for the first build, the list with suggestions won't disappear even if I delete all the text from the searchbox.
The second build works fine for me.
Flags: needinfo?(cornel.ionce)
(In reply to Timothy Nikkel (:tn) from comment #43)
> Seems like we have the 10.7 sdk installed on our build slaves as a
> 10.7sdk-using build seems to have completed fine (not so for a 10.8 try) and
> it's at
> http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/tnikkel@gmail.com-
> 8ac5f8518cc7/try-macosx64/

Also, this one works perfectly.
Verified on FF 23 beta 7 (build ID: 20130718163513) and reproducing.

Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:23.0) Gecko/20100101 Firefox/23.0
Keywords: verifyme
Comment on attachment 778284 [details] [diff] [review]
Call Show(false) in Destroy instead of removing from the sibling list

I don't like this.  If we don't call mParent->RemoveChild(this), the parent will be left with a dangling pointer to the child.  And yes, this has caused crashes in the past.

Why not just add the call to Show(false) and leave the call to mParent->RemoveChild(this)?
> And yes, this has caused crashes in the past.

Actually, I'm not sure that's true in this case.  But I know I've seen crashes in other cases of a parent object having dangling pointers to child objects.
(In reply to Steven Michaud from comment #54)
> I don't like this.  If we don't call mParent->RemoveChild(this), the parent
> will be left with a dangling pointer to the child.  And yes, this has caused
> crashes in the past.

Yeah, I don't like it either. :(

I don't think it will cause dangling pointers though, the child and sibling pointers are strong (http://mxr.mozilla.org/mozilla-central/source/widget/nsIWidget.h#1756) so we will just leak these windows, which is what we were doing all along before bug 869151 landed.

> Why not just add the call to Show(false) and leave the call to
> mParent->RemoveChild(this)?

It seemed like the RemoveChild call was causing the problem, and if we call RemoveChild then the window will get destroyed and won't remain on screen, so the Show(false) call seems unnecessary. But maybe there is some weird quirk in the 10.6 sdk so I've pushed a patch to try to get a build that does this to see if it has the problem or not.
Comment on attachment 778284 [details] [diff] [review]
Call Show(false) in Destroy instead of removing from the sibling list

> I don't think it will cause dangling pointers though, the child and
> sibling pointers are strong
> (http://mxr.mozilla.org/mozilla-central/source/widget/nsIWidget.h#1756)
> so we will just leak these windows, which is what we were doing all
> along before bug 869151 landed.

OK, your points taken.

Maybe I'll eventually have time to reverse engineer the difference
between the 10.6 SDK and later SDKs, and get at the true cause.  But
in the meantime it makes sense to try to hack around it.

If we keep having problems, that may force my hand.  But reverse
engineering projects can take a long time -- a week or more.  So I'm
reluctant to start on another one without a compelling reason.
Attachment #778284 - Flags: review?(smichaud) → review+
(In reply to Timothy Nikkel (:tn) from comment #56)
> > Why not just add the call to Show(false) and leave the call to
> > mParent->RemoveChild(this)?
> 
> It seemed like the RemoveChild call was causing the problem, and if we call
> RemoveChild then the window will get destroyed and won't remain on screen,
> so the Show(false) call seems unnecessary. But maybe there is some weird
> quirk in the 10.6 sdk so I've pushed a patch to try to get a build that does
> this to see if it has the problem or not.

Just adding the Show(false) fixes this bug and doesn't seem to regress anything else in some quick basic testing. That actually makes sense because the original bug (bug 869151) happened because we destroyed the window before we could call Show(false) on it, and that seems like something that maybe an SDK update might fix. Removing the RemoveChild call still seems safer since that has never been in a release build and it means we actually deallocate window objects where we used to just leak them before. So I think I'll remove the RemoveChild call on beta (and maybe aurora?) but leave it in nightly because it is a desirable change to stop leaking windows so that it can get more testing.
I pushed the patch with the removal of RemoveChild to inbound so that that version of the patch gets some testing before uplift. I will file a followup at add it back in, this will also help for regression hunting later if needed (knock on wood).
Blocks: 896053
(In reply to Timothy Nikkel (:tn) from comment #60)
> I pushed the patch with the removal of RemoveChild to inbound so that that
> version of the patch gets some testing before uplift. I will file a followup
> at add it back in, this will also help for regression hunting later if
> needed (knock on wood).

Filed bug 896053 for this.
Comment on attachment 778284 [details] [diff] [review]
Call Show(false) in Destroy instead of removing from the sibling list

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 869151
User impact if declined: panels (things that popup over the main firefox window) will be very problematic in some cases on os x
Testing completed (on m-c, etc.): just landed
Risk to taking this patch (and alternatives if risky): I don't think we want to ship the panel's issue. That leaves taking this patch or backing out the bug that started all this, bug 851641. Bug 851641 was needed for bug 566746, I can't comment on the feasibility of disabling or backing out bug 566746, but it doesn't look simple.
String or IDL/UUID changes made by this patch: none
Attachment #778284 - Flags: approval-mozilla-beta?
Attachment #778284 - Flags: approval-mozilla-aurora?
https://hg.mozilla.org/mozilla-central/rev/a8051c99d7c3
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
Attachment #778284 - Flags: approval-mozilla-beta?
Attachment #778284 - Flags: approval-mozilla-beta+
Attachment #778284 - Flags: approval-mozilla-aurora?
Attachment #778284 - Flags: approval-mozilla-aurora+
This works for me in officially built nightly 2013-07-23.
(In reply to Milan Sreckovic [:milan] from comment #65)
> This works for me in officially built nightly 2013-07-23.

Thanks Milan, would you mind testing the latest Aurora and Beta builds to see if this is fixed for you?
Status: RESOLVED → VERIFIED
I still see the problem in Aurora (2013-07-23, 24.0a2)
(In reply to Milan Sreckovic [:milan] from comment #67)
> I still see the problem in Aurora (2013-07-23, 24.0a2)

Timothy, did this land in time for today's Aurora builds?
Flags: needinfo?(tnikkel)
(In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #68)
> (In reply to Milan Sreckovic [:milan] from comment #67)
> > I still see the problem in Aurora (2013-07-23, 24.0a2)
> 
> Timothy, did this land in time for today's Aurora builds?

No, it did not. So it will be in tomorrow's builds.
Flags: needinfo?(tnikkel)
I'll check aurora and beta tomorrow.
It works for me with 2013-07-24 Aurora and Beta.
You need to log in before you can comment on or make changes to this bug.