Closed Bug 139294 Opened 22 years ago Closed 22 years ago

Switching from a tabbed mozembed to another causes the first to be hidden and never shown again

Categories

(Core Graveyard :: Embedding: GTK Widget, defect)

x86
Linux
defect
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mozillabugs.philipl, Assigned: blizzard)

References

()

Details

(Keywords: topembed)

Attachments

(1 file, 4 obsolete files)

This has occured for me in both galeon and TestGtkEmbedNotebook.

Note: It is not related to 137685 in so far as the patch attached to that bug 
does not fix this problem. However, I first observed this bug at the same time
as 137685.

Steps:

1) Start TestGtkEmbedNotebook with a url argument, eg: http://www.mozilla.org.
2) Switch from the message box tab to the browser tab.
3) Observe the web page load up.
4) Switch back to the message box tab.
5) Switch back to the mozembed tab.
6) Observe the lack of web page. Either black if 137685 patch unapplied or theme
background colour if applied (usually grey).

Expected behaviour: Still see the web page that was there before.

Note: If a tab is created 'behind' the current one and a url loaded into it, 
it will load fine so that if you switch to the tab you will see the web page.
However, as soon as you switch away from the tab, it will die as described.

In testing in galeon, the mozembed appears to be totally dead. You cannot reload
it's contents or attempt to load a new page. It does not respond at all.

This renders tabbed browsing impossible.
Blocks: 98995
After doing some timewarping, I've identifed the commit which caused this bug.
It's a big commit, so I'm still trying to pick through it and see what needs
to be done.

Here is the bonsai query for the commit:

http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=2002-04-16+21%3A16&maxdate=2002-04-16+21%3A30&cvsroot=%2Fcvsroot

which was related to bug 52334 which is a rather involved one.

But it is related to redraw and visibility so I can see how it would cause this
bug.
Ok, after due poking around, I can avoid this bug by diabling
|EmbedPrivate::Hide| but that simply calls |nsIBaseWindow::SetVisibility| with
PR_FALSE. nsIBaseWindow is implemented by
embedding/browser/webBrowser/nsWebBrowser which I haven't fully worked out yet.

Right now it seems that the visibility setting mechanism is asymetric and once
set to hidden, can't be set back to visible.

So, this is not a bug with the gtk embedding widget, but I'm not sure who's in
change of embedding/browswer/webBrowser/nsWebBrowser
Summary: Switching from a tabbed mozembed to another causes the first to die. → Switching from a tabbed mozembed to another causes the first to be hidden and never shown again
Well, after more poking around, I've isolated the bug as being caused by the
changes to |DocumentViewerImpl::Show| and |DocumentViewerImpl::Hide| in
content/base/src/nsDocumentViewer.cpp in the commit.

The old versions of |Show| and |Hide| are pretty simple; the new ones do all
sorts of initialisation and cleanup. 

From the cursory testing using TestGtkEmbedNotebook modified to dump a few
messages, it appears that neither implementation of |Show| and |Hide| is working
correctly. The old code seems to ignore attempts to get visibility to FALSE and
the new code ignores attempts to set to TRUE, although the initial visibility is
TRUE, the first attempt to change visibility to any value sets it to FALSE,
and there is remains.

Mozilla-the-browser is unaffected by this bug because it never calls |Hide|
when a tab is sent to the back, rather it calls |Show| when a tab is brought to
the front. Perhaps this bug is caused by a miscommunication as to the true
meaning of |nsIBaseWindow::Show| and |nsIBaseWindow::Hide| The new
nsDocumentViewerImpl code for |Show| and |Hide| looks more like initialisation
and shutdown code than visibility control to me, and the fact that
mozilla-the-browser doesn't call |Hide| when visibility is lost is consistent
with this. That would suggest that the correct fix to stop calling
|nsIBaseWindow::Hide| in EmbedPrivate. 

As such, we can summarise thus:

|nsDocumentViewerImpl::Show| fails if called after |nsDocumentViewerImpl::Hide|.

The bug is either a) |Show| should not fail. or b) |Hide| should not be called
unless the window is being destroyed.

If this bug is b), I can whipe a patch for EmbedPrivate up very quickly. a) is a
different matter. :-)

I hope you guys are reading this; This bug is topembed as far as galeon is
concerned!
Sorry,

mixed up some interfaces, this has been a long day. :-)

References to |nsIBaseWindow::Show| and |nsIBaseWindow::Hide| are of course
wrong. These methods don't exist. Rather I mean |nsIBaseWindow::SetVisibility|
which in turn, eventually, lead to calls to |nsIContentViewer::Show| and
|nsIContentViewer::Hide|.
Seems like there probably is a bug in DocumentViewerImpl::Hide() and ::Show(),
it's unclear what those methods really mean now, and that caused bugs that showd
up in mozilla as well (similar to this one). The problem in the other case was
IIRC that the parent widget on the nsIBaseWindow (docshell) was set to null when
the document viewer was hidden, and in order to show a document viewer, it needs
the base window to have a parent widget (note that parent widget here doesn't
really mean *parent* widget, it really means just widget). The reason for all
the teardown/initialization code in ::Hide() and ::Show() is that that code is
called when iframes on webpages are hidden/shown, in the case where an iframe is
hidden we want to throw away the whole presentation and the widget for that
iframe is then also destroyed since the layout frame and view (which holds the
widget) is torn down as well. When an iframe is shown again, we create a new
widget and a completely new presentation for the content of that iframe.

There is a fix in place for this already though, document viewers can now be
marked as "sticky", that way you can safely hide them w/o them completely
tearing down the presentation, and it sounds like that's exactly what you want
here. To do this, call nsIContentViewer::SetSticky(PR_TRUE) on the content
viewer (document viewer) when it's created in a case where it might be used in a
scenario like the one described here.
Johnny,

Thanks! Depending on what Chris is inclined towards, we can either stop
|Hide|ing windows, or use |Sticky|. I'll put together a patch using |Sticky| today.
Heh,

Well, I don't think I can currently make a call to |nsIContentViewer::SetSticky|
from gtkmozembed. nsWebBrowser owns the docshell but there is no way for
gtkmozembed to get a reference to the docshell or an interface that can be
|do_QueryInterface|ed to docshell. Either nsWebBrowser needs to provide a
|GetDocShell| or a |SetSticky| that propagates down, both of which would require
changes to interfaces. Alternately, nsWebBrowser can always set sticky to be
true. I don't know whether this would be appropriate or not.
Blocks: 143200
No longer blocks: 143200
Ping for updates:

At the very least, someone could confirm this bug according to the original test
case.

Also, we need to decide what needs fixing: Do we stop gtkmozembed calling
|nsIBaseWindow::SetVisibility(PR_FALSE)| when a tab is de-focussed, which is what
mozilla-the-browser seems to do, or do we change interfaces so that
|nsIConentViewer::SetSticky| is successively exposed up to nsIBaseWindow so that
it can call it, hopefully allowing Hide to work.

I'm more in favour of the first fix. It's a minimally invasive one line job
and there's an existing precedent for doing so.

This bug will not go away if ignored long enough. 1.1 will branch in due course
and then galeon is screwed.
*sigh* ping.

1.1alpha. Total b0rkage, etc etc. I really don't want to deal with all the
complaints that will appear if 1.1 goes out in this state. 1.1alpha is going
to be bad enough as it is.

Attached patch The one line fix. (obsolete) — Splinter Review
More, it would seem, for galeon users who come here in search of help than
anything else...
Attached patch The one line fix. (obsolete) — Splinter Review
More, it would seem, for galeon users who come here in search of help than
anything else...
Hmm. damn bugzilla.
Umm, this, in fact, didn't fix the problem at all. :-(
Have we misdiagnosed the cause of the bug?
bz, here's the galeon related bug, please have a look today if you have time, if
not we'll look at it tomorrow...
*ping* Time is an abstract concept.
Keywords: patch, topembed
Doh.  This fell through the cracks...

I just spent a while trying to set up an env in which I can usefully debug
this....  I have a RedHat 7.3 system, and I'd rather not try installing all of
RawHide on top of it.  So given that, what's the simplest setup in which I could
reproduce this with a debug Galeon build and debug Mozilla libs?  Would building
against a debug 1.0 build work (and using current release Galeon)?  Or do I need
to build against trunk?

Also, does this require Galeon or is there some other way I can test this using
only things normally present in a Mozilla tree?
You need a debug trunk build.  I don't think this is present on the branch.  If
you pull the galeon-1-2 branch out of gnome's cvs I think it will work with the
trunk.  Well.  It might, depending on how much of printing is different.
I pulled the "galeon" module from
":pserver:anonymous@anoncvs.gnome.org:/cvs/gnome".  I then tried to build it. 
It needs a few neat things like a newer autoconf (installed that), then a new
gnome-common (installed that), then a libgnomeui-2.0 (at which point I made that
comment, since like I said, I'd rather avoid installing all of RawHide).

Should I be pulling a branch of that module instead of the tip?  If so, what's
the tag?
You can test without using galeon at all using the repro case I included in the
my original description.

if you want to build galeon anyway, you need to check the galeon-1-2 branch out
of cvs to work with mozilla trunk which is where this bug is present.

mozilla 1.0.x branch does not have this bug and the 1.1a+ snapshot release is
aflicted by that helper bug you helped sort out; though you can use it for
testing this bug if really want.

You will need a complete gnome1 library installation and up-to-date auto* tools
to do this.
Doh.  I should read more carefully. :(  TestGtkEmbedNotebook reproduces the
problem just fine.  Thank you very much; I'll start poking at this...
Attached patch Patch that _does_ fix this (obsolete) — Splinter Review
OK... so this is just like jst's patch except it also changes the _other_
constructor of DocumentViewerImpl -- the one used by NS_NewDocumentViewer
(whose bright idea was it to not have the constructors next to each other in
the C++?).

My first instinct is that the _real_ fix here is to eliminate this constructor
and make NS_NewDocumentViewer pass a null prescontext to the sole remaining
constructor... as it is, the "no argument" constructor does not do any of the
initialization the other one does...

I've verified that this fixes the problem in TestGtkEmbedNotebook
Attachment #87361 - Attachment is obsolete: true
Attachment #87362 - Attachment is obsolete: true
Attachment #87377 - Attachment is obsolete: true
Comment on attachment 92075 [details] [diff] [review]
Patch that _does_ fix this

r=philipl

Verified to work in galeon.
Thanks Boris!
Attachment #92075 - Flags: review+
And I have to agree on the silliness of having two constructors; This is what
defaults parameters are for, and as you say, even that's not actually needed.
Comment on attachment 92075 [details] [diff] [review]
Patch that _does_ fix this

 DocumentViewerImpl::DocumentViewerImpl()
+  : mIsSticky(PR_TRUE)

<mefeelsstupid>Duh!</mefeelsstupid>

- In nsFrameFrame.cpp

...
>+    // Mark the content viewer as non-sticky so that the presentation
>+    // can safely go away when this frame is destroyed.
>+
>+    //    content_viewer->SetSticky(PR_FALSE);

You didn't mean to leave this commented out did you?

sr=jst
Attachment #92075 - Flags: superreview+
Let's get an aye-firmative on that "yes, I didn't mean to leave the then-part of
that if statement commented out" and get this in for 1.1beta.

/be
Comment on attachment 92075 [details] [diff] [review]
Patch that _does_ fix this

a=asa (on behalf of drivers) for checkin to 1.1
Attachment #92075 - Flags: approval+
"yes, I didn't mean to leave the then-part of that if statement commented out".  ;)

Patch checked in on the trunk for 1.1.
Did this make the cut for 1.1 beta? Can I mark fixed?
It didn't make beta...:(  It did make 1.1 final, however (when that comes out).

As for marking it fixed, that's your call.. I left it open in case we want to
address the "too many constructors" issue, but that may be best spun off into a
separate bug.
too many constructors spun off to bug 158804.

marking FIXED.
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Blocks: 152429
This fix may have caused bug 159155 / bug 152429. Recent nightlies are crashing
with:

Gdk-ERROR **: BadDrawable (invalid Pixmap or Window parameter)
  serial 5504 error_code 9 request_code 144 minor_code 3
Re-opening.

I don't know if it's a corner case or related to the fix for bug 152429 but I'm
seeing this bug again on occasion. I'm finding it very difficult to get a re-pro
but I think it's related to application level visibility. Sometimes I'll switch
back to galeon from another app and the current tab or a hidden one will die.
The latest occurence saw the visible tab and another one both killed at once.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Ok, 

the corner case is plugins. If a page uses a plugin such as java or flash and it
loses visibility then it will be dead when it regains visibility. Interestingly
if the page with the plugin is framed, this does not happen, it's only if the
plugin is part of the main page.

for a quick re-pro, go to http://www.macromedia.com (with flash obviously :-)
Do you mean the page is disabled?  Or just the plugin?
The page is disabled exactly as according to the original bug report. The fix
for bug 152429 is the most likely cause as it disables sticky for plugins; It
would seem that it disables it for the frame that contains the plugin,
unfortunately this frame can be the whole page, effectively leaving us back at
square-one with respect to this bug.

This is going to miss 1.1 isn't it....
Blocks: 1.1
That patch was by serge, reviewed by jst and bzbarsky.  Guys, this regression is
on the 1.1 list.  Please take the time to fix this regression.  Thanks.
Well... what exactly is the deal with plugins?  Do they need to have the viewer
always non-sticky?  Or what?

Here's the situation as I understand it:

1)  Now that bug 159268 is fixed, the patch from bug 152429 can be safely backed
    out without reintroducing the crash described in that bug (I just tested
    that on the testcase there and all the dups).
2)  Backing that patch out fixes TestGtkEmbedNotebook on pages with flash in
    them for me.  So far so good.
3)  One possible problem is that showing and hiding a tab with a plugin in it in
    Galeon will crash in exactly the same way Mozilla was crashing.  I tested
    with TestGtkEmbedNotebook and it seems to work dandy -- no crashes.
4)  I tested plugins in top-level windows in moz after backing out the patch in
    question. Reloading, loading new documents, closing window.  No crashes.

So I think backing out bug 152429 is safe, but

A)  Confirmation from Philip that backing out the fix for bug 152429 fixes the
    problem he's seeing and does not make Galeon crash on hiding and reshowing a
    tab with a plugin in it would be nice.

B)  Confirmation from serge that this is a non-issue for plugins in top-level
    documents would be nice.

would be nice....
I can confirm that backing out bug 152429 has fixed the regression for pages
with plugins while not reintroducing the crash for plugin pages.
OK. Then I say we back it out if Serge is OK with it.
well, I did not dig down the cause of crash in bug 152429, but I can confirm
backing out check in for bug 152429, while patch for bug 159268 is in, does not
regress to crash, so I think backing out is safe.
Attached patch Here we go againSplinter Review
Could we have the usual sprinkling of letters on this, please?
Attachment #92075 - Attachment is obsolete: true
Comment on attachment 94057 [details] [diff] [review]
Here we go again

r = philipl

What fun eh?
Attachment #94057 - Flags: review+
Comment on attachment 94057 [details] [diff] [review]
Here we go again

sr=blizzard
Attachment #94057 - Flags: superreview+
Comment on attachment 94057 [details] [diff] [review]
Here we go again

a=asa (on behalf of drivers) for checkin to the 1.1 branch
Attachment #94057 - Flags: approval+
Checked in, 1.1 branch and trunk.
Status: REOPENED → RESOLVED
Closed: 22 years ago22 years ago
Resolution: --- → FIXED
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: