Closed Bug 89488 Opened 23 years ago Closed 22 years ago

Profile mgr and java cause hang (was: mozilla catatonic with Java installed)

Categories

(Core Graveyard :: Java: OJI, defect, P2)

x86
Linux
defect

Tracking

(Not tracked)

VERIFIED FIXED
mozilla1.0

People

(Reporter: siemsen, Assigned: dbaron)

References

Details

(Keywords: regression, Whiteboard: fixed on branch)

Attachments

(5 files)

I just installed Mozilla 0.9.2 on my Dell Laptop, Pentium III,
SuSE Linux 7.0, 2.4.5 kernel, XFree86 4.0.3, Gnome, sawfish.

Worked fine.  Then I installed the JRE 1.3 plug-in, and got "Installation
successful".  I exited Mozilla.  When I restarted, I got a message that
some part of Mozilla was still running, so I rebooted.  Now when I start
Mozilla, I get the "Select user profile" window as usual, and the

 I am inside the initialize
 Hey : You are in QFA Startup 
(QFA)Talkback loaded Ok.

...and then nothing.  I found the release note about putting a link
in my mozilla0.9.2/plugins directory for libjavaplugin_oji.so.
That seems to apply to a previous version of mozilla, because the
link was already there when I looked.  Note that the Release Notes
says "libjava.oji.so directory", which I think should say
"libjavaplugin_oji.so file".  Anyway I tried
renaming the link to something meaningless, to make mozilla ignore
the Java plugin and start working again, but no luck: it just hangs.

This is quite reproducible: I can't run mozilla at all now.
Is there a command-line argument I can use to produce verbose
debugging to give you more information than this?
thisis probably the profile manager issue where you cannot start with an old 
profile or something..i do not know the bug number.. ccing the qa
I believe you are beyond profiles here.  I think the console lines before what 
is included here are what profile passed in etc.  
ccing dbragg for insights on the first message - remove xpicleanup.dat?
I'd need to see the actual text of the "mozilla is running" message.  Was it
more like, "Mozilla needs to shutdown to allow a previous installation to finish"?  

Are you getting absolutely nothing when you start mozilla or are you getting a
message dialog everytime?
Seeing this on the commercial builds as well.  If you install the commercial
build  with custom or full install method and include the Java Plugin, this
hangs the browser on initial startup.  Changing severity to critical and adding
nsbranch keyword.  Confirming bug.  The last known set of builds that worked was
the July02 builds.  Trying to trace down what changed between July 02 and July 03.  
Severity: normal → critical
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: nsBranch
shannon, i believed you mentioned that you are seeing this on HPUX.  adding you
to Cc.
Is this a plugin issue or an installer issue? What happens if one manually
installs the Java plugin?
Peter,
I installed the plugin from http://home.netscape.com/plugins/jvm.html and the
browser started with no problems (using one single profile).  I see the problem
if I do a clean installation of the commercial and trunk bits on my linux system
(clean meaning I delete my .mozilla directory). If I install todays bits
(2001070510) with the "Recommended" option, browser starts up fine.  If choose a
full or custom, and specifically choose to install the JVM plugin with no
.mozilla directory on home directory, will reproduce this hang.
From what the reporter stated, he had multiple profiles in his .mozilla
directory.  After I installed the Java Plugin with multiple profiles in my
.mozilla directory, and attempted to restart the browser I can reproduce the
hang. So the issue happens in both cases...after plugin install (assuming
multiple profiles) and during initial browser install (with no profiles existing
on the system).
see bug 89188, sounds similar
*** Bug 89188 has been marked as a duplicate of this bug. ***
Cancel the comment about the july 02 builds working.  Just checked my build, and
I installed it without the java plugin (thus multi-profiles work).  I've been
seeing variations of this hang for a while (see bug 897843, and bugscape-bug
5992), hopefully this is narrowed down to the java plugin installation.
Whiteboard: critical for 0.9.2
reassign : dbragg? 
Assignee: av → dbragg
QA Contact: shrir → gbush
Uh, No one has verified that this is a java "installation" issue and not just a
java problem that I can see.  Does anyone know what the java installer is
putting on the system?  Sounds like the java installer (a Sun product as far as
I know) is putting things on the system that break Mozilla.
I don't know exactly who wrote the java installer or what it's SUPPOSED to be
doing but install issues are supposed to go to Syd.  reassigning.
Assignee: dbragg → syd
Blocks: 88893
adding bug this blocks
Syd,

See bug 88893 - N6 does not launch when using profile manager - with FULL setup
type build.  I confirmed by doing all my regular   Profile and Activation tests
with a recommended setup type.  Then I added Java to the recommended
installation and am unable to launch- even with profiles created in first set of
tests.
I can also reproduce the problem with a FULL setup   type install etc
see bug 5016- java installs ok on commercial trunk
Yeah but that was closed on 5/23.  It seems like this has broken since then. 
Can someone verify it's working on latest commercial branch/trunk?
Found a simple work-around!

The comments about different results depending on the existence of
multiple profiles made me try deleting the "default" profile, leaving
only my own profile.  This fixed the problem.

Thanks again!  I'm a bugzilla newbie, so I don't know the "proper"
disposition of this bug.  I suppose it should have a lower priority,
and be marked "work-around exists".

BTW, when the bug was alive, another "fix" was to run mozilla as root.

Thanks to all for the help!  Now if I can just figure out how to make
mozilla stop using massively ugly fonts...

Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → LATER
This is NOT working on the 2001070604 branch build for either K'Trina or me
Once I install Java whether by FULL setup type install, or by RECOMMENDED and 
add Java using the plugin, I cannot launch N6

Marking blocker - if  you want to use Java, you cannot launch with a new or 
migrated profile- not sure about old profile but will test 
Severity: critical → blocker
Status: RESOLVED → REOPENED
Keywords: nsbeta1, regression
Resolution: LATER → ---
Lisa,

Wanted you to be aware of this going into PDT today.  
PDT+.  We need to fully understand what's going on here.  Adding a Sun person to
the CC list...
Whiteboard: critical for 0.9.2 → PDT+
Reporter: Can you use other plugins other than Java Plugin such as flash media 
player?
Is there any eta that can be added to the status whiteboard?
Whiteboard: PDT+ → PDT+; no eta
Manually installing the flash player files ShockwaveFlash.class and
libflashplayer.so files in the plugins directory works fine using todays branch
builds.  Installed on single profile, and multiprofile environment.  The Java
plugin looks like the only plugin that results in a hang for multi-profiles and
initial install (install w/out .mozilla directory and migrate a 4.x profile).
I did a custom install - deselected Java, select Flash Player and profiles acted
as expected, app launched as expected
Looking
Status: REOPENED → ASSIGNED
Adding eng. mgr. of Java Plug-In group; don't know if this is relevant to him,
though.
*** Bug 64351 has been marked as a duplicate of this bug. ***
Attached file files we install
verified failure/success scenarios. Here are the files we install (did a diff of
the directory contents). Can someone verify there is nothing missing?
Second patch indicates that there are some missing files, can someone intimate
with the JRE plugin comment? This was a "typical" install followed by a visit to
http://home.netscape.com/plugins/jvm.html, followed by a directory compare
against the installed version that fails.
bug happens for me if I installer with the stub (native) installer if I have a 
profile already established on the machine. I fail to see the connection between 
profiles and the JRE at this point.
are we sure that we have the same version of the JRE on the smartupdate (it is
smartupdate, right) download page as we are currently shipping?  cc'ing jcall
and maier for help figuring out what we have on the download page.
Do we have two different Java versions here? I think we use 1.3.1 now but used
to use 1.3.0_x
Here are stack traces of the various threads when we are "hung"

#0  0x405592c7 in __poll (fds=0x830a1a8, nfds=3, timeout=9)
    at ../sysdeps/unix/sysv/linux/poll.c:63
#1  0x40369485 in g_main_poll () from /usr/lib/libglib-1.2.so.0
#2  0x40368dca in g_main_iterate () from /usr/lib/libglib-1.2.so.0
#3  0x403691cc in g_main_run () from /usr/lib/libglib-1.2.so.0
#4  0x4027fe57 in gtk_main () from /usr/lib/libgtk-1.2.so.0
#5  0x40876f70 in NSGetModule ()
   from /tmp/nsinstallertest/components/libwidget_gtk.so
#6  0x40710a5a in NSGetModule ()
   from /tmp/nsinstallertest/components/libnsappshell.so
#7  0x0804f8bf in main1 ()
#8  0x08050165 in main ()
#9  0x4049eb65 in __libc_start_main (main=0x8050038 <main>, argc=1, 
    ubp_av=0xbffffa44, init=0x804b2d0 <_init>, fini=0x8051e78 <_fini>, 
    rtld_fini=0x4000df24 <_dl_fini>, stack_end=0xbffffa3c)
    at ../sysdeps/generic/libc-start.c:111

#0  0x405592c7 in __poll (fds=0x8103f14, nfds=1, timeout=2000)
    at ../sysdeps/unix/sysv/linux/poll.c:63
#1  0x401d97b0 in __pthread_manager (arg=0xb) at manager.c:148
#2  0x401da29b in __pthread_manager_event (arg=0xb) at manager.c:230

#0  0x405592c7 in __poll (fds=0xbf7ffa9c, nfds=1, timeout=28948)
    at ../sysdeps/unix/sysv/linux/poll.c:63
#1  0x401be684 in PR_Poll () from /tmp/nsinstallertest/libnspr4.so
#2  0x4077af22 in NSGetModule ()
   from /tmp/nsinstallertest/components/libnecko.so
#3  0x401375fa in nsThread::Main () from /tmp/nsinstallertest/libxpcom.so
#4  0x401bf6ee in PR_Select () from /tmp/nsinstallertest/libnspr4.so
#5  0x401d9a4f in pthread_start_thread_event (arg=0xbf7ffc00) at manager.c:274

#0  0x404af585 in __sigsuspend (set=0xbf5ff9b8)
    at ../sysdeps/unix/sysv/linux/sigsuspend.c:45
#1  0x401dc4b9 in __pthread_wait_for_restart_signal (self=0xbf5ffc00)
    at pthread.c:896
#2  0x401d8a59 in pthread_cond_wait (cond=0x810089c, mutex=0x81217c8)
    at restart.h:34
#3  0x401bb3fe in PR_WaitCondVar () from /tmp/nsinstallertest/libnspr4.so
#4  0x407903b3 in NSGetModule ()
   from /tmp/nsinstallertest/components/libnecko.so
#5  0x4079000b in NSGetModule ()
   from /tmp/nsinstallertest/components/libnecko.so
#6  0x401375fa in nsThread::Main () from /tmp/nsinstallertest/libxpcom.so
#7  0x401bf6ee in PR_Select () from /tmp/nsinstallertest/libnspr4.so
#8  0x401d9a4f in pthread_start_thread_event (arg=0xbf5ffc00) at manager.c:274

#0  0x404af585 in __sigsuspend (set=0xbf3ff990)
    at ../sysdeps/unix/sysv/linux/sigsuspend.c:45
#1  0x401dc4b9 in __pthread_wait_for_restart_signal (self=0xbf3ffc00)
    at pthread.c:896
#2  0x401d8a59 in pthread_cond_wait (cond=0x81040fc, mutex=0x81214a8)
    at restart.h:34
#3  0x401bb3fe in PR_WaitCondVar () from /tmp/nsinstallertest/libnspr4.so
#4  0x401381a5 in nsThreadPool::GetRequest ()
   from /tmp/nsinstallertest/libxpcom.so
#5  0x4013881e in nsThreadPoolRunnable::Run ()
   from /tmp/nsinstallertest/libxpcom.so
#6  0x401375fa in nsThread::Main () from /tmp/nsinstallertest/libxpcom.so
#7  0x401bf6ee in PR_Select () from /tmp/nsinstallertest/libnspr4.so
#8  0x401d9a4f in pthread_start_thread_event (arg=0xbf3ffc00) at manager.c:274

ok, now I can hang with the bits downloaded from netscape.com/plugins

(from the Reporter, in response to earlier question)

Now that I've deleted the "default" profile, leaving only a
single profile for myself, Mozilla starts, and I can use Java
and flash plugins.
Blocks: 89792
It seems that I have the same problem with Win98. Update OS to "all"?
win98 works fine for me with multiple profiles (>1).  I tried a full 
install and also a recommended, then triggering jvm.xpi.  Both worked fine 
without errors.

What's the error you're seeing under win98?
Looks like this was somehow fixed for linux on the trunk.  Linux trunk build
2001071008 shows this bug but on build 2001071021 its fixed.  Unfortunately,
this is still broken on the branch.  
Just a quick guess, but brian checked in a Mac plugin fix on the tenth that
might have fixed this problem.  the bug ID for that is bug 85231.  Thanks
shannon for pointing this out.  I'm applying his patches to my debug build (last
updated on the ninth) to see if this works.  Brian, any idea if your patch fixed
this problem as well?
cancel the last comment, brians patches didn't do the trick for me.  
Sean Su:

Hello!

On 2001-07-09 the bug 64351, which I have opened, was marked as a duplicate of 
this one.

The problem: If I try to enter a site with java-applets (JRE 1.3.0_01 is 
installed through mozilla), Mozilla crashes and talkback appears.

I really don´t know if the bugs have the same origin.

Bye,
Daniel
*** Bug 90958 has been marked as a duplicate of this bug. ***
Keywords: smoketest
Do we need to bring any more help on this one?  Who do we want?
Assignee: syd → racham
Status: ASSIGNED → NEW
Component: Plug-ins → Profile Migration
Summary: mozilla catatonic after JRE plugin installation → mozilla catatonic inside Migrate Profile Routine with Java installed
I'm now able to reproduce this easily by simply blowing alway my ~/.mozilla
directory. Starting back up after killing the frozen task seemed to make Mozilla
work okay.

The installer may be off the hook because everything works fine as long as we
don't go into profile migration after having installed Java. In fact, I can even
reproduce by doing:
1) Typical install
2) Install Java (either from web or manually create symbolic links)
3) Exit
4) Remove ~./mozilla
5) Start Netscape and notice the same hang, except I got the Activation window
(still catatonic) to appear for a few moments.

Having said that, I think this bug may belong to whomever owns the "Migrate
Profile Routine" because that's where it seem to freeze and it's the last thing
on the console.
Peter,

I am not sure it is just profile migration.
I can go into profile manager and create a profile and hang also. I can have my 
profile migrated from an earlier test and it will not be usable after I install 
Java.  
Shannond discovered that a checkin on July 10 between 08:00 a.m. and 09:00 p.m.
fixed this problem on the trunk. I verified that yes this is fixed in the trunk
with a single profile (no .mozilla directory) and multiple profiles. If anybody
knows what checkin fixed this problem, please inform the masses.  I've tried
looking on bonsai.mozilla.org to see what checkin it could possibly be, still
haven't found it since there were NUMEROUS checkins during that period.  I'm
currently patching up my debug branch build one checkin at a time which really
isn't producing any significant results...for me anyway.  If anybody knows an
easier way to find out what fixed this bug, please advise.

Here is the URL of all the checkins to the trunk on July 10 between 8 a.m. and 9
p.m.

http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=SeaMonkeyAll&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Change+Size&hours=2&date=explicit&mindate=07/10/2001+08:00:00&maxdate=07/10/2001+21:00:00&cvsroot=/cvsroot

Hope this helps us narrow down what fixed this problem.
Summary: mozilla catatonic inside Migrate Profile Routine with Java installed → mozilla catatonic after JRE plugin installation
sweetlou no longer has the 2001-07-10-08-trunk bits (but you can still get the
working 07-10-21-trunk).  If you want a copy of the last bad trunk bits e-mail
me and I'll let you where to get em on my machine.
Syd and I are seeing the same thing, trunk works, branch doesn't. 

This could possibly be caused on the branch only because of bug 87913. That
patch still needs to be backed out of the branch but it's already out of the
trunk. It changed binary compatiblity of the component manager, however, that
checkin was made on the 11th, not the 10th. I'll try backing out that change
tonight and we'll see if this bug gets fixed tomorrow.
Summary: mozilla catatonic after JRE plugin installation → mozilla catatonic inside Migrate Profile Routine with Java installed
Yes, only branch. I see the profile manager dialog come up and go away just 
before hanging. But that doesn't prove anything to me regarding where the blame 
lies.

Grace mentioned that we see this even with new profiles. Peter did you notice
that ? Getting linux bits from sweetlou.
I am not sure if this is a profile migration issue unless it is something new
that need to be taken care of (due to other changes). Getting all my builds
ready. Adding Putterman, Conrad and Seth to the list.
I've got a debug build and have the plugins folder filled in. Some interesting 
things, if I remove .mozilla and run, the activation window comes up but the 
content does not display (there are a bunch of JS errors that display). After a 
while the timeout occurs for the registration window and then it hangs. Like 
before the stack trace tells me nothing.

If I ctrl-C and restart, with a .mozilla folder now in place everything runs 
great. 
So, the next step was to leave everything the same, remove the Java plugin, and 
try the same tests. Now I can't break it. This is *long* after installer, I'm 
just playing with stuff in a directory on my machine. So you *know* that the 
java plugin being there is hosing us, because I can't crash it no matter what 
the profile situation is (i.e., regardless of my deleting .mozilla or not). It 
is probably *not* component registration, as that happens only once, not each 
time I run. So, I really need to know the following from someone -- what is the 
interaction between mozilla and this java plugin each time the browser is 
started? Do we call into the java plugin each time we startup? If so, what is 
done? I think we should focus on that code and see if anything is getting 
trashed.
If this is Buvhan's bug, then I'm Britney Spears. Reassigning to Peter to help 
me understand the pluggin issues.
Assignee: racham → peterl
Summary: mozilla catatonic inside Migrate Profile Routine with Java installed → mozilla catatonic with Java installed
I commented out the call to nsPluginHostImpl::LoadPlugins. This fixes things 
too. 
OK, this is interesting, and the best clue so far. 

Seems that LoadPlugins happens when a docshell is created. Guess when the first 
docshell is created? Yes, when the window that shows the barber pole is 
displayed, showing you profile migration. And we hang after that docshell is 
destroyed. Obviously, then, but not removing the .mozilla folder, we are 
deferring creation of a docshell, and thus loading of plugins until later in the 
startup sequence. Perhaps the problem isn't one of memory corruption (after all, 
we have seen both profile migration, and java plugin work fine, just not fine 
together). Perhaps it is one of timing.
Just to clarify, when I refer to timing, I think this is about stuff happening 
too early, or a docshell going away to soon.

If I comment out the call to show the little progress bar dialog, I get past 
this problem. I'll attach a patch. This patch is not the fix, necessarily. 
Index: pref-migrator/src/nsPrefMigration.cpp
===================================================================
RCS file: /cvsroot/mozilla/profile/pref-migrator/src/nsPrefMigration.cpp,v
retrieving revision 1.144.24.1
diff -u -r1.144.24.1 nsPrefMigration.cpp
--- nsPrefMigration.cpp 2001/06/25 13:59:53     1.144.24.1
+++ nsPrefMigration.cpp 2001/07/17 04:59:30
@@ -309,7 +309,8 @@
 nsPrefMigration::ProcessPrefs(PRBool showProgressAsModalWindow)
 {
   nsresult rv;
-  
+ 
+#if 0 
   nsCOMPtr<nsIWindowWatcher>
windowWatcher(do_GetService("@mozilla.org/embedcomp/window-watcher;1", &rv));
   if (NS_FAILED(rv)) return rv;
 
@@ -321,6 +322,7 @@
                                  nsnull,
                                  getter_AddRefs(mPMProgressWindow));
   if (NS_FAILED(rv)) return rv;
+#endif
 
   return NS_OK;
 }
okay, I can also hang if I give it a --profilemanager argument. So, this is all 
about window creation, not profile migration or java. Or maybe java plugin 
initialization can't handle its docshell being closed, according to dan veditz.
In the bonasi query pasted by Rodney Velasco 2001-07-16, it looks like Timeless
made some check-ins to nsWindow.cpp and nsWidget.cpp. Could this explain Syd's
window timing theory?
No. Dan and I applied those changes. There was no effect. I think there is a
real problem here of some kind. Let's look for the real bug and get this solved.
nsDocShell::NewContentViewerObj() appears to be trying to load the plugins only
when we can't otherwise find a doc loader factory with which to view content.
What is |aContentType| when you hit this code?
Peter and I discovered if we call the profile migration code in nsAppRunner.cpp
*after* we allow the hidden window to be created, then the plugin loading
triggered off the docshell works great. Dan Veditz has suggested maybe there is
a dependency on hidden window, is that possible?

Chris, I'll look into your question.
Well, the plugin loading code shouldn't be triggered off the docshell unless
we're trying to load a content type that mozilla doesn't know about. So, what
content type are we trying to load that mozilla doesn't know about?
So, what if you make the hidden window earlier. There is this comment here:
http://lxr.mozilla.org/mozilla/source/xpfe/bootstrap/nsAppRunner.cpp#1145
but I'm pretty sure that's not true anymore.
Thanks to Syd's detail in debuggin', he found the problem. It seems that 
the profile migration code happens BEFORE the hidden window is 
created. 
Status: NEW → ASSIGNED
Keywords: patch
Priority: -- → P1
Whiteboard: PDT+; no eta → PDT+ [SEEKING REVIEWS]
But still, what's the actual problem? Why should migration have to happen after
the hidden window is created? The whole point of windowwatcher is that it can
make windows with or without a parent and without affecting or being affected by
the hidden window. If that's not the case, there's a problem. Making the hidden
window before migration might do the trick, but...
Will somebody please determine what MIME type is failing before we wallpaper
over another gaping hole? Thanks.
Right, and when we changed the order of the hidden window creation vs. 
the profile migration in the AppShell, it worked great!

However, we then noticed that the profile manager has this same 
problem, that window is created BEFORE the hidden window is shown.

To explain the relationship between LoadPlugins() and the hidden 
window, is that when the hidden window is created it is created with a 
script context. This, in turns, causes LiveConnect to init which in turns 
causes the JVMManager to start which in turn attempts to LoadPlugins() 
to see if Java is there. Ed Burns says in e-mail that this is really touchy.

Our idea of having LoadPlugins() create the hidden window if it hasn't 
been created didn't work.

What worked very well was adding an EnsureScriptEnvironment() call in 
the Docshell before we call LoadPlugins(). We don't know the docshell 
code well enough to understand the side effects of this, but it would sovle 
this problem in one line.

Checking content-type next.
Waterson, the content type is application/vnd.mozilla.xul.+xml. Should we 
just special case this?
Is it _exactly_ "application/vnd.mozilla.xul.+xml"? If so, it _should_ be

  application/vnd.mozilla.xul+xml

and somebody is screwing it up somewhere. LXR couldn't find any instances of the 
string you typed, so did you miskey it? If not, maybe someone mangled a strcat() 
along the way to the docshell?
Sorry, it is exactly:

  "application/vnd.mozilla.xul+xml"

Here is a link to the full text search:
http://lxr.mozilla.org/seamonkey/search?string=application%2Fvnd.mozilla
.xul
Okay, so -- we should be able to create a doc loader factory for that MIME type; 
specifically, content/build/nsContentDLF.cpp is able to handle it. I dug around 
a bit but couldn't find where on earth we actually _register_ the component ID 
properly, though. What does the stack trace look like at the time we try to 
|do_CreateInstance()| the DLF?

Presumably we'll unwind to the caller with a failure code, who'll have some 
hard-coded knowledge about using the built-in DLF? Maybe we just need to reverse 
that, so we try our DLF first before throwing it at the docshell. Alternatively 
(and probably better), we could make sure that we properly register all the 
DLF's contract ID's.

I've finally got the JRE installed, so I'll dig at it a bit, too.
What if the call to LoadPlugins() was completely removed from the docshell? I
don't think we need it here because plugins will be loaded by the JVMManager anyway.
It's conceivable that someone would want plugins without a JVM (e.g., embedded 
on a device), but that may be a reasonable way to fix this on the branch.
I think nsContentDLF is unused code and you should be modifying nsLayoutDLF. 
See bug 87476.
Peter, not sure how you came to this conclusion, maybe we had a testing error or
something, but adding a call to EnsureScriptEnvironment() does not fix the
problem. The only fix I have seen is to get a hidden window created ahead of
time, for whatever reason.

cc'ing patrick beard at the request of PDT.
If I don't install Java, would Peter's proposal result in plugins not working? 
That wouldn't be OK since not all users take Java from our installs.

Is it reasonable to create a patch based on Chris's comment about the MIME type?
I just chatted with Syd and he says there is even a latter call to LoadPlugins()
is causing the same problem if the one in docshell is commented out. We're
trying to get a stack.

I think I've got reproducing down. Starting mozilla with the -ProfileWizard
switch  can trigger this every time for me without deleteing any files. 
Keywords: patch
Whiteboard: PDT+ [SEEKING REVIEWS] → PDT+
I've tried narrowing down which check-ins fixed the trunk. On 7/10 at 16:00 we 
hang, and at 16:50 we don't (incorporating one later typo build-bustage fix). 
I'm trying to narrow it down further, but having to clobber my build.

Don't see how any of the 3 checkins in that time block would affect startup 
ordering.
so, a couple of answers to a couple of questions. 1) no, the plugins are not
being unloaded when the windows (profile) go down. 2) I still get the bug if I
remove the call to LoadPlugins from inside the docshell stuff. It just gets
called elsewhere, as in the following stack. Makes me think that we need to make
it work instead of trying to not make it happen.

#0  goobooroo () at prmem.c:37
#1  0x40b2df89 in nsPluginHostImpl::LoadPlugins (this=0x80fdd68)
    at nsPluginHostImpl.cpp:3940
#2  0x40b2d2f1 in nsPluginHostImpl::GetPluginFactory (this=0x80fdd68, 
    aMimeType=0x40a342b1 "application/x-java-vm", aPlugin=0xbfffe43c)
    at nsPluginHostImpl.cpp:3685
#3  0x40a26630 in nsJVMManager::StartupJVM (this=0x840cf38)
    at nsJVMManager.cpp:602
#4  0x40a26bee in nsJVMManager::MaybeStartupLiveConnect (this=0x840cf38)
    at nsJVMManager.cpp:783
#5  0x40a2cc2f in nsJVMManager::StartupLiveConnect (this=0x840cf38, 
    runtime=0x810a440, outStarted=@0xbfffe538) at nsJVMManager.h:128
#6  0x41bed87b in nsJSEnvironment::nsJSEnvironment (this=0x840ce98)
    at nsJSEnvironment.cpp:1527
#7  0x41bed265 in nsJSEnvironment::GetScriptingEnvironment ()
    at nsJSEnvironment.cpp:1446
#8  0x41bedcf4 in NS_CreateScriptContext (aGlobal=0x83e86d0, 
    aContext=0x83e7f20) at nsJSEnvironment.cpp:1574
#9  0x41be5c74 in nsDOMSOFactory::NewScriptContext (this=0x83e8668, 
---Type <return> to continue, or q <return> to quit---
    aGlobal=0x83e86d0, aContext=0x83e7f20) at nsDOMFactory.cpp:123
#10 0x42b15a15 in nsDocShell::EnsureScriptEnvironment (this=0x83e7e70)
    at nsDocShell.cpp:5830
#11 0x42b17b24 in nsWebShell::GetInterface (this=0x83e7e70, aIID=@0x40d28e54, 
    aInstancePtr=0xbfffe850) at nsWebShell.cpp:322
#12 0x401e9f1e in nsGetInterface::operator() (this=0xbfffe8d0, 
    aIID=@0x40d28e54, aInstancePtr=0xbfffe850) at nsIInterfaceRequestor.cpp:37
#13 0x40d1c014 in ?? ()
   from /opt/raptor/branch/ns/dist/bin/components/libembedcomponents.so
#14 0x40d1eaa0 in ?? ()
   from /opt/raptor/branch/ns/dist/bin/components/libembedcomponents.so
#15 0x40d12925 in ?? ()
   from /opt/raptor/branch/ns/dist/bin/components/libembedcomponents.so
#16 0x40d0f549 in ?? ()
   from /opt/raptor/branch/ns/dist/bin/components/libembedcomponents.so
#17 0x40d0e6d9 in ?? ()
   from /opt/raptor/branch/ns/dist/bin/components/libembedcomponents.so
#18 0x42b5491c in nsPrefMigration::ProcessPrefs (this=0x83e7ae0, 
    showProgressAsModalWindow=0) at nsPrefMigration.cpp:322
---Type <return> to continue, or q <return> to quit--- 
#19 0x420e33e3 in nsProfile::MigrateProfile (this=0x83d5320, 
    profileName=0x83e7b48, showProgressAsModalWindow=0) at nsProfile.cpp:1957
#20 0x420e3d68 in nsProfile::MigrateAllProfiles (this=0x83d5320)
    at nsProfile.cpp:2086
#21 0x420dc44c in nsProfile::AutoMigrate (this=0x83d5320) at nsProfile.cpp:610
#22 0x420dd6e3 in nsProfile::ProcessArgs (this=0x83d5320, 
    cmdLineArgs=0x8249888, profileDirSet=0xbffff5e8, profileURLStr=@0xbffff5c0)
    at nsProfile.cpp:841
#23 0x420dad70 in nsProfile::StartupWithArgs (this=0x83d5320, 
    cmdLineArgs=0x8249888, canInteract=1) at nsProfile.cpp:376
#24 0x080592f6 in InitializeProfileService (cmdLineArgs=0x8249888)
    at nsAppRunner.cpp:904
#25 0x0805a81d in main1 (argc=1, argv=0xbffff9d4, nativeApp=0x0)
    at nsAppRunner.cpp:1191
#26 0x0805b83f in main (argc=1, argv=0xbffff9d4) at nsAppRunner.cpp:1532
#27 0x405b1b65 in __libc_start_main (main=0x805b640 <main>, argc=1, 
    ubp_av=0xbffff9d4, init=0x8053fe4 <_init>, fini=0x8066f20 <_fini>, 
    rtld_fini=0x4000df24 <_dl_fini>, stack_end=0xbffff9cc)
    at ../sysdeps/generic/libc-start.c:111

(goobooroo is my little hack to get me a breakpoint in gdb without loading the
.so, which would be too late)

I think we have to understand what (if there is one) the dependency on a global
window is all about. 
This sounds very much like bug 87843 which happened to be fixed right around the
time this one was opened. David Baron fixed the problem with the
nsDeviceContextGTK but perhaps there is something additional that needs to be
done for Java to work. cc:ing Marc and Brendan, reviewers from bug 87843.
Although it may *sound* like bug 87843, the underlying cause is most likely very
different.  The problem in bug 87843 is now fixed.
Bug 87843 was about the screen resolution not being calculated, or being
calculated incorectly, when the activation window was drawn. The telltale sign
of that bug was:
###!!! ASSERTION: Negative Width Input - very bad: 'mComputedWidth>=0', file
nsHTMLReflowState.cpp, line 2472

I don't think it's similar.

*** Bug 87571 has been marked as a duplicate of this bug. ***
Change subject.
Summary: mozilla catatonic with Java installed → Profile mgr and java cause hang (was: mozilla catatonic with Java installed)
*** Bug 91385 has been marked as a duplicate of this bug. ***
After working with Ed burns, here's my first attempt at a patch of hackery to
actually fully fix this. Please review and comment, maybe someone has a better idea.

I tested this out only on Linux branch so far in the following situations:
1) No .mozilla and profile migration
2) One profile and no proifle manager
3) Several profiles and the profile manager come sup
4) Using the --ProfileWizard switch

All of those work for me with this patch plus on each one, I verified that Java
was working by visiting http://www.javasoft.com

Marc suggested that Liveconnect be tested as well. Could someone more familiar
with Liveconnect either try this or point me to a testcase? Thanks!
Keywords: patch
Lamentably, the hack breaks liveconnect.

Peter, you need to make it so the hack is in place if and only if we're in the
(ProfileManager|ProfileWizard|Profile Migration) case.  

If I remove the hack and start mozilla with only one profile, liveconnect works.
Peter would it be possible to look for the existence of some service that only
is known to exist during the *real* browser running situation?

How could we determine whether or not we're in the *real* browser case, or in
the "before the browser really starts" case?
Does it still solve the problem to create the hidden window before calling
InitializeProfileService()? That seems like a bit less of a hack than changing
nsJVMManager.cpp. All you'd have to do is move nsAppRunner.cpp#1147 up to #1134.
The comment at #1145 just isn't so.  You'd have to make a new scary comment though.
dbaron and I updated modules/libpref/src/init/all.js from r3.252 to r.3.253, and
it fixes the problem. This was the magic checkin that ``fixed'' things on the
trunk. cc'ing jesse, who came up with the patch.
Heh, here's the fix:

Index: all.js
===================================================================
RCS file: /cvsroot/mozilla/modules/libpref/src/init/all.js,v
retrieving revision 3.245.2.7
diff -u -r3.245.2.7 all.js
--- all.js	2001/07/17 03:40:18	3.245.2.7
+++ all.js	2001/07/19 03:04:17
@@ -230,6 +230,10 @@
 pref("capability.policy.mailnews.sites", "mailbox: imap: news:");
 pref("capability.policy.mailnews.Window.name.set", "noAccess");
 pref("capability.policy.mailnews.Window.location", "noAccess");
+////////////////////////////////////////////////////////////
+pref("capability.principal.codebase.foo.id", "http://www.netscape.com");
+pref("capability.principal.codebase.foo.granted", "UniversalFoo");
+//////////////////////////////////////////////////////////
 
 pref("javascript.enabled",                  true);
 pref("javascript.allow.mailnews",           false);
From a conversation with mstoltz, this fix was checked in accidentally (along
with jesse's changes that were meant to be checked in).  It shouldn't actually
do anything, but it probably affects initialization order in some way that fixes
this bug.
Glad to see that the stench of the hack that Peter and I were working on was
enough to wake people up from their torpor.

Waterson's all.js patch works like a charm.

linux:

./mozilla --ProfileWizard -> browser, with java and liveconnect works fine
./mozilla (two profiles) -> browser with java and liveconnect works fine
./mozilla (one profile) -> browser with java and liveconnect works fine.
The presence of those lines in all.js (which were checked in by accident) cause
the for loop in nsScriptSecurityManager::InitPrincipals to iterate once. Without
those prefs, aPrefCount will probably be zero, and the code in the for loop will
never run. My guess is that the code in that for loop is affecting
initialization order somehow.
Great work Waterson! Get that into the trunk!
..er..I mean branch!
Wow, finally a fix.  Don't mean to be a bringdown since you all found the
fix.....but anybody know why xpicleanup is still running after you install the
JVM, and restart browser?  I get this dialog box 

"The program must close to allow a previous installation attempt to complete.
Please restart"

Right after I installed the jvm from http://home.netscape.com/plugins/jvm.html
and file -> quit the browser.  I need to start the netscape binary about two
times before I can actually get the Profile Manager window (which actually works
when I choose a profile..yeah).  With that said, I'll file another bug on
this...I'm sure most of you hate to see the word 'catatonic' in your mailbox.
bug 91427 filed for above issue I described.
The replacement utility has a far too long sleep cycle, resulting in most 
people being able to re-launch Mozilla again before xpicleanup realizes Mozilla 
has shut down and gotten out of its way.
Has anyone already checked in the fix from  Chris Waterson 2001-07-18 20:07?  We
need this in to make the next build a viable candidate build.  I've asked
dveditz to do the checkin if it hasn't been done and nobody is around.
Checked in to MOZILLA_0_9_2_BRANCH, 2001-07-19 01:02 PDT, all.js rev. 3.245.2.8.
Whiteboard: PDT+ → PDT+ fixed on branch
Thank You
Thank You
Thank You!!!
Yes!!!  fixed on branch build 2001-07-19-04-0.9.2
not to rain on dbaron and waterson's parade, but... we're going to keep this 
open though and find and resolve the real underlying problem, right?  Otherwise 
what are we going to do when someone removes those prefs for some reason and it 
comes back...
Uh, yeah. I hope so. In fact, I think mstoltz is going to back the spurious
``fix'' out on the trunk, so we'll soon be able to debug the problem there.
Reassigning to mstoltz...
Assignee: peterl → mstoltz
Status: ASSIGNED → NEW
The comments suggest that this is checked in.  But I just installed the 20010719
4am branch build (net installer, checked ALL to get Java) and when I go to
http://www.shallowsky.com/jupiter.html (which has an applet on it) the app hangs
before it even loads any of the page, while displaying "Connecting to ..." in
the status bar.  This is repeatable.  The page works in other java-enabled
browsers (e.g. galeon).  Am I seeing a different bug, or did the fix somehow not
make it into 20010719?
akkana, you might be seeing the 'lazy java loading' thing fixed with bug  26516.
 
If so, it needs to be done differently.  The fact is that going to a page
containing java completely locks up the browser (presumably forever, but I only
timed it for 3 minutes -- should I wait longer than that?  The whole app never
used to take that long to start up) so that you have to kill the app externally
and start over.

BTW, this build is:
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.2) Gecko/20010719 Netscape6/6.1 
Akkana:  You're probably seeing bug 84093 if you're running RedHat 7.1.
Yes, I'm seeing bug 84093.
Whiteboard: PDT+ fixed on branch → fixed on branch
*** Bug 89792 has been marked as a duplicate of this bug. ***
I don't understand, why was this assigned to me? Shouldn't we come up with a
real fix before I back out the accidental fix and reopen the blocker?

Back to peterl.
Assignee: mstoltz → peterlubczynski
Mitch,
I assigned it to you because I thought nobody knows more about this mysterious
pref than you and maybe you could shed some light (in greater detail) as to what
this pref actualy does? Do you have ANY ideas WHY Java MUST have this pref set?
 Using this pref trips _something_ pretty important in security that makes the
browser not hang with Java. Also, what's wrong with considering keeping this
pref as the REAL fix? I agree the name is awful, but what's wrong with replacing
"foo" with "oji" or something like that?

I think the problem is with the interaction of browser security and OJI. Even if
I did understand the real problem, I don't know enough about OJI or security to
fix it. Reassigning to OJI.
Assignee: peterlubczynski → edburns
Component: Profile Migration → OJI
QA Contact: gbush → shrir
I mentioned above what those prefs do - they cause some additional
initialization code to run as nsScriptSecurityManager is initialized. I don't
know how or why that affects java, and I'd really like to see this fixed at the
root of the problem, not haphazardly. Does anyone know when this started/what
checkin caused the bug to begin with?
Over the weekend I debugging this a bit and found that the hang was happening
where the OnStartRequest for the load of navigator.xul was sent into a proxy and
never came out on the other side and got handled by the main thread.

On a suggestion from a conversation with darin and gordon (that in the past
there was a bug with the cache where holding on to an event queue would stop
events from being processed), I debugged this a bit with danm.  We found that:
 * when Java was not installed, there are two event queues destroyed, one right
after the profile manager closes (right before the load of navigator.xul starts)
and the other at app shutdown
 * when Java is installed and we don't grant UniversalFoo, the first destruction
doesn't happen and the navigator.xul load events that are dropped go through the
codepath in PostEvent that punts to an elder queue.
 * when Java is installed and we do grant UniversalFoo, the event queue
destruction at app shutdown doesn't happen, but the one after the profile
manager closes does happen (which presumably causes us to run correctly)

So it seems like this bug might be related to holding on to an event queue. 
It's not fully understood why this causes a hang, but the solution may just be
not to hold on to the event queue.  danm says the event queue stuff is very
fragile and changing that might not be a good idea.  (Or something like that...)

Is the Java plugin holding on to the event queue somehow?  (It's an XPCOM
plugin, right?)  If so, could it be changed not to do so?
It would be nice if someone who has access to the source of the Java plugin (Ed
Burns?) could investigate my last comment.  The "UniversalFoo" hack really
shouldn't be the permanent fix for this bug.
With 0.9.1, I installed the java plugin (1.3.0_01). When starting 0.9.2 or
0.9.3, I had the "Mozilla needs to shutdown to allow a previous installation to
finish" error message. I rm -rf my .mozilla directory, run mozilla as root, to
no avail. Following the thread here, I remove the symlink
/usr/lib/mozilla/plugins/libjavaplugin_oji.so, ran mozilla as root (it worked),
rm -rf .mozilla, and run mozilla as a user. It worked. After that, I've put back
the symlink for /usr/lib/mozilla/plugins/libjavaplugin_oji.so, and mozilla
started, although java is not working (complete hang). Galeon (0.11, 0.11.1,
0.11.2, 0.11.3) does not have those problems. 
*** Bug 93169 has been marked as a duplicate of this bug. ***
Priorities in the new economy.
Severity: blocker → critical
Priority: P1 → P2
Target Milestone: --- → mozilla0.9.5
SPAM: reassigning all OJI bugs to new OJI QA, pmac ( 227 bugs)
QA Contact: shrir → pmac
this checkin has a significant negative effect on startup time, because it
forces http to initialize during startup.  http in turn causes several other
components to be initialized.  could we get by with just changing the pref to
reference about:blank instead of www.netscape.com?

any help testing this would be most appreciated.  thanks!

see bug 97462 and bug 96681
We could probably come up with a better hack anyway -- like doing something that
causes the Java library to be initialized before we start the nested event queue
for the profile manager.  Although, really, it would be nice if the plugin
itself were fixed, or if we fixed event queues not to behave this way.
Would a "good" place for a less-invasive hack be
nsProfile::LoadDefaultProfileDir, a little before we call OpenWindow?
I don't think so. There is another place in profile startup (Confirming
auto-migration) where a modal window is opened with windowwatcher. There may be
others. In general, we should be able to open modal windows without hacking
around this problem. Let's find a real answer.
OK, I've been assuming that the problem was in the java plugin itself, but I
think it's in the JVM manager, so I think we can find a real fix.
Actually, never mind that.  The JVM Manager doesn't hold on to the event queue
that it gets, so I think the problem is in the Java plugin, which we can't
change.  Only Sun can, and they don't seem interested in doing that anytime soon.
it sounds to me like someone (maybe the java plugin) has cached an eventQ...  

Then when a new eventQ is pushed (because of the modal window) some events end
up being pushed to the wrong eventQ.  This ultimately leads to the hang...

-- rick
It's actually a leak of an event queue -- see my comments dated 2001-07-24 15:53.
I assume that the eventQ leaks because someone is holding a reference to it...  

This is really starting to sound as if the java plugin (or someone) is grabbing
an eventQ during early initialization, later posting events to the cached Q
(rather than the currently active Q)...  Since the eventQ is cached, and
everything goes to hell in a hand-basket, we end up leaking the eventQ...

Can we add an ASSERT in the event posting code which checks that the eventQ
being tageted is the 'active' event Q for that thread?  This could catch similar
situations in the future :-)

-- rick
Whiteboard says 'fixed on branch'.  Is that the 0.9.4 branch?  If so, could
someone please remove the nsbranch keyword to get it off the radar?
It's been fixed temporarily, with a hack of sorts, before 0.9.4 was cut, so
removing nsbranch.

I would still like this to be fixed soon so we can remove the 'UniversalFoo'
hack from all.js.
Severity: critical → normal
Keywords: nsbranch
reassign
Assignee: edburns → joe.chou
Reassign to Joe as I'm leaving the role of OJI module owner.
Assignee: joe.chou → edburns
Ressign to Joe Chou, as I am no longer working officially on OJI.
Assignee: edburns → joe.chou
Target Milestone: mozilla0.9.5 → mozilla0.9.6
I just installed mozilla 0.9.4-2 on a system running RH7.1 with the 2.4.3-12
kernel.   I still have the same problems I originally had way back with 0.9.1.

As root I tried through debug applets.   It asked me to download java which I
did, and it claimed it was successfully installed.   So I exited mozilla and
tried to run it again.  I got that same old message about mozilla needing to
shutdown.   This time I was able to kill the running process (although it wasn't
up on the screen) and then I could start mozilla.   But when I went to test
the applets through debug, it hung again just as it always has.   It also
hung when I tried to test javascript.

Is any progress being made on this at all.  Or are we just going around
in circles.   Any suggestions about what I might try?
len: you seem anxious to try something, and I have something you could try. I 
don't have the JRE on my machine and I'm pressed for time right now. If anyone 
out there can build with the patch I'm about to post and let me know if it 
helps...

The patch isn't what I'd propose for an actual checkin. I'm just curious whether 
it fixes the problem. If it does, we at least know for certain that we understand 
the problem. The patch wants to be tried using some version of the source that 
doesn't have the 0.9.4 UniversalFoo "fix." That fix just disguises the problem 
through some mysterious process. It seems to be checked in everywhere; it can be 
backed out by removing the two lines in all.js that contain the word "foo".

I've tried the patch on my machine. It doesn't hurt anything. I'm curious to know 
whether it helps. Any volunteers? (Lacking a response, I'll try this myself in a 
week or maybe.)
Blocks: 104166
I just installed version 0.9.5 from the RedHat_7x rpm package

As usualy I started at debug and picked applets.  That prompted a download of
the java plugin and installed it.   This time I was able (as root) to close
java without any hanging.   But when I went back to the debug > applets
test, it hung as usual.

Why does the Mozilla web page keep saying that Mozilla has been tested with
java, when it still doesn't work.   It seems to me this is not a minor issue.
Okay.   I finally gave up on getting the Java plugin from mozilla
which is  supposed to automatically find the plugin you need.
Instead I went to sun and got the rpm package for jre 1.3.1.  In
installed that and moved the link in /usr/lib/mozilla/plugins so
it pointed at the plugin in the /usr/java/... tree.  
Now it works.
I don't know how long it has been working, but perhaps the web page
should be rewritten.  It seems to imply that 1.3.0 will work and
you get a version of that if you let mozilla automatically get the
plugin.  But in fact that version doesn't work.
What you're describing sounds like bug 84093 rather than this bug.  (I think the
web site should have been changed before resolving that bug, but it's probably
going to happen sometime, anyway...)
Sun's JRE 1.3.0 does not like RH 7.1 unless your set LD_ASSUME_KERNEL=2.2.5. See
bug 84093. The remaining issues with JRE 1.3.1 and Mozilla have been fixed and
the next release of Netscape will come with JRE 1.3.1. The web page for download
should  also be updated soon, see bug 103926.
Blocks: 97462
Re-assign to nidheesh.
Assignee: joe.chou → nidheesh
Target Milestone: mozilla0.9.6 → mozilla0.9.8
Status: NEW → ASSIGNED
Keywords: mozilla1.0
Keywords: mozilla1.0+
Keywords: mozilla1.0
Nidheesh, any update on this?
*** Bug 97462 has been marked as a duplicate of this bug. ***
Could this be the same issue as bug #99026?  If so, perhaps that patch fixes
this too :-)
This is fixed in my tree even with the hack removed.  (I tested in my Linux
debug build by linking the java plugin into the plugins directory, and starting
with ./mozilla -ProfileManager.  The build started, and java worked.)

So here's a patch that removes the hack that is no longer necessary (and causes
HTTP to start up earlier, etc., etc.).
hey david,

can you tell if it was the patch for bug #99026 that fixed the problem?  this
patch 'kinda snuck' into the tree as part of darin's nsIChannel changes ;-)
it would be nice if we knew *why* we no longer have this problem.

-- rick
Yes, it was the patch for bug 99026.  (It looks like only the first part of that
patch was checked in.  When I back out that one line change in my tree, I get
the hang again.)
Comment on attachment 77992 [details] [diff] [review]
patch to remove hack

sr=darin (with pleasure!)
Attachment #77992 - Flags: superreview+
Yeah! finally :-)
Comment on attachment 77992 [details] [diff] [review]
patch to remove hack

I second that! r=mstoltz.
Attachment #77992 - Flags: review+
thank you david!!!!!

it's GREAT to get rid of this wacky bug ;-)

-- rick
I'm taking this bug with the intent of closing it once the above patch
is checked in (and the hack removed), since the real bug in Mozilla is
already fixed.

There remains the issue of the leak of the event queue within the Java
plugin itself, but I'm not sure how to report a bug to Sun on the Java
plugin.  Having it in bugzilla didn't seem to lead to a fix.
Assignee: nidheesh → dbaron
Status: ASSIGNED → NEW
Target Milestone: mozilla0.9.8 → mozilla1.0
Comment on attachment 77992 [details] [diff] [review]
patch to remove hack

a=asa (on behalf of drivers) for checkin to the 1.0 trunk
Attachment #77992 - Flags: approval+
Hack removed 2002-04-08 18:22 PDT.  Marking fixed (although it was really bug
99026).
Status: NEW → RESOLVED
Closed: 23 years ago22 years ago
Resolution: --- → FIXED
verified on linux redhat 7.1, jre 1.3.1(branch build: 2002-04-17-10-1.0.0).
Visiting http://www.java.sun.com, no hang!
Keywords: verified1.0.0
verified
Status: RESOLVED → VERIFIED
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: