Closed
Bug 13202
Opened 26 years ago
Closed 25 years ago
Solaris: Build packaging problem, crash on startup
Categories
(SeaMonkey :: Build Config, defect, P1)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: MatsPalmgren_bugz, Assigned: rich.burridge)
References
Details
(Keywords: crash, helpwanted, Whiteboard: [dogfood-] Build packaging problem. Need plan.)
Attachments
(2 files)
Apprunner nightly builds for Solaris have been crashing on startup for about
2 weeks now. Viewer works fine though. I will attach a stack trace. uname -a:
SunOS claudia 5.6 Generic_105181-11 sun4u sparc SUNW,Ultra-2
| Reporter | ||
Comment 1•26 years ago
|
||
| Reporter | ||
Comment 3•26 years ago
|
||
Just as a status update: it's still crashing. I have checked now and then for
the past month (latest build: 1999-10-07-16-M11) and it always crashes with
the same stack trace as the one I have attached.
Updated•26 years ago
|
Priority: P3 → P1
Comment 4•26 years ago
|
||
Should a P1 critical have no milestone?
Here is the analysis I sent several of the engineers on 8 September, plus some
more recent discoveries. I'm a little sad that it hasn't resulted in any
action, so I'll post it here:
> 1) The problem appears to be that gdk_rdb_init calls gdk_rgb_choose_visual,
> which does:
>
> visuals = gdk_list_visuals ();
>
> When this returns, visuals is NULL; subsequently, the code does:
>
> tmp_list = visuals;
> best_visual = tmp_list->data;
>
> Which is clearly the cause of the crash.
>
> 2) So why does gdk_list_visuals() return NULL? It's pretty simple:
>
> gdk_list_visuals (void)
> {
> GList *list;
> guint i;
>
> list = NULL;
> for (i = 0; i < nvisuals; ++i)
> list = g_list_append (list, (gpointer) &visuals[i]);
>
> return list;
> }
>
> nvisuals is 0, so the list returned is NULL.
>
> Sadly, what I can't answer is why nvisuals is empty. I'll keep poking
> to see if I can figure out why.
>
> This leads me to...
>
> 3) Why is it that there are two complete copies of gdk in different
> shared libraries? Here's what I mean:
>
> $ dis -F gdk_rgb_init libgfx_gtk.so | head
> disassembly for libgfx_gtk.so
>
> section .text
> gdk_rgb_init()
> 165798: 9d e3 bf 90 save %sp, -112, %sp
> 16579c: 11 00 00 00 sethi %hi(XSynchronize), %o0
> 1657a0: d0 4a 20 00 ldsb [%o0 + XSynchronize], %o0
>
> $ dis -F gdk_rgb_init libwidget_gtk.so | head
> disassembly for libwidget_gtk.so
>
> section .text
> gdk_rgb_init()
> 1a5a0c: 9d e3 bf 90 save %sp, -112, %sp
> 1a5a10: 11 00 00 00 sethi %hi(XSynchronize), %o0
> 1a5a14: d0 4a 20 00 ldsb [%o0 + XSynchronize], %o0
>
That was my original mail-- I got some vague response that this "looked like
an X server bug" which I rejected. Since then I thought some more about this
problem:
I think the issue is that one library is getting loaded after the other, and
effectively interposing on the other's functions. This is a standard linker
trick, but I think here it is happening by mistake. Recall that this would
also explain why nvisuals is 0-- it is state private to the gdk library. In
this case, when the second copy of that library is loaded, it has its own,
*uninitialized* version of the nvisuals variable.
To check this, I performed the following experiment using LD_PRELOAD, which
should keep a particular library at the head of the link chain:
First, I installed libgtk and libgdk, etc a well-known place (in this case
/usr/local/lib)
$ /bin/ksh
$ export LD_PRELOAD=libgtk.so:libgdk.so
$ export LD_LIBRARY_PATH=/usr/local/lib
Then I ran mozilla, and everything came up. In this case, I forced the
application to *always* use my copy of gtk/gdk, instead of getting confused
between two other copies.
So: I think removing one of the two statically linked copies of libgtk/gdk
should solve this problem.
Comment 6•26 years ago
|
||
Updating QA contact to a release person. Internal Core QA does not test
Solaris.
| Reporter | ||
Updated•26 years ago
|
Status: VERIFIED → REOPENED
| Reporter | ||
Comment 9•26 years ago
|
||
Reopen because bug 13160 was fixed but this bug still occurs in build 1999111909
Comment 10•26 years ago
|
||
Clearing DUPLICATE resolution due to reopen.
Updated•26 years ago
|
Assignee: cyeh → chofmann
Status: REOPENED → NEW
Comment 11•26 years ago
|
||
did mcafee's recent award winning fixes help this? ;-)
jdunn, do you see this on other unix ports?
Comment 12•26 years ago
|
||
On AIX, we ran into this problem EONS ago and the 'fix' is to create
a shared gtk library and link against this instead of the static libs.
on HPUX we are still chasing the problem that was introduced with
superwin, but it is either a similar case to this OR a threading issue
Personally I would love it if we only lined against gtk/gdk... once
instead of the 3-5 times we do it now.
Updated•26 years ago
|
Assignee: chofmann → mcafee
Target Milestone: M12
Comment 13•26 years ago
|
||
stealing from chofmann, m12.
Updated•26 years ago
|
Summary: Apprunner crash on startup → Solaris: Duplicate gtk libs -> crash @ startup
Comment 14•26 years ago
|
||
better summary
Updated•26 years ago
|
Target Milestone: M12 → M13
Comment 15•26 years ago
|
||
gtk/gdk should be dynamically linked, I only show an
undefined ref. to gdk_rgb_init in libwidget_gtk.so.
I'm guessing you've got some static versions of gdk/gtk
around, and/or missing some dynamic versions. ?
This looks like gtk installation confusion to me.
pushing off m12.
Comment 16•26 years ago
|
||
I'm can't remember when this worked on Solaris - maybe M5?...
I just downloaded the latest build - ftp server said 12/02/99.
I haven't installed any custom GTK, GDK, etc.
I am viewing it thru an PC X server if that helps anybody.
e5sey{gcfalck}85: ./mozilla
MOZILLA_FIVE_HOME=/home/iis/mozilla/package
LD_LIBRARY_PATH=/home/iis/mozilla/package:/usr/ucblib:/interleaf/rdm2.7/sun4os5/fulcrum/lib:/interleaf/rdm2.6/sun4os5/fulcrum/lib:/usr/openwin/lib
SHLIB_PATH=/home/iis/mozilla/package
LIBPATH=/home/iis/mozilla/package
MOZ_PROGRAM=./mozilla-bin
MOZ_TOOLKIT=
moz_debug=0
moz_debugger=
nNCL: registering deferred (0)
Segmentation Fault
e5sey{gcfalck}113: gdb mozilla-bin core
GNU gdb 4.17
Copyright 1998 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "sparc-sun-solaris2.6"...
(no debugging symbols found)...
Core was generated by `./mozilla-bin'.
Program terminated with signal 9, Killed.
Reading symbols from /home/iis/mozilla/package/libraptorgfx.so...
(no debugging symbols found)...done.
Reading symbols from /home/iis/mozilla/package/libmozjs.so...
(no debugging symbols found)...done.
...[snip]...
(gdb) where
#0 0xeda74cc4 in gdk_rgb_set_min_colors ()
#1 0xeda74cc0 in gdk_rgb_set_min_colors ()
#2 0xeda74e90 in gdk_rgb_init ()
#3 0xeda77f78 in gdk_rgb_get_visual ()
#4 0xed9cc674 in nsDeviceContextGTK::Init ()
#5 0xee28eb58 in nsBaseWidget::BaseCreate ()
#6 0xee287a3c in nsWidget::CreateWidget ()
#7 0xee287d5c in nsWidget::Create ()
#8 0xee7bf290 in nsWebShellWindow::Initialize ()
#9 0xee7bd248 in nsAppShellService::JustCreateTopWindow ()
#10 0xee7bd0c0 in nsAppShellService::CreateTopLevelWindow ()
#11 0xedd9683c in nsProfile::LoadDefaultProfileDir ()
#12 0xedd96398 in nsProfile::StartupWithArgs ()
#13 0x1455c in NS_CanRun ()
#14 0x14aa0 in main ()
(gdb) q
e5sey{gcfalck}114: uname -a
SunOS e5sey 5.6 Generic_105181-09 sun4u sparc
Updated•26 years ago
|
Summary: Solaris: Duplicate gtk libs -> crash @ startup → [DOGFOOD] Solaris: Duplicate gtk libs -> crash @ startup
Target Milestone: M13 → M12
Comment 17•26 years ago
|
||
ok, back to m12. latest download crashes for me per
greg's last comment.
Updated•26 years ago
|
Whiteboard: PDT-
Comment 18•26 years ago
|
||
will we figure this out before Tuesday?
Comment 19•26 years ago
|
||
re: tuesday: I hope so.
Comment 20•26 years ago
|
||
Moving to M13 since this has been determined to be PDT-.
Comment 21•26 years ago
|
||
*** Bug 20748 has been marked as a duplicate of this bug. ***
Updated•26 years ago
|
Assignee: mcafee → briano
Comment 22•26 years ago
|
||
Brian, it looks like executor (2.5.1) is linking with
gtk libs improperly? We should try running the build in
that tree to debug this.
| Reporter | ||
Comment 23•26 years ago
|
||
*** Bug 22814 has been marked as a duplicate of this bug. ***
Updated•26 years ago
|
Whiteboard: [PDT-] → [PDT-] Build packaging problem
Comment 24•26 years ago
|
||
Brian is trying a 2.6 build, this might fix the problem.
Updated•26 years ago
|
Summary: [DOGFOOD] Solaris: Duplicate gtk libs -> crash @ startup → [DOGFOOD] Solaris: Build packaging problem, crash on startup
Whiteboard: [PDT-] Build packaging problem → Build packaging problem
Comment 25•26 years ago
|
||
Better summary, I think this is just a packaging mis-cue.
Putting back on PDT radar, we look stupid shipping
Solaris bits that DO NOT WORK ANYWHERE, NOT EVEN ON THE
BUILD MACHINE.
Updated•26 years ago
|
Whiteboard: Build packaging problem → PDT- Build packaging problem
Comment 26•26 years ago
|
||
marking PDT- per PDT, dogfood eaters can do manual install
Comment 27•26 years ago
|
||
*** Bug 20678 has been marked as a duplicate of this bug. ***
Whiteboard: PDT- Build packaging problem → [PDT-] Build packaging problem
Updated•26 years ago
|
Whiteboard: [PDT-] Build packaging problem → [PDT-] Build packaging problem. Need status from briano.
Comment 28•26 years ago
|
||
The problem as I see it (from a Release point of view) is that we are
incorrectly linking with the GTK libs multiple times when we should be
doing so once. Linking dynamically is _not_ an option, because we can't
release a product that requires the user to find, download, build, and
install _anything_ (especially something like GTK that is changing rapidly,
and which any random version can't be guaranteed to be 100% compatible
with our releases). The user must be able to simply download our product,
install it, and run. And that means static GTK. Period.
Let's fix the _real_ problem here, instead of coming up with hacks that
results in a product we ultimately can't ship.
Updated•26 years ago
|
Status: NEW → ASSIGNED
Comment 29•26 years ago
|
||
Brian - We are not going to ship a statically linked product on Linux. This
would simply suck. There are too many issues that can cause far more crashes
doing this.
Comment 30•26 years ago
|
||
What did we do with Motif in 4.x? Why is this solution not good enough here?
(As I understand it, we shipped both dynamic and static versions for Linux/*BSD,
and dynamic-Motif-only for platforms that came with Motif by default.)
Comment 31•26 years ago
|
||
Because statically linking GTK with our app won't work correctly. The whole
idea of components becomes broken at that point. We can't simply link GTK in to
the main app because that then breaks the idea of being able to change toolkits
on the fly. Also statically linking GTK with mozilla can cause theme version
conflicts which can kill the browser on startup.
Comment 32•26 years ago
|
||
We could statically link GTK into the GTK widget/gfx lib though, right?
That wouldn't solve the theme problem, though.
Comment 33•26 years ago
|
||
No, this won't work. Since widget and gfx would both then have the same symbols
inside them so that on broken platforms that have to resolve all the symbols at
runtime they would then get duplicated symbols causing the app not to startup.
Comment 34•26 years ago
|
||
Which leaves us right back at the beginning.... Do we end up having to
ship the GTK shared libs (prebuilt for each platform) as part of our
product releases (except for Linux...)?
FYI: For 4.x, the only special case platform was Linux, where we provided
both a statically-linked Motif version and a dynamic Motif version. All
the other platforms were statically linked.
Comment 35•26 years ago
|
||
We could make gtk a component! :-)
We need to:
1) Get the bits we build at night working
2) Figure out a shipping strategy.
Switching back to dynamic linking fixes (1), let's
do that now and work on (2). I'd rather have something
working than nothing at all, which is currently the case.
Comment 36•26 years ago
|
||
And, in fact, though Mozilla can statically link with the LGPL'd GTK, I'm not
sure other commercial interests can. I think we should package the GTK libs
right alongside the xpcom and js ones, since we're going to need the
LD_LIBRARY_PATH stuff set up anyway.
Comment 37•26 years ago
|
||
*** Bug 13682 has been marked as a duplicate of this bug. ***
Updated•26 years ago
|
Target Milestone: M13 → M14
Comment 38•26 years ago
|
||
briano checked in last night. mcafee's step one above might be able to be
crossed off and a possible large set of folks might be able to run.
lets get in a room and battle this out mano y mano.
Comment 39•26 years ago
|
||
bits still crash for me, Solaris 2.6
Comment 40•26 years ago
|
||
That's because no new builds have been delivered since 1/14 (due to build
errors of various different types). Theoretically, tomorrow's builds might
make it all the way.
Comment 41•26 years ago
|
||
The Solaris build made it to the FTP server. Right now, it seems that I
must have installed my own copy of gtk for mozilla to start up. I'm willing
to do this, although I don't know what the blessed version of gtk is. Mozilla
seems extremely volatile (crashes & freezes) when I point it at my copy of
gtk, however. I guess that could be today's nightly build. I dunno.
I've gotten "Virtual memory exceeded in `new'" errors several times, even
though my machine has plenty of memory (and plenty of swap).
Comment 42•26 years ago
|
||
briano is no longer at netscape.
over to granrose, cc-ing leaf.
Assignee: briano → granrose
Status: ASSIGNED → NEW
Comment 44•26 years ago
|
||
accepted. changed QA contact to mcafee since I can't verify my own bug. moved
to M16, since Solaris delivery is not a beta blocker.
looking at the mozilla ftp site we had Solaris 2.6 and 2.51 builds deliver
yesterday which is more than we've had most of January it seems. anyone have
any insights on how we're going to resolve this?
Status: NEW → ASSIGNED
QA Contact: granrose → mcafee
Whiteboard: [PDT-] Build packaging problem. Need status from briano. → [PDT-] Build packaging problem. Need plan.
Target Milestone: M14 → M16
Comment 46•26 years ago
|
||
The 1/27 and 2/3 builds don't crash like before (and use my gtk).
Now I get two messages "Gdk-WARNING **:shmat failed!" The errno set by the call
corresponds to "To many open files". Adding --no-xshm to the arguments of
mozilla-bin prevents this.
In either case, it gets down to "WEBSHELL+ = 1" and nothing else happens - no
crash but no windows open.
(SunOS 5.5.1 Gtk+-1.2.6)
Updated•26 years ago
|
Summary: [DOGFOOD] Solaris: Build packaging problem, crash on startup → Solaris: Build packaging problem, crash on startup
Comment 47•25 years ago
|
||
What machine are the ftp.mozilla.org bits built on? I currently have a theory
(which I am still testing) that there is an optimizer bug in gcc 2.7.2.3 that's
causing the hang-at-startup problem.
Comment 48•25 years ago
|
||
my comment in this bug log says executor, still valid?
Comment 49•25 years ago
|
||
Executor doesn't have a dynamic copy of libg++ and libstdc++, as mentioned
above, this is likely part of the problem. Even if we could link statically
with those libs (and it sounds like Pavlov thinks we can't), the current linking
setup actually attempts to pull non-PIC code out of those libraries into at
least one or two of the shared libs (libwidget_gtk, if i remember correctly).
Comment 50•25 years ago
|
||
not sure what to do with this one. punting to nobody until someone wants to
step forward and claim it.
Comment 51•25 years ago
|
||
*** Bug 16210 has been marked as a duplicate of this bug. ***
Comment 52•25 years ago
|
||
Adding Rich Burridge to the CC on this bug; perhaps he will want it. It's
possible that the fix he checked in to bug 15604 will have fixed this.
Additionally, if the builds in question were gcc builds, there is a problem with
non-PIC code being pulled into shared libraries (see bug 23759) which could
conceivably be causing problems here as well. I'm gonna try and look at 23759
again soon.
Updated•25 years ago
|
| Assignee | ||
Comment 53•25 years ago
|
||
[richb - 10th April 2000]
I'll take it. This is how we intended to "fix" it for our Solaris Netscape
PR1 version, (whose bits we are currently building in order to give it back
to the mozilla.org site; hi Leaf!):
We have simply build glib, gtk+ and libIDL dynamically, created a single
.tar.gz distribution which contains mozilla distribution + those three
libraries + a simple "netscape" script (that'll probably be a copy of the
"mozilla" script). The "netscape" script will be setup to just look for
it's dependancies within the distribution directory.
Does this approach sound like the right one to you all?
PS: We are also using the Gnu compilers so expect a 7.8Mb binary distribution
and not a 16Mb one that you get with SW 5.0 compilers. I'll worry about
*that* problem somewhere else.
Comment 54•25 years ago
|
||
huh? PR1 on mozilla.org? you must have us confused for a portal company that's
released a mozilla-based browser and called it ``Netscape 6 PR1'' =)
Seriously, rich, if you are planning on putting up a pr1 build a la netscape,
send me private mail.
Comment 55•25 years ago
|
||
richb: that approach sounds reasonable to me. in some ideal world, there might
be some mechanism for the user to configure the browser to use an already
existing set of shared libraries if they happen to already be installed.
Conceivably split this out into two tarballs so that people who already have
this stuff don't need to download it again?
Assignee: nobody → rich.burridge
Comment 56•25 years ago
|
||
What bout using Solaris packages ?
Maybe you can ship packages like this:
- GTK
- GLIB
- libIDL
- Mozilla core
- Mozilla mail/news
- Mozilla editor
- Mozilla misc
- Mozilla add-ons
If a user already installed a library he/she may skip the matching package...
| Assignee | ||
Comment 57•25 years ago
|
||
[richb - 11th April 2000]
We're way ahead of you. For our equivalent of PR1, we are just goinh to do
a .tar.gz and make it available off somewhere like www.sunfreeware.com, but
for beta2 and beyond, we'll use the SVR4 pkgadd delivery style. The extra
libraries (glib, gtk+ ...) will be in a separate package.
To the bug submitter; are you happy with are proposed fix for this problem?
Can I close the bug, or do you want to wait under Netscape 6 beta2? Thanks.
| Assignee | ||
Updated•25 years ago
|
Status: NEW → ASSIGNED
Comment 58•25 years ago
|
||
I'm still against hosting gpl binaries that we need to host the source for as
well. dmose, you think we should start hosting gpl licensed binaries, and comply
with the publishing of sources as well? Do you know how long we have to keep
hosting the sources?
Comment 59•25 years ago
|
||
Section 6 of the LGPL is what you want to look at:
http://www.gnu.org/copyleft/lgpl.html
The shortened answer is that you must make the source available upon request for
at least 3 yrs.
Comment 60•25 years ago
|
||
If we get the binary builds from somewhere else, though, then we can use 3c in
the GPL to just ``pass along'' the source distribution requirements. That would
save us some work.
Comment 61•25 years ago
|
||
*** Bug 35888 has been marked as a duplicate of this bug. ***
Comment 62•25 years ago
|
||
*** Bug 27995 has been marked as a duplicate of this bug. ***
Comment 63•25 years ago
|
||
Putting on [dogfood-] radar.
Whiteboard: [PDT-] Build packaging problem. Need plan. → [dogfood-] Build packaging problem. Need plan.
Comment 64•25 years ago
|
||
*** Bug 27112 has been marked as a duplicate of this bug. ***
Comment 65•25 years ago
|
||
Nightly builds are back and work fine with gtk 1.2.3 fetched from
www.sunfreeware.com as well as gtk 1.2.7 built with WS5.0. RichB has a
packaging plan. Marking fixed.
Comment 66•25 years ago
|
||
Actually marking fixed this time.
Status: ASSIGNED → RESOLVED
Closed: 26 years ago → 25 years ago
Resolution: --- → FIXED
Updated•21 years ago
|
Product: Browser → Seamonkey
You need to log in
before you can comment on or make changes to this bug.
Description
•