Closed
Bug 13202
Opened 25 years ago
Closed 24 years ago
Solaris: Build packaging problem, crash on startup
Categories
(SeaMonkey :: Build Config, defect, P1)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: MatsPalmgren_bugz, Assigned: rich.burridge)
References
Details
(Keywords: crash, helpwanted, Whiteboard: [dogfood-] Build packaging problem. Need plan.)
Attachments
(2 files)
Apprunner nightly builds for Solaris have been crashing on startup for about 2 weeks now. Viewer works fine though. I will attach a stack trace. uname -a: SunOS claudia 5.6 Generic_105181-11 sun4u sparc SUNW,Ultra-2
Reporter | ||
Comment 1•25 years ago
|
||
Reporter | ||
Comment 3•25 years ago
|
||
Just as a status update: it's still crashing. I have checked now and then for the past month (latest build: 1999-10-07-16-M11) and it always crashes with the same stack trace as the one I have attached.
Updated•25 years ago
|
Priority: P3 → P1
Comment 4•25 years ago
|
||
Should a P1 critical have no milestone?
Here is the analysis I sent several of the engineers on 8 September, plus some more recent discoveries. I'm a little sad that it hasn't resulted in any action, so I'll post it here: > 1) The problem appears to be that gdk_rdb_init calls gdk_rgb_choose_visual, > which does: > > visuals = gdk_list_visuals (); > > When this returns, visuals is NULL; subsequently, the code does: > > tmp_list = visuals; > best_visual = tmp_list->data; > > Which is clearly the cause of the crash. > > 2) So why does gdk_list_visuals() return NULL? It's pretty simple: > > gdk_list_visuals (void) > { > GList *list; > guint i; > > list = NULL; > for (i = 0; i < nvisuals; ++i) > list = g_list_append (list, (gpointer) &visuals[i]); > > return list; > } > > nvisuals is 0, so the list returned is NULL. > > Sadly, what I can't answer is why nvisuals is empty. I'll keep poking > to see if I can figure out why. > > This leads me to... > > 3) Why is it that there are two complete copies of gdk in different > shared libraries? Here's what I mean: > > $ dis -F gdk_rgb_init libgfx_gtk.so | head > disassembly for libgfx_gtk.so > > section .text > gdk_rgb_init() > 165798: 9d e3 bf 90 save %sp, -112, %sp > 16579c: 11 00 00 00 sethi %hi(XSynchronize), %o0 > 1657a0: d0 4a 20 00 ldsb [%o0 + XSynchronize], %o0 > > $ dis -F gdk_rgb_init libwidget_gtk.so | head > disassembly for libwidget_gtk.so > > section .text > gdk_rgb_init() > 1a5a0c: 9d e3 bf 90 save %sp, -112, %sp > 1a5a10: 11 00 00 00 sethi %hi(XSynchronize), %o0 > 1a5a14: d0 4a 20 00 ldsb [%o0 + XSynchronize], %o0 > That was my original mail-- I got some vague response that this "looked like an X server bug" which I rejected. Since then I thought some more about this problem: I think the issue is that one library is getting loaded after the other, and effectively interposing on the other's functions. This is a standard linker trick, but I think here it is happening by mistake. Recall that this would also explain why nvisuals is 0-- it is state private to the gdk library. In this case, when the second copy of that library is loaded, it has its own, *uninitialized* version of the nvisuals variable. To check this, I performed the following experiment using LD_PRELOAD, which should keep a particular library at the head of the link chain: First, I installed libgtk and libgdk, etc a well-known place (in this case /usr/local/lib) $ /bin/ksh $ export LD_PRELOAD=libgtk.so:libgdk.so $ export LD_LIBRARY_PATH=/usr/local/lib Then I ran mozilla, and everything came up. In this case, I forced the application to *always* use my copy of gtk/gdk, instead of getting confused between two other copies. So: I think removing one of the two statically linked copies of libgtk/gdk should solve this problem.
Comment 6•25 years ago
|
||
Updating QA contact to a release person. Internal Core QA does not test Solaris.
Reporter | ||
Updated•25 years ago
|
Status: VERIFIED → REOPENED
Reporter | ||
Comment 9•25 years ago
|
||
Reopen because bug 13160 was fixed but this bug still occurs in build 1999111909
Comment 10•25 years ago
|
||
Clearing DUPLICATE resolution due to reopen.
Updated•25 years ago
|
Assignee: cyeh → chofmann
Status: REOPENED → NEW
Comment 11•25 years ago
|
||
did mcafee's recent award winning fixes help this? ;-) jdunn, do you see this on other unix ports?
Comment 12•25 years ago
|
||
On AIX, we ran into this problem EONS ago and the 'fix' is to create a shared gtk library and link against this instead of the static libs. on HPUX we are still chasing the problem that was introduced with superwin, but it is either a similar case to this OR a threading issue Personally I would love it if we only lined against gtk/gdk... once instead of the 3-5 times we do it now.
Updated•25 years ago
|
Assignee: chofmann → mcafee
Target Milestone: M12
Comment 13•25 years ago
|
||
stealing from chofmann, m12.
Updated•25 years ago
|
Summary: Apprunner crash on startup → Solaris: Duplicate gtk libs -> crash @ startup
Comment 14•25 years ago
|
||
better summary
Updated•25 years ago
|
Target Milestone: M12 → M13
Comment 15•25 years ago
|
||
gtk/gdk should be dynamically linked, I only show an undefined ref. to gdk_rgb_init in libwidget_gtk.so. I'm guessing you've got some static versions of gdk/gtk around, and/or missing some dynamic versions. ? This looks like gtk installation confusion to me. pushing off m12.
Comment 16•25 years ago
|
||
I'm can't remember when this worked on Solaris - maybe M5?... I just downloaded the latest build - ftp server said 12/02/99. I haven't installed any custom GTK, GDK, etc. I am viewing it thru an PC X server if that helps anybody. e5sey{gcfalck}85: ./mozilla MOZILLA_FIVE_HOME=/home/iis/mozilla/package LD_LIBRARY_PATH=/home/iis/mozilla/package:/usr/ucblib:/interleaf/rdm2.7/sun4os5/fulcrum/lib:/interleaf/rdm2.6/sun4os5/fulcrum/lib:/usr/openwin/lib SHLIB_PATH=/home/iis/mozilla/package LIBPATH=/home/iis/mozilla/package MOZ_PROGRAM=./mozilla-bin MOZ_TOOLKIT= moz_debug=0 moz_debugger= nNCL: registering deferred (0) Segmentation Fault e5sey{gcfalck}113: gdb mozilla-bin core GNU gdb 4.17 Copyright 1998 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "sparc-sun-solaris2.6"... (no debugging symbols found)... Core was generated by `./mozilla-bin'. Program terminated with signal 9, Killed. Reading symbols from /home/iis/mozilla/package/libraptorgfx.so... (no debugging symbols found)...done. Reading symbols from /home/iis/mozilla/package/libmozjs.so... (no debugging symbols found)...done. ...[snip]... (gdb) where #0 0xeda74cc4 in gdk_rgb_set_min_colors () #1 0xeda74cc0 in gdk_rgb_set_min_colors () #2 0xeda74e90 in gdk_rgb_init () #3 0xeda77f78 in gdk_rgb_get_visual () #4 0xed9cc674 in nsDeviceContextGTK::Init () #5 0xee28eb58 in nsBaseWidget::BaseCreate () #6 0xee287a3c in nsWidget::CreateWidget () #7 0xee287d5c in nsWidget::Create () #8 0xee7bf290 in nsWebShellWindow::Initialize () #9 0xee7bd248 in nsAppShellService::JustCreateTopWindow () #10 0xee7bd0c0 in nsAppShellService::CreateTopLevelWindow () #11 0xedd9683c in nsProfile::LoadDefaultProfileDir () #12 0xedd96398 in nsProfile::StartupWithArgs () #13 0x1455c in NS_CanRun () #14 0x14aa0 in main () (gdb) q e5sey{gcfalck}114: uname -a SunOS e5sey 5.6 Generic_105181-09 sun4u sparc
Updated•25 years ago
|
Summary: Solaris: Duplicate gtk libs -> crash @ startup → [DOGFOOD] Solaris: Duplicate gtk libs -> crash @ startup
Target Milestone: M13 → M12
Comment 17•25 years ago
|
||
ok, back to m12. latest download crashes for me per greg's last comment.
Updated•25 years ago
|
Whiteboard: PDT-
Comment 18•25 years ago
|
||
will we figure this out before Tuesday?
Comment 19•25 years ago
|
||
re: tuesday: I hope so.
Comment 20•25 years ago
|
||
Moving to M13 since this has been determined to be PDT-.
Comment 21•25 years ago
|
||
*** Bug 20748 has been marked as a duplicate of this bug. ***
Updated•25 years ago
|
Assignee: mcafee → briano
Comment 22•25 years ago
|
||
Brian, it looks like executor (2.5.1) is linking with gtk libs improperly? We should try running the build in that tree to debug this.
Reporter | ||
Comment 23•25 years ago
|
||
*** Bug 22814 has been marked as a duplicate of this bug. ***
Updated•25 years ago
|
Whiteboard: [PDT-] → [PDT-] Build packaging problem
Comment 24•25 years ago
|
||
Brian is trying a 2.6 build, this might fix the problem.
Updated•25 years ago
|
Summary: [DOGFOOD] Solaris: Duplicate gtk libs -> crash @ startup → [DOGFOOD] Solaris: Build packaging problem, crash on startup
Whiteboard: [PDT-] Build packaging problem → Build packaging problem
Comment 25•25 years ago
|
||
Better summary, I think this is just a packaging mis-cue. Putting back on PDT radar, we look stupid shipping Solaris bits that DO NOT WORK ANYWHERE, NOT EVEN ON THE BUILD MACHINE.
Updated•25 years ago
|
Whiteboard: Build packaging problem → PDT- Build packaging problem
Comment 26•25 years ago
|
||
marking PDT- per PDT, dogfood eaters can do manual install
Comment 27•25 years ago
|
||
*** Bug 20678 has been marked as a duplicate of this bug. ***
Whiteboard: PDT- Build packaging problem → [PDT-] Build packaging problem
Updated•25 years ago
|
Whiteboard: [PDT-] Build packaging problem → [PDT-] Build packaging problem. Need status from briano.
Comment 28•25 years ago
|
||
The problem as I see it (from a Release point of view) is that we are incorrectly linking with the GTK libs multiple times when we should be doing so once. Linking dynamically is _not_ an option, because we can't release a product that requires the user to find, download, build, and install _anything_ (especially something like GTK that is changing rapidly, and which any random version can't be guaranteed to be 100% compatible with our releases). The user must be able to simply download our product, install it, and run. And that means static GTK. Period. Let's fix the _real_ problem here, instead of coming up with hacks that results in a product we ultimately can't ship.
Updated•25 years ago
|
Status: NEW → ASSIGNED
Comment 29•25 years ago
|
||
Brian - We are not going to ship a statically linked product on Linux. This would simply suck. There are too many issues that can cause far more crashes doing this.
What did we do with Motif in 4.x? Why is this solution not good enough here? (As I understand it, we shipped both dynamic and static versions for Linux/*BSD, and dynamic-Motif-only for platforms that came with Motif by default.)
Comment 31•25 years ago
|
||
Because statically linking GTK with our app won't work correctly. The whole idea of components becomes broken at that point. We can't simply link GTK in to the main app because that then breaks the idea of being able to change toolkits on the fly. Also statically linking GTK with mozilla can cause theme version conflicts which can kill the browser on startup.
We could statically link GTK into the GTK widget/gfx lib though, right? That wouldn't solve the theme problem, though.
Comment 33•25 years ago
|
||
No, this won't work. Since widget and gfx would both then have the same symbols inside them so that on broken platforms that have to resolve all the symbols at runtime they would then get duplicated symbols causing the app not to startup.
Comment 34•25 years ago
|
||
Which leaves us right back at the beginning.... Do we end up having to ship the GTK shared libs (prebuilt for each platform) as part of our product releases (except for Linux...)? FYI: For 4.x, the only special case platform was Linux, where we provided both a statically-linked Motif version and a dynamic Motif version. All the other platforms were statically linked.
Comment 35•25 years ago
|
||
We could make gtk a component! :-) We need to: 1) Get the bits we build at night working 2) Figure out a shipping strategy. Switching back to dynamic linking fixes (1), let's do that now and work on (2). I'd rather have something working than nothing at all, which is currently the case.
And, in fact, though Mozilla can statically link with the LGPL'd GTK, I'm not sure other commercial interests can. I think we should package the GTK libs right alongside the xpcom and js ones, since we're going to need the LD_LIBRARY_PATH stuff set up anyway.
Comment 37•25 years ago
|
||
*** Bug 13682 has been marked as a duplicate of this bug. ***
Updated•25 years ago
|
Target Milestone: M13 → M14
Comment 38•25 years ago
|
||
briano checked in last night. mcafee's step one above might be able to be crossed off and a possible large set of folks might be able to run. lets get in a room and battle this out mano y mano.
Comment 39•25 years ago
|
||
bits still crash for me, Solaris 2.6
Comment 40•25 years ago
|
||
That's because no new builds have been delivered since 1/14 (due to build errors of various different types). Theoretically, tomorrow's builds might make it all the way.
Comment 41•25 years ago
|
||
The Solaris build made it to the FTP server. Right now, it seems that I must have installed my own copy of gtk for mozilla to start up. I'm willing to do this, although I don't know what the blessed version of gtk is. Mozilla seems extremely volatile (crashes & freezes) when I point it at my copy of gtk, however. I guess that could be today's nightly build. I dunno. I've gotten "Virtual memory exceeded in `new'" errors several times, even though my machine has plenty of memory (and plenty of swap).
Comment 42•25 years ago
|
||
briano is no longer at netscape. over to granrose, cc-ing leaf.
Assignee: briano → granrose
Status: ASSIGNED → NEW
Comment 44•25 years ago
|
||
accepted. changed QA contact to mcafee since I can't verify my own bug. moved to M16, since Solaris delivery is not a beta blocker. looking at the mozilla ftp site we had Solaris 2.6 and 2.51 builds deliver yesterday which is more than we've had most of January it seems. anyone have any insights on how we're going to resolve this?
Status: NEW → ASSIGNED
QA Contact: granrose → mcafee
Whiteboard: [PDT-] Build packaging problem. Need status from briano. → [PDT-] Build packaging problem. Need plan.
Target Milestone: M14 → M16
Comment 46•25 years ago
|
||
The 1/27 and 2/3 builds don't crash like before (and use my gtk). Now I get two messages "Gdk-WARNING **:shmat failed!" The errno set by the call corresponds to "To many open files". Adding --no-xshm to the arguments of mozilla-bin prevents this. In either case, it gets down to "WEBSHELL+ = 1" and nothing else happens - no crash but no windows open. (SunOS 5.5.1 Gtk+-1.2.6)
Updated•25 years ago
|
Summary: [DOGFOOD] Solaris: Build packaging problem, crash on startup → Solaris: Build packaging problem, crash on startup
Comment 47•25 years ago
|
||
What machine are the ftp.mozilla.org bits built on? I currently have a theory (which I am still testing) that there is an optimizer bug in gcc 2.7.2.3 that's causing the hang-at-startup problem.
Comment 48•25 years ago
|
||
my comment in this bug log says executor, still valid?
Comment 49•25 years ago
|
||
Executor doesn't have a dynamic copy of libg++ and libstdc++, as mentioned above, this is likely part of the problem. Even if we could link statically with those libs (and it sounds like Pavlov thinks we can't), the current linking setup actually attempts to pull non-PIC code out of those libraries into at least one or two of the shared libs (libwidget_gtk, if i remember correctly).
Comment 50•24 years ago
|
||
not sure what to do with this one. punting to nobody until someone wants to step forward and claim it.
Comment 51•24 years ago
|
||
*** Bug 16210 has been marked as a duplicate of this bug. ***
Comment 52•24 years ago
|
||
Adding Rich Burridge to the CC on this bug; perhaps he will want it. It's possible that the fix he checked in to bug 15604 will have fixed this. Additionally, if the builds in question were gcc builds, there is a problem with non-PIC code being pulled into shared libraries (see bug 23759) which could conceivably be causing problems here as well. I'm gonna try and look at 23759 again soon.
Updated•24 years ago
|
Assignee | ||
Comment 53•24 years ago
|
||
[richb - 10th April 2000] I'll take it. This is how we intended to "fix" it for our Solaris Netscape PR1 version, (whose bits we are currently building in order to give it back to the mozilla.org site; hi Leaf!): We have simply build glib, gtk+ and libIDL dynamically, created a single .tar.gz distribution which contains mozilla distribution + those three libraries + a simple "netscape" script (that'll probably be a copy of the "mozilla" script). The "netscape" script will be setup to just look for it's dependancies within the distribution directory. Does this approach sound like the right one to you all? PS: We are also using the Gnu compilers so expect a 7.8Mb binary distribution and not a 16Mb one that you get with SW 5.0 compilers. I'll worry about *that* problem somewhere else.
Comment 54•24 years ago
|
||
huh? PR1 on mozilla.org? you must have us confused for a portal company that's released a mozilla-based browser and called it ``Netscape 6 PR1'' =) Seriously, rich, if you are planning on putting up a pr1 build a la netscape, send me private mail.
Comment 55•24 years ago
|
||
richb: that approach sounds reasonable to me. in some ideal world, there might be some mechanism for the user to configure the browser to use an already existing set of shared libraries if they happen to already be installed. Conceivably split this out into two tarballs so that people who already have this stuff don't need to download it again?
Assignee: nobody → rich.burridge
Comment 56•24 years ago
|
||
What bout using Solaris packages ? Maybe you can ship packages like this: - GTK - GLIB - libIDL - Mozilla core - Mozilla mail/news - Mozilla editor - Mozilla misc - Mozilla add-ons If a user already installed a library he/she may skip the matching package...
Assignee | ||
Comment 57•24 years ago
|
||
[richb - 11th April 2000] We're way ahead of you. For our equivalent of PR1, we are just goinh to do a .tar.gz and make it available off somewhere like www.sunfreeware.com, but for beta2 and beyond, we'll use the SVR4 pkgadd delivery style. The extra libraries (glib, gtk+ ...) will be in a separate package. To the bug submitter; are you happy with are proposed fix for this problem? Can I close the bug, or do you want to wait under Netscape 6 beta2? Thanks.
Assignee | ||
Updated•24 years ago
|
Status: NEW → ASSIGNED
Comment 58•24 years ago
|
||
I'm still against hosting gpl binaries that we need to host the source for as well. dmose, you think we should start hosting gpl licensed binaries, and comply with the publishing of sources as well? Do you know how long we have to keep hosting the sources?
Comment 59•24 years ago
|
||
Section 6 of the LGPL is what you want to look at: http://www.gnu.org/copyleft/lgpl.html The shortened answer is that you must make the source available upon request for at least 3 yrs.
If we get the binary builds from somewhere else, though, then we can use 3c in the GPL to just ``pass along'' the source distribution requirements. That would save us some work.
Comment 61•24 years ago
|
||
*** Bug 35888 has been marked as a duplicate of this bug. ***
Comment 62•24 years ago
|
||
*** Bug 27995 has been marked as a duplicate of this bug. ***
Comment 63•24 years ago
|
||
Putting on [dogfood-] radar.
Whiteboard: [PDT-] Build packaging problem. Need plan. → [dogfood-] Build packaging problem. Need plan.
Comment 64•24 years ago
|
||
*** Bug 27112 has been marked as a duplicate of this bug. ***
Comment 65•24 years ago
|
||
Nightly builds are back and work fine with gtk 1.2.3 fetched from www.sunfreeware.com as well as gtk 1.2.7 built with WS5.0. RichB has a packaging plan. Marking fixed.
Comment 66•24 years ago
|
||
Actually marking fixed this time.
Status: ASSIGNED → RESOLVED
Closed: 25 years ago → 24 years ago
Resolution: --- → FIXED
Updated•20 years ago
|
Product: Browser → Seamonkey
You need to log in
before you can comment on or make changes to this bug.
Description
•