Closed
Bug 411171
Opened 17 years ago
Closed 16 years ago
Thunderbird Mac tinderbox crashing in dump_syms
Categories
(Toolkit :: Crash Reporting, defect)
Tracking
()
RESOLVED
INCOMPLETE
People
(Reporter: philor, Assigned: gozer)
References
()
Details
(Whiteboard: [needs info re 3.0a build process])
Apparently for quite a while now, the Thunderbird Mac tinderbox has been crashing while trying to dump symbols - Ted says "2007100103 is the last one to have symbols for thunderbird-bin" and logs back into December show the same dump_syms(25544) malloc: *** vm_allocate(size=1406185472) failed (error code=3) dump_syms(25544) malloc: *** error: can't allocate region dump_syms(25544) malloc: *** set a breakpoint in szone_error to debug 2008-01-06 03:50:04.817 dump_syms[25544] *** Uncaught exception: <NSInvalidArgumentException> *** NSCopyMemoryPages(0x2008000, 0x0, 1124945920) failed dump_syms(25546) malloc: *** vm_allocate(size=1406185472) failed (error code=3) dump_syms(25546) malloc: *** error: can't allocate region dump_syms(25546) malloc: *** set a breakpoint in szone_error to debug 2008-01-06 03:50:05.336 dump_syms[25546] *** Uncaught exception: <NSInvalidArgumentException> *** NSCopyMemoryPages(0x2008000, 0x0, 1124945920) failed
Comment 1•17 years ago
|
||
cf is going to grab thunderbird-bin from this box for examination.
Reporter | ||
Comment 3•16 years ago
|
||
Oops, I might have accidentally fixed this. I was looking at the log for the (clobbered) first build after I checked in bug 414515, http://tinderbox.mozilla.org/showlog.cgi?log=Thunderbird/1202072340.1202075003.20596.gz&fulltext=1 and noticed it survived dump_syms, while the previous nightly didn't.
Reporter | ||
Comment 4•16 years ago
|
||
And the next real nightly worked, too, so apparently whatever broke it (probably me, 20071001 being when I turned on SVG in Thunderbird), switching from --enable-optimize="-O2 -g" to export C(XX)FLAGS="-g -gfull" did fix it.
Comment 5•16 years ago
|
||
Going to resolve this for now, if it pops back up we'll reopen. Without being able to reproduce it locally it's hard to fix.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → WORKSFORME
Comment 6•16 years ago
|
||
This seems to have returned: buildid: 2008043003 Uploading nightly release build make -C /builds/tinderbox/Tb-Trunk/Darwin_8.8.4_Depend/mozilla/../build/universal/ppc buildsymbols echo building symbol store building symbol store mkdir -p ./dist/crashreporter-symbols/2008043003 /usr/bin/python /builds/tinderbox/Tb-Trunk/Darwin_8.8.4_Depend/mozilla/toolkit/crashreporter/tools/symbolstore.py \ -a "ppc i386" --vcs-info -s /builds/tinderbox/Tb-Trunk/Darwin_8.8.4_Depend/mozilla ./dist/host/bin/dump_syms \ ./dist/crashreporter-symbols/2008043003 \ ./dist/universal > \ ./dist/crashreporter-symbols/2008043003/thunderbird-3.0a1pre-Darwin-2008043003-symbols.txt dump_syms(15531) malloc: *** vm_allocate(size=1331736576) failed (error code=3) dump_syms(15531) malloc: *** error: can't allocate region dump_syms(15531) malloc: *** set a breakpoint in szone_error to debug 2008-04-30 03:48:36.662 dump_syms[15531] *** Uncaught exception: <NSInvalidArgumentException> *** NSCopyMemoryPages(0x2008000, 0x0, 1065385984) failed dump_syms(15533) malloc: *** vm_allocate(size=1331736576) failed (error code=3) dump_syms(15533) malloc: *** error: can't allocate region dump_syms(15533) malloc: *** set a breakpoint in szone_error to debug 2008-04-30 03:48:37.245 dump_syms[15533] *** Uncaught exception: <NSInvalidArgumentException> *** NSCopyMemoryPages(0x2008000, 0x0, 1065385984) failed Marking as blocking-3.0a1 because we want to be able to get useful crash-data out of 3.0a1.
Flags: blocking-thunderbird3.0a1+
Updated•16 years ago
|
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Comment 7•16 years ago
|
||
If someone can reproduce this locally, we can probably get a fix (or at least a workaround).
Comment 8•16 years ago
|
||
nthomas says that the symbols for thunderbird-bin disappeared between 2008041303 and 2008041403. A snippet from IRC about reproducing locally: [09:53am] dmose: ted: on that dump_syms crashiness; is reproducing locally like just a matter of building with an identical mozconfig as the tbox? or is there more to it? [09:53am] ted: i only have leopard here, so you probably need to be on tiger, maybe with the same version of xcode [09:53am] ted: i couldn't repro on leopard [09:54am] dmose: ted: interesting. but just building with the same mozconfig should be enough? or need to invoke via tinderbox clients scripts? [09:54am] ted: same mozconfig should be enough, hopefully [09:55am] dmose: i guess we'll find out! [09:55am] ted: yeah [09:55am] ted: tinderbox doesn't call it in any particularly fancy way [09:55am] ted: it just builds, then runs |make buildsymbols| in the objdir [09:56am] dmose: ok, good to know [09:56am] dmose: the tinderbox is PPC or intel? [09:56am] ted: intel [09:56am] ted: (pretty sure)
Comment 9•16 years ago
|
||
<http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=2008-04-13+01%3A00%3A00&maxdate=2008-04-14+04%3A00%3A00&cvsroot=%2Fcvsroot> is the checkins around that time (with a little slop on the edges for good measure). At first glance, nothing jumps out at me as an obvious candidate for having caused this... I wonder if there is any chance that something on that machine's configuration changed during that time frame.
Comment 10•16 years ago
|
||
Given the fact that it showed up earlier and disappeared, it's probably a bug in dump_syms that's triggered by a a particular set of compiler output. While there may be some patches that we could back out to get it to go away, there's no guarantee it wouldn't resurface again.
Comment 11•16 years ago
|
||
We have similar problem with the SeaMonkey Mac tinderbox on and on, see bug 395664
Comment 12•16 years ago
|
||
Can't see anything wrong with this machine - it has at least half of it's 4GB of it's RAM free as I look at it now - and I can't find any record of any config changes. If dump_syms is asking for more than a TB of memory then that's always going to be a big ask. :-)
Comment 13•16 years ago
|
||
smichaud was gracious enough to try this on his Tiger machine, and the dump_syms worked fine there, both in the i386 and ppc directories. So we still need to figure out how to reproduce...
Comment 14•16 years ago
|
||
Re-assigning to rick in the hopes that he has or can get access to the actual machine where this is happening and can reproduce it / catch it in the debugger there...
Assignee: nobody → rick.tessner
Status: REOPENED → NEW
Comment 15•16 years ago
|
||
Did a build on bm-xserve07 in ~cltbld/rick-411171 and could not reproduce. I then realized that I'd only built i386 and the dump_syms works fine for that. However, the nightly builds are a universal build. I am currently retrying the build as a universal (i386 and ppc) and we'll see whether I can reproduce the problem.
Comment 16•16 years ago
|
||
I would have thought that using the mozconfig from that Tinderbox would have forced a universal build (it did for me on my local Leopard machine), where I also couldn't reproduce.
Comment 17•16 years ago
|
||
It does if you have the mozilla/build/macosx/universal/mozconfig checked out as well. That's dotted by the mozconfig. Since I started with a clean check-out, I did not have the universal/mozconfig checked out. Once I had that checked out as well, the universal build proceeded. That still did not reproduce the problem tho. ie. The build and buildsymbols step was completely successful. Below are the steps I did to do the build directly on bm-xserve07 (as the user cltbld): mkdir ~/rick-411171 cp /builds/tinderbox/Tb-Trunk/mozconfig ./ export MOZCONFIG=$PWD/mozconfig cvs -d :ext:tbirdbld@cvs.mozilla.org:/cvsroot co mozilla/client.mk mozilla/build/macosx/universal/mozconfig cd mozilla /usr/bin/make -f client.mk MOZ_OBJDIR=../build/universal checkout mkdir -p -m 0777 ../build/universal /usr/bin/make -f client.mk MOZ_OBJDIR=../build/universal CONFIGURE_ENV_ARGS='CC=cc CXX=c++' build_all_depend And once that completed, I ran the buildsymbols step with: make -C /Users/cltbld/rick-411171/build/ universal/ppc buildsymbols And that did not come up with the dump_syms error at all. The log of this can be seen on bm-xserve07:~cltbld/rick-4111171/screenlog.0
Comment 18•16 years ago
|
||
If we can't get this crash to repro (and hence fix) soon, I'm tempted to not consider it a blocker for 3.0a1. Not having crash data for mac is problematic, but not as problematic as not having any feedback.
Comment 19•16 years ago
|
||
The only other thing I could suggest would be to stop the nightly tinderbox after it's built a nightly, and run dump_syms manually to reproduce the problem.
Reporter | ||
Comment 20•16 years ago
|
||
One other random thing to try: after your build finishes, but before you buildsymbols, run and kill dist/Thunderbird.app, to imitate having run MozillaAliveTest. (Okay, more than one, since you could also run the make package equivalent, and the regxpcom test, and codesighs, but I'm more suspicious of having run the build.)
Comment 21•16 years ago
|
||
(In reply to comment #0) > Apparently for quite a while now, the Thunderbird Mac tinderbox has been > crashing while trying to dump symbols - Ted says "2007100103 is the last one to > have symbols for thunderbird-bin" and logs back into December show the same > > dump_syms(25544) malloc: *** vm_allocate(size=1406185472) failed (error code=3) > dump_syms(25544) malloc: *** error: can't allocate region > dump_syms(25544) malloc: *** set a breakpoint in szone_error to debug > 2008-01-06 03:50:04.817 dump_syms[25544] *** Uncaught exception: Is any change needing in monitoring symbols upload? ref: bug 401808 nagios monitoring for breakpad symbol upload
Comment 22•16 years ago
|
||
Wayne: it's sort of tricky, we could probably monitor specific important files like thunderbird-bin/libxul etc, but that doesn't really guarantee that everything was good. I guess ideally we should make the buildsymbols step just fail if dump_syms crashes anywhere.
Comment 23•16 years ago
|
||
want a new bug on that, or should I reopen bug 401808?
Comment 24•16 years ago
|
||
Alrighty then, the nightly build that I ran explicitly on bm-xserve07 did die with the dump_syms error: ./dist/crashreporter-symbols/2008050210/thunderbird-3.0a1pre-Darwin-2008050210-symbols.txt dump_syms(5496) malloc: *** vm_allocate(size=1331904512) failed (error code=3) dump_syms(5496) malloc: *** error: can't allocate region I grabbed the directory /builds/tinderbox/Tb-Trunk and tar'd it into ~cltbld/rick-411171/Tb-Trunk I then ran, while cd'd to this *copy* of the Tb-Trunk directory, make -C /Users/cltbld/rick-411171/Tb-Trunk/Darwin_8.8.4_Depend/build/universal/ppc buildsymbols and it ran just fine. I'm at a loss at what this could be at this point. That malloc is trying to allocate 1.3Gb. Could we be that border-line on available memory that while running inside the perl build-seamonkey.pl script we fail adn when just running the make, it passes? Any ideas out there?
Comment 25•16 years ago
|
||
Is it possible that make saw that the file already existed since it had been run once, decided that that made the dependency up-to-date, and didn't try to run dump_syms again? I assume the original crash was while generating PPC symbols?
Comment 26•16 years ago
|
||
buildsymbols doesn't have any dependencies, so that's not it. Also, buildsymbols runs dump_syms once per-arch per-file on a universal build.
Comment 27•16 years ago
|
||
While we'd very much like to see this fixed for 3.0a1, we're not going to block on it. Rick, it might be worth playing around with the suggestions Phil had in comment 20. Explicitly bumping the OS virtual memory / swap size might also be worth looking into.
Flags: blocking-thunderbird3.0a2+
Flags: blocking-thunderbird3.0a1-
Flags: blocking-thunderbird3.0a1+
Comment 28•16 years ago
|
||
I've been searching about trying to figure out how to create more swap space on mac osx. Articles that I've read seem to indicate that the OS itself takes care of creating swap file in /var/vm. To test this, I put togther a little C program that just keeps grabbing 1/2 Gb of memory and sleeps for 2 seconds. Once it gets up to 2.5 Gb total allocation, I get the nice error: a.out(15125) malloc: *** vm_allocate(size=536870912) failed (error code=3) a.out(15125) malloc: *** error: can't allocate region a.out(15125) malloc: *** set a breakpoint in szone_error to debug which seems to indicate that swap is not created automatically by the OS. On bm-xserve07, a |df -h /var/vm| shows that it's on the root partition and that there's plenty (21G) of space available. So, does anyone have any idea on how to create swap on OSX? Or is there some limit on OSX about how much swap can be created? I'm almost tempted just to reboot the box and force a nightly build. (I'm wondering if there might be memory leaks somewhere that has led to a mem shortage ... it has been up for about 125 days at this point)
Comment 29•16 years ago
|
||
Rebooted bm-xserve07 ahead of today's nightly to test the memory shortage hypothesis. It'll do an some hourly builds before hitting the nightly.
Comment 30•16 years ago
|
||
top is logging into ~/nthomas/top.log on a 5 second interval, for the two hours from 2:48 PDT. Hopefully there will be some clues there; alternatively we could sample with a much smaller interval given some trigger for the symbol collection.
Comment 31•16 years ago
|
||
rick, nick -- at some point it'd be good to get an update on this bug, as it's marked blocking-tb3a2, and if anyone has made progress on figuring out what happened, that'd be good to know.
Updated•16 years ago
|
Whiteboard: [status unknown]
Comment 32•16 years ago
|
||
I'm not aware of any progress on this.
Updated•16 years ago
|
Assignee: rick.tessner → gozer
Assignee | ||
Comment 33•16 years ago
|
||
I've been able to get a new osx buildbot running (still in testing/debugging mode) and added the buildsymbols step to it. So far, it has successfully completed that step every single time. Regarding the swapping, on OS X, you pretty much have as much swap space as free disk space in /, unless otherwise modified in /etc/rc (read dynamic_pager(8) for all the details). You might also have per-user limits (see ulimit -a) Not sure if this qualifies as progress, but I can report being unable to reproduce this failure.
Comment 34•16 years ago
|
||
Presumably, if we build 3.0a2 on gozer's buildbot and then use that machine instead of our current tinderbox going forward, we could declare victory here. Any chance of either of one or both of those things happening?
Whiteboard: [status unknown] → [needs info re 3.0a build process]
Comment 35•16 years ago
|
||
The impression I got from joduinn is that bhearsum is planning to do 3.0a2 builds on brand new VMs. This may just go away w/ the new VMs.
Comment 36•16 years ago
|
||
I can clearly reproduce such a crash with the steps mentioned in bug 444211 comment 3. It's an official nightly build and no debug build. So if I can help please give me note.
Comment 37•16 years ago
|
||
Not that this is marked as blocking bug 439142 that users are likely to see fairly frequently - i.e. its a repeatable crashing bug based on messages sent to you.
Comment 38•16 years ago
|
||
For the record, this was also a problem for the 3.0a2 release builds that we did with tinderbox, but isn't for 3.1b1pre nightlies under buildbot. I'll manually correct that for the release, but this is probably WONTFIX now that Thunderbird 3.x development moved away from tinderbox.
Comment 39•16 years ago
|
||
Would be nice to figure out the underlying cause here, but given how hard this is to reproduce, I don't think it's worth the effort right now.
Status: NEW → RESOLVED
Closed: 16 years ago → 16 years ago
Resolution: --- → INCOMPLETE
Comment 40•16 years ago
|
||
So if this is not going to be fixed, is there antoher way to fix Bug 439142 - a bad, repeatable, crashing bug, which is still marked dependent on this.
Comment 41•16 years ago
|
||
See bug 439142 comment 10 - we should have symbols now (ie by moving away from the system that was failing to generate them).
No longer blocks: 439142
You need to log in
before you can comment on or make changes to this bug.
Description
•