Closed Bug 72027 Opened 24 years ago Closed 24 years ago

Linux orange on tinderbox

Categories

(SeaMonkey :: General, defect, P1)

x86
Linux

Tracking

(Not tracked)

RESOLVED WORKSFORME
mozilla0.8.1

People

(Reporter: mcafee, Assigned: mcafee)

Details

(Keywords: helpwanted, smoketest)

Linux is orange on tinderbox, attinasi & co. talking about this on irc. This is a blocker, we need to fix this.
blocker
Severity: normal → blocker
Keywords: smoketest
On it like, like, er, stuff on other stuff.
Status: NEW → ASSIGNED
testing backout of rods change to config/config.mk, clobbering content & layout.
Locally, backing out rods changes to config.mk and clobbering/rebuilding makes the assertions at the end of the bloat tests go away. We are testing this on the coffee tinderbox now.
Just for the record, there are two problems: 1. the alive test, which needs a clobber of at least layout and content to get it to work. This is because of bug 72018 - the config.mk change didn't cause a rebuild. Regardless of what happens with the bloat test, most of the ports will need a clobber. 2. The bloat test, which looks like its only on the debug tinderboxes - ie the ones that rods' checkin affected. I don't see this problem, but I'm building with --enable-optimse=-O2 (but NOT --disable-debug). Everyone on IRC who sees this is running debug, and the optimised builds aren't.
other builds are randomly failing, and coffee test of config.mk just failed. Some builds are working, I'm now pretty confused.
WFT man?!? Well, removing Rod's change doesn't seem to solve the problem (other than the required clobber on linux that Bradley described). Some curiousities: * tinderbox-test-1 has NEVER gone orange, and it is Linux (-nondebug) * after fixing the AliveTest problems by clobbering, we get intermittent timeout failures and an occasional assert 'Assertion failure: 0 == rv, at ptsynch.c:168' * we get several cycles of green when nothing has changed, then several orange, usually with a timeout
Keywords: helpwanted
Priority: -- → P1
Target Milestone: --- → mozilla0.8.1
I'm not sure what, if any, tests are running on tinderbox-test-1 cls should know... if its running no tests it won't turn orange.
ok, I guess I should have looked at latest cycles before making my comment.. mcafee turned on tests early this morning... [mcafee@mocha.com - 03/14 01:59] just turned on tests, bloat numbers might be off.?
Has anyone outside netscape seen this? Is it a local networking issue of some sort? That would match the intermittent behaviour, and it appearing and disappearing without any code change. The bloat test is exactly the same as ./mozilla -f bloaturls.txt isn't it? That always works for me. I've occasionally got a crash on shutdown similar to coffee's current problems. Its very intermittant (about once every couple of weeks)? It seems to be clear now though, on the main page and ports (cement and muerte need a clobber)
Bryner suggested I CC an NSPR rep - cc'ing larryh@netscape.com We are getting periodic assertions in NSPR threads 'Assertion failure: 0 == rv, at ptsynch.c:168' on redHat 6.x machines
tinderbox-test-1 is rh7.0. only builds with the sighup problem are rh6.x + depend, I think.
adding darin, who noted the opposite, that rh7 was failing but rh6.2 worked-for-him. tinderboxes break down as follows coffee=rh6.2 shrike=rh6.0 harpoon=rh6.0 tinderbox-test-1=rh7.0 speedracer = Solaris 2.6
I'm on RH7 and am not seeing this. The last time I saw those assertions was on my old machine running 6.2.
mcafee: turns out i was seeing a different problem (on my rh7 box) which was simply solved by clobbering layout. I have, however, seen the ptsynch.c assertion (at least) once on a rh6.2 box (it's dual processor, if this makes any difference).
Could we perhaps modify ptsynch.c on one of the machines to actually give the error code that pthread_mutex_lock is returning? That might be helpful, rather than just knowing that it failed.
this is also happening on cement, IRIX 6.5, on the ports page.
mkaply: good idea, I just did that on lespaul build, on the main tbox page. Currently hidden: http://tinderbox.mozilla.org/showbuilds.cgi?tree=SeaMonkey&noignore=1#status
pthread_mutex_lock() is returning 22, recent lespaul log.
22 is EINVAL, which means the mutex has not been properly initialized (according to the man page).
Has anyone looked at the core file on the IRIX machine (cement)? This might gives us a clue about why it is crashing on shutdown (although not the assert problem).
The last line of the full log on cement is "killing plugin host". Has this something to do with plugins? Is there a plugin in the page or recent plugin checkins? I didn't check anything in.
This may be a build problem. That it is intermittent is troublning; it it is what we have seen before, it would be solid. -lpthread should always go ahead of -lc (or -lg++) when linking applications.
I don't really know that much about this stuff, but maybe my lack of knowledge can be helpful :) Are only machines that do bloat statistics having this problem? I find it interesting that shut down happens, then bloat statistics, then the assert. Can we turn off bloat statistics on either of the machine and see if that affects things?
I have seen this happen on the alivetest, which aborted the bloattest.
larryh-- I'm not sure if this is relevant, but it appears that when we landed NSPR autoconf, we no longer link libnspr4 with -lc. Could that be causing a problem?
I don't see this anymore, has it been fixed, or have these tinderboxen been really taken offline?
coffee is still orange with the assertion sometimes. (That machine still has rods' config.mk changes backed out locally - that should probably be fixed and then clobbered at some point)
downgrading from blocker to major after discussion about Mozilla 0.8.1 bugs in Performance meeting. I am not at all sure what we can do about this for 0.8.1
Severity: blocker → major
invalid now? no more chronic orange, me observes...
Assignee: attinasi → mcafee
Status: ASSIGNED → NEW
coffee still had orange, I've taken it offline for other testing. Please leave this open for a while, I'll assume ownership.
the orange has gone away, sigh. wfm. no QA verification needed.
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → WORKSFORME
Product: Browser → Seamonkey
You need to log in before you can comment on or make changes to this bug.