Closed Bug 584920 Opened 14 years ago Closed 14 years ago

Tinderbox is down again

Categories

(mozilla.org Graveyard :: Server Operations, task)

task
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: khuey, Assigned: justdave)

References

Details

tinderbox.mozilla.org is down again.
Is it still down? Looks like someone was spidering showbuilds.cgi
Severity: blocker → major
It appears to have come back up.
It comes and goes, I got a log but now everything is taking forever again.
Might be an issue with the iscsi storage ? There is a lot of time spent in iowait on the munin monitoring for the last two or three hours.
I wonder if we need a related bug to renice/cache in tbpl, or use json, or ?
Looks like the file that is trying to read is 120M. I'm not sure how fast iscsi usually is:


time cat  /var/www/iscsi/webtools/tinderbox/Firefox/build.dat > /dev/null

real	0m8.505s
user	0m0.020s
The url in comment 4 eventually worked for me. Just took a while.
from cvs:webtools/tinderbox/README:

build.dat is a database file each row is a build and has pipe
separated columns: 

1) the time stamp of the tinderbox server 
2) time stamp of the build machine 
3) the official build name (should include build machine name)
   ( note: that 2 & 3 together uniquely identify the build 
        and all relevant build data)
4) the architecture dependent error parser to use on the log files
5) status of the build (success|busted|building|testfailed|exception)
6) The log file for this build (if completed)
7) the name of the binary (if any) that came from the build

Can I get a copy of build.dat for the Firefox tree ? I'd like to check it's expiring builds properly.
(In reply to comment #8)
> The url in comment 4 eventually worked for me. Just took a while.

At the time I posted that comment I was getting 500's from that url. Now I don't anymore.
I reopened try, m-c is closed for other reasons.
The static pages, http://tinderbox.mozilla.org/Firefox/ where you can easily see the (out of) date, and http://tinderbox.mozilla.org/Firefox/json.js that tinderboxpushlog uses, are not updating, so we can't really open m-c unless someone's going to sit around manually hitting the URL that regenerates the static pages every few minutes.
CPU usage is spiking up again, about 140% of iowait and about 160% of "user"
Severity: major → critical
Killed off more spidering of showlogs/builds.
Assignee: server-ops → jeremy.orem+bugs
We hit a spike of 180% iowait and 205% user, now back down to 120% iowait and just under 200% user
Severity: critical → blocker
Whoops, didn't mean to raise severity
Severity: blocker → major
iowait is up around 160% again
(In reply to comment #7)
> Looks like the file that is trying to read is 120M. I'm not sure how fast iscsi
> usually is:
> 
> 
> time cat  /var/www/iscsi/webtools/tinderbox/Firefox/build.dat > /dev/null
> 
> real    0m8.505s
> user    0m0.020s

This doesn't seem all that fast, based on a couple of tests on other machines:
-rw-r--r--  1 bhearsum  bhearsum   117M  6 Aug 09:08 blahblah
foo-ix-blah:tmp bhearsum$ time cat blahblah > /dev/null

real	0m0.056s
user	0m0.001s
sys	0m0.054s



-rw-r--r-- 1 bhearsum users 118M Aug  6 06:12 blahblah
[bhearsum@cm-vpn01 ~]$ time cat blahblah > /dev/null

real	0m0.144s
user	0m0.018s
sys	0m0.122s




Of course, dm-webtools02 was probably loaded at the time your test was run. Is there any way we can do a health check on this disk?
We're in a death spiral again. 0% idle CPU
Severity: major → blocker
Load seems to have gone down quite a bit in the past 20 minutes. iowait is at 40%, user is at 150% or so, which is normal for this time of day for the past month.

Jeremy, you blocked 195.166.157.111 at some point this morning. Turns out that this is a developer's IP address, can we get it unblocked?
Severity: blocker → major
Assignee: jeremy.orem+bugs → server-ops
Looks like someone already took care of that.
OOC, was the developer spidering via tbpl or other tool? Or just digging through logs?
Nope, he wasn't using any special tools, just browsing tinderbox.mozilla.org directly.
Assignee: server-ops → justdave
Did one of the rounds of blocking block kuix.de? http://kuix.de/mozilla/tinderboxstat/ isn't evil pointless spidering, it's really useful (and potentially load reducing, since I often close tbpl and count on his notifier to tell me when to reopen it) spidering.
And did we block firebot (a Road Runner account somewhere in the Carolinas, last I knew)? I think he only fetches the static quickparse.txt files, so if blocking him did us any good, we've got really awful problems.
I think this has outlived its usefulness.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.