Closed
Bug 551532
Opened 15 years ago
Closed 15 years ago
Graph server crashed again
Categories
(mozilla.org Graveyard :: Server Operations, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: joduinn, Assigned: mrz)
References
Details
Graph server crashed out and needed reboot today. This happened also yesterday.
Marking as blocker, because this closes the tree each time.
The RAM was upgraded Friday in bug#548371. Latest theory from irc was possibly slow disks, but uncertain. Also, unclear what has changed to make this a problem now.
Updated•15 years ago
|
Assignee: server-ops → aravind
Reporter | ||
Comment 1•15 years ago
|
||
zandr reports in irc that the clock for this VM is unable to keep up.
host-4-40:~ amilewski$ hostname ; date; ssh root@dm-graphs01 "hostname ; date" ; hostname ; date ; sleep 100 ; hostname ; date; ssh root@dm-graphs01 "hostname ; date" ; hostname ; date
host-4-40.mv.mozilla.com
Thu Mar 11 10:53:26 PST 2010
dm-graphs01.mozilla.org
Thu Mar 11 10:36:21 PST 2010
host-4-40.mv.mozilla.com
Thu Mar 11 11:02:34 PST 2010
host-4-40.mv.mozilla.com
Thu Mar 11 11:04:14 PST 2010
dm-graphs01.mozilla.org
Thu Mar 11 10:36:30 PST 2010
host-4-40.mv.mozilla.com
Thu Mar 11 11:19:15 PST 2010
host-4-40:~ amilewski$
Assignee | ||
Comment 2•15 years ago
|
||
/data2 is some 150GB drive that was on the SATA array. I've since moved it to a FCAL shelf. Appears to be historical data.
[root@dm-graphs01 data2]# ls -la
total 36
drwxr-xr-x 5 root root 4096 Oct 13 18:41 .
drwxr-xr-x 25 root root 4096 Mar 11 16:34 ..
drwx------ 2 root root 16384 Oct 13 18:36 lost+found
drwxr-xr-x 5 mysql mysql 4096 Oct 13 21:13 mysql
drwxr-xr-x 2 mysql mysql 4096 Oct 13 18:56 mysql-innodb
Box is back up with 2 vCPUs and 4GB RAM.
[root@dm-graphs01 data2]# date
Thu Mar 11 16:49:28 PST 2010
Reporter | ||
Comment 3•15 years ago
|
||
machine running fine, but note that tree closure means no build load.
We're going to trigger a bunch of talos runs to generate load on graphserver, and see if it holds up. If that works, then we'll reopen the tree in approx 40-60 mins from now.
Comment 4•15 years ago
|
||
So far so good, tree handed back to developers.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
![]() |
||
Comment 5•15 years ago
|
||
Looks like the graph server is having those problems again. I have another "failed graph server post" on SeaMonkey trees, the Firefox tree also has some of those, #bmo has "<nagios> [96] dm-graphs01:http - graphs.mozilla.org is CRITICAL: CRITICAL - Socket timeout after 10 seconds" with no further message about it coming back, and when anything loads on http://graphs.mozilla.org/ at all with a very long lag, it can't actually get to a point where it has any data or graphs in it.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Reporter | ||
Comment 6•15 years ago
|
||
catlee and justdave are setting up physical hardware to replace the current VM. This *should* be just a drop-in replacement, with no need to reconfigure talos masters/slaves, but stay tuned.
Comment 7•15 years ago
|
||
Reassigning to mrz, since he was last working on this (or maybe justdave is)
Assignee: aravind → mrz
Comment 8•15 years ago
|
||
New box is up and running as of this afternoon, now running on physical hardware. The old box is still there and still hosting graphs-old and graphs-historical, but due to the external IP address getting re-mapped, it's now only accessible from behind the VPN. If there's a need to have this data publicly available still (does anyone still use it?) we can make further arrangements.
The entire config from top to bottom is now in puppet, and set up in a way that would make it incredibly simple to add additional webheads to host this if we find ourselves running into load issues again.
Status: REOPENED → RESOLVED
Closed: 15 years ago → 15 years ago
Resolution: --- → FIXED
Updated•10 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•