About once per day, sometimes around 4am (sometimes not), there's some unknown problem with landfill's internet connection. The symptoms are that bugbot and logbot (who both live on that server) disappear from IRC and then reconnect. Also, I have a script on that server that calls out to another server about every five minutes, and sometimes it sends me a text message saying that it wasn't able to get out, around the same time that the bots disappear. Basically, it seems like the connection "hiccups" about once a day. It could easily be a problem with the server OS itself, but this never happened before landfill was moved onto the VMWare server.
Are you seeing any similar issues with any of the VMs sitting behind landfill?
It's harder to say, because the VMs sitting behind landfill aren't as active with outbound connections. I have seen the tinderbox VM have trouble once in a while, but it doesn't have any constant outgoing connections, so it could be related to something else.
Passing to reed for debugging. cg-centos01 is the other public facing VM on there.
(In reply to comment #3) > Passing to reed for debugging. cg-centos01 is the other public facing VM on > there. I've set up an irssi session on cg-centos01. Let's see how it does.
I haven't seen any problems. Are you still seeing issues on landfill?
Logbot hasn't died lately, as far as I can see. I think it might be good to give it a bit longer, just to be sure.
[11:47:16AM] * logbot has quit (Quit: connection timed out) [11:47:17AM] * logbot (glob@moz-90A89D35.bugzilla.org) has joined #mozwebtools [12:05:06PM] <mkanat> reed^^^^^^^ [12:05:11PM] <mkanat> reed: There went logbot. Note that the quit message has the "Quit:" prefix. That means that the bot is actually doing "/quit connection timed out" instead of that message coming from the server. Also, my irssi session on cg-centos01 hasn't had problems since I connected it. So, it's either just a problem with logbot or an issue locally on landfill.
mkanat: Any update on this?
My current guess is that it's a load problem--that something (most likely mxr, PLEASE somebody move that off of landfill) is causing load levels to peak and that prevents the bots from responding in time.
(In reply to comment #9) > My current guess is that it's a load problem--that something (most likely mxr, > PLEASE somebody move that off of landfill) is causing load levels to peak and > that prevents the bots from responding in time. Uh, mxr.mozilla.org is hosted on an IT-supported vm, not landfill. You're probably thinking of mxr-test, which is run by timeless on landfill.