Closed
Bug 548371
Opened 15 years ago
Closed 15 years ago
production graph server swamped
Categories
(mozilla.org Graveyard :: Server Operations, task)
mozilla.org Graveyard
Server Operations
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: anodelman, Assigned: phong)
References
Details
After some investigation by fox2mike the production graph server was determined to be 'sluggish' and using up half its swap. This is a central pieces of our build and performance testing infrastructure.
IT should figure out how to beef up this vm.
Comment 1•15 years ago
|
||
Two pronged approach :
1) Currently the VM has 2 processors and about 2 GB of RAM, I'd like to boost that to 4 + 4 and see how it performs.
2) If the above fails too, we might need to think of moving to hardware, but seeing how it's performed so far I don't think this is needed quite yet.
Alice, this would need downtime, the VM has to be shutdown before these values can be bumped up. Can you please let us know when we can do this?
Updated•15 years ago
|
Assignee: server-ops → shyam
Reporter | ||
Comment 2•15 years ago
|
||
As long as notification goes to dev.planning and dev.tree-management and waterfalls are closed you can do this whenever fits into your schedule.
Comment 3•15 years ago
|
||
when is the next regular downtime, and could this be included?
(i cant set the "needs-downtime" flag, but this does need advance downtime notice as any builds attempting to post results while graphserver is offline will burn red.)
Comment 4•15 years ago
|
||
Tonight, but if that's too short notice, we could aim for the coming Tue or Thu.
Flags: needs-downtime+
Reporter | ||
Comment 5•15 years ago
|
||
Noticed that this didn't end up on tonight's downtime notification - what are we blocking on here?
Anytime IT can get this on their downtime schedule releng is willing to do the necessary tree closures to make it happen.
Comment 6•15 years ago
|
||
Just talked with mrz; this is too late to happen tonight. IT will schedule this for Thursday downtime instead. RelEng still needs to send out notice to developers about tree closure.
Whiteboard: 03/04/2010 @ 7pm
Comment 7•15 years ago
|
||
(In reply to comment #6)
> Just talked with mrz; this is too late to happen tonight. IT will schedule this
> for Thursday downtime instead. RelEng still needs to send out notice to
> developers about tree closure.
Rescheduled to Friday 8-11am PST.
Depends on: 550066
Whiteboard: 03/04/2010 @ 7pm → 03/05/2010 @ 8am
Comment 8•15 years ago
|
||
I see a downtime notice for this and email that says do it tonight instead of friday, so which is it? Tonight or Friday?
Comment 9•15 years ago
|
||
Yes, joduinn replied to the email (on moz.dev.tree-management at least) saying it's been postponed to tomorrow (Friday).
Comment 10•15 years ago
|
||
Thanks Aki, the email said don't do it tonight ;) My bad, I misread.
Comment 11•15 years ago
|
||
All done, dm-graphs02 has twice the processor and ram as it did before this.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Comment 12•15 years ago
|
||
Picked up a kernel upgrade and a bunch of other fixes as well.
Comment 13•15 years ago
|
||
the VM has been swamped this morning, so aravind just rebooted it. Missing VMware tools (maybe after kernel upgrade?).
Reopening to track.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 14•15 years ago
|
||
I installed vmware tools on it, so if this helps and it hold up, this bug can be closed.
Comment 15•15 years ago
|
||
Calling fixed.
Status: REOPENED → RESOLVED
Closed: 15 years ago → 15 years ago
Flags: needs-downtime+ → needs-downtime-
Resolution: --- → FIXED
Whiteboard: 03/05/2010 @ 8am
Comment 16•15 years ago
|
||
How was the load this morning? We had a bunch of failed posts early this morning (2:30-3am or so)
Comment 17•15 years ago
|
||
We're still having problems here. More failures around 7:30.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 18•15 years ago
|
||
I can't even login to the box :(
Phong any ideas here? We picked up a kernel upgrade, bumped up the RAM and CPU and Aravind updated VMWare tools, but it seems like the box just locks up after a while and is completely unresponsive.
Assignee: shyam → phong
Comment 19•15 years ago
|
||
(In reply to comment #18)
> I can't even login to the box :(
>
> Phong any ideas here? We picked up a kernel upgrade, bumped up the RAM and CPU
> and Aravind updated VMWare tools, but it seems like the box just locks up after
> a while and is completely unresponsive.
By any chance, for the CPU upgrade, did you switch to multi-core CPUs? If so, can we try going back to single-core? I ask because we've hit problems with multi-core CPUs on win32 build VMs in the past and even if graphserver is not-win32 VM, it would be good to eliminate that variable from the problem.
![]() |
||
Comment 20•15 years ago
|
||
...and next morning, it seems to be gone again :(
Assignee | ||
Comment 21•15 years ago
|
||
I thought this was scheduled for 9 AM.
Comment 22•15 years ago
|
||
Just talked with mrz: lets leave this closed, and track the latest fallout in bug#551532
Status: REOPENED → RESOLVED
Closed: 15 years ago → 15 years ago
Resolution: --- → FIXED
Updated•10 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•