548371 - production graph server swamped

Reporter

Description

•

15 years ago

After some investigation by fox2mike the production graph server was determined to be 'sluggish' and using up half its swap. This is a central pieces of our build and performance testing infrastructure. IT should figure out how to beef up this vm.

alice nodelman [:alice] [:anode]

Reporter

Updated

•

15 years ago

Blocks: 548320

Shyam Mani [:fox2mike]

Comment 1

•

15 years ago

Two pronged approach : 1) Currently the VM has 2 processors and about 2 GB of RAM, I'd like to boost that to 4 + 4 and see how it performs. 2) If the above fails too, we might need to think of moving to hardware, but seeing how it's performed so far I don't think this is needed quite yet. Alice, this would need downtime, the VM has to be shutdown before these values can be bumped up. Can you please let us know when we can do this?

Shyam Mani [:fox2mike]

Updated

•

15 years ago

Assignee: server-ops → shyam

alice nodelman [:alice] [:anode]

Reporter

Comment 2

•

15 years ago

As long as notification goes to dev.planning and dev.tree-management and waterfalls are closed you can do this whenever fits into your schedule.

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Comment 3

•

15 years ago

when is the next regular downtime, and could this be included? (i cant set the "needs-downtime" flag, but this does need advance downtime notice as any builds attempting to post results while graphserver is offline will burn red.)

Shyam Mani [:fox2mike]

Comment 4

•

15 years ago

Tonight, but if that's too short notice, we could aim for the coming Tue or Thu.

Flags: needs-downtime+

alice nodelman [:alice] [:anode]

Reporter

Comment 5

•

15 years ago

Noticed that this didn't end up on tonight's downtime notification - what are we blocking on here? Anytime IT can get this on their downtime schedule releng is willing to do the necessary tree closures to make it happen.

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Comment 6

•

15 years ago

Just talked with mrz; this is too late to happen tonight. IT will schedule this for Thursday downtime instead. RelEng still needs to send out notice to developers about tree closure.

Whiteboard: 03/04/2010 @ 7pm

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Comment 7

•

15 years ago

(In reply to comment #6) > Just talked with mrz; this is too late to happen tonight. IT will schedule this > for Thursday downtime instead. RelEng still needs to send out notice to > developers about tree closure. Rescheduled to Friday 8-11am PST.

Depends on: 550066

Whiteboard: 03/04/2010 @ 7pm → 03/05/2010 @ 8am

Shyam Mani [:fox2mike]

Comment 8

•

15 years ago

I see a downtime notice for this and email that says do it tonight instead of friday, so which is it? Tonight or Friday?

Aki Sasaki (not active)

Comment 9

•

15 years ago

Yes, joduinn replied to the email (on moz.dev.tree-management at least) saying it's been postponed to tomorrow (Friday).

Shyam Mani [:fox2mike]

Comment 10

•

15 years ago

Thanks Aki, the email said don't do it tonight ;) My bad, I misread.

Shyam Mani [:fox2mike]

Comment 11

•

15 years ago

All done, dm-graphs02 has twice the processor and ram as it did before this.

Status: NEW → RESOLVED

Closed: 15 years ago

Resolution: --- → FIXED

Shyam Mani [:fox2mike]

Comment 12

•

15 years ago

Picked up a kernel upgrade and a bunch of other fixes as well.

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Comment 13

•

15 years ago

the VM has been swamped this morning, so aravind just rebooted it. Missing VMware tools (maybe after kernel upgrade?). Reopening to track.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Aravind Gottipati [:aravind]

Comment 14

•

15 years ago

I installed vmware tools on it, so if this helps and it hold up, this bug can be closed.

matthew zeier [:mrz]

Comment 15

•

15 years ago

Calling fixed.

Status: REOPENED → RESOLVED

Closed: 15 years ago → 15 years ago

Flags: needs-downtime+ → needs-downtime-

Resolution: --- → FIXED

Whiteboard: 03/05/2010 @ 8am

Chris AtLee [:catlee]

Comment 16

•

15 years ago

How was the load this morning? We had a bunch of failed posts early this morning (2:30-3am or so)

Chris AtLee [:catlee]

Comment 17

•

15 years ago

We're still having problems here. More failures around 7:30.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Shyam Mani [:fox2mike]

Comment 18

•

15 years ago

I can't even login to the box :( Phong any ideas here? We picked up a kernel upgrade, bumped up the RAM and CPU and Aravind updated VMWare tools, but it seems like the box just locks up after a while and is completely unresponsive.

Assignee: shyam → phong

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Comment 19

•

15 years ago

(In reply to comment #18) > I can't even login to the box :( > > Phong any ideas here? We picked up a kernel upgrade, bumped up the RAM and CPU > and Aravind updated VMWare tools, but it seems like the box just locks up after > a while and is completely unresponsive. By any chance, for the CPU upgrade, did you switch to multi-core CPUs? If so, can we try going back to single-core? I ask because we've hit problems with multi-core CPUs on win32 build VMs in the past and even if graphserver is not-win32 VM, it would be good to eliminate that variable from the problem.

Robert Kaiser

Comment 20

•

15 years ago

...and next morning, it seems to be gone again :(

Phong Tran [:phong]

Assignee

Comment 21

•

15 years ago

I thought this was scheduled for 9 AM.

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Comment 22

•

15 years ago

Just talked with mrz: lets leave this closed, and track the latest fallout in bug#551532

Status: REOPENED → RESOLVED

Closed: 15 years ago → 15 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

10 years ago

Product: mozilla.org → mozilla.org Graveyard

Bugzilla

production graph server swamped

Categories

(mozilla.org Graveyard :: Server Operations, task)

Tracking

(Not tracked)

People

(Reporter: anodelman, Assigned: phong)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Updated

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14

Comment 15

Comment 16

Comment 17

Comment 18

Comment 19

Comment 20

Comment 21

Comment 22

Updated