772593 - Setup 7 Hardware machines as Linux Foopies in mtv1

Reporter

Description

•

12 years ago

So, in the coming weeks, we're gearing up to need Linux Foopies brought up.

I am working on the puppetizing process on a VM right now, but we want Linux foopies to run on physical hardware, I suggest the HP DL120G7's we have.

These will be setup with the PuppetAgain process (and a bit of hands on to get the tegras attached once fully up). I am naming us to try setting up 5 at the outset:

* 1 for a linux-foopy staging
* 3-4 for our new batch of tegra hosting
* 1 for migrating existing tegras off mac foopies **or** panda/beagle board foopy setup testing, whichever comes first/makes the most sense.

I'm not yet asking for anything on the buildbot master side, while we await data on how much additional buildbot master load we'll have with these new systems. My theory is we'll need at least 1 more master.

Hopefully this work can happen in parallel, and we can morph this into a tracker if need be.

Amy Rich [:arr] [:arich]

Assignee

Comment 1

•

12 years ago

What hardware specifications are required for:

CPU speed (and multi-core vs single core, e.g. is this multi-threaded?)
amount of memory
amount of disk space (and what sort of io processing load these machines will incur).

Based on the answers to these questions, a vm may fit the bill, or we'll choose the appropriate hardware.  We don't have spare hardware, so we'll need to spec and purchase something new if that's the case.

Are we to assume centos 6.2 for the OS based on your puppetagain statement?

Justin Wood (:Callek)

Reporter

Comment 2

•

12 years ago

(In reply to Amy Rich [:arich] [:arr] from comment #1)
> Are we to assume centos 6.2 for the OS based on your puppetagain statement?

Yes.

> What hardware specifications are required for:
> 
> CPU speed (and multi-core vs single core, e.g. is this multi-threaded?)

We'll want faster the better, and multi-core preferred, since we'll be running 2 clientproxy processes per tegra attached, as well as any related processes for the tests (stuff for talos/trobocop, etc.)

> amount of memory
> amount of disk space (and what sort of io processing load these machines
> will incur).

Disk space is not our largest facter here, but we do a lot of downloading/unpacking of apk's/tests packages.

> Based on the answers to these questions, a vm may fit the bill, or we'll
> choose the appropriate hardware.  We don't have spare hardware, so we'll
> need to spec and purchase something new if that's the case.
> 

The more CPU/Memory/Network throughput we have, the more tegras we can run *reliably* on a single machine. I would say the CPU/Mem basis from our newest mac foopies is a good example of our min requirement here.

I'll let bear/armen/someone-else chime in here on their thoughts though.

Mike Taylor [:bear]

Comment 3

•

12 years ago

(In reply to Justin Wood (:Callek) from comment #2)
> (In reply to Amy Rich [:arich] [:arr] from comment #1)
> > Are we to assume centos 6.2 for the OS based on your puppetagain statement?
> 
> Yes.
> 
> > What hardware specifications are required for:
> > 
> > CPU speed (and multi-core vs single core, e.g. is this multi-threaded?)
> 
> We'll want faster the better, and multi-core preferred, since we'll be
> running 2 clientproxy processes per tegra attached, as well as any related
> processes for the tests (stuff for talos/trobocop, etc.)

Each tegra requires 2 cp processes like Callek mentioned, buildbot process and anywhere from 0 to 3 additional processes depending on the test being run.

> 
> > amount of memory

Memory is not the limiting factor, so no less than what the mini's currently have is a good starting benchmark

> > amount of disk space (and what sort of io processing load these machines
> > will incur).
> 
> Disk space is not our largest facter here, but we do a lot of
> downloading/unpacking of apk's/tests packages.

disk space, while not the largest factor is a big one because of the supporting files and increased number of projects/platforms that will be running - remember that each tegra has it's own build environment.

The largest issue is disk i/o - each tegra does *ton* of disk i/o and the biggest reason we have had to reduce the tegra per foopy ratio is because of disk i/o.

If we are using VM's for these I would worry about the accumlated i/o on the host server.


> 
> > Based on the answers to these questions, a vm may fit the bill, or we'll
> > choose the appropriate hardware.  We don't have spare hardware, so we'll
> > need to spec and purchase something new if that's the case.
> > 
> 
> The more CPU/Memory/Network throughput we have, the more tegras we can run
> *reliably* on a single machine. I would say the CPU/Mem basis from our
> newest mac foopies is a good example of our min requirement here.
> 
> I'll let bear/armen/someone-else chime in here on their thoughts though.

Nothing that I'm am aware of prevents using VMs for these - we should spin one up now so that the final puppet checks can be done on this VM to confirm that it won't be an issue.  We can also burn-in/stage the new tegras on this VM to get some ganglia metrics on what the i/o rate is in reality.

Amy Rich [:arr] [:arich]

Assignee

Comment 4

•

12 years ago

We have a test vm up for this right now.  Can you install/configure ganglia on it and move some tegras over so we can get an idea of actual load?  We can tune the vm's CPU and RAM allocation based on that.

Amy Rich [:arr] [:arich]

Assignee

Updated

•

12 years ago

Assignee: server-ops-releng → arich

Justin Wood (:Callek)

Reporter

Comment 5

•

12 years ago

(In reply to Amy Rich [:arich] [:arr] from comment #4)
> We have a test vm up for this right now.  Can you install/configure ganglia
> on it and move some tegras over so we can get an idea of actual load?  We
> can tune the vm's CPU and RAM allocation based on that.

Ok, I stuck 13 tegras on it (which is the number we have allocated to the beefier mac foopies) I'd like to be able to allocate a min of 16 tegras to each of these... 

I ran start_cp.sh for all these tegras, and many have buildbot running already. They are not doing jobs properly/with-all-required-load yet though.

I am certainly seeing some slow down on here with all 13 of these running so far. And we'll see more load once we get the buildbot changes done, so that these actually start passing jobs.

http://ganglia3.build.mtv1.mozilla.com/ganglia/?r=hour&cs=&ce=&m=load_one&s=by+name&c=RelEngMTV1&h=linux-foopy-test.build.mtv1.mozilla.com&host_regex=&max_graphs=0&tab=m&vn=&sh=1&z=small&hc=4

Justin Wood (:Callek)

Reporter

Comment 6

•

12 years ago

FYI as well:

[cltbld@linux-foopy-test builds]$ python sut_tools/tegra_powercycle.py tegra-233
snmpset: Timeout
[cltbld@linux-foopy-test builds]$ python sut_tools/tegra_powercycle.py tegra-233
snmpset: Timeout
[cltbld@linux-foopy-test builds]$ python sut_tools/tegra_powercycle.py tegra-233
snmpset: Timeout

While doing it from foopy06 worked fine, is this a PDU-needs-to-know-this-host? or is it a Linux-Foopy-Too-Loaded to powercycle?

Amy Rich [:arr] [:arich]

Assignee

Comment 7

•

12 years ago

I installed iotop and htop to get some realtime stats when we see it under load (it's not right now).

I'll leave bear to decipher what comment 6 means.

Mike Taylor [:bear]

Comment 8

•

12 years ago

(In reply to Amy Rich [:arich] [:arr] from comment #7)
> I installed iotop and htop to get some realtime stats when we see it under
> load (it's not right now).
> 
> I'll leave bear to decipher what comment 6 means.

it just means that the vm that linux foopy is currently on does not have a flow for snmp traffic to the pdus

Mike Taylor [:bear]

Updated

•

12 years ago

Depends on: 774318

Amy Rich [:arr] [:arich]

Assignee

Comment 9

•

12 years ago

I'm not sure that the DL120s are a long term solution, but if we need to we might (for space reasons) be able to swap the iX machines that are moving to scl1 to become w64 builders with some of the HPs that are currently in scl1 acting as b2g machines.  Then we could use the dl120s as foopies.

Justin Wood (:Callek)

Reporter

Updated

•

12 years ago

Depends on: 776977

Amy Rich [:arr] [:arich]

Assignee

Updated

•

12 years ago

Depends on: 777768

Summary: Setup 5 Hardware machines as Linux Foopies in mtv1 → Setup 7 Hardware machines as Linux Foopies in mtv1

Amy Rich [:arr] [:arich]

Assignee

Comment 10

•

12 years ago

The following machines were kickstarted in mtv1 this morning:

foopy26
foopy27
foopy28
foopy29
foopy30
foopy31
foopy32

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

11 years ago

Component: Server Operations: RelEng → RelOps

Product: mozilla.org → Infrastructure & Operations

Bugzilla

Quick Search

Setup 7 Hardware machines as Linux Foopies in mtv1

Categories

(Infrastructure & Operations :: RelOps: General, task)

Tracking

(Not tracked)

People

(Reporter: Callek, Assigned: arich)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Updated

Comment 5

Comment 6

Comment 7

Comment 8

Updated

Comment 9

Updated

Updated

Comment 10

Updated