683718 - Prepare 10 rev4 minis for a medium scale test

John Ford [:jhford] CET/CEST Berlin Time

Assignee

Description

•

13 years ago

With the exception of a puppet specific issue, our puppet manifests are looking good in testing.  I would like to start a medium scale test with 10 Rev4 minis.

Please prepare 10 Rev4 minis with the hostnames:

talos-r4-snow-001
talos-r4-snow-002
talos-r4-snow-003
talos-r4-snow-004
talos-r4-snow-005
talos-r4-snow-006
talos-r4-snow-007
talos-r4-snow-008
talos-r4-snow-009
talos-r4-snow-010

These machines must have:
1) Mac OS X 10.6 installed
2) 10.6.8 v1.1 update applied, from:
   curl -LO http://support.apple.com/downloads/DL1399/en_US/MacOSXUpdCombo10.6.8.dmg
2) User created with the following details:
   Full Name: Client Builder
   User Name: cltbld
   Password to be communicated to Release Engineering
3) VNC and SSH sharing enabled.  This can be done by
   launching System Preferences, going to 'Sharing' and ticking
   the 'Screen Sharing' and 'Remote Login' settings
4) 'cltbld' set to automatically log into a console session.  
   This can be done by launching System Preferences, going to 
   'Accounts', clicking the padlock to unlock the preference pane,
   pressing 'Login Options' then selecting "Client Builder" from
   the "Automatic Login" list
5) Puppet v 0.24.8 installed. This can be done by running
   curl -LO http://downloads.puppetlabs.com/gems/facter-1.5.6.gem
   curl -LO http://projects.puppetlabs.com/attachments/download/584/puppet-0.24.8.gem
   sudo gem install facter-1.5.6.gem puppet-0.24.8.gem

John Ford [:jhford] CET/CEST Berlin Time

Assignee

Updated

•

13 years ago

Summary: Prepare 10 minis for a medium scale test → Prepare 10 rev4 minis for a medium scale test

John Ford [:jhford] CET/CEST Berlin Time

Assignee

Updated

•

13 years ago

Blocks: 683720

Chris Cooper [:coop] (he/him)

Comment 1

•

13 years ago

Per meeting with IT yesterday:

* Erica said these machines were waiting on cabling and should be online today. They'll need to be imaged following that, obviously.

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Comment 2

•

13 years ago

(In reply to Chris Cooper [:coop] from comment #1)
> Per meeting with IT yesterday:
> 
> * Erica said these machines were waiting on cabling and should be online
> today. They'll need to be imaged following that, obviously.

Are these machines online?

Amy Rich [:arr] [:arich]

Comment 3

•

13 years ago

As mentioned in the overarching mini bug, all of the hardware work on the minis had to be redone this week.  

At this point, we have 10 minis with the base os install that comes with the system and a cltbld user (same passwd as all other machines) with screen sharing and ssh enabled.  We still need to reboot these minis so that they obtain their correct IP addresses, since we didn't have time to do that before we had to leave the datacenter tonight.  These minis do not yet have dongles on them, though zandr says we do have 20 of them on hand (just not at scl1).

jhford, do you still want the updates applied, considering your comments about rev-testing2 and not wanting the updates applied there?

Assignee: server-ops-releng → arich

Dustin J. Mitchell [:dustin] (he/him)

Comment 4

•

13 years ago

DNS also isn't fixed for these minis, although the IPs are in DHCP.  I can get to that tomorrow (Friday).

Amy Rich [:arr] [:arich]

Comment 5

•

13 years ago

These minis are all now responding to ssh via IP.

John Ford [:jhford] CET/CEST Berlin Time

Assignee

Comment 6

•

13 years ago

(In reply to Amy Rich [:arich] from comment #3)
> As mentioned in the overarching mini bug, all of the hardware work on the
> minis had to be redone this week.  

:S

> At this point, we have 10 minis with the base os install that comes with the
> system and a cltbld user (same passwd as all other machines) with screen
> sharing and ssh enabled.  We still need to reboot these minis so that they
> obtain their correct IP addresses, since we didn't have time to do that
> before we had to leave the datacenter tonight.  These minis do not yet have
> dongles on them, though zandr says we do have 20 of them on hand (just not
> at scl1).

comment 5 in this bug suggests that these were rebooted, is that correct?

> jhford, do you still want the updates applied, considering your comments
> about rev-testing2 and not wanting the updates applied there?

Comment 0 is still correct for these minis, please install the 10.6.8 v1.1 update.  I asked for slightly different requirements in the rev4-testing2 bug because I wanted to save time in that bug.

Dustin J. Mitchell [:dustin] (he/him)

Comment 7

•

13 years ago

They were rebooted, and are now in DNS.  I'll take care of the updates.

Assignee: arich → dustin

Dustin J. Mitchell [:dustin] (he/him)

Comment 8

•

13 years ago

For future reference, I made a copy of the updater in fs2:/IT/Apple.

I downloaded the updater and ran it (by hand .. softwareupdate -i didn't seem to want to do it).

I ran the requested gem install.

talos-r4-snow-010 isn't responding to ssh or vnc, although it responds to ping.

talos-r4-snow-006 failed while installing puppet:
---
Installing RDoc documentation for facter-1.5.6...
/System/Library/Frameworks/Ruby.framework/Versions/1.8/usr/lib/ruby/1.8/rdoc/template.rb:137: [BUG] Segmentation fault
ruby 1.8.7 (2009-06-12 patchlevel 174) [universal-darwin10.0]

Abort trap
---
re-running the install worked.  Hard to say whether that's hardware or software.

So everything but talos-r4-snow-010 is up and running.  John, I assume that "9" is close enough to "10" for a medium scale test?  I'll file a separate bug for -010, but deal with it at a more leisurely pace, if that's OK.

Assignee: dustin → jhford

Dustin J. Mitchell [:dustin] (he/him)

Comment 9

•

13 years ago

(repair is in bug 687190)

John Ford [:jhford] CET/CEST Berlin Time

Assignee

Comment 10

•

13 years ago

Yes, missing that machine while it gets fixed up is fine

Chris Cooper [:coop] (he/him)

Comment 11

•

13 years ago

(In reply to John Ford [:jhford] from comment #10)
> Yes, missing that machine while it gets fixed up is fine

Resolving based on comment #10.

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

John Ford [:jhford] CET/CEST Berlin Time

Assignee

Comment 12

•

13 years ago

Actually, these machines are missing dongles. Please close this bug when the dongles are installed on the 9 functional machines

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

John Ford [:jhford] CET/CEST Berlin Time

Assignee

Updated

•

13 years ago

Assignee: jhford → server-ops-releng

Amy Rich [:arr] [:arich]

Updated

•

13 years ago

Assignee: server-ops-releng → mlarrain

Severity: normal → critical

Amy Rich [:arr] [:arich]

Comment 13

•

13 years ago

I just spoke to zandr, and he said that emux will be handling this tonight.  Reassigning.

Assignee: mlarrain → emuxlow

Amy Rich [:arr] [:arich]

Comment 14

•

13 years ago

Hey, emux, did these get installed last night?  jhford needs them by 9:00 today.  Thanks!

Amy Rich [:arr] [:arich]

Comment 15

•

13 years ago

The dongles are attached and the machines are ready to go.

Assignee: emuxlow → arich

Status: REOPENED → RESOLVED

Closed: 13 years ago → 13 years ago

Resolution: --- → FIXED

John Ford [:jhford] CET/CEST Berlin Time

Assignee

Comment 16

•

13 years ago

Reopening bug. These machines, as installed, have iLife installed. Because we don't know the possible impact of having iLife installed on the machines, we need to have these machine set up again without iLife.

Because there are addenda in comments in this and other bugs, these are the updated requirements.

These machines must have:
1) The hard drive with pre-installed Mac OS X erased
2) Mac OS X 10.6 installed from OS recovery DVD included with hardware. Note the Applications DVD
included with the hardware should not be installed.
3) 10.6.8 v1.1 update applied, from:
curl -LO http://support.apple.com/downloads/DL1399/en_US/MacOSXUpdCombo10.6.8.dmg
4) User created with the following details:
Full Name: Client Builder
User Name: cltbld
Password to be communicated by Release Engineering
5) VNC and SSH sharing enabled with VNC password set to 'cltbld' password.
This can be done by launching System Preferences, going to 'Sharing' and ticking
the 'Screen Sharing' and 'Remote Login' settings. To set the VNC password, select
the 'Screen Sharing' item from the checkbox list and press "Computer Settings". On the
the sheet that drops down, tick the 'VNC Viewers may control screen with password:"
checkbox, then enter the communicated 'cltbld' user password.
6) 'cltbld' set to automatically log into a console session.
This can be done by launching System Preferences, going to
'Accounts', clicking the padlock to unlock the preference pane,
pressing 'Login Options' then selecting "Client Builder" from
the "Automatic Login" list
7) Puppet v 0.24.8 installed. This can be done by running
curl -LO http://downloads.puppetlabs.com/gems/facter-1.5.6.gem
curl -LO http://projects.puppetlabs.com/attachments/download/584/puppet-0.24.8.gem
sudo gem install facter-1.5.6.gem puppet-0.24.8.gem
8) Hardware dongle installed that allows the display mode to be set to 1600x1200x32

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Amy Rich [:arr] [:arich]

Comment 17

•

13 years ago

We're resource constrained this week and are relying on infra for hands on work, but we can get to this next week when we have a CA person back in the office.  I spoke with mrz about getting resources from infra, but they are also strapped this week since it's the end of Q3, and we've already borrowed someone for the thunderbird migration emergency. Since we don't want to stand in the way if this is an urgent blocker, he offered to get jhford access to scl1 (if he doesn't currently have it) so he is able to come perform the work himself.

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Comment 18

•

13 years ago

(In reply to Amy Rich [:arich] from comment #17)
> We're resource constrained this week and are relying on infra for hands on
> work, but we can get to this next week when we have a CA person back in the
> office.  I spoke with mrz about getting resources from infra, but they are
> also strapped this week since it's the end of Q3, and we've already borrowed
> someone for the thunderbird migration emergency. Since we don't want to
> stand in the way if this is an urgent blocker, he offered to get jhford
> access to scl1 (if he doesn't currently have it) so he is able to come
> perform the work himself.

Per discussions with mrz, jhford and myself:
* mrz is setting up colo access for aki, jhford, lsblakk and myself
* jhford will be driving to scl1 first thing in the morning to image these 10minis.

More info as we have it.

John Ford [:jhford] CET/CEST Berlin Time

Assignee

Comment 19

•

13 years ago

> Per discussions with mrz, jhford and myself:
> * mrz is setting up colo access for aki, jhford, lsblakk and myself

I had no trouble getting into the DC.  I am not sure if that means I have colo access, or if that means that Erica spoke to the security guard.

> * jhford will be driving to scl1 first thing in the morning to image these
> 10minis.

This is done, machines are in staging for medium scale test.

Marking this bug as fixed.  If there are further issues, I will file a new bug.

Assignee: arich → jhford

Status: REOPENED → RESOLVED

Closed: 13 years ago → 13 years ago

Resolution: --- → FIXED

Amy Rich [:arr] [:arich]

Comment 20

•

13 years ago

Did you reimage talos-r4-snow-010 as well?

Also, I can only ping 2, 4, and 6.  The rest of them appear to be unreachable.

John Ford [:jhford] CET/CEST Berlin Time

Assignee

Comment 21

•

13 years ago

(In reply to Amy Rich [:arich] [:arr] from comment #20)
> Did you reimage talos-r4-snow-010 as well?

Yes

> Also, I can only ping 2, 4, and 6.  The rest of them appear to be
> unreachable.

I just noticed that myself.  They were reachable yesterday and all of them were talk to their master.  Each of the 10 slaves have done at least one test job.

I am going to be filing a bug for nagios monitoring of these 10 slaves because at this point, it is a problem is they go down.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

John Ford [:jhford] CET/CEST Berlin Time

Assignee

Comment 22

•

13 years ago

silly bugzilla.  i cleared cache when i refreshed the page, but the old form values stuck around.

Status: REOPENED → RESOLVED

Closed: 13 years ago → 13 years ago

Resolution: --- → FIXED

Dustin J. Mitchell [:dustin] (he/him)

Comment 23

•

13 years ago

The odd-numbered hosts are not available because, I think, the switch they're in doesn't have its uplink configured correctly.  Erica should be onsite to fix that quickly later today.

Nobody; OK to take it and work on it

Updated

•

11 years ago

Component: Server Operations: RelEng → RelOps

Product: mozilla.org → Infrastructure & Operations