Closed Bug 666757 Opened 14 years ago Closed 14 years ago

fix naming problems on seamicro.phx1 nodes

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: nmaul, Assigned: nmaul)

Details

Jake Maul [:jakem]

Assignee

Description

•

14 years ago

Nodes 10-19 and 21-29 are known to be wrong. That is, DNS and the server disagree... the hostname is one number higher than DNS. Apart from being confusing, this is probably also causing an off-by-one error with puppet. 1- remove the <node30 nodes from the engagement cluster (drain them on http https) 2- remove node1-node29 from rhn (or at least the affected ones) 3- fix the hostnames for node0-node29 (currently show as node1-node30) 4- re-register them with rhn 5- re-puppetize them just to make sure we have the name/certs right 6- un-drain them pro tip: puppetizing these takes forever. If you do them in a serial loop make sure and screen your session.

Jake Maul [:jakem]

Assignee

Updated

•

14 years ago

Assignee: server-ops → nmaul

Jake Maul [:jakem]

Assignee

Comment 1

•

14 years ago

1. Done 2. Done 3. Done. Rebooted nodes to put into effect (because some daemons don't like changing hostnames, and this seemed like the easiest way to get everything fixed). This caused chaos with the Seamicro though. I think there is a race condition where nodes rebooting simultaneously can screw it up... best to space out the reboots by 15+ seconds... that seemed to work fine. 4. In progress.

Status: NEW → ASSIGNED

Jake Maul [:jakem]

Assignee

Comment 2

•

14 years ago

4. Done. 5. In progress (screen w/ 2 windows on dp-nagios01... doing 2 at a time).

Jake Maul [:jakem]

Assignee

Comment 3

•

14 years ago

5. Done. 6. Not starting yet... I want to double-check at least the edge cases, to make sure they look like they're going to work properly and got the right puppet configs.

Shyam Mani [:fox2mike]

Comment 4

•

14 years ago

(In reply to comment #0) > pro tip: puppetizing these takes forever. If you do them in a serial loop > make sure and screen your session. I just puppetized node81 and 82 yesterday...and had no problems (took the "usual" amount of time). Could it be that these nodes have storage related issues? or something else that's making them slow?

Jake Maul [:jakem]

Assignee

Comment 5

•

14 years ago

It actually wasn't too bad... I think it just seems bad if you do a bunch in serial. Of course, they're just little Atom CPUs, so they should be a bit slower than a good blade would be. Anyway, found another problem with the engagement cluster. For some reason, engagement1-20 line up with node10-29.seamicro engagement21 is a blade engagement22-30 line up with node210-218.seamicro engagement31-41 line up with node229-239.seamicro There was a gap... node219-228 are assigned to the Engagement cluster in puppet, but did not have an 'engagementXX' CNAME. I wedged it in there, which shifted 31-41 down a bunch. I updated commanderconfig.py on ip-admin02.phx, and fixed root's known_hosts file to have all the proper keys for each host. 6. Done as well. Marking this as resolved... don't see any more strangeness with engagement naming.

Status: ASSIGNED → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

10 years ago

Product: mozilla.org → mozilla.org Graveyard

You need to log in before you can comment on or make changes to this bug.

Bugzilla

fix naming problems on seamicro.phx1 nodes

Categories

(mozilla.org Graveyard :: Server Operations, task)

Tracking

(Not tracked)

People

(Reporter: nmaul, Assigned: nmaul)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Updated