662400 - Set up foopy05-11, migrate tegras from bm-foopy/foopy{01,02,03,04}

Reporter

Description

•

14 years ago

This needs to happen over time to prevent burning, but production may see some fluctuation in performance when they move. I say we should move all staging tegras first, and then put those in production and migrate the old production ones later.

Aki Sasaki (not active)

Reporter

Updated

•

14 years ago

Whiteboard: [mobile][tegra]

Aki Sasaki (not active)

Reporter

Comment 1

•

14 years ago

When we migrate, we need to update the files on the tegras that specify a) which IP that Watcher pings to see if it's networked, and b) which IP/port that SUT pings when the tegra boots.

Mike Taylor [:bear]

Assignee

Comment 2

•

14 years ago

I can probably handle these in batches of 3-5 by creating helper scripts to work on a list of tegra id's. Doing the staging ones first and making those production may only work for a small set - current most of the staging tegras are *in* staging because they are tempermental. Making them production would be worse, IMO, than just pulling 5 or so at a time from production. Lately production has not been 100% busy except during peak times - so this means most of the work could be done off-peak to minimize impact.

Aki Sasaki (not active)

Reporter

Comment 3

•

14 years ago

Assigning Bear for now. Ideally this results in elegantly automated solutions to moving a tegra from one foopy to another (updating the files-on-tegra, disabling the old cp, enabling a new cp with buildslave, and updating anything else that needs updating (dashboard?)). We know we're already planning a second (third?) migration of all tegras to new-new-foopies in bug 635907, so some forward-thinking automation will hopefully see some time- and effort- savings. Something more slap-dash or by hand works as well, depending on time+effort+complexity. Definitely getting 1-10 and 91-93 back into the pool will be a win.

Assignee: nobody → bear

Mike Taylor [:bear]

Assignee

Comment 4

•

14 years ago

The foopy servers have the software environment for the tegras installed and I'm working on getting a script together to move each tegra's environment to the new foopy. This is needed so we can migrate at anytime a functional environment without losing the logs, data and other state information when moving a tegra from one foopy to another.

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Comment 5

•

14 years ago

from irc: bear attempting to do this using new scripts, which scales better for next set of foopies. However, if scripts dont work by end of day, bear will fallback and revert to manual setup.

Depends on: 647051

Mike Taylor [:bear]

Assignee

Comment 6

•

14 years ago

I have some prototypes that i've been trying to run during the day (when tegras are the most busy) but i'm finding that it's very hard to tell from outside of buildbot when a buildslave is idle due to being idle and not idle from the tegra being busy with a job. Because this is a non-priority item (the script) holding up a priority item (moving tegras) I am going to do the moves manually when the tegras are idle tonight and tomorrow night.

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Comment 7

•

14 years ago

(In reply to comment #6) > Because this is a non-priority item (the script) holding up a priority item > (moving tegras) I am going to do the moves manually when the tegras are idle > tonight and tomorrow night. bear: per releng meeting yesterday, these was to be completed yesterday. Any news?

Mike Taylor [:bear]

Assignee

Comment 8

•

14 years ago

The tegras were very busy during the day and because this is a multi-step process that has to be done all at once (shutdown buildslave and cp, rename old dir, copy SUTAgent.ini to tegra pointing to new foopy, enable new dir, start cp, monitor) I ended up getting up at 0-dark-thirty and doing 3/4 of them then. After a good staging run I am now currently moving them over to production.

Mike Taylor [:bear]

Assignee

Comment 9

•

14 years ago

all of the tegras are now running on the new foopies. tegra-017 and tegra-024 I was unable to push a new SUTAgent.ini - filed bugs 666029 and 617129 to track when they get reimaged. pending question about watcher.ini that needs answering - do we need to update it and, if so, where is it. going to wait till morning before closing the bug

Aki Sasaki (not active)

Reporter

Comment 10

•

14 years ago

I believe we have to update watcher.ini. Bob, Clint, Joel: do you remember where the watcher.ini is? If not I'll dig.

Bob Moss :bmoss

Comment 11

•

14 years ago

the file should be placed in /data/data/com.mozilla.watcher/files.

Mike Taylor [:bear]

Assignee

Comment 12

•

14 years ago

becuase of bug 665967 I have migrated all tegras back to the original foopies

Depends on: 665967

Mike Taylor [:bear]

Assignee

Comment 13

•

14 years ago

I spent most of this morning double-checking that foopy08 was setup in a similar manner as foopy02, making sure that the test environment for tegra-040 (on foopy08) was exactly the same as tegra-041 (on foopy02) and then made local changes to devicemanager.py to isolate the whole code flow all the way to the point where it calls to SUTAgent to launch fennec. Not seeing anything obvious I then ping'd the ateam and ctalbert started helping. We worked thru devicemanager and also the talos ffprocess_remote.py code to try and work out any differences but found nothing. jmaher then jumped in with me for an evening debug session and we even tested against a python-agent running on one of the talos fed 32 slaves to make sure the foopy side was behaving. after a couple hours we could not find any reason why they failed. still working on it with jmaher at the moment so another update later

Joel Maher ( :jmaher ) (UTC -8)

Comment 14

•

14 years ago

it appears that we were missing (http://hg.mozilla.org/build/talos/rev/1128691728d8): user_pref("browser.firstrun.show.localepicker", false);

Mike Taylor [:bear]

Assignee

Comment 15

•

14 years ago

The new foopies have now been running the majority of the tegra jobs for the last 3 hours with no "new foopy" issues. I'll be moving the rest of the tegras over during the evening and weekend.

Mike Taylor [:bear]

Assignee

Comment 16

•

14 years ago

the remainder of the tegras are now moved - foopy01 - foopy04 are now idle foopy05 - foopy11 are running tegras and reporting to the dashboard

Status: NEW → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

Amy Rich [:arr] [:arich]

Comment 17

•

14 years ago

I've added the new foopy hosts to nagios, but have not removed the old foopy hosts.

Nobody; OK to take it and work on it

Updated

•

12 years ago

Product: mozilla.org → Release Engineering

Bugzilla

Set up foopy05-11, migrate tegras from bm-foopy/foopy{01,02,03,04}

Categories

(Release Engineering :: General, defect)

Tracking

(Not tracked)

People

(Reporter: mozilla, Assigned: bear)

References

Details

(Whiteboard: [mobile][tegra])

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14

Comment 15

Comment 16

Comment 17

Updated