Closed Bug 662400 Opened 14 years ago Closed 14 years ago

Set up foopy05-11, migrate tegras from bm-foopy/foopy{01,02,03,04}

Categories

(Release Engineering :: General, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mozilla, Assigned: bear)

References

Details

(Whiteboard: [mobile][tegra])

This needs to happen over time to prevent burning, but production may see some fluctuation in performance when they move. I say we should move all staging tegras first, and then put those in production and migrate the old production ones later.
Whiteboard: [mobile][tegra]
When we migrate, we need to update the files on the tegras that specify a) which IP that Watcher pings to see if it's networked, and b) which IP/port that SUT pings when the tegra boots.
I can probably handle these in batches of 3-5 by creating helper scripts to work on a list of tegra id's. Doing the staging ones first and making those production may only work for a small set - current most of the staging tegras are *in* staging because they are tempermental. Making them production would be worse, IMO, than just pulling 5 or so at a time from production. Lately production has not been 100% busy except during peak times - so this means most of the work could be done off-peak to minimize impact.
Assigning Bear for now. Ideally this results in elegantly automated solutions to moving a tegra from one foopy to another (updating the files-on-tegra, disabling the old cp, enabling a new cp with buildslave, and updating anything else that needs updating (dashboard?)). We know we're already planning a second (third?) migration of all tegras to new-new-foopies in bug 635907, so some forward-thinking automation will hopefully see some time- and effort- savings. Something more slap-dash or by hand works as well, depending on time+effort+complexity. Definitely getting 1-10 and 91-93 back into the pool will be a win.
Assignee: nobody → bear
The foopy servers have the software environment for the tegras installed and I'm working on getting a script together to move each tegra's environment to the new foopy. This is needed so we can migrate at anytime a functional environment without losing the logs, data and other state information when moving a tegra from one foopy to another.
from irc: bear attempting to do this using new scripts, which scales better for next set of foopies. However, if scripts dont work by end of day, bear will fallback and revert to manual setup.
Depends on: 647051
I have some prototypes that i've been trying to run during the day (when tegras are the most busy) but i'm finding that it's very hard to tell from outside of buildbot when a buildslave is idle due to being idle and not idle from the tegra being busy with a job. Because this is a non-priority item (the script) holding up a priority item (moving tegras) I am going to do the moves manually when the tegras are idle tonight and tomorrow night.
(In reply to comment #6) > Because this is a non-priority item (the script) holding up a priority item > (moving tegras) I am going to do the moves manually when the tegras are idle > tonight and tomorrow night. bear: per releng meeting yesterday, these was to be completed yesterday. Any news?
The tegras were very busy during the day and because this is a multi-step process that has to be done all at once (shutdown buildslave and cp, rename old dir, copy SUTAgent.ini to tegra pointing to new foopy, enable new dir, start cp, monitor) I ended up getting up at 0-dark-thirty and doing 3/4 of them then. After a good staging run I am now currently moving them over to production.
all of the tegras are now running on the new foopies. tegra-017 and tegra-024 I was unable to push a new SUTAgent.ini - filed bugs 666029 and 617129 to track when they get reimaged. pending question about watcher.ini that needs answering - do we need to update it and, if so, where is it. going to wait till morning before closing the bug
I believe we have to update watcher.ini. Bob, Clint, Joel: do you remember where the watcher.ini is? If not I'll dig.
the file should be placed in /data/data/com.mozilla.watcher/files.
becuase of bug 665967 I have migrated all tegras back to the original foopies
Depends on: 665967
I spent most of this morning double-checking that foopy08 was setup in a similar manner as foopy02, making sure that the test environment for tegra-040 (on foopy08) was exactly the same as tegra-041 (on foopy02) and then made local changes to devicemanager.py to isolate the whole code flow all the way to the point where it calls to SUTAgent to launch fennec. Not seeing anything obvious I then ping'd the ateam and ctalbert started helping. We worked thru devicemanager and also the talos ffprocess_remote.py code to try and work out any differences but found nothing. jmaher then jumped in with me for an evening debug session and we even tested against a python-agent running on one of the talos fed 32 slaves to make sure the foopy side was behaving. after a couple hours we could not find any reason why they failed. still working on it with jmaher at the moment so another update later
it appears that we were missing (http://hg.mozilla.org/build/talos/rev/1128691728d8): user_pref("browser.firstrun.show.localepicker", false);
The new foopies have now been running the majority of the tegra jobs for the last 3 hours with no "new foopy" issues. I'll be moving the rest of the tegras over during the evening and weekend.
the remainder of the tegras are now moved - foopy01 - foopy04 are now idle foopy05 - foopy11 are running tegras and reporting to the dashboard
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
I've added the new foopy hosts to nagios, but have not removed the old foopy hosts.
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.