Closed
Bug 1164214
Opened 9 years ago
Closed 9 years ago
Set up a 5 machine puppet configured 2008 datacenter test pool in try
Categories
(Infrastructure & Operations :: RelOps: Puppet, task)
Infrastructure & Operations
RelOps: Puppet
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: markco, Assigned: markco)
References
Details
(Whiteboard: [windows])
Setting a 5 machine test pool for Puppet configured 2008 machines. As per irc conversation with coop. Going to be using b-2008-ix-017(5-9).
Assignee | ||
Updated•9 years ago
|
Assignee: relops → mcornmesser
Assignee | ||
Comment 1•9 years ago
|
||
0175 has been re-imaged and enabled. The other 4 machines needs to finish current builds and then reimaged.
Assignee | ||
Comment 2•9 years ago
|
||
All machines 0177 have been reimaged and enabled. 0177 is currently being reimaged.
Assignee | ||
Updated•9 years ago
|
Summary: Set up a 5 machine puppet configured 2008 test pool in try → Set up a 5 machine datacenter puppet configured 2008 test pool in try
Assignee | ||
Comment 3•9 years ago
|
||
0177 is now enabled.
Summary: Set up a 5 machine datacenter puppet configured 2008 test pool in try → Set up a 5 machine puppet configured 2008 datacenterdatacenter test pool in try
Assignee | ||
Comment 4•9 years ago
|
||
The majority of builds are green. The failing builds appear to be an issue with the build itself. There was many green successful builds. There were a few with failures or warnings, but treeherder showed other parts failing and warning on other Windows builders.
Assignee | ||
Updated•9 years ago
|
Summary: Set up a 5 machine puppet configured 2008 datacenterdatacenter test pool in try → Set up a 5 machine puppet configured 2008 datacenter test pool in try
Comment 5•9 years ago
|
||
I disabled your b-2008-ix-0176 because it has failed every job but one for the last several days, timing out after 4800 seconds without output when it starts the actual build. Please decide whether you want to fix it, or to give it back to the pool to have buildduty treat it as broken hardware and get diagnostics run on it, because we're getting pretty desperate for working Windows try build slaves.
Flags: needinfo?(mcornmesser)
Comment 6•9 years ago
|
||
These are currently failing their puppet runs. Mark was out today but will look at them when he gets back tomorrow.
Comment 7•9 years ago
|
||
Huh, all the rest of them are actually working - they can fail puppet and still happily take jobs? They also seem to be running Mercurial 2.9.1, substantially behind the 3.2.1 of non-puppet.
Assignee | ||
Comment 8•9 years ago
|
||
I looking into this now. If the machines are still taking jobs then that means it eventually ran Puppet without failures. The are set up to run Puppet forever or until successful run.
Assignee | ||
Comment 9•9 years ago
|
||
As far as 0176. The new network configurations have landed in Puppet. I am going to reimage it and test to see if I see any issues network wise. If i do then it maybe hardware.
Assignee | ||
Comment 10•9 years ago
|
||
Network wise the machine seems fine now. 1.5 gig file transfer within the datacenter was at 50+MB/s and same transfer out to S3 was at 40+MB/s. I am going to reenable the machine in the AM and keep an eye on it.
Updated•9 years ago
|
Whiteboard: [windows]
Assignee | ||
Comment 11•9 years ago
|
||
It seems that all but 0078 has been returned to the domain and have been re-enabled. I am going to reenable 0078 and keep an eye on it.
Assignee | ||
Comment 12•9 years ago
|
||
Disregard comment 11. Sendchange appears to have been addressed and hg 3.2.1 has been installed through Puppet. I am reimaging and reenabling this test pool and will keep an eye out for any failures.
Comment 13•9 years ago
|
||
Hi Mark, We ahve found a host that is failing to do sendchanges (even though the job happily shows as green). Please disable all hosts on try until we can find a fix. Otherwise, developers will get frustrated not knowing why their green Windows builds do not trigger test jobs.
Blocks: 1186586
Assignee | ||
Comment 14•9 years ago
|
||
Done. It was a single machine currently 0175. Could you give a me a link to a log that shows the failure, please?
Comment 15•9 years ago
|
||
It's in bug 1186586. Thanks Mark! Good luck!
Assignee | ||
Comment 16•9 years ago
|
||
The sendchange issue has been addressed. I am going to enable 0175, and if all goes well with it I will spin up the rest of the test pool.
Comment 17•9 years ago
|
||
Looks like 0175-0179 were enabled on try, but they have staging keys and fail to upload to stage.mozilla.org (the build is still green though, for some reason). I've disabled them again. When these have been swapped to production keys, please double check that try slaves only have trybld and b2gtry keys, and not anything for ffxbld, tbirdbird, or b2gbld.
Comment 18•9 years ago
|
||
That should read '... and not anything for ffxbld, tbirdbld, b2gbld.'
Assignee | ||
Comment 19•9 years ago
|
||
The slavealloc environment was set to dev/pp. Which overrode the slavetrust level set in Puppet. It is now set to prod. I will verify which keys are there before enabling the slaves.
Assignee | ||
Comment 20•9 years ago
|
||
On 0176, after the update slavealloc, there is b2gtry_dsa and trybld_dsa and none the keys mentioned in comment 18. Enabled the machine for one build: http://buildbot-master83.bb.releng.scl3.mozilla.com:8101/builders/WINNT%206.1%20x86-64%20try%20build/builds/6242 And I am not seeing any traceback or errors in regards to the upload or keys hear http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/dwillcoxon@mozilla.com-c4e957b01933/try-win64/try-win64-bm83-try1-build6242.txt.gz Above is a raw log from here: https://treeherder.mozilla.org/logviewer.html#?job_id=10090623&repo=try nthomas: Does it seem safe to enable these machines after next Puppet run?
Flags: needinfo?(nthomas)
Comment 21•9 years ago
|
||
0176 looks good to me, and that job uploaded fine (see the lines leading up to "11:39:30 INFO - Running post-upload command: post_upload.py ..."). In the case of errors, the overall build status doesn't change due to bug 1118778. Thanks for figuring it out.
Flags: needinfo?(nthomas)
Assignee | ||
Comment 22•9 years ago
|
||
No longer applicable.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•