Closed Bug 991236 Opened 11 years ago Closed 10 years ago

Fix StartTalos.bat and StartBuildbot.bat Scripts and update repos

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: q, Assigned: q)

References

Details

Attachments

(1 file, 1 obsolete file)

start_talos_new.bat 11 years ago Q 805 bytes, text/plain	armenzg : review+	Details
startTalos.bat 11 years ago Q 1.25 KB, text/plain	armenzg : review+	Details

Assignee

Description

•

11 years ago

No description provided.

Assignee

Updated

•

11 years ago

Assignee: relops → q

Assignee

Updated

•

11 years ago

Blocks: 977341

Depends on: 987152

Armen [:armenzg]

Comment 1

•

11 years ago

What is the actual fix? Remove /t 0 ?

Assignee

Comment 2

•

11 years ago

Getting these batch scripts fixed will take care of a few outstanding problems: 1) Runslave exiting out and no-one knows why 2) Cleaning of temp dirs and possibly profiles ( full directories cause tests to time out and polluted profiles may skew results) 3) XP tester slaves failing to exit the batches correctly after a test run. I will attached a new cross platform (win7/8/xp) startTalos.bat script in this bug for review and then move on to startbuildbot.bat.

Assignee

Updated

•

11 years ago

Blocks: 918507

Assignee

Comment 3

•

11 years ago

Attached file start_talos_new.bat (obsolete) — Details

New Talos bat for clean up and logging with input from releng to replace platform specific and outdated bat script

Attachment #8405571 - Flags: review?(armenzg)

Justin Wood (:Callek)

Comment 4

•

11 years ago

Comment on attachment 8405571 [details] start_talos_new.bat my only off-the-cuff comment would be, I fear runslave.log getting too long. can we do *something* like: mv runslave.log runslave.log.old tail -n500 runslave.log.old > runslave.log at the start, to essentially trim the existing log?

Armen [:armenzg]

Comment 5

•

11 years ago

Comment on attachment 8405571 [details] start_talos_new.bat Q: this makes sense: mv runslave.log runslave.log.old tail -n500 runslave.log.old > runslave.log On another note, could we deploy this change few machines at a time? I fear we don't know how long the rmdir will take. We should also coordinate with buildduty so they know which machines to ignore if they are not talking jobs for a while. Thanks Q!

Attachment #8405571 - Flags: review?(armenzg) → review+

Assignee

Comment 6

•

11 years ago

how about mv runslave.log.old ruslave.log.old.1 mv runslave.log runslave.log.old then we keep two runs and we aren't dependent on the gnu port of tail ? We can assign to a few machines at a time. I can also run a background find to clean up files in those directories with atime older than X days and delete them. That should be fairly low over head and safe. Q

Armen [:armenzg]

Comment 7

•

11 years ago

We have tail on the machines under C:\mozilla-build\msys\bin IIUC Whatever you prefer. That second approach would only keep track of the last two runs. I'm worried with background removals as I don't know how it could affect running jobs or affect perf jobs (I assume nice -19 would be OK if we had it). If we do batches of machines (5-10 at a time) I don't think we would need to worry much about using background removals. It would be like taking machines down for a bit for maintenance. Does this work for you?

Assignee

Comment 8

•

11 years ago

Attached file startTalos.bat — Details

Added in the tail roll ( after much debate I don't mind being bound to msys tools). Also added a variable block and added comments.

Attachment #8405571 - Attachment is obsolete: true

Attachment #8408409 - Flags: review?(armenzg)

Armen [:armenzg]

Comment 9

•

11 years ago

Comment on attachment 8408409 [details] startTalos.bat It looks. Could we deploy this change to 30 machines a day? I know it sucks but it will ensure that we don't cause delays depending on how long this runs. We might even get a rough idea with the first batch on if it is that much of an impact.

Attachment #8408409 - Flags: review?(armenzg) → review+

Ed Morley [:emorley]

Comment 10

•

11 years ago

I'm presuming rolling this out will mean doing manual-foo on each box? If so, would it make sense to combine it with the rollout of bug 961075?

Armen [:armenzg]

Comment 11

•

11 years ago

No manual intervention. We can select a subset of machines through Windows' GPO.

Assignee

Comment 12

•

11 years ago

I will roll this out to the first 10 machines ( 001 - 010 ) in each pool xp, 7, and 8 start with the next reboot. Does that work for everyone ?

Armen [:armenzg]

Comment 13

•

11 years ago

WFM. BTW, I was meaning 30 from each test pool. Let's see how these 10 do and gear up for large batches on the following sets? Could you please comment in here with the time when this gets deployed to the machines? I would like to look into how long it takes them to come back from their last job. Roughly. Thanks Q!

Phil Ringnalda (:philor)

Comment 14

•

11 years ago

t-w864-ix-003 is loaned (though probably a no-longer-used loan), t-xp32-ix-008 is disabled, and you have t-w732-ix-003 and t-w732-ix-004, so that'll be fewer than 10 and perhaps a bit of a surprise for the loaner.

Assignee

Comment 15

•

11 years ago

How about we start with 10 - 20 ? Q

Phil Ringnalda (:philor)

Comment 16

•

11 years ago

Much better looking span, only missing the busted and disabled t-w864-ix-020.

Assignee

Comment 17

•

11 years ago

Great those machines should pick up the changes next reboot

Armen [:armenzg]

Comment 18

•

11 years ago

OK I will review them in the next couple of hours.

Assignee

Comment 19

•

11 years ago

To clarify, it was easier to patern match *-IX*-01* so machines 010 - 019 in each OS pool will get the update

Armen [:armenzg]

Comment 20

•

11 years ago

I won't be able to evaluate to determine how many we can do on every batch as they have not picked up a job since the change got deployed. I hope the machines will pick up a job sometime later in the day once the cleaning up finishes up. If anyone wants to figure out in my absence this is what I was going to do: * Load these 3 pages https://secure.pub.build.mozilla.org/builddata/reports/slave_health/slavetype.html?class=test&type=t-xp32-ix https://secure.pub.build.mozilla.org/builddata/reports/slave_health/slavetype.html?class=test&type=t-w732-ix https://secure.pub.build.mozilla.org/builddata/reports/slave_health/slavetype.html?class=test&type=t-w864-ix * Sort the slave by name * Open each slaves on the range indicated on comment 19 ** e.g. https://secure.pub.build.mozilla.org/builddata/reports/slave_health/slave.html?class=test&type=t-xp32-ix&name=t-xp32-ix-010 * Look at the end time of the last job that finished around 3pm PT * Figure out how much of a gap there is to the next set of jobs I hope the gap is not hours. Thanks Q.

Assignee

Comment 21

•

11 years ago

Based on IRC conversations I think we are ready to roll this out on all testers. Any objections?

Assignee

Comment 22

•

11 years ago

[09:59] <RyanVM> jlund|buildduty: we didn't see any test bustage this time around either [09:59] <RyanVM> so I'd be OK with a wider rollout [10:00] <RyanVM> Q: once the Windows slaves are good to with this cleanup work, I'd love the OSX slaves to get it next [10:00] <jlund|buildduty> RyanVM: thanks, verifying with you guys should have been my 1st step. :)

Assignee

Comment 23

•

11 years ago

Rolling out pool wide. Testers will get changes on next reboot

Amy Rich [:arr] [:arich]

Updated

•

11 years ago

Severity: critical → normal

Panos Astithas (he/him) [:past] (please ni?)

Comment 24

•

11 years ago

Has this been fixed now? I'm asking because I want to re-enable a test in bug 918507.

Assignee

Comment 25

•

11 years ago

This has indeed been fixed.

Amy Rich [:arr] [:arich]

Updated

•

10 years ago

Status: NEW → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.