1219341 - Disable Windows triggerbot retries and discuss other matters

Reporter

Description

•

9 years ago

philor: in general, would you mind filing bugs or pinging the people behind services? It is not the first time that I need to overhear some of the issues you're facing. We're happy to help if you let us know. chmanchester: could you please disable Windows auto-retry on try until the backlog reduces? The question in here is, should we disable triggerbot in certain scenarios? and what should be those scenarios? Should we disable triggerbot for a platform when the oldest try pending job for that platform is N hours old? When there is more than X jobs for that platform? I'm going to add mozci the ability to query the backlog of a platform for a given repository. The other questions are: * does triggerbot run against hidden jobs? ** I don't think so from inspecting philor's push * does triggerbot limit itself? ** afaik, triggerbot has a threshold at which point it stops retrying From Callek: > try's auto-retry runs against hidden jobs and against windows (windows > being severely overloaded, that all flavors of windows had a 24hour+ > backlog as of my start of the day today) It is not true the matter about hidden jobs. ... missing some logs here... <Callek_cloud> P5 3593 try (win7) --- P5 3769 try (win8) --- P5 2895 try (winXP) <Callek_cloud> "fun" <Callek_cloud> didn't recover overnight: https://secure.pub.build.mozilla.org/builddata/reports/pending/pending_test_all_day.png <Callek_cloud> philor: I expect you do, but in the slim chance you don't need results from the pending on https://treeherder.mozilla.org/#/jobs?repo=try&revision=5f45b9cad594 can you cancel for us :-) <Callek_cloud> p.s. thanks for that trial-run * philor loads up selfserve to cancel everyone's gtest <catlee> philor: wait, that's still not fixed? <philor> catlee: the patch that would have greened them bounced <philor> and nobody had fixed the auto-retry of hidden jobs the last time I looked <philor> and nobody cares about load or backlog enough to turn off auto-retry on platforms where we absolutely unquestionably can't afford it <catlee> hm <catlee> buildbot doesn't know they're hidden... <catlee> why are they retrying? <catlee> armenzg: can we have it do what philor wants? <catlee> i.e. don't retry hidden jobs, or jobs on very constrained platforms?

Chris Manchester (limited bugmail, email directly)

Comment 1

•

9 years ago

Triggerbot was disabled for windows in bug 1208104. Triggerbot queries Treeherder for job visibility and does not trigger hidden jobs, although this is not working for GTest, I think because that job has been known by multiple buildernames, and Treeherder is aware of the first buildername it was known by. This will not be a problem once GTests are turned on, which has been an uphill battle. Bug 992983 has up-to-date status for this effort -- there is a patch awaiting review in a dependent bug (I'll ping the reviewer today).

Armen [:armenzg]

Reporter

Comment 2

•

9 years ago

Thanks for the clarification! In such case, triggerbot seems to have been blamed for no good reason and the load was natural :/ philor: is our understanding correct? Sounds like an invalid bug at this point. From looking at the try push [1] it seems that only gtest for Linux 64 debug is the one that got re-triggered. Is there a bug filed for Treeherder? [1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=5f45b9cad594&exclusion_profile=false&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=running&filter-resultStatus=pending&filter-searchStr=gtest

Chris Manchester (limited bugmail, email directly)

Comment 3

•

9 years ago

Bug 1199825 is the bug I filed about the treeherder issue.

Chris Cooper [:coop] (he/him)

Comment 4

•

9 years ago

Armen: how can we examine how much load is being generated by the triggerbot in general? Are the jobs flagged as such in the db, or are the logs in papertrail (or similar)?

Armen [:armenzg]

Reporter

Comment 5

•

9 years ago

You can get to it by visting: https://dashboard.heroku.com/apps/trigger-bot/resources and then clicking on Papertrail. I can't check atm (you can go past sign-in) if it shows up on the corporate's papertrail account. FYI chmanchester is the maintainer of the it. We use different ldap account (trigger-bot@ vs mozci-bot@). We could write a report if it helps.

Chris Manchester (limited bugmail, email directly)

Comment 6

•

9 years ago

(In reply to Chris Cooper (away until Oct 19) [:coop] from comment #4) > Armen: how can we examine how much load is being generated by the triggerbot > in general? Are the jobs flagged as such in the db, or are the logs in > papertrail (or similar)? I have a basic script at https://github.com/chmanchester/trigger-bot/blob/master/buildapistats/dump_buildstats.py that will pull jobs from buildapi and report how many were initiated by trigger bot. The automatic triggering is capped at 3% of all jobs for a push. The script will include things that are requested by "--rebuild N", so running it for October I see trigger-bot responsible for between 2 and 4% of all jobs on try daily.

Blocks: 1163698

Armen [:armenzg]

Reporter

Comment 7

•

9 years ago

Nothing left to be done in here.

Status: NEW → RESOLVED

Closed: 9 years ago

Resolution: --- → INVALID

Joel Maher ( :jmaher ) (UTC -8)

Comment 8

•

9 years ago

we have windows disabled for trigger bot, this should be resolved fixed? If we decided to keep it on, then invalid?

Bugzilla

Disable Windows triggerbot retries and discuss other matters

Categories

(Testing :: General, defect)

Tracking

(Not tracked)

People

(Reporter: armenzg, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8