Closed
Bug 1219341
Opened 9 years ago
Closed 9 years ago
Disable Windows triggerbot retries and discuss other matters
Categories
(Testing :: General, defect)
Testing
General
Tracking
(Not tracked)
RESOLVED
INVALID
People
(Reporter: armenzg, Unassigned)
References
Details
philor: in general, would you mind filing bugs or pinging the people behind services? It is not the first time that I need to overhear some of the issues you're facing. We're happy to help if you let us know.
chmanchester: could you please disable Windows auto-retry on try until the backlog reduces?
The question in here is, should we disable triggerbot in certain scenarios? and what should be those scenarios?
Should we disable triggerbot for a platform when the oldest try pending job for that platform is N hours old?
When there is more than X jobs for that platform?
I'm going to add mozci the ability to query the backlog of a platform for a given repository.
The other questions are:
* does triggerbot run against hidden jobs?
** I don't think so from inspecting philor's push
* does triggerbot limit itself?
** afaik, triggerbot has a threshold at which point it stops retrying
From Callek:
> try's auto-retry runs against hidden jobs and against windows (windows
> being severely overloaded, that all flavors of windows had a 24hour+
> backlog as of my start of the day today)
It is not true the matter about hidden jobs.
... missing some logs here...
<Callek_cloud> P5 3593 try (win7) --- P5 3769 try (win8) --- P5 2895 try (winXP)
<Callek_cloud> "fun"
<Callek_cloud> didn't recover overnight: https://secure.pub.build.mozilla.org/builddata/reports/pending/pending_test_all_day.png
<Callek_cloud> philor: I expect you do, but in the slim chance you don't need results from the pending on https://treeherder.mozilla.org/#/jobs?repo=try&revision=5f45b9cad594 can you cancel for us :-)
<Callek_cloud> p.s. thanks for that trial-run
* philor loads up selfserve to cancel everyone's gtest
<catlee> philor: wait, that's still not fixed?
<philor> catlee: the patch that would have greened them bounced
<philor> and nobody had fixed the auto-retry of hidden jobs the last time I looked
<philor> and nobody cares about load or backlog enough to turn off auto-retry on platforms where we absolutely unquestionably can't afford it
<catlee> hm
<catlee> buildbot doesn't know they're hidden...
<catlee> why are they retrying?
<catlee> armenzg: can we have it do what philor wants?
<catlee> i.e. don't retry hidden jobs, or jobs on very constrained platforms?
Comment 1•9 years ago
|
||
Triggerbot was disabled for windows in bug 1208104. Triggerbot queries Treeherder for job visibility and does not trigger hidden jobs, although this is not working for GTest, I think because that job has been known by multiple buildernames, and Treeherder is aware of the first buildername it was known by. This will not be a problem once GTests are turned on, which has been an uphill battle. Bug 992983 has up-to-date status for this effort -- there is a patch awaiting review in a dependent bug (I'll ping the reviewer today).
Reporter | ||
Comment 2•9 years ago
|
||
Thanks for the clarification!
In such case, triggerbot seems to have been blamed for no good reason and the load was natural :/
philor: is our understanding correct? Sounds like an invalid bug at this point.
From looking at the try push [1] it seems that only gtest for Linux 64 debug is the one that got re-triggered.
Is there a bug filed for Treeherder?
[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=5f45b9cad594&exclusion_profile=false&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=running&filter-resultStatus=pending&filter-searchStr=gtest
Comment 3•9 years ago
|
||
Bug 1199825 is the bug I filed about the treeherder issue.
Comment 4•9 years ago
|
||
Armen: how can we examine how much load is being generated by the triggerbot in general? Are the jobs flagged as such in the db, or are the logs in papertrail (or similar)?
Reporter | ||
Comment 5•9 years ago
|
||
You can get to it by visting:
https://dashboard.heroku.com/apps/trigger-bot/resources
and then clicking on Papertrail.
I can't check atm (you can go past sign-in) if it shows up on the corporate's papertrail account.
FYI chmanchester is the maintainer of the it.
We use different ldap account (trigger-bot@ vs mozci-bot@).
We could write a report if it helps.
Comment 6•9 years ago
|
||
(In reply to Chris Cooper (away until Oct 19) [:coop] from comment #4)
> Armen: how can we examine how much load is being generated by the triggerbot
> in general? Are the jobs flagged as such in the db, or are the logs in
> papertrail (or similar)?
I have a basic script at https://github.com/chmanchester/trigger-bot/blob/master/buildapistats/dump_buildstats.py that will pull jobs from buildapi and report how many were initiated by trigger bot.
The automatic triggering is capped at 3% of all jobs for a push. The script will include things that are requested by "--rebuild N", so running it for October I see trigger-bot responsible for between 2 and 4% of all jobs on try daily.
Blocks: 1163698
Reporter | ||
Comment 7•9 years ago
|
||
Nothing left to be done in here.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → INVALID
Comment 8•9 years ago
|
||
we have windows disabled for trigger bot, this should be resolved fixed? If we decided to keep it on, then invalid?
You need to log in
before you can comment on or make changes to this bug.
Description
•