Use queue.claimWork in Scriptworker

RESOLVED FIXED
(NeedInfo from)

Status

Release Engineering
General Automation
RESOLVED FIXED
a year ago
8 months ago

People

(Reporter: dustin, Assigned: aki, NeedInfo)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(URL)

Attachments

(5 attachments)

(Reporter)

Description

a year ago
This will allow the bridge to handle task priorities properly, which we need to support running tasks on hardware.

This will replace the existing queue.claimTask process.  It's already supported in the queue.

Don't deploy this until bug 1331094 is done, as Jonas will use that bug to monitor load on the queue as this new approach is deployed.
Blocks: 1331094
No longer depends on: 1331094
No longer blocks: 1331094
Depends on: 1331094

Comment 2

10 months ago
Removing dependency on 1331094 as docker-worker has been entirely moved over which accounts for a majority of our load.  Tasks that have been created using a provisioner ID of scriptworker-prov-v1 accounts for about 1.7% of our tasks for the last 30 days so should be fine with moving over to claimWork.
No longer depends on: 1331094
(Assignee)

Comment 3

10 months ago
Created attachment 8855154 [details] [review]
claimWork
Assignee: nobody → aki
Attachment #8855154 - Flags: review?(jlorenzo)

Updated

10 months ago
Attachment #8855154 - Flags: review?(jlorenzo) → review+
(Assignee)

Comment 4

9 months ago
This bug has a reviewed PR at https://github.com/mozilla-releng/scriptworker/pull/106 .
I'm currently holding off on merging until we get https://github.com/taskcluster/taskcluster-client.py/pull/67 merged first.

Comment 5

8 months ago
Aki:

Randi is asking the status of this bug. It looks like https://github.com/taskcluster/taskcluster-client.py/pull/67 has been merged. Does this mean https://github.com/mozilla-releng/scriptworker/pull/106  can be merged as well?
Flags: needinfo?(aki)
(Assignee)

Comment 6

8 months ago
We need a taskcluster-client.py release with https://github.com/taskcluster/taskcluster-client.py/pull/67 in it, and then we'll be fully unblocked.
Flags: needinfo?(aki)

Comment 7

8 months ago
John, can you create a new release of the client that picks up that change?
Flags: needinfo?(jhford)
Sure.  i'll do that today
Flags: needinfo?(jhford)
I'm having trouble accessing my desktop in Berlin to be able to generate a release.

Josh, are you aware of any trouble with the VPN setup that would be causing consistent connection drops?  I'm accessing my desktop (jhford.corp.ber1) from Teksavvy DSL in Toronto through our VPN.
Flags: needinfo?(jhoward)
I've been able to get it to work long enough to generate the release.

Submitting dist/taskcluster-1.3.0.tar.gz to https://upload.pypi.org/legacy/
Server response (200): OK
(Assignee)

Comment 11

8 months ago
Thanks John!

I'm currently hitting integration test failures (locally; travis doesn't run integration tests) and flake8 failures [1]; I suspect these are due to new aiohttp and flake8 releases, but I'll have to narrow it down.  I'm hitting both types of test failures on both master and the claimWork branch.

[1] https://travis-ci.org/mozilla-releng/scriptworker/jobs/231238998
(Assignee)

Comment 12

8 months ago
Created attachment 8866875 [details]
integration log
(Assignee)

Comment 13

8 months ago
Downgrading to taskcluster==1.2.0 fixes the integration tests, but we lose retries.  I'm going to see if there's something in the integration tests I can fix.
(Assignee)

Comment 14

8 months ago
I think the integration bustage has to do with this line:
https://github.com/taskcluster/taskcluster-client.py/commit/83c53dcea46619858f5bd195feafee7a9a081689#diff-e3b70cb7ee9cc15ea2c36a2de136df52R62

I added that line so we'd stop getting unclosed session errors.  Ideally this won't affect production, and I can change the tests to recreate the session(s) every time we need... probably by not passing in a session in the tests.  Otherwise we may need to remove it.
(Assignee)

Comment 15

8 months ago
Created attachment 8866973 [details] [review]
tc-client.py PR 71

taskcluster-client.py PR to fix the scriptworker integration tests.
Attachment #8866973 - Flags: review?(jhford)
(Assignee)

Updated

8 months ago
Attachment #8866973 - Flags: review?(jhford) → review+
(Assignee)

Comment 16

8 months ago
scriptworker 4.0.0 released.  I need to bump signing- beetmover- balrog- and pushapk- scriptworkers to 4.0.0, then all scriptworkers will be using claimWork.
(Assignee)

Comment 17

8 months ago
Created attachment 8867795 [details] [diff] [review]
bump_scriptworker.4.0.0.diff
Attachment #8867795 - Flags: review?(mtabara)
Comment on attachment 8867795 [details] [diff] [review]
bump_scriptworker.4.0.0.diff

Review of attachment 8867795 [details] [diff] [review]:
-----------------------------------------------------------------

All packages seem to be at their place in puppet.
Lgtm, dephash++!
Attachment #8867795 - Flags: review?(mtabara) → review+
(Assignee)

Comment 20

8 months ago
https://hg.mozilla.org/build/puppet/rev/56d7fcebda3738e1a51651c61a3accce9216835c
bug 1331098 - add missing dep (pyparsing) to pushapk. r=bustage
(Assignee)

Comment 22

8 months ago
Created attachment 8867862 [details] [review]
PR 73

How does this PR look jhford?

Essentially, we were getting

May 15 11:42:31 signing-linux-1.srv.releng.use1.mozilla.com python: signing_scriptworker 2017-05-15T18:42:31  WARNING - <class 'taskcluster.exceptions.TaskclusterRestFailure'> Unknown Server Error

which isn't very informative.

This PR a) fixes the await json(), so we actually get the dict instead of a coroutine (this fixes the current issue), and b) adds some more info to the unknown server error message if we do hit it in the future.

After I patched a scriptworker with this patch locally, I got the missing scope error message logged, and I was able to fix the underlying problem.

Sorry for the churn; we're slowly but surely getting this function where it needs to be.
Attachment #8867862 - Flags: review?(jhford)
(Assignee)

Comment 23

8 months ago
I think we're good here... claimWork is rolled out.  I had to update the clientId scopes, and downgrade everything from chardet==3.0.2 back to 2.3.0 until a new chardet release is rolled out.

I'll resolve once we fix the above tc.py PR.
Comment on attachment 8867862 [details] [review]
PR 73

Looks good!  I've landed this and released 1.3.2 with this patch.  Please open a quick PR to address the review comments from #73, but I didn't want to block this on the comment there.
Attachment #8867862 - Flags: review?(jhford) → review+
(Assignee)

Comment 25

8 months ago
Thanks John!
https://github.com/taskcluster/taskcluster-client.py/pull/74 for max 1024 chars.
(Assignee)

Comment 26

8 months ago
Bumped to taskcluster 1.3.2 in bug 1365290 here: https://hg.mozilla.org/build/puppet/rev/7cfa62eda6d584c5fbda457300310e11b4e14f32

I'm planning to upgrade chardet to 3.0.3 and taskcluster to 1.3.3 tomorrow; just cleanup.
(Assignee)

Comment 27

8 months ago
Randi considers this a blocker, so I'm going to resolve and open a non-blocker for cleanup.
Status: NEW → RESOLVED
Last Resolved: 8 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.