Closed Bug 1331098 Opened 7 years ago Closed 7 years ago

Use queue.claimWork in Scriptworker

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Assigned: mozilla)

References

()

Details

Attachments

(5 files)

This will allow the bridge to handle task priorities properly, which we need to support running tasks on hardware.

This will replace the existing queue.claimTask process.  It's already supported in the queue.

Don't deploy this until bug 1331094 is done, as Jonas will use that bug to monitor load on the queue as this new approach is deployed.
Blocks: 1331094
No longer depends on: 1331094
No longer blocks: 1331094
Depends on: 1331094
Removing dependency on 1331094 as docker-worker has been entirely moved over which accounts for a majority of our load.  Tasks that have been created using a provisioner ID of scriptworker-prov-v1 accounts for about 1.7% of our tasks for the last 30 days so should be fine with moving over to claimWork.
No longer depends on: 1331094
Attached file claimWork
Assignee: nobody → aki
Attachment #8855154 - Flags: review?(jlorenzo)
Attachment #8855154 - Flags: review?(jlorenzo) → review+
This bug has a reviewed PR at https://github.com/mozilla-releng/scriptworker/pull/106 .
I'm currently holding off on merging until we get https://github.com/taskcluster/taskcluster-client.py/pull/67 merged first.
Aki:

Randi is asking the status of this bug. It looks like https://github.com/taskcluster/taskcluster-client.py/pull/67 has been merged. Does this mean https://github.com/mozilla-releng/scriptworker/pull/106  can be merged as well?
Flags: needinfo?(aki)
We need a taskcluster-client.py release with https://github.com/taskcluster/taskcluster-client.py/pull/67 in it, and then we'll be fully unblocked.
Flags: needinfo?(aki)
John, can you create a new release of the client that picks up that change?
Flags: needinfo?(jhford)
Sure.  i'll do that today
Flags: needinfo?(jhford)
I'm having trouble accessing my desktop in Berlin to be able to generate a release.

Josh, are you aware of any trouble with the VPN setup that would be causing consistent connection drops?  I'm accessing my desktop (jhford.corp.ber1) from Teksavvy DSL in Toronto through our VPN.
Flags: needinfo?(jhoward)
I've been able to get it to work long enough to generate the release.

Submitting dist/taskcluster-1.3.0.tar.gz to https://upload.pypi.org/legacy/
Server response (200): OK
Thanks John!

I'm currently hitting integration test failures (locally; travis doesn't run integration tests) and flake8 failures [1]; I suspect these are due to new aiohttp and flake8 releases, but I'll have to narrow it down.  I'm hitting both types of test failures on both master and the claimWork branch.

[1] https://travis-ci.org/mozilla-releng/scriptworker/jobs/231238998
Attached file integration log
Downgrading to taskcluster==1.2.0 fixes the integration tests, but we lose retries.  I'm going to see if there's something in the integration tests I can fix.
I think the integration bustage has to do with this line:
https://github.com/taskcluster/taskcluster-client.py/commit/83c53dcea46619858f5bd195feafee7a9a081689#diff-e3b70cb7ee9cc15ea2c36a2de136df52R62

I added that line so we'd stop getting unclosed session errors.  Ideally this won't affect production, and I can change the tests to recreate the session(s) every time we need... probably by not passing in a session in the tests.  Otherwise we may need to remove it.
Attached file tc-client.py PR 71
taskcluster-client.py PR to fix the scriptworker integration tests.
Attachment #8866973 - Flags: review?(jhford)
Attachment #8866973 - Flags: review?(jhford) → review+
scriptworker 4.0.0 released.  I need to bump signing- beetmover- balrog- and pushapk- scriptworkers to 4.0.0, then all scriptworkers will be using claimWork.
Attachment #8867795 - Flags: review?(mtabara)
Comment on attachment 8867795 [details] [diff] [review]
bump_scriptworker.4.0.0.diff

Review of attachment 8867795 [details] [diff] [review]:
-----------------------------------------------------------------

All packages seem to be at their place in puppet.
Lgtm, dephash++!
Attachment #8867795 - Flags: review?(mtabara) → review+
Attached file PR 73
How does this PR look jhford?

Essentially, we were getting

May 15 11:42:31 signing-linux-1.srv.releng.use1.mozilla.com python: signing_scriptworker 2017-05-15T18:42:31  WARNING - <class 'taskcluster.exceptions.TaskclusterRestFailure'> Unknown Server Error

which isn't very informative.

This PR a) fixes the await json(), so we actually get the dict instead of a coroutine (this fixes the current issue), and b) adds some more info to the unknown server error message if we do hit it in the future.

After I patched a scriptworker with this patch locally, I got the missing scope error message logged, and I was able to fix the underlying problem.

Sorry for the churn; we're slowly but surely getting this function where it needs to be.
Attachment #8867862 - Flags: review?(jhford)
I think we're good here... claimWork is rolled out.  I had to update the clientId scopes, and downgrade everything from chardet==3.0.2 back to 2.3.0 until a new chardet release is rolled out.

I'll resolve once we fix the above tc.py PR.
Comment on attachment 8867862 [details] [review]
PR 73

Looks good!  I've landed this and released 1.3.2 with this patch.  Please open a quick PR to address the review comments from #73, but I didn't want to block this on the comment there.
Attachment #8867862 - Flags: review?(jhford) → review+
Bumped to taskcluster 1.3.2 in bug 1365290 here: https://hg.mozilla.org/build/puppet/rev/7cfa62eda6d584c5fbda457300310e11b4e14f32

I'm planning to upgrade chardet to 3.0.3 and taskcluster to 1.3.3 tomorrow; just cleanup.
Randi considers this a blocker, so I'm going to resolve and open a non-blocker for cleanup.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Component: General Automation → General
Flags: needinfo?(jhoward)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: