Support Firefox Translations in Taskcluster
Categories
(Firefox Build System :: Task Configuration, task)
Tracking
(Not tracked)
People
(Reporter: ahal, Assigned: bhearsum)
References
Details
Attachments
(15 files, 2 obsolete files)
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review |
This will involve setting up Taskcluster with https://github.com/mozilla/firefox-translations-training, as well as creating a new GPU enabled pool for training the machine learning model on.
| Reporter | ||
Comment 1•2 years ago
|
||
Depends on D172451
| Assignee | ||
Comment 5•2 years ago
|
||
| Assignee | ||
Comment 7•2 years ago
|
||
| Assignee | ||
Comment 9•2 years ago
|
||
Comment 10•2 years ago
|
||
| Assignee | ||
Comment 11•2 years ago
|
||
Some of the toolchains we're building are large enough to warrant this already -- and we'll probably end up using these for some of the more CPU intensive parts of the training pipeline.
Comment 12•2 years ago
|
||
| Assignee | ||
Comment 13•2 years ago
|
||
Comment 14•2 years ago
|
||
| Assignee | ||
Comment 15•2 years ago
|
||
Comment 16•2 years ago
|
||
| Assignee | ||
Updated•2 years ago
|
| Assignee | ||
Comment 17•2 years ago
|
||
We're probably going to push my initial work to the main repo soon, and my hope is that we'll be able to further iterate in PRs. Given this, it will be important that PRs can't stomp on caches from on-push or action tasks.
Updated•2 years ago
|
Comment 18•2 years ago
|
||
| Assignee | ||
Comment 19•2 years ago
|
||
I'll add GPU workers after we stabilize them on level 1.
Comment 20•2 years ago
|
||
| Assignee | ||
Comment 21•2 years ago
|
||
Comment 22•2 years ago
|
||
| Assignee | ||
Comment 23•2 years ago
|
||
(In reply to Pulsebot from comment #22)
Pushed by bhearsum@mozilla.com:
https://hg.mozilla.org/ci/ci-configuration/rev/7d5ed3aea4e2
revert translations GPUworker patch because of issues creating the new
provider. r=releng-reviewers,gabriel
I backed this out due to this error when deploying:
Error: Identity and Access Management (IAM) API has not been used in project 559515877712 before or it is disabled. Enable it by visiting https://console.developers.google.com/apis/api/iam.googleapis.com/overview?project=559515877712 then retry. If you enabled this API recently, wait a few minutes for the action to propagate to our systems and retry.
at Gaxios._request (/app/node_modules/gaxios/build/src/gaxios.js:129:23)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async JWT.requestAsync (/app/node_modules/google-auth-library/build/src/auth/oauth2client.js:368:18)
at async GoogleProvider.setup (/app/services/worker-manager/src/providers/google.js:84:35)
at async Providers.setupProvider (/app/services/worker-manager/src/providers/index.js:75:7)
| Assignee | ||
Comment 24•2 years ago
|
||
Comment 25•2 years ago
|
||
| Assignee | ||
Comment 26•2 years ago
|
||
Updated•2 years ago
|
| Assignee | ||
Comment 27•2 years ago
|
||
We'll need this to start doing "production" training once https://github.com/mozilla/firefox-translations-training/pull/115 lands.
Comment 28•2 years ago
|
||
| Assignee | ||
Comment 29•2 years ago
|
||
I see we usually only grant specific hooks, but in this case I think all are appropriate, as it would be good to allow him to cancel & rerun tasks as well.
Updated•2 years ago
|
Comment 30•2 years ago
|
||
| Assignee | ||
Comment 31•2 years ago
|
||
| Assignee | ||
Comment 32•2 years ago
|
||
Now that we have 2 people working on this, and multiple training tasks, we bump up against the max of 2 often.
Depends on D180375
| Assignee | ||
Comment 33•2 years ago
|
||
Two reasons for doing this:
- To make sure multiple GPUs work with our Taskcluster pipeline
- To speed up development :)
Depends on D180376
Updated•2 years ago
|
Comment 34•2 years ago
|
||
| Assignee | ||
Comment 35•2 years ago
|
||
Our initial port of the pipeline is now completed and working in Taskcluster. There will definitely still be some follow-up fixes and improvements needed. I've filed bug 1844556 to track those in a central place.
Description
•