Create dedicated task routes for Pernosco
Categories
(Firefox Build System :: Task Configuration, task)
Tracking
(firefox-esr115 fixed, firefox-esr128 fixed, firefox128 fixed, firefox129 fixed, firefox130 fixed)
People
(Reporter: ahal, Assigned: ahal)
References
(Blocks 2 open bugs)
Details
Attachments
(5 files)
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
phab-bot
:
approval-mozilla-beta+
|
Details | Review |
|
48 bytes,
text/x-phabricator-request
|
phab-bot
:
approval-mozilla-release+
|
Details | Review |
|
48 bytes,
text/x-phabricator-request
|
phab-bot
:
approval-mozilla-esr128+
|
Details | Review |
|
48 bytes,
text/x-phabricator-request
|
phab-bot
:
approval-mozilla-esr115+
|
Details | Review |
The Pernosco service interfaces with Firefox-CI via pulse exchanges. Currently, they listen to all tasks on the taskcluster-queue/v1/task-completed exchange, grab the task definition and filter out to only the tasks that are supported then run their analysis on them.
However, this means their consumers need to process ~10k tasks per hour and if their consumers ever fall over or there's some network problems preventing messages from being delivered, it starts to cause problems in our RabbitMQ instances which can bring the whole Firefox-CI Taskcluster instance to a halt. We're planning some longer term improvements to make sure this won't happen.
But in the shorter term, a simpler thing we can do is to create a dedicated pernosco route on only the tasks that are supported, and only when the PERNOSCO env is enabled. This should reduce the volume of tasks from many tens of thousands of tasks per day, to merely dozens. In effect, it would be moving the logic around which tasks to run the analysis on from the Pernosco side over to the Mozilla side.
The Pernosco pulse consumers would need to be updated to use this new exchange rather than the queue service's exchange. Though we can leave the PERNOSCO env in place for now to preserve backwards compatibility such that the Pernosco consumers can be updated at a later time.
| Assignee | ||
Comment 1•1 year ago
|
||
This is blocked on https://github.com/taskcluster/taskcluster/pull/7069/files being deployed to fxci (currently scheduled June 27th, 2024).
| Assignee | ||
Comment 2•1 year ago
|
||
This sets up a route on Pernosco tasks such that they will emit pulse messages
over the notify service's exchange with a dedicated routing key. This will allow
the Pernosco pulse consumer to receive only tasks that should be recorded.
Updated•1 year ago
|
| Assignee | ||
Comment 3•1 year ago
|
||
Once this lands, I'm going to need to uplift this to all branches.
| Assignee | ||
Updated•1 year ago
|
Comment 5•1 year ago
|
||
| bugherder | ||
| Assignee | ||
Comment 6•1 year ago
|
||
This sets up a route on Pernosco tasks such that they will emit pulse messages
over the notify service's exchange with a dedicated routing key. This will allow
the Pernosco pulse consumer to receive only tasks that should be recorded.
Original Revision: https://phabricator.services.mozilla.com/D215967
Updated•1 year ago
|
Comment 7•1 year ago
|
||
beta Uplift Approval Request
- User impact if declined: Higher risk of Taskcluster outages
- Code covered by automated testing: yes
- Fix verified in Nightly: yes
- Needs manual QE test: no
- Steps to reproduce for manual QE testing: N/A
- Risk associated with taking this patch: None
- Explanation of risk level: automation only
- String changes made/needed: N/A
- Is Android affected?: no
| Assignee | ||
Comment 8•1 year ago
|
||
This sets up a route on Pernosco tasks such that they will emit pulse messages
over the notify service's exchange with a dedicated routing key. This will allow
the Pernosco pulse consumer to receive only tasks that should be recorded.
Original Revision: https://phabricator.services.mozilla.com/D215967
Updated•1 year ago
|
Comment 9•1 year ago
|
||
release Uplift Approval Request
- User impact if declined: Higher risk of Taskcluster outages
- Code covered by automated testing: yes
- Fix verified in Nightly: yes
- Needs manual QE test: no
- Steps to reproduce for manual QE testing: N/A
- Risk associated with taking this patch: None
- Explanation of risk level: automation only
- String changes made/needed: N/A
- Is Android affected?: no
| Assignee | ||
Comment 10•1 year ago
|
||
This sets up a route on Pernosco tasks such that they will emit pulse messages
over the notify service's exchange with a dedicated routing key. This will allow
the Pernosco pulse consumer to receive only tasks that should be recorded.
Original Revision: https://phabricator.services.mozilla.com/D215967
Updated•1 year ago
|
Comment 11•1 year ago
|
||
esr128 Uplift Approval Request
- User impact if declined: Higher risk of Taskcluster outage
- Code covered by automated testing: yes
- Fix verified in Nightly: yes
- Needs manual QE test: no
- Steps to reproduce for manual QE testing: N/A
- Risk associated with taking this patch: None
- Explanation of risk level: automation only
- String changes made/needed: N/A
- Is Android affected?: no
| Assignee | ||
Comment 12•1 year ago
|
||
This sets up a route on Pernosco tasks such that they will emit pulse messages
over the notify service's exchange with a dedicated routing key. This will allow
the Pernosco pulse consumer to receive only tasks that should be recorded.
Original Revision: https://phabricator.services.mozilla.com/D215967
Updated•1 year ago
|
Comment 13•1 year ago
|
||
esr115 Uplift Approval Request
- User impact if declined: Higher risk of Taskcluster outage
- Code covered by automated testing: yes
- Fix verified in Nightly: yes
- Needs manual QE test: no
- Steps to reproduce for manual QE testing: N/A
- Risk associated with taking this patch: None
- Explanation of risk level: automation only
- String changes made/needed: N/A
- Is Android affected?: no
| Assignee | ||
Comment 14•1 year ago
|
||
This bug is changing the interface between mozilla-central and Pernosco. Because the Pernosco consumer lives out of tree, we need this change to land to all branches.
If we don't land to all branches, then the Pernosco consumer will need to continue using the old interface (which has been responsible for some TC outages due to causing OOMs in our RabbitMQ instance for pulse). Alternatively, not taking these patches would mean developers would no longer be able to use Pernosco on older branches.
Comment 15•1 year ago
|
||
Are you planning on attaching more work here, wondering why it's set to leave-open?
Updated•1 year ago
|
Comment 16•1 year ago
|
||
| uplift | ||
Updated•1 year ago
|
Updated•1 year ago
|
Comment 17•1 year ago
|
||
| uplift | ||
Updated•1 year ago
|
Comment 18•1 year ago
|
||
| uplift | ||
Comment 19•1 year ago
|
||
| uplift | ||
Updated•1 year ago
|
Comment 20•1 year ago
|
||
| uplift | ||
| Assignee | ||
Comment 21•1 year ago
|
||
Nope, guess I set it to wait for the uplifts to land.. there's work that needs to block on this landing everywhere.
Thanks for landing!
Updated•1 year ago
|
Description
•