1903235 - Investigate isolation of taskcluster exchanges/queues from the rest by vhost

Since we allow anyone to create pulse accounts and queues to listen to all messages, we are adding a certain risk of core publishers being affected by large number of uncontrolled queues.

When message is being published by taskcluster to one of its exchanges, RMQ would need to route this message to all queues that have corresponding bindings. Only after messages were delivered to the queues, RMQ would send a confirmation to the publisher.
If one of the nodes in cluster is under load or some queue cannot accept new incoming message, whole process is going to be delayed, and producer might fail with deadline exceeded timeout (12s)

To minimize risk of waiting for the messages to be propagated to the external (non-core) queues we can experiment by separating core vs non-core queues by vhost.

Idea to test:

FxCI gets a dedicated fxci vhost that only it is allowed to publish and create queues.
Federation plugin is setup to forward all messages from fxci host to the existing / vhost (or a new one) (federation should be ASYNC, so wouldn't block publisher)
All external integrations and queues are listening on the mirrored queues.

Potential benefits here (needs validation and testing):

publishing to a dedicated vhost, should not depend on external queues, as federation would be async
clean separation of core vs non-core

However, external integrations might still cause CPU spike, so it's important to test if having much lower number of direct consumer queues would work as fast.
In case of exclusive pernosco queue which had a high number of unacked messages - it could have delayed new messages being published

Bugzilla

Quick Search

Investigate isolation of taskcluster exchanges/queues from the rest by vhost

Categories

(Webtools :: Pulse, task)

Tracking

(Not tracked)

People

(Reporter: yarik, Unassigned)

References

(Blocks 1 open bug)

Details

Crash Data

Security

(public)

User Story

Description

Updated