my concerns with current design: - doesn't function as a true queue - patches are not always landed in the order they were submitted, which is unexpected - if there's a transient failure when trying to land a patch at the head of the queue, it'll be deferred and the next patch will be attempted - if the trees are closed for an extended period of time, this is likely to result in unexpected ordering of patches - a single "queue" is used for all repositories - we shouldn't be restricted to one worker for all repos - data stored unnecessarily on server - there's a pg database which holds the "queue" - once jobs are processed they are not removed from the database - service outages (eg. review board upgrades) may result in lost notifications proposed solution: - use pulse for queuing commits - removes data store from autoland server, simplifying deployment and code - report success and failures back to review board, also via pulse - review board should be the data store, not autoland - report to treeherder (maybe?), to provide a high level view of autoland's activities, and as a means to view the detailed autoland logs for success and failures - always process the commit at the head of the queue (ie. FIFO) - need to detect transient vs fatal failures - a transient failure should result in the job being retried - with a back-off - if retry attempts hit a max value, autoland should stop and alerts triggered for admins to deal with - admins need a mechanism to easily examine the job at the head of the queue and the failures encountered during processing - use a separate queue/topic/key for each repository - required when switching to a true queuing system - allows us to land to different repos at the same time - spin up a process/container/instance for each queue
There will be some upcoming changes to autoland to support Servo that may require persistent state (read: a database). I'm all for using a proper queue, however. But don't get your heart set on killing the database :(
(In reply to Gregory Szorc [:gps] from comment #3) > There will be some upcoming changes to autoland to support Servo that may > require persistent state (read: a database). > > I'm all for using a proper queue, however. But don't get your heart set on > killing the database :( no worries -- if required we should use RDS (or S3?) for persistence, not a database running on the server.
Glob and I discussed this the other day and I will be working with him to use Pulse rather than the database as a message queue for the Servo autoland changes.
Still something we very much want to do but needs prioritization versus other important Lando features.
Most likely will use SQS not Pulse. Turning this into the overarching story.
Depends on: 1312140
Summary: autoland should use a queue (amqp/pulse) instead of a database → autoland should use a queue instead of a database
Attachment #8770025 - Attachment is obsolete: true
proposed design is in https://docs.google.com/document/d/1q6LWsrj2l-ClTbkWiHTfaC3712zewrhX9oyeKAdi1cI
You need to log in before you can comment on or make changes to this bug.