Closed
Bug 1286160
Opened 9 years ago
Closed 6 years ago
autoland should use a queue instead of a database
Categories
(Conduit Graveyard :: Transplant, defect, P2)
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: glob, Unassigned)
References
()
Details
(Keywords: conduit-triaged)
Attachments
(1 file, 1 obsolete file)
|
82.81 KB,
image/png
|
Details |
my concerns with current design:
- doesn't function as a true queue
- patches are not always landed in the order they were submitted, which is unexpected
- if there's a transient failure when trying to land a patch at the head of the queue, it'll be deferred and the next patch will be attempted
- if the trees are closed for an extended period of time, this is likely to result in unexpected ordering of patches
- a single "queue" is used for all repositories
- we shouldn't be restricted to one worker for all repos
- data stored unnecessarily on server
- there's a pg database which holds the "queue"
- once jobs are processed they are not removed from the database
- service outages (eg. review board upgrades) may result in lost notifications
proposed solution:
- use pulse for queuing commits
- removes data store from autoland server, simplifying deployment and code
- report success and failures back to review board, also via pulse
- review board should be the data store, not autoland
- report to treeherder (maybe?), to provide a high level view of autoland's activities, and as a means to view the detailed autoland logs for success and failures
- always process the commit at the head of the queue (ie. FIFO)
- need to detect transient vs fatal failures
- a transient failure should result in the job being retried
- with a back-off
- if retry attempts hit a max value, autoland should stop and alerts triggered for admins to deal with
- admins need a mechanism to easily examine the job at the head of the queue and the failures encountered during processing
- use a separate queue/topic/key for each repository
- required when switching to a true queuing system
- allows us to land to different repos at the same time
- spin up a process/container/instance for each queue
Comment 3•9 years ago
|
||
There will be some upcoming changes to autoland to support Servo that may require persistent state (read: a database).
I'm all for using a proper queue, however. But don't get your heart set on killing the database :(
(In reply to Gregory Szorc [:gps] from comment #3)
> There will be some upcoming changes to autoland to support Servo that may
> require persistent state (read: a database).
>
> I'm all for using a proper queue, however. But don't get your heart set on
> killing the database :(
no worries -- if required we should use RDS (or S3?) for persistence, not a database running on the server.
Glob and I discussed this the other day and I will be working with him to use Pulse rather than the database as a message queue for the Servo autoland changes.
Updated•7 years ago
|
Product: MozReview → Conduit
Comment 6•7 years ago
|
||
Still something we very much want to do but needs prioritization versus other important Lando features.
Keywords: conduit-triaged
Whiteboard: [lando-backlog]
Comment 7•7 years ago
|
||
Most likely will use SQS not Pulse. Turning this into the overarching story.
Depends on: 1312140
Keywords: conduit-story
Summary: autoland should use a queue (amqp/pulse) instead of a database → autoland should use a queue instead of a database
Whiteboard: [lando-backlog]
Attachment #8770025 -
Attachment is obsolete: true
proposed design is in https://docs.google.com/document/d/1q6LWsrj2l-ClTbkWiHTfaC3712zewrhX9oyeKAdi1cI
Blocks: 1532784
Keywords: conduit-story
Priority: -- → P2
| Reporter | ||
Comment 10•6 years ago
|
||
we've decided not to move to a queue based system; instead we'll integrate transplant directly into lando.
see bug 1544346.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WONTFIX
Updated•2 years ago
|
Product: Conduit → Conduit Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•