Closed Bug 1123994 Opened 11 years ago Closed 10 years ago

queue - worker: Interaction with Azure Queues

Categories

(Taskcluster :: Services, defect)

x86_64
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jonasfj, Assigned: jonasfj)

References

Details

Attachments

(2 files, 1 obsolete file)

When moving the queue to Azure Table/Queue Storage, we have to change how the worker and taskcluster-queue interacts. Specially, the flow for claiming tasks.
What you think about the outlined flow: https://github.com/taskcluster/taskcluster-docs/pull/7 From worker perspective there is two simple flows: A) Simple (but slow) Option Call queue.claimWork(provisionerId, workerType) -> task claimed! or status == 204 B) The Efficient Option Call queue.pollTaskUrl(provisionerId, workerType) -> signedPollTaskUrl do xml = GET(signedPollTaskUrl) while(xml.indexOf('<MessageText>') === -1) queue.claimWork(provisionerId, workerType, {xmlMessage: xml}) -> task claimed! On the queue side, claimWork looks as follows: var xml = req.body.xmlMessage if(!xml) xml = poll azure queue xml = parse(xml) azureQueue.updateMessage(xml.msgId, xml.receipt, { taskId: xml.MessageText.taskId, runId: xml.MessageText.runId + 1 // If this message becomes visible, start next run }, { invisibleTimeout: 20 min // takenUntil interval }).then(function(newReceipt) { // update table storage and save newReceipt for next time, the worker calls reclaim // if there is a run with runId < xml.MessageText.runId, then that run **must** have // timed out, and we declare it "failed". }); That sort of the short pseudo code. Remark, it might reasonable to cut the maximum deadline down to 5 days into the future, as azure queue message can stay at most 7 days in the queue. And we need a little time to survive if taskcluster queue goes down (or misbehaves).
Attachment #8552116 - Flags: review?(jlal)
Attachment #8552116 - Flags: review?(jhford)
Attachment #8552116 - Flags: review?(garndt)
Assignee: nobody → jopsen
Comment on attachment 8552116 [details] [review] github PR for docs, describing new claim task flow lgtm- though we should add some documentation as we go about how to poll the urls (priority algorithm stuff).
Attachment #8552116 - Flags: review?(jlal) → review+
Comment on attachment 8552116 [details] [review] github PR for docs, describing new claim task flow Looks good. Definitely like the idea of polling a url for N messages. If there are 10 workers polling the url around the same time, will all 10 get the same (or close to the same) # of messages out of the queue to process or will the messages pulled have an invisibility set for a short amount of time (like 30 seconds or something) ?
Attachment #8552116 - Flags: review?(garndt) → review+
@garndt, there is an invisibility timeout... I'll probably set it to 5min or so... So that there is plenty of time for the worker to call queue.claimTask for each of the N tasks is got from the azure queue.
Attached file Github PR for phase one implementation (obsolete) —
This is the implementation of proposed claim-task flow. This doesn't break backwards compatibility, but allows us to claim tasks with the new flow.. so we can make a gradual migration. ClaimWork has been marked deprecated, and claimTask warns that messageId, receipt and signature will before required in the future...
Attachment #8554022 - Flags: review?(jhford)
Comment on attachment 8554022 [details] [review] Github PR for phase one implementation Looks good
Attachment #8554022 - Flags: review?(jhford) → review+
Attachment #8552116 - Flags: review?(jhford) → review+
Deployed... Caused a few minutes of downtime, as I had an import of a devDependency... Anyways, it works now, I hope. This just phase one... the old way of poll tasks should still work, but now we can start work on porting docker-worker to this flow, after which I'll port the queue off postgres entirely. (hmm, I forgot to push the button yesterday)
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Oh, just for reference the bug for using this in docker-worker is bug 1126648.
I modified the interface a bit in the process of migrating the queue.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attached file Github PR
* Marked everything deprecated accordingly * Updated JSON schemas to match what we will have in the future * Sorry about the npm-shrinkwrap.json changes, just ignore those Shouldn't break compatibility, but allow migration to the new poll-flow, as documented in [1]. @garndt, I'm attaching you as reviewer because I hope you're going to help with the docker-worker side of this. Reading [1] you'll see exactly how the azure queue polling thing works. I've written about that in detail, because it hard to fit it into API docs. Also note that inline API docs haven't received so much love, but you're welcome to take a look at the rewrite branch [2] if curious. Should you have questions there is also a PR [3] open for that. I'll hopefully add more docs to the rewrite branch. This PR is pretty much just a backport for smooth transition. References: [1] http://docs.taskcluster.net/queue/worker-interaction/ [2] https://github.com/taskcluster/taskcluster-queue/tree/rewrite-on-azure-storage-services [3] https://github.com/taskcluster/taskcluster-queue/pull/32
Attachment #8554022 - Attachment is obsolete: true
Attachment #8566908 - Flags: review?(jhford)
Attachment #8566908 - Flags: review?(garndt)
Blocks: 1138388
Attachment #8566908 - Flags: review?(garndt) → review+
Merged and deployed based on garndt r+ :)
Status: REOPENED → RESOLVED
Closed: 11 years ago10 years ago
Resolution: --- → FIXED
Comment on attachment 8566908 [details] [review] Github PR Clearing r? for jhford.
Attachment #8566908 - Flags: review?(jhford)
Component: TaskCluster → Queue
Product: Testing → Taskcluster
Component: Queue → Services
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: