1405072 - Iteration support in python client library

Reporter

Description

•

8 years ago

The call I use most often in the python taskcluster client is queue.listTaskGroup. Once it supports the continuation token, it would be very useful if there was a higher level wrapper around it. It's likely we'll have several different pieces of code around the place to deal with the continuation token, when most of the use cases will want either the full task graph, or up to the specified limit. It would be useful if the python client could handle the token on our behalf, so we don't need to see it.

Dustin J. Mitchell [:dustin] (he/him)

Updated

•

8 years ago

Mentor: dustin, jhford

tuhina chatterjee

Comment 1

•

8 years ago

Hi Dustin, I am newbie to this project and would like to work on this bug. Can I proceed working on this one?

Flags: needinfo?(dustin)

Dustin J. Mitchell [:dustin] (he/him)

Comment 2

•

8 years ago

Absolutely! What simon is looking for here is a function which will call another API function like https://docs.taskcluster.net/reference/platform/taskcluster-auth/references/api#azureTables https://docs.taskcluster.net/reference/platform/taskcluster-queue/references/api#listArtifacts https://docs.taskcluster.net/reference/platform/taskcluster-queue/references/api#listTaskGroup https://docs.taskcluster.net/reference/core/taskcluster-secrets/references/api#list https://docs.taskcluster.net/reference/core/taskcluster-index/references/api#list-namespaces repeatedly, using the continuationToken option to build up the whole list in memory and then returning it. So this might look like queue = taskcluster.Queue() taskcluster.getAll(queue.listArtifacts, taskId, runId) so that would call queue.listArtifacts repeatedly, with taskId and runId, until it had fetched all of the artifacts, and then return the whole bunch. There are some complications here! For example, listArtifacts returns an object with an `artifacts` key, while listTaskGroup returns an object with a `tasks` key. How can this generic `getAll` function know what lists to put together to return "the whole bunch"? We do not want to make special-cases for each API method. How do you think we could solve this? Another complication is that the index.listNamespaces API method takes its continuationToken argument in the payload, instead of as a query parameter. I believe that is the only method to do so -- perhaps we could just modify it to behave like the others.

Flags: needinfo?(dustin)

tuhina chatterjee

Comment 3

•

8 years ago

(In reply to Dustin J. Mitchell [:dustin] pronoun: he from comment #2) Thanks Dustin. Before getting into the details of the issue. I would be grateful if you can help me out with a one query. a - how should I go about setting up development environment for this particular TaskCluster project ? > Absolutely! > > What simon is looking for here is a function which will call another API > function > .... > repeatedly, using the continuationToken option to build up the whole list in > memory and then returning it. > > So this might look like > > queue = taskcluster.Queue() > taskcluster.getAll(queue.listArtifacts, taskId, runId) > > so that would call queue.listArtifacts repeatedly, with taskId and runId, > until it had fetched all of the artifacts, and then return the whole bunch. > > There are some complications here! For example, listArtifacts returns an > object with an `artifacts` key, while listTaskGroup returns an object with a > `tasks` key. How can this generic `getAll` function know what lists to put > together to return "the whole bunch"? We do not want to make special-cases > for each API method. How do you think we could solve this? > Possible solutions to this : a - hold these possible keys in a constant list, i.e ["tasks", "artifacts", "secrets", "namespaces", ...] and iterate over this list to lookup the object(`listArtifacts or listTaskGroup`) passed to `getAll` method. b - pass the key name as the method parameter, i.e taskcluster.getAll(queue.listArtifacts, key, taskId, runId) where `key` is "tasks" or "artifacts" whichever is the key to get data from the first argument. c - pass a unique string to the client and lookup its values in an object with the hash as the key and the keyword to lookup with as its value, i.e taskcluster.getAll(queue.listArtifacts, hash, taskId, runId) lookup the key from this constant object { "hash1": "tasks", "hash2": "artifacts", "hash3": "secrets", "hash4": "namespaces" } b & c are almost on similar lines. Do let me know what you think and if a better solution can be found. > Another complication is that the index.listNamespaces API method takes its > continuationToken argument in the payload, instead of as a query parameter. > I believe that is the only method to do so -- perhaps we could just modify > it to behave like the others. Do let me know if modifying index.listNamespaces API method signature is within the scope of this contribution.

Flags: needinfo?(dustin)

Simon Fraser [:sfraser] ⌚️GMT

Reporter

Comment 4

•

8 years ago

If it'll be a generic wrapper to handle continuation tokens, I don't know if this would be possible, but: It would be nice if the result could be pythonic, such as 'iterate over all tasks returned by listTaskGroup' I'd started fiddling with https://github.com/srfraser/taskhuddler to answer some of my own questions about the release graphs, but it's not the generic client that taskcluster-client.py should be, as it involves knowing about how we structure things in order to answer useful questions.

Simon Fraser [:sfraser] ⌚️GMT

Reporter

Comment 5

•

8 years ago

(In reply to tuhina chatterjee from comment #3) > b - pass the key name as the method parameter, i.e > > c - pass a unique string to the client and lookup its values in an object > with the hash as > > b & c are almost on similar lines. Do let me know what you think and if a > better solution can be found. As a preference, I'd like to not have to tell the library how to do its job when I make calls to it. We could recursively merge dictionaries and extend lists - it seems unlikely there'd be duplicate entries in any lists returned.

Dustin J. Mitchell [:dustin] (he/him)

Comment 6

•

8 years ago

I agree that an iterator would be better -- one that fetches a batch of results, yields them one-by-one, then fetches the next batch, and so on. And, you're right -- we should do this the right way. That means including enough metadata in the API definition that the client can auto-generate this sort of function. That's a bigger project than something we can expect of an Outreachy applicant (but you're welcome to work on it!) However, updating the index.listNamespaces API method is a good task. Please file a new bug to track that, and put the bug number in a comment on this one so that we can find it. Please also look through the other services on https://docs.taskcluster.net/reference to see if there are other API methods with continuation tokens that use POST instead of GET.

Flags: needinfo?(dustin)

Dustin J. Mitchell [:dustin] (he/him)

Comment 7

•

8 years ago

Oops, I phrased that wrong: it's bigger than a "getting started" contribution. It would actually make a good Outreachy or GSoC project :)

Dustin J. Mitchell [:dustin] (he/him)

Comment 8

•

8 years ago

..also, that kind of makes this not a very good mentored bug :)

Mentor: dustin, jhford

tuhina chatterjee

Comment 9

•

8 years ago

(In reply to Dustin J. Mitchell [:dustin] pronoun: he from comment #6) That's a bigger project than something we can expect of > an Outreachy applicant (but you're welcome to work on it!) Is it possible that I take this up as an Outreachy project ? And I can probably select another good first bug to complete the application process ? > However, updating the index.listNamespaces API method is a good task. > Please file a new bug to track that, and put the bug number in a comment on > this one so that we can find it. Please also look through the other > services on https://docs.taskcluster.net/reference to see if there are other > API methods with continuation tokens that use POST instead of GET. This is the link to the bug https://bugzilla.mozilla.org/show_bug.cgi?id=1439058

Dustin J. Mitchell [:dustin] (he/him)

Comment 10

•

8 years ago

Bug 1439058 will make a good first bug. We'll stick with the existing project for this round of Outreachy, though. Setting up for and mentoring an Outreachy project is a lot of work, and we're already set up for that project. Perhaps in a future round? Or, if you're selected, perhaps that can be a good follow-up project if you finish the main project quickly :)

Dustin J. Mitchell [:dustin] (he/him)

Updated

•

8 years ago

Summary: [python client] Higher level calls in taskcluster-client.py → Iteration support in client libraries

Dustin J. Mitchell [:dustin] (he/him)

Comment 11

•

8 years ago

See also https://github.com/taskcluster/taskcluster-lib-api/issues/63

John Ford [:jhford] CET/CEST Berlin Time

Assignee

Comment 12

•

8 years ago

The python client provides pagination support. https://github.com/taskcluster/taskcluster-client.py#pagination It looks like this is in reference to the Python client, if so I think this can be resolved as fixed.

Flags: needinfo?(sfraser)

Simon Fraser [:sfraser] ⌚️GMT

Reporter

Comment 13

•

8 years ago

(In reply to John Ford [:jhford] CET/CEST Berlin Time from comment #12) > The python client provides pagination support. > > https://github.com/taskcluster/taskcluster-client.py#pagination > > It looks like this is in reference to the Python client, if so I think this > can be resolved as fixed. This is a request for the python client to not present the continuation token, but instead to handle the api in a more pythonic way: for example as a generator. At the moment we're hand-rolling our own continuationToken handler for every script, and for some use cases writing our own higher level wrapper around taskcluster-client.py, since most of our use cases require fetching the whole graph. It would be useful if we didn't have to do this - folding it into taskcluster-client.py seems reasonable, rather than have everyone reinvent that particular wheel.

Flags: needinfo?(sfraser)

Chris Cooper [:coop] (he/him)

Comment 14

•

7 years ago

John: would you be willing to mentor this bug?

Flags: needinfo?(jhford)

John Ford [:jhford] CET/CEST Berlin Time

Assignee

Comment 15

•

7 years ago

(In reply to Simon Fraser [:sfraser] ⌚️GMT from comment #13) > This is a request for the python client to not present the continuation > token, but instead to handle the api in a more pythonic way: for example as > a generator. > > At the moment we're hand-rolling our own continuationToken handler for every > script, and for some use cases writing our own higher level wrapper around > taskcluster-client.py, since most of our use cases require fetching the > whole graph. > > It would be useful if we didn't have to do this - folding it into > taskcluster-client.py seems reasonable, rather than have everyone reinvent > that particular wheel. The pagination support that I mentioned is not for presenting the continuation token, but rather taking a callable which is called on each response body. Here's the example of this callable support: import taskcluster queue = taskcluster.Queue() responses = [] def handle_page(y): print("%d tasks fetched" % len(y.get('tasks', []))) responses.append(y) queue.listTaskGroup('JzTGxwxhQ76_Tt1dxkaG5g', paginationHandler=handle_page) tasks = 0 for response in responses: tasks += len(response.get('tasks', [])) print("%d requests fetch %d tasks" % (len(responses), tasks)) I'm happy for us to have a generator based interface as well, or in place of this support if it's still desirable. (In reply to Chris Cooper [:coop] pronoun: he from comment #14) > John: would you be willing to mentor this bug? Sure!

Flags: needinfo?(jhford)

John Ford [:jhford] CET/CEST Berlin Time

Assignee

Comment 16

•

7 years ago

This bug seems to be about the python client, so changing the title to reflect that. The python client already has the requested support, as mentioned in comment 15. Let's open new bugs for possibly changing the pagination interface, or if there's problems doing pagination. I'm going to mark this bug as resolved because we have support for the requested interface.

Assignee: nobody → jhford

Status: NEW → RESOLVED

Closed: 7 years ago

Priority: -- → P5

Resolution: --- → FIXED

Summary: Iteration support in client libraries → Iteration support in python client library

Simon Fraser [:sfraser] ⌚️GMT

Reporter

Comment 17

•

7 years ago

This was the new bug to talk about changing the pagination interface :) I didn't request callbacks, so I don't know where that requested interface came from.

Chris Cooper [:coop] (he/him)

Comment 18

•

7 years ago

With a new Outreachy season approaching, I'm going to re-open this based on comment #7. Simon: if you need this more quickly, we would certainly accept/review patches, although it seems like you and John are still talking past each other about what's required. I'd encourage the two of you to hop on vidyo to sort it out.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Chris Cooper [:coop] (he/him)

Updated

•

7 years ago

Assignee: jhford → nobody

Mentor: jhford

John Ford [:jhford] CET/CEST Berlin Time

Assignee

Comment 19

•

7 years ago

I wrote a comment but it seems that it didn't make its way into bugzilla, so here's version 2 :) My understanding is that this bug is about abstracting the low level continuation token away from users of the taskcluster-client.py library, which have support for. This support was added in an unrelated bug which was wrapping up right around when this bug was filed, which is why it used a different interface. Here's a usage example: import taskcluster queue = taskcluster.Queue() pages = [] queue.listTaskGroup('JzTGxwxhQ76_Tt1dxkaG5g', paginationHandler=pages.append) I'm finding this bug difficult to follow, but I think the new request is for us to add a generator interface to the client. That's a great idea! I've filed bug 1488482 to track this work. Simon, if the generator interface is the only outstanding request here, I think that the new bug will address that. Please let me know if there's anything other than a generator interface that I've missed, otherwise I don't think there's anything left in this bug.

Assignee: nobody → jhford

Flags: needinfo?(sfraser)

Comment 20

•

7 years ago

I think a higher level interface for common queries would be useful, but we can discuss that in the new bug if it's easier.

Flags: needinfo?(sfraser)

John Ford [:jhford] CET/CEST Berlin Time

Assignee

Updated

•

7 years ago

Status: REOPENED → RESOLVED

Closed: 7 years ago → 7 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

7 years ago

Component: Client Libraries → Services

Bugzilla

Iteration support in python client library

Categories

(Taskcluster :: Services, enhancement, P5)

Tracking

(Not tracked)

People

(Reporter: sfraser, Assigned: jhford, Mentored)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Updated

Comment 11

Comment 12

Comment 13

Comment 14

Comment 15

Comment 16

Comment 17

Comment 18

Updated

Comment 19

Comment 20

Updated

Updated