Closed Bug 1455130 Opened 6 years ago Closed 5 years ago

Add pagination to auth.listRoles

Categories

(Taskcluster :: Services, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: dustin, Assigned: ojaswin25111998, Mentored)

Details

(Keywords: good-first-bug)

Hi Dustin. Answering your question https://bugzilla.mozilla.org/show_bug.cgi?id=1436212#c32.
Yes I want to work on this.
Looks like one of the changes for bug 1436216 has landed (login) and the other is in review (tools).

Do you want to start on this, Tuhina?
Assignee: dustin → nobody
Mentor: dustin
I was going through the code base. I have some questions about how to go about implementing the continuationToken part.

listRoles[1] loads a single Blob of data from 'azure-blob-storage'[2]. continuationToken could have been used if roles were a list of blobs using this method[3].


[1] https://github.com/taskcluster/taskcluster-auth/blob/master/src/v1.js#L582
[2] https://github.com/taskcluster/taskcluster-auth/blob/master/src/containers.js#L68
[3] https://github.com/taskcluster/azure-blob-storage/blob/5027ec7039cc926e0bf786ced91717834df49e42/src/datacontainer.js#L302
Flags: needinfo?(dustin)
Good thinking!  You're right that the way we store roles right now would prevent us from ever returning a continuationToken.

However, that's an implementation detail, and we do not want to expose that implementation detail in the API.  Someday we may store roles in a different way, and want to allow pagination.  Also, users might pass `limit=5` and want to get five roles at a time.

So, I think we need to "simulate" pagination.  Probably the simplest way to do that is to put the starting index in the continuation token.  Then

GET ..?limit=5
 --> {continuationToken: '5', roles: [..]}
GET ..?continuationToken=5&limit=5
 --> {continuationToken: '10', roles: [..]}
..

That should be relatively straightforward to implement.  My only concern is that users might notice that the continuationToken looks an awful lot like a number, and start to depend on that (for example, someone might decide to go backward by setting `continuationToken = (parseInt(continuationToken, 10) - 5).toString()`).  Then when we change the implementation, their software would break.  In other words, we're exposing an implementation detail again.

There is a package on npm to help with this exact thing -- https://www.npmjs.com/package/hashids

So the example above might become

GET ..?limit=5
 --> {continuationToken: 'o2fXhV', roles: [..]}
GET ..?continuationToken=5&limit=5
 --> {continuationToken: '38fBae', roles: [..]}
..

Hopefully that's enough to get a start.  It'd be fine to implement the first thing (with the integers), and then in another commit implement the hashid solution.

Jonas, did I suggest anything stupid here?
Flags: needinfo?(dustin) → needinfo?(jopsen)
I think this is a good idea.

Note: please maintain current behavior that if no limit is given then it just returns all roles.
Flags: needinfo?(jopsen)
> Note: please maintain current behavior that if no limit is given then it just returns all roles.

I think we can do this by setting the default limit to something much larger than the current number of roles (1000 is a typical default value, and we only have a few hundred roles, so that should be fine).
Keywords: good-first-bug
The current role listing is super-slow, because it's transferring several MB of data, including roles, scopes, and expanded scopes.  This is only going to get bigger!

So, let's add a new method `listRoleIds` that lists the (..wait for it) role ID's.  This is enough to render a list of scopes in a UI.  Then the UI, or any other consumer, can call `role(roleId)`.  Then we can modify the tools site to use this new method.  Then we can deprecate (but keep supporting) the existing listRoles endpoint.
QA Contact: dustin
Hey, this is my first time working on the backend. Can this be assigned to me?
Done :) Please don't hesitate to message here if you have any questions.
Assignee: nobody → ojaswin25111998
So I went through the source code and figured out that I need to use continuationToken and limit to paginate the Roles. I went through bug 1436212 but can't figure out how exactly it's being done. Can you please help me out a bit?
I think the task is to create a new listRoleIds method that supports pagination.  I described a way to do the pagination in comment 4.  You can base the new method on the existing listRoles method in src/v1.js, but create a new one named listRoleIds.  The existing listRoles method shows how to load the entire list of roles.  It all comes out as one big list of data, so the key is to return only the list elements specified by the caller using continuationToken and limit.
Alright so I should create a new method in src/v1.js using builder.declare() that loads a list of roles based on limit and continuationToken. Additionally, since container.js only has methods to return the whole list of roles all the roles must be loaded everytime and pagination should be simulated in the listRoleIds method itself.

Please correct me if I'm wrong
Sounds about right!  We'll see how that goes once you start writing some code :)
Hey I had already started working on this :)

I have created a PR, do you mind reviewing it please.


https://github.com/taskcluster/taskcluster-auth/pull/189/files
Also, can you please help me out with writing tests for the functionality as I don't have much experience in it?
Dustin, would it be okay if we add expandedScopes and scopes to the new listRoleIds endpoint? We will need that to successfully paginate the auth/scopes view (roles table).
Flags: needinfo?(dustin)
How would that be different from listRoles?
Flags: needinfo?(dustin)
Looking at the tools site, the scopes view extracts the expandedScopes for example when constructing the list of scopes[1]. On the other hand, the roles view looks at the role id when rendering its list[2]. In order to do the same thing in tc-web, I think we'll need to add `expandedScopes` to the endpoint so we can keep the same behavior as tc-tools. Thoughts?
 
[1] https://github.com/taskcluster/taskcluster-tools/blob/939d1eb9743019a507a484a54353217427605d51/src/views/ScopeInspector/ScopeInspector.jsx#L308-L316
[2] https://github.com/taskcluster/taskcluster-tools/blob/101c14990690856321a152a1ec2ef5b9644cd2b7/src/views/RoleManager/RoleManager.jsx#L71
Flags: needinfo?(dustin)
Ah, for the scope inspector you should just call listRoles, not listRoleIds.  If you add all of the information that listRoles returns to listRoleIds, then it won't be "listRoleIds" anymore, and will be identical to litsRoles.  So just call that :)
Flags: needinfo?(dustin)
It won't be possible to paginate the data if we call `listRoles`. Hmm, in that case, would it be possible to change the name of `listRoleIds` to something else? Perhaps `listRolesPaginated` or `listRolesTokenized`? As far as I can tell, tc-web is the only consumer of `listRoleIds`.
Flags: needinfo?(dustin)
The goal of this bug was to support the role manager, not the scope inspector, and from what I can tell it's still useful for that purpose.  The listRoles method can't be paginated in-place since it returns an array at the top-level, rather than an object to which a continuationToken property could be added.  

We could add an additional paginated listRoles2 method or something like that, too.  It's just hard to think of a name that's not ugly :)

From what I understand, all of this should be invisible to tc-web, right?  It's just different backend APIs that tc-web-server is calling to produce the same GraphQL output?
Flags: needinfo?(dustin)
(In reply to Dustin J. Mitchell [:dustin] pronoun: he from comment #21)
> We could add an additional paginated listRoles2 method or something like
> that, too.  It's just hard to think of a name that's not ugly :)

Sure. Btw, `listRoles2` would remove the need for `listRoleIds` anymore since GraphQL has the power to ask for exactly what it needs and nothing more. For example, we would be able to:

* ask only for the `roleId` when calling `listRoles2` in the roles view
* ask only for the `expandedScopes` when calling `listRoles2` in the scope's view

I guess we can keep `listRoleIds` since some other client might find this useful but as far as tc-web, it won't be needed anymore.


> From what I understand, all of this should be invisible to tc-web, right? 
> It's just different backend APIs that tc-web-server is calling to produce
> the same GraphQL output?

Correct. All of this will be invisible to tc-web. Once `listRoles2` is in place, tc-web-server won't need `listRoleIds` anymore since the roles view will be able to get its data via:

query Roles($rolesConnection: PageConnection, $filter: JSON) {
  listRoles2(connection: $rolesConnection, filter: $filter) {
    pageInfo {
      hasNextPage
      hasPreviousPage
      cursor
      previousCursor
      nextCursor
    }
    edges {
      node {
        roleId
      }
    }
  }
}
Yes, but the REST API does *not* have that capability.  So taskcluster-web-server would still ask for a few MB of data from the auth service, then filter out the id's -- so you'd still have slow performance.  Instead, tc-web-server should be special-casing requests for just role ID's and using listRoleIds to get that data, while using listRoles2 to get any more detailed results.
Component: Authentication → Services
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.