Closed Bug 1332457 Opened 7 years ago Closed 7 years ago

Experiment with providing a GraphQL API to Treeherder

Categories

(Tree Management :: Treeherder: API, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: wlach, Assigned: seban)

References

Details

Attachments

(3 files)

So I'm not 100% sure if this is a good idea, but I've chatted with a few people about it already and figured it might make sense to write down some of my thoughts, even if it's not something I'm going to be working on immediately.

Treeherder's REST API has proven to be a bit unwieldly for the purposes we want to put it towards. In particular, the jobs api endpoints provide *a lot* of data, which is usually not relevant to the consumers of it. For example we hit this endpoint to get the jobs for display:

https://treeherder.mozilla.org/api/project/mozilla-inbound/jobs/?count=2000&result_set_id=161730&return_type=list

Much of this is actually unused by the frontend! (at least in the initial view) This slows load and query times, since that much more data needs to be both fetched from the database and processed into a json response.

At the same time, getting different types of information pertaining to a job (performance data, job details, error summary lines) require multiple requests, which slows response time (every time you load the details panel for a job, 4+ http requests need to be processed). 

I suspect this sort of problem will become larger in the future, as some of the new views we might want to have into treeherder data (e.g. a manifest-based view) will almost certainly need that isn't in the main jobs table, which means yet more http requests and a slower UI (or the tedious hand-coding of custom endpoints which return the data we need).

GraphQL (http://graphql.org/) is a burgeoning standard which seems to fit our exact requirements. You specify what data you want as a json "graph", traversing across object types if you like, and the API returns exactly what the user asked for in a single response.
This seems like an ideal fit for solving the above problems.

It also might be a good fit for solving some other things we don't yet have an answer to, like how to populate a development instance with a set of production data (since our endpoints only return a subset of the data in our database, it isn't possible to do this with them).

It seems like there's a pretty decent python library for working with GraphQL called Graphene, which also has a Django integration extension, Graphene Django (https://github.com/graphql-python/graphene-django). An interesting proof of concept might be to use that to build up a simple mechanism for querying the jobs endpoint (example above) and then update our UI code to use it. Depending on the results of that experiment, we could consider proposing a project to expose our entire data model using this interface.
Seban is going to drive creating a prototype of this feature. I'll mentor where need be.
Assignee: nobody → sebastinssanty
Attachment #8843394 - Flags: review?(wlachance)
Comment on attachment 8843394 [details] [review]
[treeherder] SebastinSanty:gql > mozilla:master

Thanks for this, looking forward to building on top of it. :)
Attachment #8843394 - Flags: review?(wlachance) → review+
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Commit pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/ff18d1c84dbdf736d22de4f903201da4ba9aeceb
Bug 1332457 - Move py to common requirements (#2237)

It's needed for graphql
Depends on: 1349237
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: