[meta] Update Treeherder metadata contract for Github projects
Categories
(Tree Management :: Treeherder: Data Ingestion, task, P3)
Tracking
(Not tracked)
People
(Reporter: armenzg, Unassigned)
References
Details
There are various Github projects reporting on Treeherder.
Some of them are reporting tasks for non-master branches and Github PRs.
On the "push/pull" exchanges we only ingest revisions for repositories configured in our database (e.g. 'master'), thus, there are tasks belonging to non-master pushes that don't have a revision on the pushes table.
In order to discard tasks that are not meant to be ingested we need to make changes on projects external to Treeherder. There are two options:
- Tasks for non-master branches and PRs should not include the Treeherder route
- All tasks should include some Github metadata to differentiate them [1]
If we fixed #1 for existing projects, we would get into a similar state as with bug 1587542 in the future. Every new Github project being added to Taskcluster and using a Treeherder route would automatically cause our celery queues to backlog because of our "if push missing try again in a little bit" logic.
If we fixed #2 and made it part of the expected contract, tasks without that information would automatically be discarded rather than retried.
[1]
The metadata to differentiate the tasks are:
base repository != head repository --> PR
base repository == head repository && HEAD_REF == 'refs/heads/master' --> master
base repository == head repository && HEAD_REF != 'refs/heads/master' --> non-master
Reporter | ||
Updated•5 years ago
|
Reporter | ||
Comment 1•5 years ago
|
||
This is not something we're going to work anytime soon.
Reporter | ||
Updated•5 years ago
|
Description
•