Open Bug 1593252 Opened 5 years ago Updated 3 years ago

[meta] Update Treeherder metadata contract for Github projects

Categories

(Tree Management :: Treeherder: Data Ingestion, task, P3)

Tracking

(Not tracked)

People

(Reporter: armenzg, Unassigned)

References

Details

There are various Github projects reporting on Treeherder.
Some of them are reporting tasks for non-master branches and Github PRs.
On the "push/pull" exchanges we only ingest revisions for repositories configured in our database (e.g. 'master'), thus, there are tasks belonging to non-master pushes that don't have a revision on the pushes table.

In order to discard tasks that are not meant to be ingested we need to make changes on projects external to Treeherder. There are two options:

  • Tasks for non-master branches and PRs should not include the Treeherder route
  • All tasks should include some Github metadata to differentiate them [1]

If we fixed #1 for existing projects, we would get into a similar state as with bug 1587542 in the future. Every new Github project being added to Taskcluster and using a Treeherder route would automatically cause our celery queues to backlog because of our "if push missing try again in a little bit" logic.

If we fixed #2 and made it part of the expected contract, tasks without that information would automatically be discarded rather than retried.

[1]
The metadata to differentiate the tasks are:

base repository != head repository --> PR
base repository == head repository && HEAD_REF == 'refs/heads/master' --> master
base repository == head repository && HEAD_REF != 'refs/heads/master' --> non-master
Assignee: nobody → armenzg

This is not something we're going to work anytime soon.

Priority: P2 → P3
Assignee: armenzg → nobody
Depends on: 1656244
You need to log in before you can comment on or make changes to this bug.