At this time, we should start storing all 40 characters of the SHA for both the ``revision`` table, and the ``result_set`` table. I'm tempted to leave the field name as ``revision_hash`` for less disruption. And that's still sort of accurate since a SHA is a hash. Though ``revision_tip`` or ``tip_revision`` (or even just ``revision``) may be clearer.
Sorry to spam you guys, just looking for buy-in. :)
Perhaps this approach is best: 1. Add a field to ``result_set`` called ``revision``. 2. Store both ``revision`` and ``revision_hash`` for a while. 3. Add function to client to add ``revision`` to a job 4. Deprecate ``revision_hash`` and ask all clients to convert over to using ``revision`` instead of ``revision_hash``. 5. Once everyone has switched over, switch to using the ``revision`` field instead of ``revision_hash`` 6. Remove the ``revision_hash`` field from ``result_set``
I should possibly(probably) add a field called ``vcs_id`` or ``push_id``. this would either store the HG pushID, or for GitHub, maybe the PR number? Not sure about GH yet.
Good idea to get rid of the revision_hash. I would call the new column either `revision` or `top_revision` or 'head_revision'. afaik the term `tip` indicates the last commit added to a repository, so it doesn't always refers to the top revision of a particular head.
I like 'revision'. 'head_revision' makes me think of git HEAD, which is not the same as the most recent SHA in a push (we may be ingesting several pushes at once, so even at the time of ingestion the push 'revision' may not always be the HEAD of the repo).
If we want to provide a `pull_request` filter on treeherder, storing the GH pull request id is a good idea. Maybe we can do that in another bug together with the required backend and frontend changes? On the other hand I don't see a case for storing the mercurial push id, specially considering that sometimes releng reset a repository, kicking back that counter to zero.
:emorley I was thinking more of the output of `hg heads`, but I can see it may be confusing.
Note also for Git, it's not always going to be the PR we're tracking - there will be normal commits to the repo too. We'll also presumably want to check what the GitHub API returns (ie can we even get at an equivalent of the Hg pushlog?): https://developer.github.com/v3/repos/commits/ Whilst the scope of this bug should not include us starting to ingest from Git repos directly, I think it's something we should at least think about in terms of schema (we're inconsistent at the moment and ingest Hg ourselves, but rely on others to provide the revisions for Git repos).
is there any progress on this bug? Now that we use full 40 character revisions more often, I need to modify tools to continue to work with treeherder by hacking the revisions to 12 characters.
This fell to lower priority due to EOQ work on my part. However, it's next on my list after a few small cleanup items.
I'm actively working on this now. Hopefully will have a PR in the next day or so.
(In reply to Joel Maher (:jmaher) from comment #10) > is there any progress on this bug? Now that we use full 40 character > revisions more often, I need to modify tools to continue to work with > treeherder by hacking the revisions to 12 characters. Lots of pieces to this issue of revisions that have to be done in order. But the one with fetching with long or short revisions is an easy piece that's now handled in bug 1214039. PR pending review.
Created attachment 8719904 [details] [review] [treeherder] mozilla:add-revision-fields > mozilla:master
Created attachment 8724782 [details] [review] [treeherder] mozilla:revision-replace-revision-hash > mozilla:master
Comment on attachment 8724782 [details] [review] [treeherder] mozilla:revision-replace-revision-hash > mozilla:master Hey Ed-- Please feel free to ping me if you want to talk through this. I know this PR is huge. But I couldn't find many areas where I could split it up. I added the part where we add the hawk ID as a param, because that was useful in a few places.
Comment on attachment 8724782 [details] [review] [treeherder] mozilla:revision-replace-revision-hash > mozilla:master clearing this review request for now. I found a couple spots regarding revision_hash that I missed. Namely, the docs... :)
Comment on attachment 8724782 [details] [review] [treeherder] mozilla:revision-replace-revision-hash > mozilla:master Ok, I think have this right now. I was able to remove some useless code for publishing new resultset data via pulse that TC is no longer using.
Comment on attachment 8724782 [details] [review] [treeherder] mozilla:revision-replace-revision-hash > mozilla:master Looks great! Thank you for working on this, some really worthwhile cleanup/simplification :-)
Commit pushed to master at https://github.com/mozilla/treeherder https://github.com/mozilla/treeherder/commit/2a9dbefa49d26d8d0ddc7c402c571691b1679770 Bug 1199364 - Use revsion instead of revision_hash for resultsets New resultsets will still store a value in their ``revision_hash`` field, but it will just be the same value as their ``long_revision`` field. This will log an exception in New Relic when a new resultset or job is posted to the API with only a ``revision_hash``and not a ``revision`` value. This also switches to using the longer 40 char revisions along side the 12 char revisions. But we leverage the longer ones for most actions. The short revisions are stored and used so that people and the UI can support locating a resultset (or setting ranges) with short revisions.
Commit pushed to master at https://github.com/mozilla/treeherder https://github.com/mozilla/treeherder/commit/6c70883cb479789c0cf2ecc6a22ab857006bb8ba Bug 1199364 - Limit displayed revision length in perf controllers