Use revision instead of revision_hash for resultsets

RESOLVED FIXED

Status

Tree Management
Treeherder: Data Ingestion
P2
normal
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: camd, Assigned: camd)

Tracking

(Blocks: 2 bugs)

Details

Attachments

(1 attachment, 1 obsolete attachment)

(Assignee)

Description

2 years ago
At this time, we should start storing all 40 characters of the SHA for both the ``revision`` table, and the ``result_set`` table.

I'm tempted to leave the field name as ``revision_hash`` for less disruption.  And that's still sort of accurate since a SHA is a hash.  

Though ``revision_tip`` or ``tip_revision`` (or even just ``revision``) may be clearer.
(Assignee)

Updated

2 years ago
Flags: needinfo?(mdoglio)
Flags: needinfo?(emorley)
(Assignee)

Updated

2 years ago
Assignee: nobody → cdawson
(Assignee)

Comment 1

2 years ago
Sorry to spam you guys, just looking for buy-in.  :)
(Assignee)

Updated

2 years ago
Blocks: 1169320
(Assignee)

Comment 2

2 years ago
Perhaps this approach is best:

1. Add a field to ``result_set`` called ``revision``.  
2. Store both ``revision`` and ``revision_hash`` for a while.
3. Add function to client to add ``revision`` to a job
4. Deprecate ``revision_hash`` and ask all clients to convert over to using ``revision`` instead of ``revision_hash``.  
5. Once everyone has switched over, switch to using the ``revision`` field instead of ``revision_hash``
6. Remove the ``revision_hash`` field from ``result_set``
(Assignee)

Updated

2 years ago
Summary: Change revision_hash to just be the SHA of the tip revision for the push → Use revision instead of revision_hash for resultsets
(Assignee)

Updated

2 years ago
Component: Treeherder → Treeherder: Data Ingestion
(Assignee)

Comment 3

2 years ago
I should possibly(probably) add a field called ``vcs_id`` or ``push_id``.  this would either store the HG pushID, or for GitHub, maybe the PR number?  Not sure about GH yet.

Updated

2 years ago
Duplicate of this bug: 1079796

Updated

2 years ago
Blocks: 1079796
Good idea to get rid of the revision_hash. I would call the new column either `revision` or `top_revision` or 'head_revision'. afaik the term `tip` indicates the last commit added to a repository, so it doesn't always refers to the top revision of a particular head.
No longer blocks: 1079796
Flags: needinfo?(mdoglio)

Comment 6

2 years ago
I like 'revision'. 'head_revision' makes me think of git HEAD, which is not the same as the most recent SHA in a push (we may be ingesting several pushes at once, so even at the time of ingestion the push 'revision' may not always be the HEAD of the repo).
Blocks: 1079796
Flags: needinfo?(emorley)
If we want to provide a `pull_request` filter on treeherder, storing the GH pull request id is a good idea. Maybe we can do that in another bug together with the required backend and frontend changes?
On the other hand I don't see a case for storing the mercurial push id, specially considering that sometimes releng reset a repository, kicking back that counter to zero.
:emorley I was thinking more of the output of `hg heads`, but I can see it may be confusing.

Comment 9

2 years ago
Note also for Git, it's not always going to be the PR we're tracking - there will be normal commits to the repo too.

We'll also presumably want to check what the GitHub API returns (ie can we even get at an equivalent of the Hg pushlog?):
https://developer.github.com/v3/repos/commits/

Whilst the scope of this bug should not include us starting to ingest from Git repos directly, I think it's something we should at least think about in terms of schema (we're inconsistent at the moment and ingest Hg ourselves, but rely on others to provide the revisions for Git repos).
is there any progress on this bug?  Now that we use full 40 character revisions more often, I need to modify tools to continue to work with treeherder by hacking the revisions to 12 characters.
(Assignee)

Comment 11

2 years ago
This fell to lower priority due to EOQ work on my part.  However, it's next on my list after a few small cleanup items.

Updated

2 years ago
Blocks: 1178227
(Assignee)

Comment 12

2 years ago
I'm actively working on this now.  Hopefully will have a PR in the next day or so.
(Assignee)

Comment 13

2 years ago
(In reply to Joel Maher (:jmaher) from comment #10)
> is there any progress on this bug?  Now that we use full 40 character
> revisions more often, I need to modify tools to continue to work with
> treeherder by hacking the revisions to 12 characters.

Lots of pieces to this issue of revisions that have to be done in order.  But the one with fetching with long or short revisions is an easy piece that's now handled in bug 1214039.  PR pending review.
(Assignee)

Updated

2 years ago
No longer blocks: 1079796
Depends on: 1079796
Created attachment 8719904 [details] [review]
[treeherder] mozilla:add-revision-fields > mozilla:master

Updated

2 years ago
Priority: -- → P2
(Assignee)

Updated

2 years ago
Status: NEW → ASSIGNED
Created attachment 8724782 [details] [review]
[treeherder] mozilla:revision-replace-revision-hash > mozilla:master
(Assignee)

Updated

2 years ago
Attachment #8719904 - Attachment is obsolete: true
(Assignee)

Comment 16

2 years ago
Comment on attachment 8724782 [details] [review]
[treeherder] mozilla:revision-replace-revision-hash > mozilla:master

Hey Ed--  Please feel free to ping me if you want to talk through this.  I know this PR is huge.  But I couldn't find many areas where I could split it up.

I added the part where we add the hawk ID as a param, because that was useful in a few places.
Attachment #8724782 - Flags: review?(emorley)
(Assignee)

Comment 17

2 years ago
Comment on attachment 8724782 [details] [review]
[treeherder] mozilla:revision-replace-revision-hash > mozilla:master

clearing this review request for now.  I found a couple spots regarding revision_hash that I missed.  Namely, the docs...  :)
Attachment #8724782 - Flags: review?(emorley)
(Assignee)

Comment 18

2 years ago
Comment on attachment 8724782 [details] [review]
[treeherder] mozilla:revision-replace-revision-hash > mozilla:master

Ok, I think have this right now.  I was able to remove some useless code for publishing new resultset data via pulse that TC is no longer using.
Attachment #8724782 - Flags: review?(emorley)
Comment on attachment 8724782 [details] [review]
[treeherder] mozilla:revision-replace-revision-hash > mozilla:master

Looks great! Thank you for working on this, some really worthwhile cleanup/simplification :-)
Attachment #8724782 - Flags: review?(emorley) → review+

Updated

2 years ago
Blocks: 1257602

Comment 20

2 years ago
Commit pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/2a9dbefa49d26d8d0ddc7c402c571691b1679770
Bug 1199364 - Use revsion instead of revision_hash for resultsets

New resultsets will still store a value in their ``revision_hash`` field, but it will
just be the same value as their ``long_revision`` field.

This will log an exception in New Relic when a new resultset or job is posted
to the API with only a ``revision_hash``and not a ``revision`` value.

This also switches to using the longer 40 char revisions along side the
12 char revisions.  But we leverage the longer ones for most actions.  The
short revisions are stored and used so that people and the UI can support
locating a resultset (or setting ranges) with short revisions.

Comment 21

2 years ago
Commit pushed to master at https://github.com/mozilla/treeherder

https://github.com/mozilla/treeherder/commit/6c70883cb479789c0cf2ecc6a22ab857006bb8ba
Bug 1199364 - Limit displayed revision length in perf controllers
(Assignee)

Updated

2 years ago
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED

Updated

2 years ago
See Also: → bug 1270643
You need to log in before you can comment on or make changes to this bug.