Closed Bug 1182386 Opened 10 years ago Closed 8 years ago

TaskCluster submits wrong push_timestamp to Treeherder (uses author date not merge/push date)

Categories

(Taskcluster :: Services, defect)

defect
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: glob, Unassigned)

Details

Attachments

(3 files)

https://treeherder.mozilla.org/#/jobs?repo=bmo-master is showing commits in the wrong order. the order of commits in git and github is: Bug 1178031: Add an "applies to all platforms" link... Bug 1173442: Implement admin UI changes to allow select... Bug 1181637: Update Req Opening Process (Cost Center... Bug 1180570: store attachment size in the database however treeherder is showing them in the following order: Bug 1181637: Update Req Opening Process (Cost Center... Bug 1180570: store attachment size in the database Bug 1178031: Add an "applies to all platforms" link... Bug 1173442: Implement admin UI changes to allow select... this has given the illusion that the tree is green when in fact it isn't -- bug 1173442 broke the tree but the top/latest job in treeherder was all green.
bug 1180570 was committed on around [8 Jul 2015 23:24:47 -0700] according to both bugzilla and git. http://git.mozilla.org/?p=webtools/bmo/bugzilla.git;a=commit;h=28ac958d3cf5951a2e75d53581d6ddd4ded30119 [github] bug 1173442 was committed on around [8 Jul 2015 18:44:19 -0700] according to git, but that patch wasn't reviewed until [8 Jul 2015 23:55:00 -0700] and dkl commented with his commit message at [9 Jul 2015 09:50:00 -0700]. http://git.mozilla.org/?p=webtools/bmo/bugzilla.git;a=commit;h=5cc7059954467225ef8055df8a1623bea926f216 [github] so it's likely that dkl committed the code to his local repo prior to the review, and pushed the already committed code once he got an r+. there's nothing wrong with that :) however with the jobs being sorted by commit date we end up with incorrect ordering and a misleading display.
Component: Testing Scripts and Infrastructure → Treeherder
Product: bugzilla.mozilla.org → Tree Management
Summary: incorrect dates/sorting in treeherder for tests → incorrect dates/sorting in treeherder
Version: Production → ---
this screenshot shows the problem -- even though the top two commits in the view are all green the tree is actually burning because the bottom commit was pushed after the top two. after thinking about this over the weekend i suspect the problem is on the taskcluster side - i bet it's setting the timestamp to the date of the commit instead of the datetime the job started.
Actually GitHub shows the same order as Treeherder--by commit date, not push date. Still not sure what the correct solution is, though.
Git doesn't actually store the push timestamp anywhere, it seems. The way git.mozilla.org and (by default) 'git log' order them is by ancestry (child/parent), so that's why it appears different than in GitHub and Treeherder, which apparently order by commit or author date. I'm not sure how Treeherder orders them; I would have thought it would be by the time that the job was first reported, but that doesn't appear to be the case. Ordered by parent->child would make even more sense, since then you'll always get the most recent on top, even if the parent job was, for some reason, reported later.
interesting... github shows the comments to me in a different order.
So the Git repo commits come from TaskCluster (Treeherder only fetches commits for Hg repos - bit inconsistent, and I'd like to figure out a longer term plan for this); after the weekend I'll check more closely and see what they are submitting, to see if it's the way we're displaying it, or whether they need to fix it.
Flags: needinfo?(emorley)
Summary: incorrect dates/sorting in treeherder → Git repo commits are displayed by commit date not push date
(I've been on PTO much of this week; will take a look when I'm back next week :-))
is it viable to sort by the earliest start date of a task instead of the commit date?
Sorry for the delay in getting back to this (when I saw this in the daily reminder of open requests it always got mentally prioritised after newer reviews etc since the requestee was myself, rather than a third party). For the examples in comment 0 - a reduced range ~permalink is: https://treeherder.mozilla.org/#/jobs?repo=bmo-master&startdate=2015-07-09&enddate=2015-07-09 Which results in this API call: https://treeherder.mozilla.org/api/project/bmo-master/resultset/?count=100&full=true&push_timestamp__gte=1436400000&push_timestamp__lt=1436484600 Summarising that slightly: "id": 54, "push_timestamp": 1436427886, "comments": "Bug 1181637..." "id": 53, "push_timestamp": 1436423087, "comments": "Bug 1180570..." "id": 56, "push_timestamp": 1436406745, "comments": "Bug 1178031..." "id": 55, "push_timestamp": 1436406259, "comments": "Bug 1173442..." So it would seem both the Treeherder UI and the API are correctly returning the resultsets in descending push_timestamp, which is actually correct behaviour (we can't sort on the resultset "id", since they may have raced during insertion, particularly with hg pushlog backfilling). The resultset's push_timestamp comes from here: https://github.com/mozilla/treeherder/blob/7440089508eb136a2f620d0cd145773c6b05ee99/treeherder/model/sql/template_schema/project.sql.tmpl#L414 Note this is different from the revision's commit_timestamp: https://github.com/mozilla/treeherder/blob/7440089508eb136a2f620d0cd145773c6b05ee99/treeherder/model/sql/template_schema/project.sql.tmpl#L455 For Git repos, we currently have the push and revision information submitted to us by TaskCluster (whereas for Hg we poll it ourselves; yes quite inconsistent :-/). TaskCluster use their own nodejs client to submit to Treeherder (though bug 1188398 is checking that they do use it everywhere and don't roll another solution in places). This node client sets push_timestamp here: https://github.com/mozilla/treeherder-node/blob/943c31caee560a48aa8f6f0bc29c58547aa86441/factory/github.js#L87 (though I've not that familiar with their client, or the rest of TC, so there may be other places too) At the place linked, they use: (githubPr.updated_at || githubPr.created_at) Looking at: https://developer.github.com/v3/pulls/#get-a-single-pull-request I see there is also a "closed_at" and "merged_at" - maybe they should use whatever unix timestamp is the oldest of all of those properties? Greg, would you be able to take a look at this? :-)
Component: Treeherder → Integration
Flags: needinfo?(emorley) → needinfo?(garndt)
Product: Tree Management → Taskcluster
Summary: Git repo commits are displayed by commit date not push date → TaskCluster submits wrong push_timestamp to Treeherder (uses author date not merge/push date)
Version: --- → unspecified
s/requestee/requester/
(In reply to Ed Morley [:emorley] from comment #9) > I see there is also a "closed_at" and "merged_at" - maybe they should use > whatever unix timestamp is the oldest of all of those properties? And by oldest I mean youngest. On a side note, I also wonder if there is any point in us storing the commit_timestamp at all. iirc we don't surface this in the UI. Filed bug 1202626 for that.
Wow I'm so sorry that this got sucked into some kind of black hold and I hadn't seen it. I'm actively looking at it now. By youngest, you mean the most recent date? if we had: create_at 12:00 updated_at 1:15 merged_at 12:15 closed_at 12:20 It should pick updated_at as the timestamp since it's the most recent? So it looks like changes are needed for the client and gaia-taskcluster (I think that's the component submitting these for gaia) needs to be updated to use it. Before that happened, bug 1221647 needs to be fixed for gaia-taskcluster.
Flags: needinfo?(garndt)
Component: Integration → Platform and Services
Closing this as this should be working now. gaia-taskcluster has been deprecated and tc-github is probably doing the right thing.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Component: Platform and Services → Services
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: