Closed
Bug 1499576
Opened 7 years ago
Closed 7 years ago
Missed GitHub.com pull request events
Categories
(Taskcluster :: Services, enhancement)
Taskcluster
Services
Tracking
(Not tracked)
RESOLVED
INACTIVE
People
(Reporter: jugglinmike, Assigned: owlish)
Details
My GitHub-hosted project recently integrated with TaskCluster for pull request
validation [1]. In the month since, TaskCluster has responded to over 500
commits--thank you!
While we learn the ins and outs of Taskcluster, we've maintained our
integration with our previous CI provider (TravisCI). This has given us some
visibility into inconsistencies between the two services.
- For 19 commits (~4%), the GitHub "statuses" API reports events from
TravisCI but nothing from Taskcluster [2]
- For 97 commits (~18%), the GitHub "statuses" API reports events from
Taskcluster, but none of them are "opened" or "synchronize" events [3]
My assumption is that every commit should at the very least produce either a
"opened" or "synchronize" event. GitHub's documentation doesn't have much to
say on this topic, but it seems to be supported by TaskCluster's documentation
[4]:
> Although the Github documentation does not make it clear, each ref that is
> updated in a `git push` operation triggers a distinct event.
It's possible that certain circumstances may lead to commits which are not
reported by GitHub (e.g. a contributor adding more than one commit in a single
"push" operation). Taskcluster can't be expected to respond to an event that
never occurs, after all! By limiting the data to only those commits to which
TravisCI has responded, I'm hoping to hone in what appears to be a
service-level failure.
Some details of our setup that may be relevant:
- Until 2018-10-01 [5], we had restricted pull request validation to
"collaborators." For the data reported here, I've removed occurrences of
non-collaborator pull requests opened in September
- Until today [6], we have configured TaskCluster to respond to *all* events.
Moving forward, we will only trigger tasks for "opened", "reopened" and
"synchronized" events
The GitHub documentation and my lack of insight into the communication between
GitHub and Taskcluster make it difficult to say if there is actually a problem
here. I'm not sure how TravisCI is operating because its status payloads don't
describe the events to which it is responding. But if this behavior is
expected, then it seems as though some other configuration will be necessary in
order for us to use Taskcluster effectively.
[2] and [3] are listings of the relevant commits, and I'm happy to provide
any more information that could be helpful in debugging.
[1] https://github.com/web-platform-tests/wpt/pull/12657
[2] https://gist.github.com/jugglinmike/0454e52a32172abe53d4a1137db1620a
[3] https://gist.github.com/jugglinmike/cf4a4d8b193703283992948e963e4d43
[4] https://docs.taskcluster.net/docs/reference/integrations/taskcluster-github/docs/taskcluster-yml-v1
[5] https://github.com/web-platform-tests/wpt/pull/13282
[6] https://github.com/web-platform-tests/wpt/pull/13552
| Assignee | ||
Updated•7 years ago
|
Assignee: nobody → bugzeeeeee
| Assignee | ||
Comment 1•7 years ago
|
||
Thank you for all the info! I am going to take a look at it, and examine your .taskcluster.yml more closely. I'll keep you posted on the results of my investigation!
| Reporter | ||
Comment 2•7 years ago
|
||
Great--thank you!
| Assignee | ||
Comment 3•7 years ago
|
||
So I investigated this a bit. I looked at the 19 commits which Taskcluster didn't react to. They could be categorized into 4 types:
Type 1.
In these PRs:
http://github.com/web-platform-tests/wpt/pull/13508
http://github.com/web-platform-tests/wpt/pull/13508
http://github.com/web-platform-tests/wpt/pull/13179
http://github.com/web-platform-tests/wpt/pull/13174
http://github.com/web-platform-tests/wpt/pull/12993
http://github.com/web-platform-tests/wpt/pull/12992
I looked at the .taskcluster.yml at that point in history on that branch, and it seemed to be configured for pushes to master only. That would explain why there were no tasks run for this PR.
Similarly, http://github.com/web-platform-tests/wpt/pull/13153 - no .taskcluster.yml in that branch at all.
We fetch only the `who can trigger tasks from PR` setting from the default branch. The rest of the settings is fetched from the branch which is being merged. This is not a bug, it's a feature(TM)
Type 2.
In these PRs:
http://github.com/web-platform-tests/wpt/pull/13358
http://github.com/web-platform-tests/wpt/pull/13102
I noticed Taskcluster didn't react on commits which followed a message that somebody "added a commit that referenced this pull request." The commit count is different on Conversation tab and Commits tab. I'm kinda curious how those commits are made.
Type 3.
These PRs:
http://github.com/web-platform-tests/wpt/pull/13271
http://github.com/web-platform-tests/wpt/pull/13103
were made by the same person, and .taskcluster.yml in the default branch was set to allow trigger jobs for collaborators only. If that person was not a collaborator at the time, that would explain Taskcluster's behavior.
Type 4.
http://github.com/web-platform-tests/wpt/pull/13228
http://github.com/web-platform-tests/wpt/pull/13164
http://github.com/web-platform-tests/wpt/pull/13125
These are mysterious. I don't catch anything wrong with those or out of the ordinary. I checked our logs for two of those - the github integration actually got push and pr-synchronize webhooks for them, and started handling them, but then there's nothing in the logs.
Finally:
http://github.com/web-platform-tests/wpt/pull/13165
I do see a Taskcluster status for the commit
My plan and suggestions:
* Make sure the branches your team makes PRs from are up-to-date. If you see Taskcluster not running tests, the first thing to troubleshoot is to merge master.
* Re: type 4 - I'll make sure we log sha everywhere so that logs were easier to find.
* Please continue to keep an eye on Taskcluster not running tasks when it should - add those commits right to this bug! I will be investigating them.
| Assignee | ||
Comment 4•7 years ago
|
||
Re: the second part of this bug, where Taskcluster didn't react to synchronize event - I checked a random commit, and indeed there is no such event in the logs, but there was no such event for that PR either (PR was created out of a branch with two commits of diff, no subsequent commits were made to the branch). That is one explanation for at least some subset of those commits.
Another thing to note is that you can see the exact event in the logs only - we don't post it to the UI. I hope that answers the question - let me know if you have any more question about the pr-synchronize event!
| Assignee | ||
Comment 5•7 years ago
|
||
PR to improve logging: https://github.com/taskcluster/taskcluster-github/pull/298
| Reporter | ||
Comment 6•7 years ago
|
||
> I looked at the .taskcluster.yml at that point in history on that branch, and
> it seemed to be configured for pushes to master only.
I see. I assumed that Taskcluster was referencing the configuration based on
the state of the target branch (roughly `git rev-parse master`), not the state
of the base commit (more like `git merge-base HEAD master`). Thanks for
clearing that up!
> I noticed Taskcluster didn't react on commits which followed a message that
> somebody "added a commit that referenced this pull request." The commit count
> is different on Conversation tab and Commits tab. I'm kinda curious how those
> commits are made.
It looks like those were the result of rebasing and force-pushing. Presumably
TaskCluster ran as expected on the commits that were originally submitted, but
I don't know if there's any way to verify that through the GitHub API. It's
fortunate that the practice is demonstrably uncommon.
> were made by the same person, and .taskcluster.yml in the default branch was
> set to allow trigger jobs for collaborators only. If that person was not a
> collaborator at the time, that would explain Taskcluster's behavior.
Roger that. While this is definitely distinct from "Type I", it's a result of
the same misunderstanding.
> I do see a Taskcluster status for the commit
Same here. Sorry for the confusion; I'm not sure what went wrong
> * Make sure the branches your team makes PRs from are up-to-date. If you see
> Taskcluster not running tests, the first thing to troubleshoot is to merge
> master.
Definitely. This will naturally resolve itself over time as contributors update
their local clones.
> * Re: type 4 - I'll make sure we log sha everywhere so that logs were easier to find.
Excellent!
> * Please continue to keep an eye on Taskcluster not running tasks when it
> should - add those commits right to this bug! I will be investigating them.
Will do
| Reporter | ||
Comment 7•7 years ago
|
||
I almost forgot: thank you very much for the thorough analysis! That's been extremely helpful for interpreting these results :)
| Reporter | ||
Comment 8•7 years ago
|
||
I've done another round of analysis with the latest data and with the insight you provided in comment 3:
https://github.com/web-platform-tests/wpt/issues/13194#issuecomment-436467415
Most notably, we found 2 commits where Taskcluster completed, but GitHub continues to report the status as "pending":
- http://github.com/web-platform-tests/wpt/pull/13904
- http://github.com/web-platform-tests/wpt/pull/13605
I don't think we saw this in the original data. Not surprising given that it only occurred in 2 out of 485 commits we considered this time around.
| Assignee | ||
Comment 9•7 years ago
|
||
I will look at those two PRs - that looks like yet another type of issue, a status handler failure. I'll examine the logs for those commits, and will see if our status handler logging could be improved as well.
> I almost forgot: thank you very much for the thorough analysis! That's been extremely helpful for interpreting these results :)
You're very welcome!
| Assignee | ||
Comment 10•7 years ago
|
||
> > * Please continue to keep an eye on Taskcluster not running tasks when it
> > should - add those commits right to this bug! I will be investigating them.
>
> Will do
Actually, a later thought... I don't like tickets that drag on forever - not that I don't ever plan fixing these issues, but it's just that this initial investigative ticket could be broken down into smaller confirmed issues - I'll summarize after I investigate the status handler issue. Then I'll close this one, and with the improved logging, as the bugs resurface I'd ask you to create new tickets for each individual bug so I could smash them methodically :) --> 🐛🔪 ☠️
| Reporter | ||
Comment 11•7 years ago
|
||
You got it! See bug 1507254
| Assignee | ||
Comment 12•7 years ago
|
||
I investigated the issues from comment 8 a bit - nothing appears in our logs. Apparently, status handler failed to run, which most likely means there was no message about the tasks completion from the queue. Possibly a queue issue.
So, tc-github issues identified:
1. Taskcluster doesn't react on commits which followed a message that somebody "added a commit that referenced this pull request." The commit count is different on Conversation tab and Commits tab. ("Type 2" section in Comment 3) bug 1507254 opened
2. Unknown bug (or a set of bugs) - "Type 4" section in Comment 3. bug 1507254 opened (Thank you Mike!)
3. An issue identified in comment 8 - likely a queue hickup. No bug opened yet.
This bug is being closed, and the work continues in the two above bugs.
| Assignee | ||
Updated•7 years ago
|
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INACTIVE
| Reporter | ||
Comment 13•7 years ago
|
||
#1 and #2 in comment 12 both reference the same bug, I believe that's a typo
Updated•6 years ago
|
Component: Github → Services
You need to log in
before you can comment on or make changes to this bug.
Description
•