taskcluster-github stops posting comments to PRs / creating tasks
Categories
(Taskcluster :: Services, enhancement)
Tracking
(Not tracked)
People
(Reporter: pmoore, Assigned: dustin)
Details
Three times today, people reporting that taskcluster-github had stopped working. After restarting taskcluster-github dynos, it kicked into action and created tasks, and posted comments on PRs, annotated commits with ticks/crosses, etc.
The papertrail logs didn't display any crashes as far as I could see.
Reporter | ||
Comment 1•6 years ago
|
||
Affected people were bastien, grenade, marco.
See #ci and #taskcluster from today.
Reporter | ||
Comment 2•6 years ago
|
||
(although grenade contacted me privately)
Assignee | ||
Comment 3•6 years ago
|
||
Feb 14 23:57:12 taskcluster-github app/worker.1: 2019-02-14T23:57:11.779Z taskcluster-github:handlers:3e2269c0-30b4-11e9-8754-d6b5c4a0f1ea Received message. Starting processing...
Feb 14 23:57:12 taskcluster-github app/worker.1: 2019-02-14T23:57:11.871Z taskcluster-github:github-auth Authenticated as installation: 42268
Feb 14 23:57:12 taskcluster-github app/worker.1: 2019-02-14T23:57:11.871Z taskcluster-github:handlers:3e2269c0-30b4-11e9-8754-d6b5c4a0f1ea handling pull_request.synchronize webhook for: mozilla-mobile/android-components@60bb3eb6ae9299fd6837ac918be2845064af27aa
Feb 14 23:57:12 taskcluster-github app/worker.1: 2019-02-14T23:57:11.871Z taskcluster-github:handlers:3e2269c0-30b4-11e9-8754-d6b5c4a0f1ea Checking pull request permission for mozilla-mobile/android-components@60bb3eb6ae9299fd6837ac918be2845064af27aa...
Feb 14 23:57:12 taskcluster-github app/worker.1: 2019-02-14T23:57:12.044Z taskcluster-github:handlers:3e2269c0-30b4-11e9-8754-d6b5c4a0f1ea Trying to fetch the YML for mozilla-mobile/android-components@60bb3eb6ae9299fd6837ac918be2845064af27aa
Feb 14 23:57:12 taskcluster-github app/worker.1: 2019-02-14T23:57:12.142Z taskcluster-github:intree intree config for mozilla-mobile/android-components matches valid schema.
Feb 14 23:57:12 taskcluster-github app/worker.1: 2019-02-14T23:57:12.146Z taskcluster-github:handlers:3e2269c0-30b4-11e9-8754-d6b5c4a0f1ea Trying to create a record for mozilla-mobile/android-components@60bb3eb6ae9299fd6837ac918be2845064af27aa (pending) in Builds table
Feb 14 23:57:12 taskcluster-github app/worker.1: 2019-02-14T23:57:12.146Z azure:authorization Refreshing shared-access-signature
Feb 14 23:57:12 taskcluster-github app/worker.1: 2019-02-14T23:57:12.146Z taskcluster-client Calling: azureTableSAS, retry: 0
Feb 14 23:57:12 taskcluster-github app/worker.1: 2019-02-14T23:57:12.200Z taskcluster-client Success calling: azureTableSAS, (0 retries)
Feb 14 23:57:12 taskcluster-github app/worker.1: 2019-02-14T23:57:12.200Z azure:authorization Refreshed shared-access-signature, will refresh at 2019-02-15T00:07:12.000Z
Feb 14 23:57:12 taskcluster-github app/worker.1: 2019-02-14T23:57:12.200Z azure:table Request: POST /TaskclusterGithubBuilds, retry: 0
Feb 14 23:57:13 taskcluster-github app/worker.1: 2019-02-14T23:57:12.620Z taskcluster-github:handlers:3e2269c0-30b4-11e9-8754-d6b5c4a0f1ea Creating tasks for mozilla-mobile/android-components@60bb3eb6ae9299fd6837ac918be2845064af27aa (taskGroupId: MHuk9RMCQqCSk7Bpws1aMg)
Feb 14 23:57:13 taskcluster-github app/worker.1: 2019-02-14T23:57:12.620Z taskcluster-client Calling: createTask, retry: 0
Feb 14 23:57:13 taskcluster-github app/worker.1: 2019-02-14T23:57:13.584Z taskcluster-client Success calling: createTask, (0 retries)
Feb 14 23:57:13 taskcluster-github app/worker.1: 2019-02-14T23:57:13.584Z taskcluster-github:handlers:3e2269c0-30b4-11e9-8754-d6b5c4a0f1ea Publishing status exchange for mozilla-mobile/android-components@60bb3eb6ae9299fd6837ac918be2845064af27aa (pending)
Feb 14 23:57:13 taskcluster-github app/worker.1: 2019-02-14T23:57:13.585Z taskcluster-lib-pulse.publisher Publishing message on exchange exchange/taskcluster-github/v1/task-group-creation-requested, routing key primary.mozilla-mobile.android-components
Feb 14 23:57:13 taskcluster-github app/worker.1: 2019-02-14T23:57:13.647Z taskcluster-github:handlers:taskGroup-handler Task group MHuk9RMCQqCSk7Bpws1aMg was defined. Creating group status...
Feb 14 23:57:13 taskcluster-github app/worker.1: 2019-02-14T23:57:13.648Z azure:table Request: GET /TaskclusterGithubBuilds(PartitionKey='MHuk9RMCQqCSk7Bpws1aMg',RowKey='taskGroupId'), retry: 0
Feb 14 23:57:13 taskcluster-github app/worker.1: 2019-02-14T23:57:13.650Z taskcluster-github:handlers:3e2269c0-30b4-11e9-8754-d6b5c4a0f1ea Job handling for mozilla-mobile/android-components@60bb3eb6ae9299fd6837ac918be2845064af27aa completed.
Feb 14 23:57:13 taskcluster-github app/worker.1: 2019-02-14T23:57:13.815Z taskcluster-github:github-auth Authenticated as installation: 42268
Feb 14 23:57:49 taskcluster-github app/worker.1: 2019-02-14T23:57:49.363Z taskcluster-lib-pulse.client-1 recycling
Feb 14 23:57:49 taskcluster-github app/worker.1: 2019-02-14T23:57:49.364Z taskcluster-lib-pulse.conn-1 retiring
Feb 14 23:57:49 taskcluster-github app/worker.1: 2019-02-14T23:57:49.365Z taskcluster-lib-pulse.conn-2 waiting
Feb 14 23:57:49 taskcluster-github app/worker.1: 2019-02-14T23:57:49.367Z taskcluster-lib-pulse.conn-2 connecting
Feb 14 23:57:50 taskcluster-github app/worker.1: 2019-02-14T23:57:49.793Z taskcluster-lib-pulse.conn-2 connected
Feb 14 23:57:50 taskcluster-github app/worker.1: 2019-02-14T23:57:49.916Z taskcluster-lib-pulse.publisher using new channel
Feb 14 23:58:19 taskcluster-github app/worker.1: 2019-02-14T23:58:19.373Z taskcluster-lib-pulse.conn-1 finished; closing AMQP connection
Feb 15 00:57:49 taskcluster-github app/worker.1: 2019-02-15T00:57:49.365Z taskcluster-lib-pulse.client-1 recycling
Feb 15 00:57:49 taskcluster-github app/worker.1: 2019-02-15T00:57:49.366Z taskcluster-lib-pulse.conn-2 retiring
Feb 15 00:57:49 taskcluster-github app/worker.1: 2019-02-15T00:57:49.366Z taskcluster-lib-pulse.conn-3 waiting
Feb 15 00:57:49 taskcluster-github app/worker.1: 2019-02-15T00:57:49.367Z taskcluster-lib-pulse.conn-3 connecting
Feb 15 00:57:50 taskcluster-github app/worker.1: 2019-02-15T00:57:49.797Z taskcluster-lib-pulse.conn-3 connected
Feb 15 00:57:50 taskcluster-github app/worker.1: 2019-02-15T00:57:49.924Z taskcluster-lib-pulse.publisher using new channel
and that pattern of just logging about lib-pulse continued until the restart. It's normal for lib-pulse to reconnect once an hour, so it's more the absence of anything else here that's unusual. After the restart, the handler started handling a bunch of queued messages:
Feb 15 08:36:43 taskcluster-github app/worker.1: 2019-02-15T08:36:43.199Z taskcluster-github:handlers Constructing handlers...
Feb 15 08:36:43 taskcluster-github app/worker.1: 2019-02-15T08:36:43.200Z taskcluster-github:handlers Setting up handlers...
Feb 15 08:36:44 taskcluster-github app/worker.1: 2019-02-15T08:36:44.020Z taskcluster-github:handlers:86673166-30b4-11e9-9dfd-fdc9f852c3d9 Received message. Starting processing...
Feb 15 08:36:44 taskcluster-github app/worker.1: 2019-02-15T08:36:44.086Z taskcluster-github:github-auth Authenticated as installation: 9218
Feb 15 08:36:44 taskcluster-github app/worker.1: 2019-02-15T08:36:44.086Z taskcluster-github:handlers:86673166-30b4-11e9-9dfd-fdc9f852c3d9 handling push webhook for: mozilla/treeherder@112d79c55178c088b93834983c9f4b7b8857459d
Feb 15 08:36:44 taskcluster-github app/worker.1: 2019-02-15T08:36:44.086Z taskcluster-github:handlers:86673166-30b4-11e9-9dfd-fdc9f852c3d9 handling push webhook for: mozilla/treeherder@112d79c55178c088b93834983c9f4b7b8857459d
Feb 15 08:36:44 taskcluster-github app/worker.1: 2019-02-15T08:36:44.086Z taskcluster-github:handlers:86673166-30b4-11e9-9dfd-fdc9f852c3d9 Trying to fetch the YML for mozilla/treeherder@112d79c55178c088b93834983c9f4b7b8857459d
Feb 15 08:36:44 taskcluster-github app/worker.1: 2019-02-15T08:36:44.091Z taskcluster-github:handlers:87261fe0-30b4-11e9-99b1-43832321c115 Received message. Starting processing...
Feb 15 08:36:44 taskcluster-github app/worker.1: 2019-02-15T08:36:44.140Z taskcluster-github:handlers:87706f50-30b4-11e9-80b9-9ec88b62e128 Received message. Starting processing...
Feb 15 08:36:44 taskcluster-github app/worker.1: 2019-02-15T08:36:44.147Z taskcluster-github:handlers:877689d0-30b4-11e9-87b9-d1044833a0bd Received message. Starting processing...
Assignee | ||
Comment 4•6 years ago
|
||
It looks like webhooks were still being received and published to pulse after 23:57:13. For example:
Feb 15 00:01:35 taskcluster-github app/web.1: 2019-02-15T00:01:35.543Z taskcluster-lib-pulse.publisher Publishing message on exchange exchange/taskcluster-github/v1/push, routing key primary.web-platform-tests.wpt
Assignee | ||
Comment 5•6 years ago
|
||
At the moment, queue/taskcluster-github/stat-result and queue/taskcluster-github/jobs have pending messages, starting about 10 minutes ago.
This seems to be associated with lib-pulse reconnects: after the first reconnection ("conn-1 retiring"), nothing happens. That seems to point to https://github.com/taskcluster/taskcluster/pull/236 / bug 1527427.
Assignee | ||
Comment 6•6 years ago
|
||
Yup, that was dumb: https://github.com/taskcluster/taskcluster/pull/236/files#diff-ec0662c7dee98f0a1f01cae19043c47fR86 doesn't shut down the whole consumer, it just shuts down use of the current connection. But the patch happily stops listening for new connection messages in _shutdown
. Ergo, on reconnection, it never starts using the new connection.
Assignee | ||
Comment 7•6 years ago
|
||
Assignee | ||
Comment 8•6 years ago
|
||
I rolled back to the Feb 11 deployment of tc-github temporarily (/ni owlish to avoid surprise). But this will be affecting anything deployed in the last 17 hours, so we should get the fix landed expeditiously and roll forward.
Updated•6 years ago
|
Assignee | ||
Comment 9•6 years ago
|
||
Deployed -- hopefully that will fix things up!
Description
•