pulse connection closed by server causes exception
Categories
(Taskcluster :: Services, enhancement)
Tracking
(Not tracked)
People
(Reporter: dustin, Assigned: dustin)
Details
The pulse servers were restarted this morning, and everything kept running just fine, but we got some exceptions:
https://console.cloud.google.com/errors/CKnkiKDzhL7fCQ?time=PT6H&project=heroku-logging
Error: Unhandled Rejection at: Promise [object Promise] reason: IllegalOperationError: Channel closed
at Monitor.reportError (/app/libraries/monitor/src/monitor.js:271)
at MonitorManager._unhandledRejectionHandler (/app/libraries/monitor/src/index.js:170)
at process.emit (events.js:189)
at process.EventEmitter.emit (domain.js:441)
at emitPromiseRejectionWarnings (internal/process/promises.js:119)
at process._tickCallback (next_tick.js:69)
Assignee | ||
Comment 1•6 years ago
|
||
Over yonder in https://github.com/squaremo/amqp.node/issues/501 I've learned a thing or two about how amqplib handles errors. So, some cleanup of how tc-lib-pulse's built-in consumers and producers handle errors is probably in order. And maybe some docs fix-ups for amqplib.
Assignee | ||
Comment 2•6 years ago
|
||
Error: Unhandled Rejection at: Promise [object Promise] reason: Error: Channel ended, no reply will be forthcoming
Assignee | ||
Comment 3•6 years ago
|
||
Ugh, the stackdriver error thing is awful. I have no idea what it's trying to tell me.
Assignee | ||
Comment 4•6 years ago
|
||
Assignee | ||
Comment 5•6 years ago
|
||
I will deploy this to my dev environment and then try restarting the pulse cluster and see what happens.
Assignee | ||
Comment 6•6 years ago
|
||
https://console.cloud.google.com/errors/CKO57o7S6qXPaA?time=P30D&project=heroku-logging
IllegalOperationError: Channel closed
at Channel.<anonymous> (/app/node_modules/amqplib/lib/channel.js:154:11)
at Channel.C._rpc (/app/node_modules/amqplib/lib/channel.js:136:8)
at /app/node_modules/amqplib/lib/channel_model.js:182:10
at tryCatcher (/app/node_modules/bluebird/js/release/util.js:16:23)
at Function.Promise.fromNode.Promise.fromCallback (/app/node_modules/bluebird/js/release/promise.js:180:30)
at Channel.C.cancel (/app/node_modules/amqplib/lib/channel_model.js:181:18)
at PulseConsumer._drainChannel (/app/libraries/pulse/src/consumer.js:96:21)
at Connection.conn.on (/app/libraries/pulse/src/consumer.js:205:38)
at Connection.emit (events.js:189:13)
at Connection.EventEmitter.emit (domain.js:441:20)
at Connection.retire (/app/libraries/pulse/src/client.js:390:10)
at Client.recycle (/app/libraries/pulse/src/client.js:104:21)
at Connection.newConn.once (/app/libraries/pulse/src/client.js:117:16)
at Object.onceWrapper (events.js:277:13)
at Connection.emit (events.js:189:13)
Assignee | ||
Comment 7•6 years ago
|
||
Assignee | ||
Comment 8•6 years ago
|
||
Tested by force-closing some connections for tc-notify:
2019-04-12T17:12:21.649324+00:00 app[handler.1]: 2019-04-12T17:12:21.649Z taskcluster-lib-pulse.conn-1 connection closed unexpectedly
2019-04-12T17:12:21.649541+00:00 app[handler.1]: 2019-04-12T17:12:21.649Z taskcluster-lib-pulse.conn-1 failed
2019-04-12T17:12:21.649692+00:00 app[handler.1]: 2019-04-12T17:12:21.649Z taskcluster-lib-pulse.client-1 recycling
2019-04-12T17:12:21.649935+00:00 app[handler.1]: 2019-04-12T17:12:21.649Z taskcluster-lib-pulse.conn-1 retiring
2019-04-12T17:12:21.652031+00:00 app[handler.1]: 2019-04-12T17:12:21.651Z taskcluster-lib-pulse.conn-2 waiting
2019-04-12T17:12:21.653712+00:00 app[handler.1]: 2019-04-12T17:12:21.653Z taskcluster-lib-pulse.conn-2 connecting
2019-04-12T17:12:22.097770+00:00 app[handler.1]: 2019-04-12T17:12:22.097Z taskcluster-lib-pulse.conn-2 connected
2019-04-12T17:12:22.222845+00:00 app[handler.1]: 2019-04-12T17:12:22.222Z taskcluster-lib-pulse.publisher using new channel
2019-04-12T17:12:33.603068+00:00 app[web.1]: 2019-04-12T17:12:33.602Z taskcluster-lib-pulse.conn-1 connection closed unexpectedly
2019-04-12T17:12:33.603435+00:00 app[web.1]: 2019-04-12T17:12:33.603Z taskcluster-lib-pulse.conn-1 failed
2019-04-12T17:12:33.604520+00:00 app[web.1]: 2019-04-12T17:12:33.604Z taskcluster-lib-pulse.client-1 recycling
2019-04-12T17:12:33.606143+00:00 app[web.1]: 2019-04-12T17:12:33.606Z taskcluster-lib-pulse.conn-1 retiring
2019-04-12T17:12:33.606829+00:00 app[web.1]: 2019-04-12T17:12:33.606Z taskcluster-lib-pulse.conn-2 waiting
2019-04-12T17:12:33.611468+00:00 app[web.1]: 2019-04-12T17:12:33.611Z taskcluster-lib-pulse.conn-2 connecting
2019-04-12T17:12:34.057422+00:00 app[web.1]: 2019-04-12T17:12:34.056Z taskcluster-lib-pulse.conn-2 connected
2019-04-12T17:12:34.183950+00:00 app[web.1]: 2019-04-12T17:12:34.183Z taskcluster-lib-pulse.publisher using new channel
2019-04-12T17:12:37.834500+00:00 app[irc.1]: 2019-04-12T17:12:37.834Z notify Did not receive any messages from sqs in timeout.
2019-04-12T17:12:37.834560+00:00 app[irc.1]: 2019-04-12T17:12:37.834Z notify Waiting for message from sqs.
2019-04-12T17:12:51.652804+00:00 app[handler.1]: 2019-04-12T17:12:51.652Z taskcluster-lib-pulse.conn-1 finished; closing AMQP connection
2019-04-12T17:12:57.866956+00:00 app[irc.1]: 2019-04-12T17:12:57.866Z notify Did not receive any messages from sqs in timeout.
2019-04-12T17:12:57.866998+00:00 app[irc.1]: 2019-04-12T17:12:57.866Z notify Waiting for message from sqs.
2019-04-12T17:13:03.606947+00:00 app[web.1]: 2019-04-12T17:13:03.606Z taskcluster-lib-pulse.conn-1 finished; closing AMQP connection
I'm going to also deploy queue, but otherwise not planning to deploy-the-world for this patch.
Description
•