Closed Bug 1534713 Opened 6 years ago Closed 6 years ago

pulse connection closed by server causes exception

Categories

(Taskcluster :: Services, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Assigned: dustin)

Details

The pulse servers were restarted this morning, and everything kept running just fine, but we got some exceptions:

https://console.cloud.google.com/errors/CKnkiKDzhL7fCQ?time=PT6H&project=heroku-logging


Error: Unhandled Rejection at: Promise [object Promise] reason: IllegalOperationError: Channel closed

        at Monitor.reportError (/app/libraries/monitor/src/monitor.js:271)
        at MonitorManager._unhandledRejectionHandler (/app/libraries/monitor/src/index.js:170)
        at process.emit (events.js:189)
        at process.EventEmitter.emit (domain.js:441)
        at emitPromiseRejectionWarnings (internal/process/promises.js:119)
        at process._tickCallback (next_tick.js:69)

Over yonder in https://github.com/squaremo/amqp.node/issues/501 I've learned a thing or two about how amqplib handles errors. So, some cleanup of how tc-lib-pulse's built-in consumers and producers handle errors is probably in order. And maybe some docs fix-ups for amqplib.

 Error: Unhandled Rejection at: Promise [object Promise] reason: Error: Channel ended, no reply will be forthcoming 

Ugh, the stackdriver error thing is awful. I have no idea what it's trying to tell me.

I will deploy this to my dev environment and then try restarting the pulse cluster and see what happens.

https://console.cloud.google.com/errors/CKO57o7S6qXPaA?time=P30D&project=heroku-logging

IllegalOperationError: Channel closed
    at Channel.<anonymous> (/app/node_modules/amqplib/lib/channel.js:154:11)
    at Channel.C._rpc (/app/node_modules/amqplib/lib/channel.js:136:8)
    at /app/node_modules/amqplib/lib/channel_model.js:182:10
    at tryCatcher (/app/node_modules/bluebird/js/release/util.js:16:23)
    at Function.Promise.fromNode.Promise.fromCallback (/app/node_modules/bluebird/js/release/promise.js:180:30)
    at Channel.C.cancel (/app/node_modules/amqplib/lib/channel_model.js:181:18)
    at PulseConsumer._drainChannel (/app/libraries/pulse/src/consumer.js:96:21)
    at Connection.conn.on (/app/libraries/pulse/src/consumer.js:205:38)
    at Connection.emit (events.js:189:13)
    at Connection.EventEmitter.emit (domain.js:441:20)
    at Connection.retire (/app/libraries/pulse/src/client.js:390:10)
    at Client.recycle (/app/libraries/pulse/src/client.js:104:21)
    at Connection.newConn.once (/app/libraries/pulse/src/client.js:117:16)
    at Object.onceWrapper (events.js:277:13)
    at Connection.emit (events.js:189:13)

Tested by force-closing some connections for tc-notify:

2019-04-12T17:12:21.649324+00:00 app[handler.1]: 2019-04-12T17:12:21.649Z taskcluster-lib-pulse.conn-1 connection closed unexpectedly
2019-04-12T17:12:21.649541+00:00 app[handler.1]: 2019-04-12T17:12:21.649Z taskcluster-lib-pulse.conn-1 failed
2019-04-12T17:12:21.649692+00:00 app[handler.1]: 2019-04-12T17:12:21.649Z taskcluster-lib-pulse.client-1 recycling
2019-04-12T17:12:21.649935+00:00 app[handler.1]: 2019-04-12T17:12:21.649Z taskcluster-lib-pulse.conn-1 retiring
2019-04-12T17:12:21.652031+00:00 app[handler.1]: 2019-04-12T17:12:21.651Z taskcluster-lib-pulse.conn-2 waiting
2019-04-12T17:12:21.653712+00:00 app[handler.1]: 2019-04-12T17:12:21.653Z taskcluster-lib-pulse.conn-2 connecting
2019-04-12T17:12:22.097770+00:00 app[handler.1]: 2019-04-12T17:12:22.097Z taskcluster-lib-pulse.conn-2 connected
2019-04-12T17:12:22.222845+00:00 app[handler.1]: 2019-04-12T17:12:22.222Z taskcluster-lib-pulse.publisher using new channel
2019-04-12T17:12:33.603068+00:00 app[web.1]: 2019-04-12T17:12:33.602Z taskcluster-lib-pulse.conn-1 connection closed unexpectedly
2019-04-12T17:12:33.603435+00:00 app[web.1]: 2019-04-12T17:12:33.603Z taskcluster-lib-pulse.conn-1 failed
2019-04-12T17:12:33.604520+00:00 app[web.1]: 2019-04-12T17:12:33.604Z taskcluster-lib-pulse.client-1 recycling
2019-04-12T17:12:33.606143+00:00 app[web.1]: 2019-04-12T17:12:33.606Z taskcluster-lib-pulse.conn-1 retiring
2019-04-12T17:12:33.606829+00:00 app[web.1]: 2019-04-12T17:12:33.606Z taskcluster-lib-pulse.conn-2 waiting
2019-04-12T17:12:33.611468+00:00 app[web.1]: 2019-04-12T17:12:33.611Z taskcluster-lib-pulse.conn-2 connecting
2019-04-12T17:12:34.057422+00:00 app[web.1]: 2019-04-12T17:12:34.056Z taskcluster-lib-pulse.conn-2 connected
2019-04-12T17:12:34.183950+00:00 app[web.1]: 2019-04-12T17:12:34.183Z taskcluster-lib-pulse.publisher using new channel
2019-04-12T17:12:37.834500+00:00 app[irc.1]: 2019-04-12T17:12:37.834Z notify Did not receive any messages from sqs in timeout.
2019-04-12T17:12:37.834560+00:00 app[irc.1]: 2019-04-12T17:12:37.834Z notify Waiting for message from sqs.
2019-04-12T17:12:51.652804+00:00 app[handler.1]: 2019-04-12T17:12:51.652Z taskcluster-lib-pulse.conn-1 finished; closing AMQP connection
2019-04-12T17:12:57.866956+00:00 app[irc.1]: 2019-04-12T17:12:57.866Z notify Did not receive any messages from sqs in timeout.
2019-04-12T17:12:57.866998+00:00 app[irc.1]: 2019-04-12T17:12:57.866Z notify Waiting for message from sqs.
2019-04-12T17:13:03.606947+00:00 app[web.1]: 2019-04-12T17:13:03.606Z taskcluster-lib-pulse.conn-1 finished; closing AMQP connection

I'm going to also deploy queue, but otherwise not planning to deploy-the-world for this patch.

Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.