Closed
Bug 1273249
Opened 9 years ago
Closed 9 years ago
Missing SNS alerts from papertrail
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Infrastructure & Operations Graveyard
CIDuty
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: catlee, Assigned: aselagea)
References
Details
Many of our buildbot masters currently have stale lockfiles for reconfigs, and yet we don't have any alerts for this in #buildduty. e.g. on buildbot-master114:
1838896 0 -rw-rw-r-- 1 cltbld cltbld 0 May 14 11:02 /builds/buildbot/tests1-linux64/reconfig.lock
Comment 1•9 years ago
|
||
papertrail alerts don't seem to be making it to irc.
Summary: Missing alerts for stale reconfig locks → Missing SNS alerts from papertrail
| Reporter | ||
Comment 2•9 years ago
|
||
Papertrail seems to be getting the events properly:
May 16 12:00:06 buildbot-master68.bb.releng.usw2.mozilla.com maybe_reconfig.sh: ERROR - Reconfig lockfile is older than 120 minutes.
May 16 12:00:06 buildbot-master54.bb.releng.usw2.mozilla.com maybe_reconfig.sh: ERROR - Reconfig lockfile is older than 120 minutes.
May 16 12:00:06 buildbot-master114.bb.releng.use1.mozilla.com maybe_reconfig.sh: ERROR - Reconfig lockfile is older than 120 minutes.
May 16 12:00:06 buildbot-master125.bb.releng.usw2.mozilla.com maybe_reconfig.sh: ERROR - Reconfig lockfile is older than 120 minutes.
May 16 12:00:06 buildbot-master67.bb.releng.use1.mozilla.com maybe_reconfig.sh: ERROR - Reconfig lockfile is older than 120 minutes.
May 16 12:00:06 buildbot-master51.bb.releng.use1.mozilla.com maybe_reconfig.sh: ERROR - Reconfig lockfile is older than 120 minutes.
May 16 12:00:06 buildbot-master113.bb.releng.use1.mozilla.com maybe_reconfig.sh: ERROR - Reconfig lockfile is older than 120 minutes.
May 16 12:00:07 buildbot-master53.bb.releng.usw2.mozilla.com maybe_reconfig.sh: ERROR - Reconfig lockfile is older than 120 minutes.
May 16 12:00:07 buildbot-master52.bb.releng.use1.mozilla.com maybe_reconfig.sh: ERROR - Reconfig lockfile is older than 120 minutes.
May 16 12:00:07 buildbot-master02.bb.releng.use1.mozilla.com maybe_reconfig.sh: ERROR - Reconfig lockfile is older than 120 minutes.
May 16 13:00:06 buildbot-master02.bb.releng.use1.mozilla.com maybe_reconfig.sh: ERROR - Reconfig lockfile is older than 120 minutes.
May 16 13:00:06 buildbot-master114.bb.releng.use1.mozilla.com maybe_reconfig.sh: ERROR - Reconfig lockfile is older than 120 minutes.
May 16 13:00:06 buildbot-master54.bb.releng.usw2.mozilla.com maybe_reconfig.sh: ERROR - Reconfig lockfile is older than 120 minutes.
May 16 13:00:06 buildbot-master51.bb.releng.use1.mozilla.com maybe_reconfig.sh: ERROR - Reconfig lockfile is older than 120 minutes.
May 16 13:00:06 buildbot-master53.bb.releng.usw2.mozilla.com maybe_reconfig.sh: ERROR - Reconfig lockfile is older than 120 minutes.
May 16 13:00:06 buildbot-master125.bb.releng.usw2.mozilla.com maybe_reconfig.sh: ERROR - Reconfig lockfile is older than 120 minutes.
May 16 13:00:06 buildbot-master113.bb.releng.use1.mozilla.com maybe_reconfig.sh: ERROR - Reconfig lockfile is older than 120 minutes.
May 16 13:00:06 buildbot-master68.bb.releng.usw2.mozilla.com maybe_reconfig.sh: ERROR - Reconfig lockfile is older than 120 minutes.
May 16 13:00:07 buildbot-master67.bb.releng.use1.mozilla.com maybe_reconfig.sh: ERROR - Reconfig lockfile is older than 120 minutes.
May 16 13:00:07 buildbot-master52.bb.releng.use1.mozilla.com maybe_reconfig.sh: ERROR - Reconfig lockfile is older than 120 minutes.
Comment 3•9 years ago
|
||
I think this might have been due to a chance papertrail made on their end:
"Based on the responses to my email below, on Tuesday, April 19, 2016,
Papertrail's SNS alert will subtly change. Most SNS subscribers will not
notice the change. Only SNS subscribers using the plaintext email and
SMS/text message protocols will see a change.
The change is as follows:
- TODAY: Papertrail sends each log message as a JSON object. However,
SNS does not support JSON for the plaintext email or SMS/text message
protocols. Today, Papertrail does not include a plaintext default, so
SNS sends a JSON object (for plaintext email) or "null" (for SMS) to
those subscribers.
- AFTER APRIL 14: Papertrail will include a plaintext message as well,
in this format:
Apr 07 10:08:14 systemname appname: log message
Only plaintext email and SMS/text message subscribers will receive this
string. Other SNS protocols will continue to receive the JSON object.
Here's more: http://docs.aws.amazon.com/sns/latest/dg/PublishTopic.html."
Comment 4•9 years ago
|
||
selena: I suspect that the bot is receiving messages it can't parse correctly, Can you please help debug?
Flags: needinfo?(sdeckelmann)
Updated•9 years ago
|
Assignee: nobody → sdeckelmann
Flags: needinfo?(sdeckelmann)
Comment 5•9 years ago
|
||
Yeah parsing error:
2016-05-24T15:00:52.766341+00:00 app[web.1]: at Robot.emit (/app/node_modules/hubot/src/robot.coffee:583:18, <js>:459:41)
2016-05-24T15:00:52.766339+00:00 app[web.1]: at EventEmitter.<anonymous> (/app/scripts/sns_response.coffee:68:17, <js>:61:20)
2016-05-24T15:00:52.766340+00:00 app[web.1]: at EventEmitter.emit (events.js:95:17)
2016-05-24T15:00:52.766342+00:00 app[web.1]: at SNS.notify (/app/node_modules/hubot-sns/src/sns.coffee:87:5, <js>:85:18)
2016-05-24T15:00:52.766342+00:00 app[web.1]: at SNS.process (/app/node_modules/hubot-sns/src/sns.coffee:63:7, <js>:55:21)
2016-05-24T15:00:52.766343+00:00 app[web.1]: at /app/node_modules/hubot-sns/src/sns.coffee:47:11, <js>:35:26
2016-05-24T15:00:52.766324+00:00 app[web.1]: [Tue May 24 2016 15:00:52 GMT+0000 (UTC)] ERROR SyntaxError: Unexpected token M
Poking at it now.
Comment 6•9 years ago
|
||
Bad network is killing me at my current location. I'll have a look at this tomorrow when I can use the Heroku dashboard without long stalls.
Sorry this has been busted for so long!
| Assignee | ||
Comment 7•9 years ago
|
||
@selena: I was wondering if you have any updates on this one? :-)
During the last weekend there were some stuck reconfigs on several masters, so it would be nice if we could receive alerts for that in #buildduty to deal with the issue faster (see bug 1276433).
Thanks!
Flags: needinfo?(sdeckelmann)
| Assignee | ||
Comment 8•9 years ago
|
||
Started debugging this, hopefully I'll find a fix :-).
Flags: needinfo?(sdeckelmann)
| Assignee | ||
Updated•9 years ago
|
Assignee: sdeckelmann → aselagea
Comment 9•9 years ago
|
||
Alin: you have access to the heroku dashboard, yes? Have you tried dumping the contents of msg.message (or even all of msg) to the log to see what the format is now?
https://github.com/mozilla/relengbot/blob/master/scripts/sns_response.coffee#L68
| Assignee | ||
Comment 11•9 years ago
|
||
Coop merged https://github.com/mozilla/relengbot/pull/2 to display to console the message for which the parsing process fails. It turned out that we don't have a json message anymore, but a plaintext one. The following PRs were merged to address this: https://github.com/mozilla/relengbot/pull/3, https://github.com/mozilla/relengbot/pull/4.
The bot now displays the alerts in #buildduty once again:
"relengbot> [sns alert] Aug 04 14:00:06 buildbot-master128.bb.releng.use1.mozilla.com maybe_reconfig.sh: ERROR - Reconfig lockfile is older than 120 minutes."
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Updated•7 years ago
|
Product: Release Engineering → Infrastructure & Operations
Updated•5 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•