Closed
Bug 914904
Opened 12 years ago
Closed 12 years ago
Investigate webmaker.org PROD crash in Events
Categories
(Webmaker Graveyard :: DevOps, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: humph, Assigned: cade)
References
Details
Attachments
(5 files)
48 bytes,
text/x-github-pull-request
|
humph
:
review+
|
Details | Review |
48 bytes,
text/x-github-pull-request
|
mjschranz
:
review+
|
Details | Review |
48 bytes,
text/x-github-pull-request
|
cade
:
review+
|
Details | Review |
48 bytes,
text/x-github-pull-request
|
cade
:
review+
|
Details | Review |
48 bytes,
text/x-github-pull-request
|
humph
:
review+
|
Details | Review |
My inbox is filling up with webmaker.org crashes:
ALERT TIME: Tue Sep 10 23:29:13 UTC 2013
HOSTGROUP: MoFo-Public
HOSTNAME: Webmaker-Production Webmaker
SERVICE: healthcheck
STATUS: CRITICAL - Socket timeout after 30 seconds
HOST URL: http://opsview.mofoprod.net:3000/status/service?host=Webmaker-Production
I wish we had butler on production so we had stacks, but until then, we should figure this out and stop it.
Reporter | ||
Comment 1•12 years ago
|
||
jp, cade, and I spent a bunch of time on this and figured out the problem. It's related to CSRF handling and the new Likes feature.
The crash goes like this:
* https://github.com/senchalabs/connect/blob/2.7.6/lib/utils.js#L60 throws a Forbidden error in the node app, which is really a 403 due to this:
* https://github.com/senchalabs/connect/blob/2.7.6/lib/middleware/csrf.js#L51
It is trying to get the _csrf token off the request's body, query, or headers here: https://github.com/senchalabs/connect/blob/2.7.6/lib/middleware/csrf.js#L69-72
We stash the CSRF token via nunjucks in the details page in a meta element, and then when we POST back to the server, we attempt to extract it and put it in the body of the request (in https://github.com/mozilla/webmaker.org/blob/master/public/js/pages/details.js):
$.post(method, {
makeID: makeID,
_csrf: $("meta[name='X-CSRF-Token']").attr("content")
The problem is that not all browsers support querySelector (see http://caniuse.com/#search=queryselector), and in particular, any IE before 8.0. We're using the 2.x version of jQuery, which removed hacks for old IE browsers, so this query for the meta element with the X-CSRF-Token will fail.
As such, we have two things to consider:
1) What to do with browsers that don't support the syntax above. One option is to do: _csrf: $("meta[name='X-CSRF-Token']").attr("content") || "unsupported-browser"
2) Guard against our app crashing when we try to access body._csrf and it isn't there. Some people have monkey-patched their middleware pipeline by adding a dummy token, see: https://github.com/pilwon/ultimate-seed/commit/77f79316cd566d052266475fd5fd76aec1fd87fb
Assigning this to cade, as he's deeper into the code/solution than I am. I just wanted to document our findings.
Assignee: david.humphrey → cade
Reporter | ||
Comment 2•12 years ago
|
||
Further info regarding 2) above. Jon points out that this should never cause our node app to crash, and the fact that it is points to something further going wrong after the error occurs. The most likely explanation is that the error reporter is dying, and causing a top-level error to make it up to the domain:
https://github.com/mozilla/webmaker.org/blob/5716b24567b01d395826d45da8871a3d9c022cdb/app.js#L64-78
We should wrap that in a try/catch so that failures in reportError don't cascade up to the domain, but get eaten.
Assignee | ||
Comment 3•12 years ago
|
||
Attachment #803087 -
Flags: review?(david.humphrey)
Reporter | ||
Comment 4•12 years ago
|
||
Comment on attachment 803087 [details] [review]
https://github.com/mozilla/webmaker.org/pull/362
Two fixes. First one is a nit, see PR.
Second one is a correction for an error Jon introduced in another patch. Please change these three lines:
https://github.com/cadecairos/webmaker.org/blob/b840914f99355faa6e6032a3d252342473ae4a2b/app.js#L60-62
to this:
GLOBAL.graylogHost = env.get( "GRAYLOG_HOST" );
GLOBAL.graylogFacility = env.get( "GRAYLOG_FACILITY" );
require( 'graylog' );
r=me with those changes.
Attachment #803087 -
Flags: review?(david.humphrey) → review-
Assignee | ||
Comment 5•12 years ago
|
||
Comment on attachment 803087 [details] [review]
https://github.com/mozilla/webmaker.org/pull/362
updated with the fixes you pointed out
Attachment #803087 -
Flags: review- → review?(david.humphrey)
Reporter | ||
Comment 6•12 years ago
|
||
Comment on attachment 803087 [details] [review]
https://github.com/mozilla/webmaker.org/pull/362
Great, let's get this on staging/prod ASAP.
Attachment #803087 -
Flags: review?(david.humphrey) → review+
Comment 7•12 years ago
|
||
Commit pushed to master at https://github.com/mozilla/webmaker.org
https://github.com/mozilla/webmaker.org/commit/b3d334c452cf69f4e068d22886868ac43fda1b9a
Bug 914904 - Guard against errors thrown in reportError function r=humph
Assignee | ||
Comment 8•12 years ago
|
||
The root cause of this appears to be an insert into the events database. I've noticed Logs for the insert between 0 and 1 seconds before the crash.
https://loggins.mofoprod.net/messages/430e4940-1b20-11e3-8f5a-22000aa45f57
https://loggins.mofoprod.net/messages/703e9d00-1b1d-11e3-8f5a-22000aa45f57
https://loggins.mofoprod.net/messages/30d4a5c0-1b21-11e3-8f5a-22000aa45f57
Reporter | ||
Comment 9•12 years ago
|
||
More often than not, I'm seeing this error:
https://loggins.mofoprod.net/messages/a0632bb0-1b34-11e3-8f5a-22000aa45f57
Which is:
Uncaught, unspecified "error" event.
Assuming this is related to the other sequelize bits in the stack, and cade's idea about SQL inserts always being present, I think it might be related to us not having an 'error' event handler for the sql work, and it blowing up when it doesn't get handled. Lots of people talk about it on the web, e.g. https://gist.github.com/deepwell/3501648 and most in the context of having bad auth info for the db. That doesn't make sense in our case, I don't think, but not having an on('error', ...) handler does.
Thoughts?
Assignee | ||
Comment 10•12 years ago
|
||
So it looks like all the create, update and delete routes do have `.error()` calls.
Now, I'm thinking outside of the box here, but, In my testing, what immediately follows an insert into our events DB is a request to get all events: https://gist.github.com/cadecairos/0edbadc2fa23b7b53794#file-logs-L5
This findAll call has no error handling: https://github.com/mozilla/webmaker.org/blob/178afb744edbe7bb771819fbe8600bfe1d89b15b/lib/events/controllers/events.js#L383-L408
Reading though the docs for Sequelize ( http://sequelizejs.com/documentation#misc-asynchronicity ) I found this snippet below the block of code I linked to:
"Please notice: Since v1.5.0 the 'error' event is used to notify about errors. If such events aren't caught however, Node.JS will throw an error. So you would probably like to catch them :D"
This error event caused by the select might be our problem, since it's nowhere to be found in the graylog logs, and should happen right after the insert (but in our case, we see the error!)
Assignee | ||
Comment 11•12 years ago
|
||
This adds in some error handling for findAll and all calls used by the event tool.
Attachment #803737 -
Flags: review?(schranz.m)
Comment 12•12 years ago
|
||
Comment on attachment 803737 [details] [review]
https://github.com/mozilla/webmaker.org/pull/364
Let's make sure we catch all places that are calling .success and don't have a .error handler.
https://github.com/cadecairos/webmaker.org/blob/210068a874002669b162b6982bf90d3ed0a45df6/lib/events/controllers/events.js#L112-L117
https://github.com/cadecairos/webmaker.org/blob/210068a874002669b162b6982bf90d3ed0a45df6/lib/events/controllers/events.js#L177-L179
Attachment #803737 -
Flags: review?(schranz.m) → review-
Assignee | ||
Updated•12 years ago
|
Attachment #803737 -
Flags: review- → review?(schranz.m)
Updated•12 years ago
|
Attachment #803737 -
Flags: review?(schranz.m) → review+
Comment 13•12 years ago
|
||
Commit pushed to master at https://github.com/mozilla/webmaker.org
https://github.com/mozilla/webmaker.org/commit/07ec50862bbfa28d8674c4981dbda1ffe5053e30
Bug 914904 - Protect against SQL errors when executing findAll on the event data model
Assignee | ||
Comment 14•12 years ago
|
||
Unfortunately, Adding an event on production continues to cause that error to be thrown.
Assignee | ||
Comment 15•12 years ago
|
||
So what I've narrowed this down to so far is that we try here to send a response, but somewhere, it has already been sent (not sure where, but my one suspect is events, triggered by some sequelize error)
https://github.com/mozilla/webmaker.org/blob/29ee19f969677d06a2d5d48cc9ebd2b4153f3c01/app.js#L105-L106
So in loggins we see the original error( in the case I was looking at, something originating from this node cluster function:
https://github.com/joyent/node/blob/master/lib/cluster.js#L453-L479
and we also see the log from our try catch on this line:
https://github.com/mozilla/webmaker.org/blob/29ee19f969677d06a2d5d48cc9ebd2b4153f3c01/app.js#L111
ultimately, this causes the process to restart, but because we never made it to ( https://github.com/mozilla/webmaker.org/blob/29ee19f969677d06a2d5d48cc9ebd2b4153f3c01/app.js#L108-L109 ) it takes 30 seconds.
At the least, we can probably solve the duplicate response issue.
Reporter | ||
Comment 16•12 years ago
|
||
Trying a few things here:
* Change domain kill timeout to 15s
* Add secondary try/catch around the pretty-500 when the domain fails (so we don't block on that)
* Change the order of the 404 and 500 handlers, as per the Express example.
Attachment #804109 -
Flags: review?(cade)
Reporter | ||
Comment 17•12 years ago
|
||
Are we dealing with NewRelic crashing here?
2013-09-13 00:08:41.0 i-e2d8e799 Info local2 i-e2d8e799 app=webmakerorg-production: at Object.monitored (/var/www/webmakerorg/node_modules/newrelic/lib/transaction/tracer.js:159:17)
2013-09-13 00:08:41.0 i-e2d8e799 Info local2 i-e2d8e799 app=webmakerorg-production: at /var/www/webmakerorg/node_modules/newrelic/lib/transaction/tracer.js:180:22
2013-09-13 00:08:41.0 i-e2d8e799 Info local2 i-e2d8e799 app=webmakerorg-production: at ErrorTracer.monitor (/var/www/webmakerorg/node_modules/newrelic/lib/error.js:192:12)
2013-09-13 00:08:41.0 i-e2d8e799 Info local2 i-e2d8e799 app=webmakerorg-production: at Object.cookieSession (/var/www/webmakerorg/node_modules/express/node_modules/connect/lib/middleware/cooki ...
2013-09-13 00:08:41.0 i-e2d8e799 Info local2 i-e2d8e799 app=webmakerorg-production: at /var/www/webmakerorg/node_modules/newrelic/lib/transaction/tracer.js:180:22
2013-09-13 00:08:41.0 i-e2d8e799 Info local2 i-e2d8e799 app=webmakerorg-production: at Tracer.monitor (/var/www/webmakerorg/node_modules/newrelic/lib/transaction/tracer.js:178:28)
2013-09-13 00:08:41.0 i-e2d8e799 Info local2 i-e2d8e799 app=webmakerorg-production: 109.70.6.77 - - [Fri, 13 Sep 2013 00:08:41 GMT] "POST /events HTTP/1.1" 403 8912 "https://webmaker.org/events/" ...
2013-09-13 00:08:41.0 i-e2d8e799 Info local2 i-e2d8e799 app=webmakerorg-production: Server worker started.
2013-09-13 00:08:41.0 i-e2d8e799 Info local2 i-e2d8e799 app=webmakerorg-production: Error: Forbidden
2013-09-13 00:08:41.0 i-e2d8e799 Info local2 i-e2d8e799 app=webmakerorg-production: at Object.<anonymous> (/var/www/webmakerorg/node_modules/express/node_modules/connect/lib/middleware/csrf.js ...
2013-09-13 00:08:41.0 i-e2d8e799 Info local2 i-e2d8e799 app=webmakerorg-production: at next (/var/www/webmakerorg/node_modules/express/node_modules/connect/lib/proto.js:190:15)
2013-09-13 00:08:41.0 i-e2d8e799 Info local2 i-e2d8e799 app=webmakerorg-production: at Object.exports.error (/var/www/webmakerorg/node_modules/express/node_modules/connect/lib/utils.js:60:13)
2013-09-13 00:08:41.0 i-e2d8e799 Info local2 i-e2d8e799 app=webmakerorg-production: at Object.wrappedConnectHandle [as handle] (/var/www/webmakerorg/node_modules/newrelic/lib/instrumentation/c ...
Maybe we should move the NewRelic module out of the forked child-process, and just run it once on the cluster master process.
Reporter | ||
Comment 18•12 years ago
|
||
Reporter | ||
Comment 19•12 years ago
|
||
Another thing I'm wondering is whether the domain stuff is actually working for the lib/events stuff at all, the way it's setup. We should consider normalizing this mess into a single Express instance.
Comment 20•12 years ago
|
||
I had hoped the newrelic upgrade to 0.10.3 that went into webmakerorg 0.0.148 would fix the newrelic part of this. Unforunately, I see a ton of these errors in loggins from today.
https://loggins.mofoprod.net/messages/universalsearch?timespan=172800&query=%22%2Fvar%2Fwww%2Fwebmakerorg%2Fnode_modules%2Fnewrelic%2Flib%2Ferror.js%22
I wonder why this is happening only on webmakerorg.
Comment 21•12 years ago
|
||
Sep 13 05:52:08 i-e2d8e799 app=webmakerorg-production: Executing: SET FOREIGN_KEY_CHECKS = 1;
Sep 13 05:52:08 i-e2d8e799 app=webmakerorg-production: Executing: UPDATE `Events` SET `title`='Intro to Web',`description`='Students AIUB will participate here. Some of Student Ambassador may join with us.',`address`='AIUB, Dhaka, Dhaka Division, Bangladesh',`latitude`=23.7943,`longitude`=90.4055,`beginDate`='2013-09-26 00:00:00',`beginTime`='1899-12-31 14:00:00',`endDate`='2013-09-26 00:00:00',`endTime`='1899-12-31 17:00:00',`attendees`=35,`registerLink`='https://www.facebook.com/events/218948418264084',`updatedAt`='2013-09-13 05:52:08' WHERE `id`=1533
Sep 13 05:52:08 i-e2d8e799 app=webmakerorg-production: Executing: SET FOREIGN_KEY_CHECKS = 1;
Sep 13 05:52:08 i-e2d8e799 app=webmakerorg-production: Executing: UPDATE `Events` SET `title`='Intro to Web',`description`='Students AIUB will participate here. Some of Student Ambassador may join with us.',`address`='AIUB, Dhaka, Dhaka Division, Bangladesh',`latitude`=23.7943,`longitude`=90.4055,`city`='Dhaka',`country`='BD',`attendees`=35,`beginDate`='2013-09-26 00:00:00',`endDate`='2013-09-26 00:00:00',`beginTime`='1899-12-31 14:00:00',`endTime`='1899-12-31 17:00:00',`registerLink`='https://www.facebook.com/events/218948418264084',`picture`=NULL,`organizer`='ashickur.noor@gmail.com',`organizerId`='ashickur-noor',`featured`=false,`id`=1533,`createdAt`='2013-09-06 08:30:21',`updatedAt`='2013-09-13 05:52:08' WHERE `id`=1533
Sep 13 05:52:08 i-e2d8e799 app=webmakerorg-production: Error: TypeError: Uncaught, unspecified "error" event.
Sep 13 05:52:08 i-e2d8e799 app=webmakerorg-production: at TypeError (<anonymous>)
Sep 13 05:52:08 i-e2d8e799 app=webmakerorg-production: at EventEmitter.emit (events.js:74:15)
Sep 13 05:52:08 i-e2d8e799 app=webmakerorg-production: at null.<anonymous> (/var/www/webmakerorg/node_modules/sequelize/lib/emitters/custom-event-emitter.js:67:17)
Sep 13 05:52:08 i-e2d8e799 app=webmakerorg-production: at EventEmitter.emit (events.js:95:17)
Sep 13 05:52:08 i-e2d8e799 app=webmakerorg-production: at null.<anonymous> (/var/www/webmakerorg/node_modules/sequelize/lib/query-interface.js:448:17)
Sep 13 05:52:08 i-e2d8e799 app=webmakerorg-production: at b (domain.js:183:18)
Sep 13 05:52:08 i-e2d8e799 app=webmakerorg-production: at EventEmitter.emit (events.js:98:17)
Sep 13 05:52:08 i-e2d8e799 app=webmakerorg-production: at module.exports.finish (/var/www/webmakerorg/node_modules/sequelize/lib/query-chainer.js:138:30)
Sep 13 05:52:08 i-e2d8e799 app=webmakerorg-production: at exec (/var/www/webmakerorg/node_modules/sequelize/lib/query-chainer.js:92:16)
Sep 13 05:52:08 i-e2d8e799 app=webmakerorg-production: at onError (/var/www/webmakerorg/node_modules/sequelize/lib/query-chainer.js:72:11)
Sep 13 05:52:08 i-e2d8e799 app=webmakerorg-production: TypeError: Uncaught, unspecified "error" event.
Sep 13 05:52:08 i-e2d8e799 app=webmakerorg-production: at TypeError (<anonymous>)
Sep 13 05:52:08 i-e2d8e799 app=webmakerorg-production: at EventEmitter.emit (events.js:74:15)
Sep 13 05:52:08 i-e2d8e799 app=webmakerorg-production: at null.<anonymous> (/var/www/webmakerorg/node_modules/sequelize/lib/emitters/custom-event-emitter.js:67:17)
Sep 13 05:52:08 i-e2d8e799 app=webmakerorg-production: at EventEmitter.emit (events.js:95:17)
Sep 13 05:52:08 i-e2d8e799 app=webmakerorg-production: at null.<anonymous> (/var/www/webmakerorg/node_modules/sequelize/lib/query-interface.js:448:17)
Sep 13 05:52:08 i-e2d8e799 app=webmakerorg-production: at b (domain.js:183:18)
Sep 13 05:52:08 i-e2d8e799 app=webmakerorg-production: at EventEmitter.emit (events.js:98:17)
Sep 13 05:52:08 i-e2d8e799 app=webmakerorg-production: at module.exports.finish (/var/www/webmakerorg/node_modules/sequelize/lib/query-chainer.js:138:30)
Sep 13 05:52:08 i-e2d8e799 app=webmakerorg-production: at exec (/var/www/webmakerorg/node_modules/sequelize/lib/query-chainer.js:92:16)
Sep 13 05:52:08 i-e2d8e799 app=webmakerorg-production: at onError (/var/www/webmakerorg/node_modules/sequelize/lib/query-chainer.js:72:11)
Sep 13 05:52:08 i-e2d8e799 app=webmakerorg-production: Error: unable to send 500 Error: Can't set headers after they are sent.
Sep 13 05:52:08 i-e2d8e799 app=webmakerorg-production: at ServerResponse.OutgoingMessage.setHeader (http.js:707:11)
Sep 13 05:52:08 i-e2d8e799 app=webmakerorg-production: at ServerResponse.res.setHeader (/var/www/webmakerorg/node_modules/express/node_modules/connect/lib/patch.js:59:22)
Sep 13 05:52:08 i-e2d8e799 app=webmakerorg-production: at ServerResponse.res.set.res.header (/var/www/webmakerorg/node_modules/express/lib/response.js:518:10)
Sep 13 05:52:08 i-e2d8e799 app=webmakerorg-production: at ServerResponse.res.send (/var/www/webmakerorg/node_modules/express/lib/response.js:124:10)
Sep 13 05:52:08 i-e2d8e799 app=webmakerorg-production: at fn (/var/www/webmakerorg/node_modules/express/lib/response.js:751:10)
Sep 13 05:52:08 i-e2d8e799 app=webmakerorg-production: at Function.app.render (/var/www/webmakerorg/node_modules/nunjucks/src/environment.js:139:17)
Sep 13 05:52:08 i-e2d8e799 app=webmakerorg-production: at ServerResponse.res.render (/var/www/webmakerorg/node_modules/express/lib/response.js:755:7)
Sep 13 05:52:08 i-e2d8e799 app=webmakerorg-production: at ServerResponse.render (/var/www/webmakerorg/node_modules/newrelic/lib/instrumentation/express.js:40:23)
Sep 13 05:52:08 i-e2d8e799 app=webmakerorg-production: at Domain.<anonymous> (/var/www/webmakerorg/app.js:106:11)
Sep 13 05:52:08 i-e2d8e799 app=webmakerorg-production: at Domain.EventEmitter.emit (events.js:95:17)
Sep 13 05:52:08 i-e2d8e799 app=webmakerorg-production:
Sep 13 05:52:08 i-e2d8e799 app=webmakerorg-production: Error: channel closed
Sep 13 05:52:08 i-e2d8e799 app=webmakerorg-production: at process.target.send (child_process.js:402:26)
Sep 13 05:52:08 i-e2d8e799 app=webmakerorg-production: at Worker.send (cluster.js:401:21)
Sep 13 05:52:08 i-e2d8e799 app=webmakerorg-production: at sendInternalMessage (cluster.js:394:10)
Sep 13 05:52:08 i-e2d8e799 app=webmakerorg-production: at Cluster.cluster._getServer (cluster.js:586:3)
Sep 13 05:52:08 i-e2d8e799 app=webmakerorg-production: at dgram.js:193:15
Sep 13 05:52:08 i-e2d8e799 app=webmakerorg-production: at /var/www/webmakerorg/node_modules/newrelic/node_modules/continuation-local-storage-glue/bootstrap.js:32:23
Sep 13 05:52:08 i-e2d8e799 app=webmakerorg-production: at dns.js:72:18
Sep 13 05:52:08 i-e2d8e799 app=webmakerorg-production: at /var/www/webmakerorg/node_modules/newrelic/node_modules/continuation-local-storage-glue/bootstrap.js:32:23
Sep 13 05:52:08 i-e2d8e799 app=webmakerorg-production: at process._tickDomainCallback [as _tickCallback] (node.js:459:13)
Sep 13 05:52:08 i-e2d8e799 app=webmakerorg-production: Server worker 14 exited.
Sep 13 05:52:08 i-e2d8e799 app=webmakerorg-production: Starting server worker...
Sep 13 05:52:09 i-e2d8e799 app=webmakerorg-production: {"name":"newrelic","hostname":"i-e2d8e799","pid":22353,"level":30,"msg":"Starting New Relic Node.js instrumentation.","time":"2013-09-13T05:52:09.560Z","v":0}
That is the full stack error
This is a slightly different one, adding it to this bug for reference.
Assignee | ||
Updated•12 years ago
|
Attachment #804109 -
Flags: review?(cade) → review+
Reporter | ||
Comment 22•12 years ago
|
||
Our crash looks like this:
Sep 13 17:17:58 i-18f9927b app=webmakerorg-production: Executing: INSERT INTO `Events` (`title`,`description`,`address`,`latitude`,`longitude`,`beginDate`,`beginTime`,`endDate`,`endTime`,`attendees`,`registerLink`,`organizerId`,`organizer`,`updatedAt`,`createdAt`) VALUES ('FakeEvent','Not a real Event','Yolo, CA','38.7646021','-121.90179539999997','2013-09-18 00:00:00','1899-12-31 10:00:00','2013-09-18 00:00:00','1899-12-31 10:00:00',6,'https://www.google.ca/signupforthisfakevent/wat/wat.html?aslkdfjasldkfjasldkfjasldkfjasdfasd=sdflksjdflskdfjslkdfjsldkfjslkdjfsldf','cade','chris@chrisdecairos.ca','2013-09-13 17:17:58','2013-09-13 17:17:58');
Sep 13 17:17:59 i-18f9927b app=webmakerorg-production: Executing: SET FOREIGN_KEY_CHECKS = 1;
Sep 13 17:17:59 i-18f9927b app=webmakerorg-production: Executing: UPDATE `Events` SET `title`='FakeEvent',`description`='Not a real Event',`address`='Yolo, CA',`latitude`='38.7646021',`longitude`='-121.90179539999997',`city`='Zamora',`country`='US',`attendees`=6,`beginDate`='2013-09-18 00:00:00',`endDate`='2013-09-18 00:00:00',`beginTime`='1899-12-31 10:00:00',`endTime`='1899-12-31 10:00:00',`registerLink`='https://www.google.ca/signupforthisfakevent/wat/wat.html?aslkdfjasldkfjasldkfjasldkfjasdfasd=sdflksjdflskdfjslkdfjsldkfjslkdjfsldf',`picture`=NULL,`organizer`='chris@chrisdecairos.ca',`organizerId`='cade',`featured`=false,`id`=2256,`createdAt`='2013-09-13 17:17:58',`updatedAt`='2013-09-13 17:17:59' WHERE `id`=2256
Sep 13 17:17:59 i-18f9927b app=webmakerorg-production: CRASH: TypeError: Uncaught, unspecified "error" event.
Sep 13 17:17:59 i-18f9927b app=webmakerorg-production: at TypeError (<anonymous>)
Sep 13 17:17:59 i-18f9927b app=webmakerorg-production: at EventEmitter.emit (events.js:74:15)
Sep 13 17:17:59 i-18f9927b app=webmakerorg-production: at null.<anonymous> (/var/www/webmakerorg/node_modules/sequelize/lib/emitters/custom-event-emitter.js:67:17)
Sep 13 17:17:59 i-18f9927b app=webmakerorg-production: at EventEmitter.emit (events.js:95:17)
Sep 13 17:17:59 i-18f9927b app=webmakerorg-production: at null.<anonymous> (/var/www/webmakerorg/node_modules/sequelize/lib/query-interface.js:448:17)
Sep 13 17:17:59 i-18f9927b app=webmakerorg-production: at b (domain.js:183:18)
Sep 13 17:17:59 i-18f9927b app=webmakerorg-production: at EventEmitter.emit (events.js:98:17)
Sep 13 17:17:59 i-18f9927b app=webmakerorg-production: at module.exports.finish (/var/www/webmakerorg/node_modules/sequelize/lib/query-chainer.js:138:30)
Sep 13 17:17:59 i-18f9927b app=webmakerorg-production: at exec (/var/www/webmakerorg/node_modules/sequelize/lib/query-chainer.js:92:16)
Sep 13 17:17:59 i-18f9927b app=webmakerorg-production: at onError (/var/www/webmakerorg/node_modules/sequelize/lib/query-chainer.js:72:11)
Sep 13 17:17:59 i-18f9927b app=webmakerorg-production: Internal error isolating shutdown sequence: Error: Can't set headers after they are sent.
Sep 13 17:17:59 i-18f9927b app=webmakerorg-production: Server worker 2 exited.
We've now got the cluster/domain stuff right. We just need to figure this out so we don't crash at all.
Reporter | ||
Comment 23•12 years ago
|
||
Let's see if this is failing.
Attachment #804571 -
Flags: review?(cade)
Reporter | ||
Comment 24•12 years ago
|
||
So we've improved things a bit, and landed a bunch of fixes. Right now we've done the following:
* Turned off newrelic on webmaker.org prod (we were seeing it in stacks)
* Turned off graylog UDP logging from webmaker.org -> loggins (to see if this was affecting loggins, which was crashing non-stop all day)
* Pushed a bunch of small fixes to the domain and error handling code
* Pushed a fix to events to add error handling for reverse geocode lookups (that doesn't seem to be the problem, although it's a good fix nevertheless). We were wondering if the fact that we don't have a MapQuest key for their API was causing us to hit the daily API limit on prod. Doesn't look like that's happening (although we probably do need a key here).
We're no longer seeing nagios reports from webmaker.org on prod every few mins, so things are more stable. However, we're still crashing all the time, and the node cluster is just restarting forks.
The STR to trigger the crash according to cade are:
1) Create an event (doesn't matter what data you enter)
This crash is *not* reproducible on staging, only on prod.
The three of us are burnt out on this problem, so we're going to circle back next week, and get Jon to help us look. Other people might want to dig in to.
Somehow the events stuff is hosed, and missing an error handler that should be handling some bad case async, and it isn't.
Summary: Investigate webmaker.org crashes → Investigate webmaker.org PROD crash in Events
Comment 25•12 years ago
|
||
naive question: can we remove events from the home page for now?
Reporter | ||
Comment 26•12 years ago
|
||
(In reply to David Ascher (:davida) from comment #25)
> naive question: can we remove events from the home page for now?
No
Comment 27•12 years ago
|
||
Unfortunately, we are still seeing webmaker.org alerts.
Time (Any) Host (Any) Service (Any) State (Any) Output
22:01:422013-09-13 MenuWebmaker-Production Menuhealthcheck OK HTTP OK: HTTP/1.1 200 OK - 52373 bytes in 0.636 second response time
22:00:102013-09-13 MenuWebmaker-Production Menuhealthcheck CRITICAL CRITICAL - Socket timeout after 30 seconds
21:55:412013-09-13 MenuWebmaker-Production Menuhealthcheck OK HTTP OK: HTTP/1.1 200 OK - 52371 bytes in 0.645 second response time
21:55:102013-09-13 MenuWebmaker-Production Menuhealthcheck CRITICAL CRITICAL - Socket timeout after 30 seconds
21:27:392013-09-13 MenuWebmaker-Production Menuhealthcheck OK HTTP OK: HTTP/1.1 200 OK - 52371 bytes in 0.777 second response time
Assignee | ||
Comment 28•12 years ago
|
||
Comment on attachment 804571 [details] [review]
https://github.com/mozilla/webmaker.org/pull/370
This was an IRC r+
Attachment #804571 -
Flags: review?(cade) → review+
Comment 29•12 years ago
|
||
Commit pushed to master at https://github.com/mozilla/webmaker.org
https://github.com/mozilla/webmaker.org/commit/ecec59471e57cc401d7345766c80d08a06876a55
REDO Bug 914904 Protect against SQL errors... https://github.com/mozilla/webmaker.org/commit/07ec50862bbfa28d8674c4981dbda1ffe5053e30
Comment 30•12 years ago
|
||
Root cause of this crash was a missing `city` and `country` field in the database. Since launch.
*whoops*
I also fixed up the code to catch that top-level error event, so that's good too.
Reporter | ||
Comment 32•12 years ago
|
||
Comment on attachment 806688 [details] [review]
https://github.com/mozilla/webmaker.org/pull/376
Great work digging into this, nice fix.
Attachment #806688 -
Flags: review?(david.humphrey) → review+
Comment 33•12 years ago
|
||
Commits pushed to master at https://github.com/mozilla/webmaker.org
https://github.com/mozilla/webmaker.org/commit/bbea097e1991928ad39f61ba3778d0780ede2405
Bug 914904 - Catch errors thrown by event city/country update
https://github.com/mozilla/webmaker.org/commit/cb99d273fac5825db6f0508606d7eee615b5e19e
Bug 914904 - Do not send response if it's already been sent
Comment 34•12 years ago
|
||
Hooray!
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 35•12 years ago
|
||
Thanks for figuring this one out jbuck! It had been haunting my dreams for days.
Updated•12 years ago
|
Attachment mime type: text/plain → text/x-github-pull-request
Updated•12 years ago
|
Attachment mime type: text/plain text/plain text/plain text/plain → text/x-github-pull-request text/x-github-pull-request text/x-github-pull-request text/x-github-pull-request
You need to log in
before you can comment on or make changes to this bug.
Description
•