Closed Bug 984325 Opened 10 years ago Closed 10 years ago

Loop Server — Log server errors

Categories

(Hello (Loop) :: Server, defect)

defect
Not set
normal

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: alexis+bugs, Assigned: mostlygeek)

References

Details

(Whiteboard: [qa+])

Attachments

(1 file)

55 bytes, text/x-github-pull-request
rhubscher
: review+
Details | Review
We should log server errors and database errors in order to debug whatever problems will arise in the future.

Services seem to use sentry to log this kind of stuff, can someone from ops provide some more information on where to hook?

https://github.com/mattrobenolt/raven-node looks like a good starting point.
Blocks: 972029
No longer blocks: loop_mlp
Assignee: nobody → alexis+bugs
Attached file link to github PR
Attachment #8394196 - Flags: review?(rhubscher)
Comment on attachment 8394196 [details] [review]
link to github PR

Looks good to me.
Attachment #8394196 - Flags: review?(rhubscher) → review+
https://github.com/mozilla-services/loop-server/commit/a7ddce30d7dcb82eecc28288d7684dc57a38579d
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Verified in code (via the commit).
But, where is this being logged and what is picking it up?
Has OPs started proper logging for Prod and Stage, based on this change?
We haven't setup heka or sentry for loop yet. Definitely a conversation we'd want to have. I believe a few of our services use sentry.
So maybe marking this bug as Resolved is premature.
Or, we can just add OPs issues as blockers to Verifying this bug...
In Loop, for errors we plugged Sentry for telemetry we plugged Statsd because Heka is compatible with its protocol.
OK. I will check all this out in Stage...
Whiteboard: [qa+]
James, I'm not sure to fully get it: I don't believe we are currently logging things into heka or the telemetry servers, so I'm reopening this bug and assigning it to Benson since the code can provide that but I'm not sure it's deployed at the moment and logging anything anywhere.

Feel free to close if that is really the case :)
Assignee: alexis+bugs → bwong
Status: RESOLVED → REOPENED
Flags: needinfo?(jbonacci)
Resolution: FIXED → ---
The current bug status is fine with me. Thanks.
Flags: needinfo?(jbonacci)
I'm going to redeploy Stage w/ sentry and statsd implemented. It will log to: 

sentry.shared.us-east-1.stage.mozaws.net
graphite.shared.us-east-1.stage.mozaws.net 


in production it'll be: 

sentry.shared.us-west-2.prod.mozaws.net
graphite.shared.us-west-2.prod.mozaws.net
:mostlygeek ok, thanks.
Ping me when this is redeployed and we can verify logging on Stage...
Benson, can you provide us some update about error logging?
Flags: needinfo?(bwong)
(In reply to Alexis Metaireau (:alexis) from comment #13)
> Benson, can you provide us some update about error logging?

I don't see output in Stage Graphite.  Should there be anything in Sentry presently?
Flags: needinfo?(bwong)
(In reply to Bob Micheletto [:bobm] from comment #14)

> I don't see output in Stage Graphite.  Should there be anything in Sentry
> presently?

There is a section for loop-server on the Stage Sentry server.  However, there aren't any tracebacks in it.  Can we force one to see if it is working?
Flags: needinfo?(alexis+bugs)
I guess disabling redis and trying to run a small load test (with make test in the "loadtest" folder) should make errors show up in sentry.

I don't have the power to disable redis, can you have a look? (you can sync with James to run the small load test during the day, also, if I'm not around).
Flags: needinfo?(alexis+bugs) → needinfo?(bobm)
Severity: normal → major
Priority: -- → P1
What I've seen with whd when setting up production is that the server hang if redis doesn't answer.
We should set the connect_timeout value to something like 2 or 5 seconds.

The doc stats that if the connect_timeout is not set the redis client will try forever.
https://github.com/mranney/node_redis
Okay, then I think we have two different issues here. The first one (the reason for this bug) is that we don't log the errors. I think this is urgent to solve because we don't currently have any way to know if something is going wrong on the server.

The (and I'll fill in a new bug about that) we have the redis connection issue that you're talking about Rémy: to paraphrase you, it seems that the connection to redis doesn't timeout, and that we're not notified when we're doing the heartbeat.
The second bug is here: 1021726
Okay, it's logging errors to http://sentry.shared.us-east-1.stage.mozaws.net/loop/loop-stage/, so we're good.
Severity: major → normal
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Flags: needinfo?(bobm)
Priority: P1 → --
Resolution: --- → FIXED
OK.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: