Closed Bug 984325 Opened 10 years ago Closed 10 years ago

Loop Server — Log server errors

Tracking

(Not tracked)

Status:

VERIFIED FIXED

People

(Reporter: alexis+bugs, Assigned: mostlygeek)

References

Details

(Whiteboard: [qa+])

Attachments

(1 file)

link to github PR 10 years ago Alexis Metaireau (:alexis) 55 bytes, text/x-github-pull-request	rhubscher : review+	Details \| Review

Alexis Metaireau (:alexis)

Reporter

Description

•

10 years ago

We should log server errors and database errors in order to debug whatever problems will arise in the future.

Services seem to use sentry to log this kind of stuff, can someone from ops provide some more information on where to hook?

https://github.com/mattrobenolt/raven-node looks like a good starting point.

Alexis Metaireau (:alexis)

Reporter

Updated

•

10 years ago

Blocks: 972029
No longer blocks: loop_mlp

Alexis Metaireau (:alexis)

Reporter

Updated

•

10 years ago

Assignee: nobody → alexis+bugs

Alexis Metaireau (:alexis)

Reporter

Comment 1

•

10 years ago

Attached file link to github PR — Details

Attachment #8394196 - Flags: review?(rhubscher)

Rémy Hubscher (:natim)

Comment 2

•

10 years ago

Comment on attachment 8394196 [details] [review]
link to github PR

Looks good to me.

Attachment #8394196 - Flags: review?(rhubscher) → review+

Rémy Hubscher (:natim)

Comment 3

•

10 years ago

https://github.com/mozilla-services/loop-server/commit/a7ddce30d7dcb82eecc28288d7684dc57a38579d

Status: NEW → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

James Bonacci [:jbonacci]

Comment 4

•

10 years ago

Verified in code (via the commit).
But, where is this being logged and what is picking it up?
Has OPs started proper logging for Prod and Stage, based on this change?

Benson Wong [:mostlygeek]

Assignee

Comment 5

•

10 years ago

We haven't setup heka or sentry for loop yet. Definitely a conversation we'd want to have. I believe a few of our services use sentry.

James Bonacci [:jbonacci]

Comment 6

•

10 years ago

So maybe marking this bug as Resolved is premature.
Or, we can just add OPs issues as blockers to Verifying this bug...

Rémy Hubscher (:natim)

Comment 7

•

10 years ago

In Loop, for errors we plugged Sentry for telemetry we plugged Statsd because Heka is compatible with its protocol.

James Bonacci [:jbonacci]

Comment 8

•

10 years ago

OK. I will check all this out in Stage...

James Bonacci [:jbonacci]

Updated

•

10 years ago

Whiteboard: [qa+]

Alexis Metaireau (:alexis)

Reporter

Comment 9

•

10 years ago

James, I'm not sure to fully get it: I don't believe we are currently logging things into heka or the telemetry servers, so I'm reopening this bug and assigning it to Benson since the code can provide that but I'm not sure it's deployed at the moment and logging anything anywhere.

Feel free to close if that is really the case :)

Assignee: alexis+bugs → bwong

Status: RESOLVED → REOPENED

Flags: needinfo?(jbonacci)

Resolution: FIXED → ---

James Bonacci [:jbonacci]

Comment 10

•

10 years ago

The current bug status is fine with me. Thanks.

Flags: needinfo?(jbonacci)

Benson Wong [:mostlygeek]

Assignee

Comment 11

•

10 years ago

I'm going to redeploy Stage w/ sentry and statsd implemented. It will log to: 

sentry.shared.us-east-1.stage.mozaws.net
graphite.shared.us-east-1.stage.mozaws.net 


in production it'll be: 

sentry.shared.us-west-2.prod.mozaws.net
graphite.shared.us-west-2.prod.mozaws.net

James Bonacci [:jbonacci]

Comment 12

•

10 years ago

:mostlygeek ok, thanks.
Ping me when this is redeployed and we can verify logging on Stage...

Alexis Metaireau (:alexis)

Reporter

Comment 13

•

10 years ago

Benson, can you provide us some update about error logging?

Flags: needinfo?(bwong)

Bob Micheletto [:bobm]

Comment 14

•

10 years ago

(In reply to Alexis Metaireau (:alexis) from comment #13)
> Benson, can you provide us some update about error logging?

I don't see output in Stage Graphite.  Should there be anything in Sentry presently?

Flags: needinfo?(bwong)

Bob Micheletto [:bobm]

Comment 15

•

10 years ago

(In reply to Bob Micheletto [:bobm] from comment #14)

> I don't see output in Stage Graphite.  Should there be anything in Sentry
> presently?

There is a section for loop-server on the Stage Sentry server.  However, there aren't any tracebacks in it.  Can we force one to see if it is working?

Flags: needinfo?(alexis+bugs)

Alexis Metaireau (:alexis)

Reporter

Comment 16

•

10 years ago

I guess disabling redis and trying to run a small load test (with make test in the "loadtest" folder) should make errors show up in sentry.

I don't have the power to disable redis, can you have a look? (you can sync with James to run the small load test during the day, also, if I'm not around).

Flags: needinfo?(alexis+bugs) → needinfo?(bobm)

Tarek Ziadé (:tarek)

Updated

•

10 years ago

Severity: normal → major

Priority: -- → P1

Rémy Hubscher (:natim)

Comment 17

•

10 years ago

What I've seen with whd when setting up production is that the server hang if redis doesn't answer.
We should set the connect_timeout value to something like 2 or 5 seconds.

The doc stats that if the connect_timeout is not set the redis client will try forever.
https://github.com/mranney/node_redis

Alexis Metaireau (:alexis)

Reporter

Comment 18

•

10 years ago

Okay, then I think we have two different issues here. The first one (the reason for this bug) is that we don't log the errors. I think this is urgent to solve because we don't currently have any way to know if something is going wrong on the server.

The (and I'll fill in a new bug about that) we have the redis connection issue that you're talking about Rémy: to paraphrase you, it seems that the connection to redis doesn't timeout, and that we're not notified when we're doing the heartbeat.

Alexis Metaireau (:alexis)

Reporter

Comment 19

•

10 years ago

The second bug is here: 1021726

Alexis Metaireau (:alexis)

Reporter

Comment 20

•

10 years ago

Okay, it's logging errors to http://sentry.shared.us-east-1.stage.mozaws.net/loop/loop-stage/, so we're good.

Severity: major → normal

Status: REOPENED → RESOLVED

Closed: 10 years ago → 10 years ago

Flags: needinfo?(bobm)

Priority: P1 → --

Resolution: --- → FIXED

James Bonacci [:jbonacci]

Comment 21

•

10 years ago

OK.

Status: RESOLVED → VERIFIED

You need to log in before you can comment on or make changes to this bug.