Bugzilla

Assignee

Comment 1

•

10 years ago

I though it was what Winston was doing (i.e always write to loop_metrics.log and rotate when it's full.)

Would it be better to use https://github.com/flatiron/winston/blob/master/lib/winston/transports/daily-rotate-file.js or to configure loop to always write to loop_metrics.log and then use the system logrotate?

Assignee

Updated

•

10 years ago

Flags: needinfo?(whd)

Reporter

Comment 2

•

10 years ago

It appears at a cursory glance that switching to the Winston daily rotate file transport suffers from the same problem with the _created internal state variable. I would prefer we switch back to writing to the standard streams and letting circus do the rotation, since this better matches the behavior of other applications and our logging standard: https://mana.mozilla.org/wiki/display/CLOUDSERVICES/Logging+Standard. AFAIK loop doesn't presently emit anything to the standard streams. However, having Loop always write to loop_metrics.log and then using the system logrotate should indeed be sufficient to mitigate the issue.

In the long term, we'd like to use the new mozlog https://github.com/mozilla/mozlog library for all log output across Cloud Services' nodejs applications.

Flags: needinfo?(whd)

Assignee

Comment 3

•

10 years ago

whd — Do you recommend us to start using mozilla/mozlog instead of winston for loop?

Flags: needinfo?(whd)

Reporter

Comment 4

•

10 years ago

(In reply to Rémy Hubscher (:natim) from comment #3)
> whd — Do you recommend us to start using mozilla/mozlog instead of winston
> for loop?

Yes.

Flags: needinfo?(whd)

Reporter

Comment 5

•

10 years ago

It looks like we lost some data due to this for 2014-12-06, and the dashboards for this week are messed up: https://metrics.services.mozilla.com/loop-server-dashboard/
By the time I noticed and checked the loop servers, one looked like:

loop_metrics15.log (old)
loop_metrics16.log (old)
loop_metrics17.log (old)
loop_metrics18.log (old)
loop_metrics19.log (old)
loop_metrics5.log (new)
loop_metrics6.log (new)
loop_metrics7.log (new)
loop_metrics8.log (new, being written to)

And the other:

loop_metrics15.log (old)
loop_metrics16.log (old)
loop_metrics17.log (old)
loop_metrics18.log (old)
loop_metrics19.log (old)
loop_metrics5.log (new)
loop_metrics6.log (new)
loop_metrics7.log (new)
loop_metrics8.log (new)
loop_metrics9.log (new)
loop_metrics.log (new, being written to)

This log rotation scheme is basically incomprehensible to me (and heka). https://github.com/mozilla-services/puppet-config/blob/master/loop/yaml/app/loop_server.prod.yaml#L36-40 indicates there should be only be five files at a time but it seems to have rotated the name at 20 and since deleted loop_metrics{1..4}.log as they are missing. I suspect the first server above will begin writing to loop_metrics.log after finishing with loop_metrics9.log, which may again break the logstreamer. I'll fix it tomorrow if it happens.

This most recent stack (2014-11-26) has been running longer than the average loop stack which is why the data loss occurred.

A way to mitigate this would be to either not rotate at all or fix the app logging settings (and/or the logger itself). With the amount of traffic loop gets and the rate at which we deploy new stacks I would be fine with turning off rotation completely until this has a proper fix.

I have attempted to recover and process as much of the existing logs as possible and will be adding anomaly detection to heka to catch this sooner in the future, but it must be addressed by the server or the logging library if we want to prevent this kind of data loss.

Comment 6

•

10 years ago

Thanks for looking into this Wesley!

Assignee

Comment 7

•

9 years ago

Starting the switch to mozlog today.

Assignee: nobody → rhubscher

Assignee

Comment 8

•

9 years ago

Actually it seems mozlog doesn't let us configure the filename where to store files. First reason why we choose to use winston. I will deactivate logrotate for winston.

Reporter

Comment 9

•

9 years ago

Mozlog is a wrapper around Intel, which should support writing out to a file: https://github.com/seanmonstar/intel#filehandler. I would also prefer you use the standard streams for metrics anyway as outlined in https://mana.mozilla.org/wiki/display/CLOUDSERVICES/Logging+Standard section "Format", but my primary concern is to stop losing data, which can be accomplished by disabling logrotate for winston.

Assignee

Comment 10

•

9 years ago

Attached file Link to Github PR — #277. (obsolete) — Details

Attachment #8543245 - Flags: review?(alexis+bugs)

Assignee

Comment 11

•

9 years ago

Following your advice I did this:

 - Remove everything that might write to stdout
 - Use mozlog to write to stdout with the heka logger

The format is now the one from mozlog:

For websocket logging:

 {"Timestamp":1420193467785000000,"Logger":"mozilla-loop-server",
  "Type":"loop.logger.request.summary","Severity":6,"Pid":11494,"EnvVersion":"2.0",
  "Fields":{"code":204,"path":"/rooms/LcYL3OdXgg4","method":"post","query":"{}",
            "agent":"python-requests/2.4.1 CPython/2.7.6 Linux/3.13.0-43-generic",
            "time":"2015-01-02T10:11:07Z",
            "uid":"fabaf4f9f60c6f8d97158c75f0b9b2661738130eb654eed13d5ecdc8739d0f1a",
            "token":"LcYL3OdXgg4","v":"0.14.2","name":"mozilla-loop-server",
            "hostname":"rhubscher-ThinkPad-W540","ip":"127.0.0.1","errno":0,"action":"leave"}}

For requests logging:

 {"Timestamp":1420193467781000000,"Logger":"mozilla-loop-server","Type":"loop.logger.request.summary",
  "Severity":6,"Pid":11494,"EnvVersion":"2.0",
  "Fields":{"code":200,"path":"/rooms/LcYL3OdXgg4","method":"post","query":"{}",
            "agent":"python-requests/2.4.1 CPython/2.7.6 Linux/3.13.0-43-generic",
            "time":"2015-01-02T10:11:07Z",
            "uid":"fabaf4f9f60c6f8d97158c75f0b9b2661738130eb654eed13d5ecdc8739d0f1a",
            "token":"LcYL3OdXgg4","v":"0.14.2","name":"mozilla-loop-server",
            "hostname":"rhubscher-ThinkPad-W540","ip":"127.0.0.1","errno":0,"action":"refresh"}}

The hekaMetrics configuration changed to fit the mozlog properties:

  hekaMetrics: {
    activated: {
      doc: "Defines if metrics should be directed to hekad",
      default: false,
      format: Boolean
    },
    debug: {
      doc: "Heka logger display development logs",
      format: Boolean,
      default: false
    },
    level: {
      doc: "Log level.",
      format: String,
      default: "INFO"
    },
    fmt: {
      doc: "Log level.",
      format: String,
      default: "heka"
    }
  },

Assignee

Comment 12

•

9 years ago

I put to examples of request.summary, the one for websocket.summary is:

 {"Timestamp":1420193489623000000,"Logger":"mozilla-loop-server","Type":"loop.logger.websocket.summary",
  "Severity":6,"Pid":11494,"EnvVersion":"2.0",
  "Fields":{"callId":"bfb6e7e5ade4b4ebdcb906382f820242","auth":"d27e26844c00c6b68005a2e6c1cb5960",
            "messageType":"hello","time":"2015-01-02T10:11:29Z"}}

Reporter

Comment 13

•

9 years ago

This looks good, thanks. If possible remove the redundant "loop.logger" prefix from the Type field, since we have the Logger field for this purpose.

Assignee

Comment 14

•

9 years ago

I will let loop.websocket.summary and loop.request.summary. Because mozlog doesn't handle an empty namespace:

  var log = require("mozlog")();

It writes `.websocket.summary` and `.request.summary`

Do you want me to create a patch for mozlog about this?

Reporter

Comment 15

•

9 years ago

It appears the idiomatic way of doing this is to use different logger objects for different code paths (at least this is my guess from https://github.com/mozilla/fxa-profile-server/search?utf8=%E2%9C%93&q=logger+%3D). So in the requests path you would do e.g.

  var log = require("mozlog")("request");

and log a "summary" event.

Assignee

Comment 16

•

9 years ago

Yes this would be one way to do it. Also it would means to write:

    hekaLogger("request").info("summary", data);

And I rather prefer to write:

    hekaLogger.info("request.summary", data);

Which does exactly the same thing.

I have created https://github.com/mozilla/mozlog/pull/3 in order to handle the empty namespace.

Comment 17

•

9 years ago

Adding a NI on Sean so we have feedback to know if this can be merged.

Flags: needinfo?(smcarthur)

Comment 18

•

9 years ago

Comment on attachment 8543245 [details] [review]
Link to Github PR — #277.

I've merged the code change, it's at https://github.com/mozilla-services/loop-server/commit/82c8883d80d4e74711cf6daba8e33bd7939005ce

I'm letting the NI on sean to have feedback about the PR Rémy opened on mozlog.

Attachment #8543245 - Flags: review?(alexis+bugs) → review+

Sean McArthur [:seanmonstar]

Updated

•

9 years ago

Status: NEW → RESOLVED

Closed: 9 years ago

Resolution: --- → FIXED

Comment 19

•

9 years ago

sorry, missed this. I have left comments on the PR.

Flags: needinfo?(smcarthur)

Reporter

Comment 20

•

9 years ago

I took a look at the logs in stage and they still have "Type" with the "loop." prefix, probably because :natim's mozlog PR has not been merged. I need this fixed before 0.15.* goes to production.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Richard Pappalardo [:rpapa][:rpappalardo]

Reporter

Updated

•

9 years ago

Blocks: 1120911

Comment 21

•

9 years ago

Wes, you mentioned today we could proceed with 0.15.0 and that this logging change could go out in a future release.  Does it instead look like we need to halt 0.15.0 and do a re-spin with this change?

Flags: needinfo?(whd)

Reporter

Comment 22

•

9 years ago

(In reply to Richard Pappalardo [:rpapa][:rpappalardo] from comment #21)
> Wes, you mentioned today we could proceed with 0.15.0 and that this logging
> change could go out in a future release.  Does it instead look like we need
> to halt 0.15.0 and do a re-spin with this change?

I need this fixed before 0.15.* goes to production, but I don't know if that requires halting current work on 0.15.0 in stage. It's a minor code change.

Flags: needinfo?(whd)

Comment 23

•

9 years ago

Remy, can you have a look at this?

Flags: needinfo?(rhubscher)

Assignee

Comment 24

•

9 years ago

Sure what would you like me to do?

Flags: needinfo?(rhubscher)

Sean McArthur [:seanmonstar]

Reporter

Comment 25

•

9 years ago

:natim I need the "loop." prefix dropped from the Logger field. This might involve updating mozlog and the loop server or both. There's a ticket for a new stage deploy of 0.15.1 (bug 1123276), so perhaps we can roll this change into that (and call it 0.15.2 if need be).

Comment 26

•

9 years ago

There's a PR to mozlog that I reviewed. Needs an update and I'll merge and publish a new version to npm.

Sean McArthur [:seanmonstar]

Assignee

Comment 27

•

9 years ago

Ok Sean, Thanks I will do that then.

Comment 28

•

9 years ago

Thanks! I've published v1.0.2 to npm, you should be good to go.

Richard Pappalardo [:rpapa][:rpappalardo]

Updated

•

9 years ago

Blocks: 1123276

Assignee

Comment 29

•

9 years ago

Attached file Link to Github PR — #289. — Details

Attachment #8553974 - Flags: review?(alexis+bugs)

Attachment #8553974 - Flags: feedback?(whd)

Assignee

Updated

•

9 years ago

Attachment #8543245 - Attachment is obsolete: true

Reporter

Updated

•

9 years ago

Attachment #8553974 - Flags: feedback?(whd) → feedback+

Comment 30

•

9 years ago

https://github.com/mozilla-services/loop-server/commit/4b3079990e8642103de88f7a5e3a908fe81d0a70

Status: REOPENED → RESOLVED

Closed: 9 years ago → 9 years ago

Resolution: --- → FIXED