Closed
Bug 1255184
Opened 8 years ago
Closed 8 years ago
Start consuming Test Pilot logs
Categories
(Cloud Services Graveyard :: Metrics: Pipeline, defect, P1)
Cloud Services Graveyard
Metrics: Pipeline
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: clouserw, Assigned: relud)
References
Details
(Whiteboard: [SvcOps])
Test Pilot lives! We have a production site at testpilot.firefox.com. I'd like to start collecting the logs from testpilot and collecting them in Redshift (the same redshift bucket as bug 1255182). I haven't done this before so please let me know anything I'm missing below. I'm planning on using the standard JSON logging format for this, with some custom fields. A complete example would be (I think): > { > "Timestamp": 1457462527, > "Type": "testpilot.test-install", > "Logger": "TestPilotWeb", > "Hostname": "testpilot.servers.s3", > "EnvVersion": "2.0", // We're conforming to the Fields spec > "Severity": 0, > "Pid": 1000, > "Fields":{ > "agent": "User Agent String", > "path": "/tests/", > "method": "GET", > "code": 200, > "errno": 0, > "lang": "en-US", > "uid": 1000, // This is the Test Pilot UID > "service": "universal_search@mozilla", // If the ping comes from a test within the program, the identifier goes here > "context": "", > "msg": "New Test Installed: Universal Search", > "remoteAddressChain": ["1.2.3.4", "4.3.2.1"], > "rid": "0b65c6fb-26d5-4367-be29-b3094f0c69d5", > "t": 256, > > "feature_switches": "", > "campaign": "" > } > } Is that the definition you need? Do you need more? I'm hoping to analyze and pull the data out of Redshift. Bug 1255182 is a complement to this bug. Full metrics plan at https://github.com/mozilla/testpilot/blob/master/docs/README-METRICS.md
Comment 1•8 years ago
|
||
Tagged the release: https://github.com/mozilla/testpilot/releases/tag/2016-03-14
Comment 2•8 years ago
|
||
(Oops, disregard my last comment please. It was intended for bug 1255135)
Comment 3•8 years ago
|
||
We'll pack the sprint tomorrow, and try to make this a P1
Assignee: nobody → whd
Updated•8 years ago
|
Component: Operations: Metrics/Monitoring → Metrics: Pipeline
Updated•8 years ago
|
Updated•8 years ago
|
Points: --- → 2
Priority: -- → P1
Reporter | ||
Comment 4•8 years ago
|
||
Is there any information missing from this bug? Anything I need to do? Thanks.
Comment 5•8 years ago
|
||
(In reply to Wil Clouser [:clouserw] from comment #4) > Is there any information missing from this bug? Anything I need to do? > Thanks. I don't think so. It's marked as P1 (i.e. highest priority, to be completed during the current sprint) and assigned to Wes, I know he's got this on his radar, so it should be getting attention Real Soon Now®. Unfortunately ops is our busiest team right now, so sometimes it takes a bit. Thanks for your patience. :)
Comment 6•8 years ago
|
||
I talked to :mostlygeek yesterday and apparently test pilot is not actually logging anything yet. I am still setting up the generic log routing to S3/ES for this week, but specifically ending up in redshift requires more of a schema/detail than is currently available.
Comment 7•8 years ago
|
||
I was thinking we can consider this bug to cover consuming the logs so they end up in S3, we can open another one for getting the log data into Redshift. I can work w/ them to help make that happen.
Comment 8•8 years ago
|
||
FWIW, Test Pilot should be ready [1] to start logging request.summary JSON to the console after our next deployment. (Not yet scheduled, but we could probably tag a release & do it whenever.) The schema should be what's specified in the Dockerflow repo [2]. I've never set up a project to log this way before, so it's possible it won't do the right thing on the first try. Could help to get some eyes on the logging middleware [1], but I think a few folks have already given the thumbs up to the log formatter [3] [1] https://github.com/mozilla/testpilot/commit/aa7f9c9cc331f9798017591bd52ba48f8a02d9ae [2] https://github.com/mozilla-services/Dockerflow/blob/master/docs/mozlog.md [3] https://github.com/mozilla/testpilot/blob/master/testpilot/base/logging.py
Reporter | ||
Comment 9•8 years ago
|
||
Deployment bug is https://bugzilla.mozilla.org/show_bug.cgi?id=1260783 . I don't think that blocks this. @wesley -- can you be more specific about what you need when you say "more of a schema/detail"? Even pointing to an example would help as I haven't done this before. Thanks.
Comment 10•8 years ago
|
||
Deployment bug doesn't necessarily block work being done on this, but it *is* it hard to verify that the logs are being processed correctly before there are any logs being generated to process. Here's an example of what a schema to load a log message structure into Redshift looks like: https://github.com/rafrombrc/push_derived/blob/master/hindsight/hs_run/output/push_endp_redshift.lua
Comment 11•8 years ago
|
||
Bug 1260783 is closed. What else is blocking this?
Depends on: 1260783
Flags: needinfo?(whd)
Comment 12•8 years ago
|
||
whd is currently blocked by ops related tasks, we will re prioritize once he comes back up for air.
Priority: P1 → P2
Updated•8 years ago
|
Whiteboard: [SvcOps]
Updated•8 years ago
|
Assignee: whd → dthorn
Priority: P2 → P1
Assignee | ||
Comment 13•8 years ago
|
||
Logs are currently being shipped to S3. I'm unclear on what the plan is for converting them into a redshift copy-able format is.
Flags: needinfo?(whd)
Assignee | ||
Comment 14•8 years ago
|
||
:clouserw I'm going to write something to read the logs from s3 and put them in redshift. I need to know: what information do you want in redshift, and what part of the logs do I find it in?
Flags: needinfo?(wclouser)
Reporter | ||
Comment 15•8 years ago
|
||
Here's a schema (which I'll also make a PR for and put in the official docs): local schema = { -- column name field type length attributes field name {"timestamp", "TIMESTAMP", nil, "SORTKEY", "Timestamp"}, {"uuid", "VARCHAR", 36, nil, get_uuid}, {"type", "VARCHAR", 255, nil, "type"}, {"logger", "VARCHAR", 255, nil, "logger"}, {"Hostname", "VARCHAR", 255, nil, "Hostname"}, {"Severity", "INTEGER", nil, nil, "Severity"}, {"agent", "VARCHAR", 45, nil, "Fields[agent]"}, {"path", "VARCHAR", 56, nil, "Fields[path]"}, {"method", "VARCHAR", 200, nil, "Fields[method]"}, {"code", "VARCHAR", 255, nil, "Fields[code]"}, {"errno", "VARCHAR", 255, nil, "Fields[errno]"}, {"lang", "VARCHAR", 36, nil, "Fields[lang]"}, {"uid", "INTEGER", nil, nil, "Fields[uid]"}, {"service", "VARCHAR", 255, nil, "Fields[service]"}, {"context", "VARCHAR", 255, nil, "Fields[context]"}, {"msg", "VARCHAR", 1000, nil, "Fields[msg]"}, {"remoteAddressChain", "VARCHAR", 255, nil, "Fields[remoteAddressChain]"}, {"rid", "VARCHAR", 255, nil, "Fields[rid]"}, {"t", "VARCHAR", 36, nil, "Fields[t]"}, {"feature_switches", "VARCHAR", 255, nil, "Fields[feature_switches]"}, {"campaign", "VARCHAR", 255, nil, "Fields[campaign]"} }
Flags: needinfo?(wclouser)
Assignee | ||
Comment 16•8 years ago
|
||
I've duplicated rmiller's autopush/webpush setup for testpilot. It can be viewed in sql.telemetry.mozilla.org with the data source Testpilot and tables named: testpilot_sever_YYYYMMDD
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Updated•6 years ago
|
Product: Cloud Services → Cloud Services Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•