Closed Bug 1334491 Opened 8 years ago Closed 8 years ago

Create a documentation page for every production monitor

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: trink, Assigned: trink)

Details

We should have documentation for every production alert we generate. - Title - Description of what is being monitored - Corrective action to be taken - List of possible root causes - List of additional diagnostic plugins available to drill-down and isolate the root cause with instruction on how to use/run/interpret the results
Summary: Create a documentation page for every production moniter → Create a documentation page for every production monitor
Assignee: nobody → whd
Points: --- → 3
Priority: -- → P2
Assignee: whd → mtrinkala
Per discussion, docs should go in Mana
As I am creating the first runbook wiki page: 1) I am duplicating a bunch of the documentation from the monitor code 2) The parts that are not currently duplicated in the code would be useful in the code documentation anyway 3) Manually maintaining this is in multiple places is going to be error prone 4) Embedding this documentation in the alert would be the most straight forward solution (at least for email alerting) 5) If we need an external runbook reference we should extract it from the code and automatically post it.
New proposal: I will add the runbook type documentation to the plugin and simply include a link in the alert back to the auto generated https://mozilla-services.github.io/lua_sandbox_extensions/moz_telemetry/sandboxes/heka/analysis/<plugin>.html
The general documentation is is embedded in the plugin (auto published). Also diagnostic information, specific to the alert instance, has been added to the alert message content.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.