Closed Bug 921218 Opened 12 years ago Closed 11 years ago

[Tracking] Create Datazilla Alerting Mechanism for performance regression and data ingestion disruption

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: cmtalbert, Assigned: jmaher)

References

Details

(Keywords: perf, Whiteboard: [c=automation p= s=2014.05.09.t u=])

Attachments

(1 file)

Summary data structures 12 years ago cmtalbert 6.89 KB, text/plain		Details

cmtalbert

Reporter

Description

•

12 years ago

Attached file Summary data structures — Details

This might be a dupe but I didn't see it anywhere. We need a tracking bug for the datazilla alerting work. Here is the current plan of record: = Sept 12 = * short term (to be delivered in within september) ** [jeads] get emails going out ** [kyle] object count table ** [jeads] test_run_id table ** [jmaher] - look at existing gmo alerting system and see if it could use datazilla data *** http://hg.mozilla.org/graphs/file/tip/server/analysis/analyze_talos.py *** run on graphs.m.o, queries the database directly, not a blocker, not a good sign *** I will experiment with writing a datazilla extractor and reusing most code ** summary page skeleton [stretch] * [jmaher] - 1 solution is to hack the regression alert system to pull data from datazilla instead of graphs.m.o; not ideal at all, but it could move us closer to datazilla. (how much work to have it markup tdad (test_data_all_dimensions)) ** we wouldn't have all dimensions- very limited but we could hit parity with what we have now, again not ideal * alerting system - architecture - current system does 40day blocks ** we get just under 1 object/second ** do we search for alerts on ingestion, or out of band ** table/queue of test_ids to process * revision-level alerts - delayed so it has more info ** initial alert if we find a regression ** wait X minutes/hours until we get most of the data and can send a *final* summary * summary interface (linked to from the emails) ** one graph that shows the push chain ** top level view of tests and notifications (jolly green giant type of grid) ** view previous alerts to determine if this is a new alert or possibly existing ** how could we notify new data in the UI ** how to determine the number of tests we run, actually objects *** median # of objects/push/branch *** timerange could be 7 days, make this configurable ** how to mark user interface for known bad bugs Further information can be found on the Signal From Noise etherpad: https://etherpad.mozilla.org/SignalFromNoise And a trial idea of the summary data for the alarm notification is attached (also copied from the etherpad)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Updated

•

12 years ago

Component: General → Talos

Mark Côté [:mcote]

Updated

•

12 years ago

Depends on: 949190

Mike Lee [:mlee]

Updated

•

12 years ago

Keywords: perf

Whiteboard: [c=automation p= s= u=]

Geo Mealer [:geo] -- This account is inactive after 2015-07-07

Updated

•

11 years ago

Priority: -- → P2

Geo Mealer [:geo] -- This account is inactive after 2015-07-07

Updated

•

11 years ago

Priority: P2 → P1

Mike Lee [:mlee]

Comment 1

•

11 years ago

Hi Joel, Assigning this to you since you're managing this effort. The fxos-perf-alerts@mozilla.com mailing list has been created so you can update this tool's configuration to make use of it. Ben Kelly has said that most of what this bug describes has been implemented, if that's true please resolve this. Thanks, Mike

Flags: needinfo?(jmaher)

Hubert Figuiere [:hub]

Comment 2

•

11 years ago

Assigning to :jmaher so we can track this in our sprint. Joel if you think somebody is more able to be the assignee, feel free.

Assignee: nobody → jmaher

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 3

•

11 years ago

we have this working for ingestion alerts and for regressions, so the general flow is good. I assume we can mark this as done?

Flags: needinfo?(jmaher)

Mike Lee [:mlee]

Comment 4

•

11 years ago

Joel, Has everything been satisfied for this bug's dependent bug 949190? Looks like that's still out for sec-review. Thanks, Mike

Status: NEW → ASSIGNED

Flags: needinfo?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 5

•

11 years ago

it depends how far we want to take this. Right now we have alerts generated via automation for all the tests going to datazilla. I agree this should be on a more static server, but I would like to close this bug when we get it deployed there. Tweaks to the detection algorithm, adjusting tests or alert text don't fall under the scope of creating these.

Flags: needinfo?(jmaher)

Mike Lee [:mlee]

Updated

•

11 years ago

Status: ASSIGNED → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Whiteboard: [c=automation p= s= u=] → [c=automation p= s=2014.05.09.t u=]

You need to log in before you can comment on or make changes to this bug.

Bugzilla

[Tracking] Create Datazilla Alerting Mechanism for performance regression and data ingestion disruption

Categories

(Testing :: Talos, defect, P1)

Tracking

(Not tracked)

People

(Reporter: cmtalbert, Assigned: jmaher)

References

Details

(Keywords: perf, Whiteboard: [c=automation p= s=2014.05.09.t u=])

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Updated

Updated

Updated

Updated

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Updated

Attachment

General

Description

File Name

Content Type