Closed Bug 1020537 Opened 11 years ago Closed 10 years ago

Find a cluster for Datazilla Alerts ES data

Tracking

(Not tracked)

Status:

RESOLVED WONTFIX

People

(Reporter: ekyle, Unassigned)

References

Details

(Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/213] )

Kyle Lahnakoski [:ekyle]

Reporter

Description

•

11 years ago

Datazilla Alerts is a cron job responsible for detecting regressions in the Talos and B2G performance data. (https://github.com/klahnakoski/datazilla-alerts). It is currently in a pre-alpha state. It uses an ES index to query the various combinations of test results quickly. We (the Automation and Tools Engineering team) require an ES instance to hold the index for each of Talos and B2G performance data: http://elasticsearch-zlb.webapp.scl3.mozilla.com/ seems most appropriate. The Talos index is: size: 51.7G (154G) docs: 16,483,564 (20,892,329) The B2G index is much smaller: size: 602M (1.76G) docs: 232,286 (275,220)

Kyle Lahnakoski [:ekyle]

Reporter

Updated

•

11 years ago

Blocks: 1023372

Philippe M. Chiasson (:gozer)

Updated

•

11 years ago

Whiteboard: [kanban:https://kanbanize.com/ctrl_board/4/234]

Mark Côté [:mcote]

Comment 1

•

11 years ago

I think this is the best long-term solution to avoid problems like bug 1033145. I think the cluster should be on bare metal and should have a very sizable disk. Given that every developer has a laptop with at least 250 GB of space, I hope it is possible to get a terabyte disk or more. We will only be adding more alert data over time, and this system will become more and more important to detect regressions in various products.

Mark Côté [:mcote]

Comment 2

•

11 years ago

Updating the summary to reflect the fact that we just need a place for this data; the webapp cluster is a possibility, but we're not married to it. Kyle, have the sizes of these indices increased significantly since this bug was filed? Also, what is the estimated growth for them, and what indices are we planning to add in the future (and how big/what growth)?

Summary: Permission to add index to http://elasticsearch-zlb.webapp.scl3.mozilla.com/ → Find a cluster for Datazilla Alerts ES data

Mark Côté [:mcote]

Updated

•

11 years ago

Flags: needinfo?(klahnakoski)

Comment hidden (obsolete)

Mark Côté [:mcote]

Comment 4

•

11 years ago

Do we need to keep this data forever? In just two years we'll be way past 1 TB...

Kyle Lahnakoski [:ekyle]

Reporter

Comment 5

•

11 years ago

A big mistake above, I forgot about the ES replication (x3 the drive space). To answer Mark: I do not think we need two years. One year is enough, and 6 months is tolerable. Furthermore, it is divided among all machines I will repeat my previous comment: Talos currently grows at about 30Gb/month, lets say 40Gb/month for safety. Assume we need no more than 12 months, plus another 3/7 (so we do not trigger the 70% disk warning**), multiply by 3 for ES replication: This gives us 3*(1+3/7)*12*40Gb = 2057Gb. B2G is 3*2.4Gb (already has a year), Eideticker is 3*0.7Gb for 9 months, so no more than 3*20Gb for those two indices. Note this space requirement is spread over all N nodes (Assume 7 nodes, we have 2117Gb/7 = 302Gb per machine). There are no plans for more indices, but as long as we have the option to add nodes we can handle unexpected growth. 1Gb = 2^9 bytes ** ES will require more drive space when it compacts shards.

Kyle Lahnakoski [:ekyle]

Reporter

Comment 6

•

11 years ago

If performance data is to be made public, then it can not reside on the private cluster in the long term. Let's keep this bug open.

Kyle Lahnakoski [:ekyle]

Reporter

Updated

•

11 years ago

Updated

•

11 years ago

Whiteboard: [kanban:https://kanbanize.com/ctrl_board/4/234] → [kanban:https://webops.kanbanize.com/ctrl_board/2/213]

:Atoll

Comment 7

•

10 years ago

I spoke with :ekyle (reporter). The architecture is still being designed upstream and may or may not involve The Cloud, so until this is more actionable we decided to WONTFIX this for now.

Status: NEW → RESOLVED

Closed: 10 years ago

Resolution: --- → WONTFIX

BMO Automation

Updated

•

6 years ago

Product: Infrastructure & Operations → Infrastructure & Operations Graveyard

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Find a cluster for Datazilla Alerts ES data

Categories

(Infrastructure & Operations Graveyard :: WebOps: Other, task)

Tracking

(Not tracked)

People

(Reporter: ekyle, Unassigned)

References

Details

(Whiteboard: [kanban:https://webops.kanbanize.com/ctrl_board/2/213] )

Crash Data

Security

(public)

User Story

Description

Updated

Updated

Comment 1

Comment 2

Updated

Comment 3

Comment 4

Comment 5

Comment 6

Updated

Updated

Comment 7

Updated