Closed Bug 745318 Opened 8 years ago Closed 8 years ago

Setup Datazilla in Production

Categories

(Developer Services :: General, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jeads, Assigned: fox2mike)

References

Details

(Whiteboard: [2012q3])

We are replacing the existing production graph server and database with a new system called datazilla.  The new system collects all of the raw talos data and a variety of different types of meta data associated with running talos performance tests.  We are anticipating data growth rates 10-20X the existing system due to the incorporation of raw/meta data.  The new system will also house a variety of different types of test data: talos data, the new Stone Ridge network project, and several other ateam projects that generate product test data.  

The data is going to be stored in a set of MySQL databases that can initially exist on the same physical database instance.  There will be no requirement for physical co-location of the databases, so if they need to be juggled around at a later date it will not be a problem for the web service.   These databases will be scaling at different rates depending on the different project demands.

We would like to start this system from the ground up with a master and read only replica.  There will be multiple systems writing to the master simultaneously and multiple client applications reading the data across several projects.

One of our goals with datazilla is to consolidate all data centric test webservices/databases that the ateam is responsible for, to make better use of our resources and deliver applications faster.  We have a deadline of releasing this system by the end of Q2.  The existing production graph server and database will need to co-exist for a period of time, we will need at least ~2 months of data collection before we can fully transition to the new system.  

We will need the new production VM database by the end of May.
 
VM Requirements
---------------
1.) 1 VM 12GB RAM, 1 TB disk, master database
    1 VM 12GB RAM, 1 TB disk, read only replica database
    1 VM 8GB RAM, 50GB disk, web services/applications and UI data cache

2.) The webservice needs to be accessible to talos workers to deposit data.  It will also serve the client applications which will be publicly accessible.
Looks like a devservices thing, but roping in a few people to co-ordinate. 

Amy,

Do we want this close to the releng BU? Since that's going to be the bulk of the work.

Sheeri/Matt,

Comment #0 has specs for MySQL databases, please sign off on those. We need to look at scalability as well, feel free to ask questions so we have an idea of what we need to get going here. Also, s/VMs/hardware (if needed).

Johnathan,

You'll probably need to file bugs with Security Assurance to help co-ordinate things, secreviews and then like, especially since this is new service.
Assignee: server-ops → server-ops-devservices
Component: Server Operations → Server Operations: Developer Services
QA Contact: phong → shyam
Assignee: server-ops-devservices → shyam
Having the service primarily be in scl3 vs phx seems logical since the talos infrastructure will eventually be there (assuming that talos is the bulk of the data), but for right now it's all in scl1.  

My biggest concern is that this HA so that we minimize tree closures due to graphs failures if it's a production requirement to make sure the data gets to these servers.  So that would mean, I believe, multiple clustered web front ends, and multiple database servers that can handle being promoted to master.  Perhaps the same approach that was taken for the buildbot databases?
 
> My biggest concern is that this HA so that we minimize tree closures due to
> graphs failures if it's a production requirement to make sure the data gets
> to these servers.  So that would mean, I believe, multiple clustered web
> front ends, and multiple database servers that can handle being promoted to
> master.  Perhaps the same approach that was taken for the buildbot databases?

It will be a production requirement. Having some redundancy in the web front end and multiple database servers that can act as a master would be a great idea to reduce tree closures due to graph web service failures.
How much data will need to be in the "working set" of data for datazilla? It's hard to account for the specs of a server if we don't know how much data is being processed at one time (approx of course). For example, how much is about 2 months' worth of data, in the expanded 10-20x range? That seems to be the minimum required for production....

I want to make sure you have the appropriate database for your system, too, and with large amounts of data, you may want to use MyISAM or a third-party storage engine like TokuDB.
Whiteboard: [2012 q2 goal]
(In reply to Amy Rich [:arich] [:arr] from comment #2)

> My biggest concern is that this HA so that we minimize tree closures due to
> graphs failures if it's a production requirement to make sure the data gets
> to these servers.  So that would mean, I believe, multiple clustered web
> front ends, and multiple database servers that can handle being promoted to
> master.  Perhaps the same approach that was taken for the buildbot databases?

Sure. The application also needs to fail gracefully vs closing trees in case of an issue. With a combination of both, we should be able to prevent too much trouble.
(In reply to Sheeri Cabral [:sheeri] from comment #4)
> How much data will need to be in the "working set" of data for datazilla?
> It's hard to account for the specs of a server if we don't know how much
> data is being processed at one time (approx of course). For example, how
> much is about 2 months' worth of data, in the expanded 10-20x range? That
> seems to be the minimum required for production....
> 
> I want to make sure you have the appropriate database for your system, too,
> and with large amounts of data, you may want to use MyISAM or a third-party
> storage engine like TokuDB.

I was thinking of using INNODB as the storage engine (mostly because that's what I have experience with), I don't think MyISAM will be appropriate for this task.  I don't have experience with TokuDB but reading a bit about it, it looks like a great choice.  Do you have experience implementing/administrating it?  It looks like a good choice for this project.

We just enabled all trees to deposit the full data set we will be collecting in our development database.  This was set up in a try/catch block so it will fail silently and not cause a tree closure if something goes wrong.  This was enabled last Friday.  The total size of the development database is now ~950MB using INNODB as the storage engine.  Based on that we're looking at an approximate growth rate of 1GB/5 days.  I think it will be greater than this, since the rate of changesets pushed is probably higher on weekdays vs the weekend.  I will have better numbers for a growth rate by the end of this week.  At that point we will have a full representative data set for a week of talos data for all branches.  This growth rate will fluctuate over time based on project activity on different branches.  

We need to identify regressions using the talos data, so the ability to track changes over time is critical.  I think keeping at least 6 months of data would be a good idea.  For most analysis tasks 3 months will be enough but it would be useful to be able to look further back in time. 

Using 1GB/5 days, we would get ~40GB/6 months.  I think it would be a good idea to plan on having 2X this so maybe ~80GB/6 months. I will update these numbers by the end of the week.  

I don't have any data for the other projects we would like to manage in this system, most of them seem to be much smaller rates of growth.  As I get quantitative info for these projects I will let you know.
Those numbers sound good. TokuDB is good for *tables* with over about 110 million rows, which is about when InnoDB falls apart. MyISAM is very good for analytics queries, because there's a lot of metadata stored, but its full-table write-locking might make it inappropriate.

TokuDB is also commercial, not for free, but it's very reasonable (something like $2000 per year per uncompressed 100 Gb stored in TokuDB tables, and you can still use innodb/myisam/whatever tables alongside it). It also has great compression, so if we do use TokuDB the disk space usage wouldn't be as high.

I have in fact implemented and administered it in the past, and I think it's a good fit for this project too, based on what I know. This assumes the growth is in 1 or 2 tables....if there are 800 tables with 100Mb each in them, TokuDB isn't appropriate, but 1 table with 80G in it or 4 tables with 20G each would make it a very good approach.
The datazilla project is now up in a repo that can be reached here https://github.com/jeads/datazilla .  The README has a pretty good description of what we're trying to do.  There is a section that describes the data model in detail https://github.com/jeads/datazilla#data-model and also an additional section that describes how we are planning to manage project data across multiple projects and databases https://github.com/jeads/datazilla#data-model-todo.  Any feedback or advice would be greatly appreciated.

The schema has 21 tables in total (https://github.com/jeads/datazilla/blob/master/model/sql/template_schema/).  After 15 days of data collection the database is 4.8 GB using INNODB as the storage engine for all tables.

The size breakdown for the 4 largest tables is:
-------------------------------------------------------
table            row count/15 days         INNODB ibd file size/15 days
-------------------------------------------------------
test_run         183,366                     40  MB
test_data        183,583                     736 MB
test_value       17,965,870                  1.5 GB
test_aux_data    38,198,460                  2.4 GB

At this rate of growth for one year of data we will have:

~120 GB/year

-------------------------------------------------------
table            row count/1 Year         INNODB ibd file size/1 Year
-------------------------------------------------------
test_run         4,461,906                 1  GB
test_data        4,467,186                 18 GB
test_value       437,169,305               38 GB
test_aux_data    929,495,860               58 GB

I think it would be wise to plan on at least 2X these quantities for the talos data.  Again I cannot speak for the other projects we will be managing with this system.

The table, test_data, holds the JSON blob that the talos bots send.  This blob is then parsed and stored in a relational form in the RDBS schema.  We're storing the blob so we can reference it when the JSON is malformed but the blob itself does not need to be in the RDBS.  Also, we would like to separate the process of the talos workers loading data through the web service from deserialization and translation into the relational model.  That way we can better manage the load on the RDBS and provide 100% uptime for the workers sending the JSON blobs.  

Rhelmer mentioned that you also maintain an HBase for socorro, I think that would be a good fit for this.  Would it be possible to store the JSON blobs in a key/value store somewhere?
I think storing JSON blobs separately is a good idea. key/value store, or document store would be good - since you might want to search on text.
I personally don't maintain HBase, maybe mpressman does? I'd have to talk to whoever does that to see if it's a good fit.
(In reply to Sheeri Cabral [:sheeri] from comment #10)
> I personally don't maintain HBase, maybe mpressman does? I'd have to talk to
> whoever does that to see if it's a good fit.

IT doesn't manage HBase IIRC. That's all metrics.
It does not need to be HBase.  Is there a key/store that IT prefers to use, this system would require 40-100GB/year.  It would be a critical system that would need as close to 100% up time as possible.  Do you have any suggestions?  It would be great to have something to develop against at this point, even if we just install it ourselves on a VM in our dev environment...
We're trying to tie up the loose ends for what we're delivering in Q2.

Is there more information needed to proceed? 

Is there an estimated delivery date?

Does IT have a recommendation for a key/value object or document store we could target development to?
OK, so let's review the requirements:

1) Must be able to be HA with very little downtime (and the app will be designed to fail gracefully
2) from comment 8, the total projected table size for 1 year is 1+18+38+58=115, and you want to be safe and double it, so 230 Gb.
3) Also from comment 8, after a year the tables will have EACH anywhere from 4.5 million rows to 930 million rows.

I'd recommend using Tokutek on MariaDB.

The pricing list is here: http://www.tokutek.com/wp-content/uploads/2012/04/tokudb-price-list-rev9.pdf 

Note that Tokutek is free for testing, and free for production use under 50 Gb. We should plan to buy it, but we don't need to buy it in order to let you start using it.
(In reply to Sheeri Cabral [:sheeri] from comment #14)
> I'd recommend using Tokutek on MariaDB.
> 
> The pricing list is here:
> http://www.tokutek.com/wp-content/uploads/2012/04/tokudb-price-list-rev9.pdf 
> 
> Note that Tokutek is free for testing, and free for production use under 50
> Gb. We should plan to buy it, but we don't need to buy it in order to let
> you start using it.

Ok great, Thanks!  We will set up Totutek on MariaDB in our dev environment.  

Do you have an estimated delivery date for the VMs?

Does IT have a recommendation for a key/value object or document store we could target development to?

This is for the json blobs we get from talos, they are currently stored in the RDBS table test_data that will be 20-40GB/year.  We would like to write the json structures to an object/document store and then pull them into the RDBS so we can control the load, avoid potential concurrency issues, and make database modifications without requiring downtime.  

We would not need to search across the json data, something with key/value access that will have very little downtime.  

We would prefer to choose something that IT is comfortable supporting.
Are you doing any indexing of the JSON metadata? If so, TokuDB would be a fit. Otherwise, I'd go with something tried and true, like Sphinx or Elastic Search. TokuDB is not optimized for long blobs....but if you were splitting up the JSON and putting it into separate fields, TokuDB is perfect.
(In reply to Sheeri Cabral [:sheeri] from comment #16)
> Are you doing any indexing of the JSON metadata? If so, TokuDB would be a
> fit. Otherwise, I'd go with something tried and true, like Sphinx or Elastic
> Search. TokuDB is not optimized for long blobs....but if you were splitting
> up the JSON and putting it into separate fields, TokuDB is perfect.

The JSON metadata is going to be deserialized into TokuDB.  We just want to store the JSON serialized blobs separately so we can control the RDBS load, avoid potential concurrency issues, and make database modifications without requiring downtime.

There is no need to index the JSON blobs in the document store.  Elastic Search indexes the JSON structure for search, which we don't really need... I suppose we could use Elastic Search anyway or just use an additional MySQL database and store the JSON data in un-indexed blobs.  Is there any other "no-frills" document store that IT is comfortable supporting?  If not, do you have a preference/opinion between using Elastic Search vs. another MySQL database?
If it's just un-indexed blobs, I'd go for using a separate table, MyISAM (or Aria, actually, because I'd recommend we use TokuDB for MariaDB - Aria is crash-safe).
This is a Q2 goal, and it's almost the middle of June. Is there anything you need from the Systems or Database teams on this?
Depends on: 763807
We went ahead and set up mariadb with the TokuDB storage engine for the data tables and a separate object store using the Aria storage engine.  We have tested TokuDB with our development data and are in the process of testing Aria.

We need all of the items listed in this bug:

VM Requirements
---------------
1.) 1 VM 12GB RAM, 1 TB disk, master database
    1 VM 12GB RAM, 1 TB disk, read only replica database

* These VM's will run mariadb using the TokuDB storage engine.  The database instances run on these VMs will have the schemas documented here https://github.com/mozilla/datazilla/tree/master/datazilla/model/sql/template_schema.
  
2.) 1 VM 8GB RAM, 500GB disk, to be used as the object store

* This VM would host the object store, which will be mariadb using the Aria storage engine.  It will also host the web service that talos and other projects post json data to.  Talos test runners on build machines will need to send test data to this web service.  It needs to provide as close to 100% up time as possible.

* Either the VM hosting the master database or the VM hosting the object store will need to be running a scheduled job that transfers serialized JSON objects from the object store to the master database.
    
3.) 1 VM 8GB RAM, 50GB disk, web services/applications and UI data cache

* This VM would host the web application and would provide general web services to different clients using the data.  It will also serve the client applications which will be publicly accessible.

It's unclear to me from reading the comments who is responsible for what in IT.  If there is any more information anyone needs please feel free to contact me directly.
The large RAM and disk requirements make #1 and #2 poor candidates for virtualization. I think blades would work much better here.
(In reply to Dan Parsons [:lerxst] from comment #21)
> The large RAM and disk requirements make #1 and #2 poor candidates for
> virtualization. I think blades would work much better here.

The disk requirements are based on the growth rate from data collected from the production test environment and are reasonable.  The RAM requirements of the databases could probably be reduced to 8GB but I would prefer not go much lower.  Some of the investigator queries can be expensive.

The way the system architecture works, we could use multiple VMs with individual databases for different projects instead of 1 large machine, if it's easier.  The largest single project we need to host data for is currently talos, we are expecting 120-200GB/year, and that would need to reside in a single database.

Blades would be great, if that's not doable we could split up projects across more VM's to compensate.
I agree with Dan here. With such massive hardware requirements, we're not going to put this up as VMs.

Because of the large disk requirements, this can't be put on Seamicro Xeons either and I don't think we want to run DBs off NFS. Sheeri?

That means we have to probably order HDDs and slot them into blades for this.
As this is a time-sensitive goal, I'd really love to get something running for you soon, and ordering hardware will take a while. I think you should still definitely do that, but in the meantime, would a single 8GB RAM, 250GB disk VM satisfy #1 and #2 on a temporary basis?

(we don't even have 1TB total free in phx1 NetApp right now)
(In reply to Dan Parsons [:lerxst] from comment #24)

> (we don't even have 1TB total free in phx1 NetApp right now)

I was thinking scl3 for this.
(In reply to Dan Parsons [:lerxst] from comment #24)
> As this is a time-sensitive goal, I'd really love to get something running
> for you soon, and ordering hardware will take a while. I think you should
> still definitely do that, but in the meantime, would a single 8GB RAM, 250GB
> disk VM satisfy #1 and #2 on a temporary basis?
> 
> (we don't even have 1TB total free in phx1 NetApp right now)

That would satisfy #1 and #2 for now and we would really prefer getting something into production as soon as possible, especially if it's going to take months to get hardware setup.  

Could we split #1 and #2 into two VM's for the master database and object store?

1 VM 6GB RAM, 150GB disk, master database
1 VM 2GB RAM, 100GB disk, object store and web service for data ingestion

and one VM for the UI?
1 VM 4GB RAM, 25GB disk, web service, memcached, and UI

The VM used for the object store would need to communicate with Talos test runners on build machines that will be sending test data.  We are already doing this in our development environment on s4n1.qa.phx1.mozilla.com on MPT.  The VM hosting the UI/web service will need to be publicly accessible.

If it helps we could re-purpose two or three of the development VMs (s4n*.qa.phx1) we're currently using.
It's not months if we can get the hard drives. We already have a *lot* of blades. I've asked folks about the hard disks, so please hold on till I can confirm if we can or cannot stand this up on hardware within a week.
So here's the lead time on hard drives (the market is still affected by the floods in Thailand last year)  :

1) 1TB drives - 3 weeks
2) 900GB drives - 3 weeks
3) 600GB drives - right away.

If we can live with 600GB it can happen in a short time span. If it has to be a TB, it's going to take 3+ weeks for the drives to arrive.

I'm not in favour of going with VMware if that's not the final home for the project. The effort it takes to migrate off it may not be trivial.

Where are you guys with secreviews and the like? Has this service been reviewed and ready for production?
Had a little chat with :jeads on IRC, we're going to go with the 1TB drives. I'll file a bug to get them ordered.
Replying to comment 23 - MySQL does NOT play nicely with NFS.
Changing title, this is now a meta bug.
Summary: New Graphserver/Datazilla VMs → Setup Datazilla in Production
Depends on: 764278
Depends on: 764281
Whiteboard: [2012 q2 goal] → [2012 q2 goal] Waiting on HDDs
:jeads, the webservice for datazilla, can you give me some more information about that? 

I'm wondering if we can just run it on our generic cluster (for more reliability) vs having a one off box for it. Off the top it looks like a standard django app.
Depends on: 764352
Hi Shyam,

(In reply to Shyam Mani [:fox2mike] from comment #32)
> :jeads, the webservice for datazilla, can you give me some more information
> about that? 
> 
> I'm wondering if we can just run it on our generic cluster (for more
> reliability) vs having a one off box for it. Off the top it looks like a
> standard django app.

Jeads is on PTO until mid-next-week and asked me to keep up with any questions on this bug and related bugs.

The webservice is indeed a fairly standard Django app, the unusual requirements are on the database side. So I don't see any problem with running the webservice on the generic cluster, as long as it can talk to all the databases of course, and has memcached available.

We will have several scripts / Django management commands that will need to be run as cron jobs; I assume this won't be a problem. (If it's better for IT we could look at turning those into Celery tasks instead, but at this point for our needs cron jobs are simpler.)
(In reply to Carl Meyer (:carljm) from comment #33)
> Hi Shyam,

Hey Carl,

> Jeads is on PTO until mid-next-week and asked me to keep up with any
> questions on this bug and related bugs.

Cool.

> The webservice is indeed a fairly standard Django app, the unusual
> requirements are on the database side. So I don't see any problem with
> running the webservice on the generic cluster, as long as it can talk to all
> the databases of course, and has memcached available.

Sure. All this will be available.

> We will have several scripts / Django management commands that will need to
> be run as cron jobs; I assume this won't be a problem. (If it's better for
> IT we could look at turning those into Celery tasks instead, but at this
> point for our needs cron jobs are simpler.)

Cron is fine, no problem at all. I'd rather have this on the generic cluster, backed by 5 machines than on it's own standalone webhead.

Thanks!
Depends on: 767862
Depends on: 767863
Status update :

We have the 1TB drives, well ahead of schedule (yay). They've been installed in the spare blades that have been pulled from storage. We're going to kickstart the servers (bugs have been filed) and then stand up the DB servers and the app server.
Yay! That's great news!
(In reply to Jonathan Eads ( :jeads ) from comment #36)
> Yay! That's great news!

Yup. Steps moving forward :

1) We setup the DBs. This will be done by Sheeri. She's at a conference this week, so I'm not sure about her availability. I'll let her comment on the timelines.

2) We setup the webapp. Can you point me to some docs for this? Or is this just a standard django/playdoh app? 

3) Once we're past the secreview, we open up the app setup in 2 above, to the world

and

4) Profit/Party etc :)
(In reply to Shyam Mani [:fox2mike] from comment #37)
> 2) We setup the webapp. Can you point me to some docs for this? Or is this
> just a standard django/playdoh app? 

It's not playdoh, but it's pretty typical Django, has the pure-Python dependencies vendored, uses local_settings, etc, so there shouldn't be anything unusual for you in it. Instructions are at http://datazilla.readthedocs.org/en/latest/installation/
Whiteboard: [2012 q2 goal] Waiting on HDDs → [2012 q2 goal]
As for setting up TokuDB, I registered today, but the sales rep said they will send me a link tomorrow.
Whiteboard: [2012 q2 goal] → [2012q3]
Downloading TokuDB 6.1, which is GA and supports MariaDB 5.5.24.
Copied the mariadb tokutek binary to natasha, but I'm not sure where the VM is to install the db to.....I don't see a machine name in this ticket (but then again it's got 40 comments on it and I'm not always awesome finding those things.
(In reply to Sheeri Cabral [:sheeri] from comment #41)
> Copied the mariadb tokutek binary to natasha, but I'm not sure where the VM
> is to install the db to.....I don't see a machine name in this ticket (but
> then again it's got 40 comments on it and I'm not always awesome finding
> those things.

bug 764281 comment #2 will make it easier for you :)
Got it! Thanx.
I will leave this to Shyam to close but this work is done.

M
Just verified that we have DB backups and Monitoring in place, going to call this fixed.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Component: Server Operations: Developer Services → General
Product: mozilla.org → Developer Services
You need to log in before you can comment on or make changes to this bug.