Closed Bug 862898 Opened 11 years ago Closed 11 years ago

start using Anemometer

Categories

(Data & BI Services Team :: DB: MySQL, task, P3)

x86
macOS

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: scabral, Assigned: scabral)

References

Details

(Whiteboard: [2013q4] November)

for reals. https://github.com/box/Anemometer Will help a bunch
Whiteboard: [2013q3] July → [2013q3] August
Priority: -- → P3
Fascinating, in reading the README, it looks like Anemometer reads query_review and query_review_history tables created by pt-query-digest. So getting this into place is going to be interesting, to say the least.

Not only would we need a webserver with PHP, but we'd need to copy all the logs there.....

at least...for MySQL 5.5 and below. For MySQL 5.6, it looks like there is likely support for Anemometer using performance_schema:

http://www.verious.com/tutorial/feature-preview-anemometer-and-my-sql-5-6/

I unpacked the "master" zip file I got from https://github.com/box/Anemometer before realizing that
Whiteboard: [2013q3] August → [2013q3] September
...before realizing that this will probably need webops' help (just a little).

I'm happy to repurpose the scalebase boxes to have this functionality, one per DC.
Whiteboard: [2013q3] September → [2013q3] September webops
webops is OK, having secops look at it. :ulfr
Whiteboard: [2013q3] September webops → [2013q3] September
I took a quick look at the code, and this will need a code review. Flagging for triage.

From a platform standpoint, does the Anemometer server need network access to all the target databases?
Flags: sec-review?
accessing directly is the holy grail as that will give the most information. If it's deemed that that absolutely can't happen, we could set up copying slow query log files over, but that does not give a real-time picture.
A couple questions:
* Is the direct access read-only?
* Does it give access to actually data, or just stats?
* How many installations do we need ? One to rule all the databases, one per datacenter?

I'm OK to target for direct access, and look at what that implies from a security standpoint. The code review will help too.
Answers:


* Is the direct access read-only?
 - yes, and it does not even need to read user databases, just a system database called "performance_schema". We can make users that can only read the system database and have no read access to any of the user/application data.

* Does it give access to actually data, or just stats?
 - It gives access to the equivalent of the general query / slow query log.


* How many installations do we need ? One to rule all the databases, one per datacenter?
 - One per datacenter would be easiest, we wouldn't need to open a lot of additional netflows. If we had one to rule all the databases, we would have to ask for a lot of cross-DC netflows to the dbs that aren't in the same data center. We have the machines to do one per datacenter already, so that's fine by us.
Sounds good. Do you have a test installation I can poke at?
Not yet. I can set it up with access to a non-critical db this week.
Flags: sec-review? → sec-review?(amuntner)
Assignee: server-ops-database → scabral
Added the webapp::base package to the puppet manifest for scalebase1 in scl3. This should give us the basics, including apache.
Whiteboard: [2013q3] September → [2013q4] October
Whiteboard: [2013q4] October → [2013q4] November
The webserver is up and running, but I can't access http://scalebase1.db.scl3.mozilla.com/index.html - I'm getting a 403 forbidden error.
11:38 < solarce> sheeri: the way our apache configs from webapp:: work is that 
                 you have to put a virtualhost config in 
                 /etc/httpd/mozilla/domains/foo.conf with an explicit 
                 ServerName or ServerAlias
11:38 < solarce> sheeri: otherwise it gives you a 403
cturra got things working; /data/www/anemometer.
It was working enough that I could see the group by feature wasn't working (which means any of the count/sum stuff didn't work properly. I also found a bug in the mysql-save_history script which I submitted (calling EXECUTE @stmt instead of EXECUTE stmt).
unfortunately puppet is killing configs in some way, so anemometer isn't accessible...
This acts as a graphical user interface to the events_statements_summary_by_digest table. It's working, but there seem to be several bugs:

GROUP BY is never honored, so if you use any of the aggregates you only get one row.

clicking on any of the checksum links gives a page with the anemometer header with menu items, and then the rest of it is blank. cf http://scalebase1.db.scl3.mozilla.com/index.php?action=show_query&datasource=dev1_scl3&checksum=6985dd8cd9a8da2ed04824cf0f3f8162
Also made 
https://github.com/box/Anemometer/issues/96

Fixed some of the issues - GROUP BY is honored when you use history tables, and I have set up a MySQL event to run every 5 minutes on dev1 (should propagate properly to dev2).

This is a great tool so far, though I think there could be tons of improvements made. I might slap it on sumotools and/or sumo and have r1cky take a look at it as a first pass.

So far, it's installed, and working. That's pretty good!

I still have to puppetize the config - I can throw this against all our servers, but the config is very redundant, so we can templatize it, like we did with the newrelic stuff.
Had to yum install bcmath for the digest page to work. Had to add date.timezone='UTC' in php.ini to get rid of warnings when using the date() function.

Checksum conversion from hex to dec in "Anemometer.php" line 641 is wrong - firstly, it's using capital letters, whereas MySQL does not, so I was getting that it didn't know what "a" was. Once I did that, calculations were still wrong, e.g. 

checksum hex - ebd4f01d29940465bd2fad27544946b5 
actual decimal - 105866025350050960000
decimal calculated by anemomoeter - 313474216032254030094408162781080143541

I changed all instances of 
        $checksum = $this->translate_checksum(get_var('checksum'));
to
        $checksum = get_var('checksum');
in anemometer.php

And the digest page now works!
Well, Anemometer is working, but we can't do more until the group by issues are fixed. :(

I'm going to call this resolved, as it's up and running. It's not particularly useful as of yet, though.

See bug 950721 for the placeholder for when the issues are fixed.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Product: mozilla.org → Data & BI Services Team
Flags: sec-review?(amuntner) → sec-review?
You need to log in before you can comment on or make changes to this bug.