Closed Bug 1279580 Opened 4 years ago Closed 4 years ago

Explore using ReDash + Treeherder to analyze job + performance data in automation

Categories

(Tree Management :: Treeherder, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: wlach, Unassigned)

References

Details

(Whiteboard: [metrics])

It seems as if people have had good luck using Re:Dash for analyzing telemetry data at https://sql.telemetry.mozilla.org/

I would encourage anyone not familiar with re:dash to check out the above link: it seems like a very powerful tool. In particular, you can automatically generate charts and dashboards by expressing queries in a very intuitive SQL syntax.

In any case, I'd like to explore using it to analyze some things about our automated testing that we care about, that are currently stored in Treeherder:

* Aggregated end-to-end times of pushes (developer productivity initiative)
* Higher-level dashboards of key perfherder metrics
* Perfherder alert resolution rates
* ... (probably lots of other stuff I haven't thought of)

There is also the intriguing possibility of comparing our automation data against telemetry data if we're using the same system. Not sure if that makes sense, but just throwing the idea out there.

So I think the first step, from what I understand from :rvitillo, is just to import the treeherder database. Treeherder data is currently stored in a set of mysql databases in Amazon RDS (well, actually the production version is in SCL3, but that's soon to change, so just assume RDS). It's about 200 Gigs big (which stays fairly constant, as we only store a 4-month window worth of data)

Treeherder's database architecture is currently a bit misdesigned, in that much of the data it stores is distributed across a bunch of sub-databases. I'm in the process of rectifying this in bug 1178641, which will move all the data to a single database (I'm hoping to wrap up by early next quarter). I think we could start doing the migration with the database as it exists (all the performance data, for example, is already there) -- I assume re:dash is capable of coping with schema changes?

robotblake, rvitillo: how much work would it be to suck a mysql database into the existing redash instance?
(In reply to William Lachance (:wlach) from comment #0)
> robotblake, rvitillo: how much work would it be to suck a mysql database
> into the existing redash instance?

Should be pretty easy. Blake can help you with that.
Flags: needinfo?(bimsland)
We'll need to add the external IP of the re:dash server to the ingress rules on the RDS instance(s), and probably would be worth it to add a read-only user to the databases for re:dash to use, but other than that all I'll need is the connection info.  In regards to schema changes, re:dash just connects and runs the queries directly so you might break existing saved queries (and their associated graphs) but that's it.

tl;dr: Not much work at all.
Flags: needinfo?(bimsland) → needinfo?(wlachance)
Adding hcrince who is looking into this class of tools.  Mozilla is already using Tableau which is very similar.
There is currently about 18million jobs (320million individual steps) in ActiveData, which could be poured into the Telemetry system.  ActiveData's ETL pipeline parses the Mozharness steps into records and could be redirected to Telemetry's system.  We can add the TreeHerder data while we do this.  Re:dash can then pull from there.

The intent would be to let Telemetry's system deal with the query load, and be responsible for long term archiving of TreeHerder data.
:ekyle moving the ActiveData dataset to telemetry sounds a bit ortogonal to this bug, could you please open a new one to explore/shape that idea? Feel free to add me to the CC list if you do so.
Let's wait until the Heroku migration is complete to look at this.
Flags: needinfo?(wlachance)
Is there a bug to track the Heroku migration?
Whiteboard: [metrics] Wait until the Heroku migration is complete
(In reply to Armen Zambrano [:armenzg] - Engineering productivity from comment #7)
> Is there a bug to track the Heroku migration?

There is, yeah (bug 1176484). I'll set it as a dep.
Depends on: treeherder-heroku
wlach: should we remove this bug from the list of dependencies of bug 1278680 (try-end-2-end project)? I believe we won't be tackling it this quarter.
(In reply to Armen Zambrano [:armenzg] - Engineering productivity from comment #9)
> wlach: should we remove this bug from the list of dependencies of bug
> 1278680 (try-end-2-end project)? I believe we won't be tackling it this
> quarter.

So I learned last week that using redash won't work well for the use case we had in mind for that project (analyzing job times) until bug 1178641 is fixed and we have all the data in a single database. I think it probably does make sense to remove this as a dependency for that project, at least for now: I've resorted to doing manual mysql queries for my analysis needs.
Thanks Will for adding all the context.
Dropping dependency relationship.
No longer blocks: thunder-try
Depends on: 1311468
I think we are pretty close to where we can start experimenting with this. I filed bug 1311468 to create a readonly mirror of the treeherder database we should be able to use as a data source here.
I hope this extract is just a diff-since-last-extract[1], and in some popular format[2] so we do not need to spin up a database to access the records.

[1] I do not mean a fine grained diff; just a list of all records that have had some value change since last time; hopefully it will include all the related records (details, notes and stars) for completness.
[2] like JSON!!  :)
No longer depends on: treeherder-heroku
Ok, time to give this another try! We now have a readonly instance of the treeherder database, and it contains a lot of useful data (and will have all the data in a week or two). Blake, are you still the right person to talk to about this?
Flags: needinfo?(bimsland)
(In reply to William Lachance (:wlach) from comment #14)
> Ok, time to give this another try! We now have a readonly instance of the
> treeherder database, and it contains a lot of useful data (and will have all
> the data in a week or two). Blake, are you still the right person to talk to
> about this?

Blake: ping again. Could you set this up for us? Basically all we want is the new treeherder database set up as a new datasource for sql.telemetry.mozilla.org. Earlier you said:

"We'll need to add the external IP of the re:dash server to the ingress rules on the RDS instance(s), and probably would be worth it to add a read-only user to the databases for re:dash to use, but other than that all I'll need is the connection info.  In regards to schema changes, re:dash just connects and runs the queries directly so you might break existing saved queries (and their associated graphs) but that's it."

Could you find some time to do this soon? It sounds like it should only take a minute or so and I was hoping to have this set up this week (before Hawaii). Ping me on irc (I'm :wlach) for credentials to the read-only mirror.
This should be all working now!
Flags: needinfo?(bimsland)
This probably happened off-bug, but just to confirm (it was part of our RRA for allowing people access):
* what client library and version are you using to access the MySQL instance?
* if it uses libmysqlclient, what version is installed?

It's just I found this commit:
https://github.com/robotblake/redash-tweaks/commit/421c93d15d98e212db10d253f1a06956c7f9777f
(unrelated, but this isn't treeherder's CA cert, but Amazon RDS's combined CA cert)

Which makes me thing redash is using a slightly forked version of the redash Docker file, which is using Ubuntu trusty's insecure libmysqlclient:
https://github.com/getredash/redash/blob/196177021c6d0b0ccd50ecbb19ca6fef1ca7160a/Dockerfile#L9

Along with the not-maintained MySQL-python (though even the newer fork https://github.com/PyMySQL/mysqlclient-python is also vulnerable): 
https://github.com/getredash/redash/blob/196177021c6d0b0ccd50ecbb19ca6fef1ca7160a/requirements_all_ds.txt#L5

I'm hoping this was discussed before the credentials were handed over (given bug 1311468 comment 36), and that Mozilla's redash instance is actually using a different setup from above?
Flags: needinfo?(wlachance)
Flags: needinfo?(bimsland)
(We may end up deciding this is an acceptable risk, however this should be a concious, joint decision and not something we incur without realising)
Let's discuss this in a confidential bug: filed bug 1320996.

(In reply to Ed Morley [:emorley] from comment #17)
> This probably happened off-bug, but just to confirm (it was part of our RRA
> for allowing people access):
> * what client library and version are you using to access the MySQL instance?
> * if it uses libmysqlclient, what version is installed?
> ...
Flags: needinfo?(wlachance)
Flags: needinfo?(bimsland)
Depends on: 1321807
Whiteboard: [metrics] Wait until the Heroku migration is complete → [metrics]
Let's mark this as fixed.
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Blocks: 1335172
You need to log in before you can comment on or make changes to this bug.