Last Comment Bug 1284481 - script to extract "active" data from balrog
: script to extract "active" data from balrog
Status: RESOLVED FIXED
[lang=python][good first bug]
:
Product: Release Engineering
Classification: Other
Component: Balrog: Backend (show other bugs)
: unspecified
: Unspecified Unspecified
-- normal (vote)
: ---
Assigned To: Ninad Bhat[:ninad101]
: Ben Hearsum (:bhearsum)
:
Mentors: Ben Hearsum (:bhearsum)
Depends on: 1311036
Blocks:
  Show dependency treegraph
 
Reported: 2016-07-05 06:10 PDT by Ben Hearsum (:bhearsum)
Modified: 2016-10-18 07:55 PDT (History)
5 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---


Attachments

Description User image Ben Hearsum (:bhearsum) 2016-07-05 06:10:09 PDT
Ben, Nick, and I recently talked about ways to enhance testing in Balrog's stage environment. One of the things that came up is having up-to-date data. Because Balrog's history tables are so huge, it's not practical to import most of that data, but we came up with the idea of using "active" data instead. The smallest set of active data would be:
* The entire rules & permissions table
* Any releases that rules.mapping or rules.whitelist reference

I think it's also important to include some history data to make sure that UI and features are easily testable, so perhaps also include:
* All rules and permissions history (because it's trivial in size)
* History of the release pointed to in the rules.mapping column of the main Firefox nightly rule, capped at a certain number of entries (maybe 500-ish)?
* History of the release pointed to by the main Firefox release rule.
Comment 1 User image Pranaydeep Singh [:theeviltwin] 2016-08-24 23:52:59 PDT
Hi, I would like to work on this bug. Could I get some background information, cause I don't really understand what's going on here?
Comment 2 User image Ben Hearsum (:bhearsum) 2016-08-25 05:24:57 PDT
(In reply to Pranaydeep from comment #1)
> Hi, I would like to work on this bug. Could I get some background
> information, cause I don't really understand what's going on here?

Hi there, thanks for your interest! This bug will mostly revolve around writing some Python and SQL to selectively extract some data from Balrog's database (based on the requirements in the description). Do you think you're comfortable enough with Python and SQL to take a look?
Comment 3 User image Pranaydeep Singh [:theeviltwin] 2016-08-25 06:06:40 PDT
Yes, I would love to work on this bug. But I am not really familiar with Balrog and what is being done here. Just needed a little bit of background info to get started.
Sorry to waste your time, but this is just my 4th or 5th bug and I am still not completely familiar with how things work here.
Comment 4 User image Ben Hearsum (:bhearsum) 2016-08-25 08:31:37 PDT
(In reply to Pranaydeep from comment #3)
> Yes, I would love to work on this bug. But I am not really familiar with
> Balrog and what is being done here. Just needed a little bit of background
> info to get started.
> Sorry to waste your time, but this is just my 4th or 5th bug and I am still
> not completely familiar with how things work here.

Not at all, I'm happy to help you get up to speed.

So, Balrog is our update server. It knows how to map client requests such as https://aus5.mozilla.org/update/3/Firefox/33.0/20141202185629/Darwin_x86_64-gcc3-u-i386-x86_64/en-US/release/default/default/default/update.xml, to responses such as:
<updates>
  <update type="minor" displayVersion="48.0.2" appVersion="48.0.2" platformVersion="48.0.2" buildID="20160823121617" detailsURL="https://www.mozilla.org/en-US/firefox/48.0.2/releasenotes/">
    <patch type="complete" URL="http://download.mozilla.org/?product=firefox-48.0.2-complete&os=osx&lang=en-US" hashFunction="sha512" hashValue="4d3ec75ef848ab194313859119f085d03eb59c29ea8ad5b7aecfdd070fbf22bdf1419840d0aeb8b91bfc8eea802038d642b1d293d6d81ce4435dd1586a87b488" size="83461450"/>
  </update>
</updates>

You can read more about Balrog here: https://wiki.mozilla.org/Balrog, and I'm happy to answer any questions you have.

This specific bug is about making it easier to create a test or staging database for Balrog that is based on production data. You will need to write some SQL or use the SQLAlchemy wrappers to select the following from database:
* Every row from the "rules", "rules_history", "rules_scheduled_changes", "rules_scheduled_changes_history", "permissions", "permissions_history", and "migrate_version" tables.
* Any row from the "releases" table that a rules.mapping, rules.whitelist, rules_scheduled_changes.mapping, or rules_scheduled_changes.whitelist references.
* Any row from the "releases_history" table whose "name" column is "Firefox-mozilla-central-nightly-latest", to a maximum of 500 rows.
* Any row from the "releases_history" table whose "name" column is equal to the rules.mapping value of the rule whose "alias" column is set to "firefox-release". For example, there should be one rules row with an alias set to "firefox-release". If that row's mapping is "Firefox-48.0-build", every row from "releases_history" with that "name" should be selected.

This can be implemented in a new script in https://github.com/mozilla/balrog/tree/master/scripts. If you need an example of how to run some queries in the database, have a look at https://github.com/mozilla/balrog/blob/master/scripts/manage-db.py.

I realize this might be a little overwhelming, so don't hesitate to ask more questions! Feel free to come chat at irc://irc.mozilla.org/#balrog if you like, too.
Comment 5 User image Pranaydeep Singh [:theeviltwin] 2016-08-25 09:32:26 PDT
Wow, that's a lot to read. Give me a while to digest this and I'll get back to you. Thanks for the guidance
Comment 6 User image Ben Hearsum (:bhearsum) 2016-09-21 07:51:33 PDT
Ninad started looking into this recently. Something that we realized while he was working on it was that there's no way to get INSERT statements out of the plain mysql client. This means that the most likely path forward here is using mysqldump. This means that manage-db.py will need to shell out (ideally with subprocess.check_call), and run mysqldump.

Annoyingly, mysqldump doesn't understand mysql:///... style URLs, so manage-db will probably need to parse the dburi it receives to extract the username, password, host, and database.

Another tricky part of this is that AFAICT, mysqldump doesn't allow you to do JOINs, you may only specify tables and WHERE clauses within them. This could make it more difficult to get some of the history rows.
Comment 7 User image [github robot] 2016-09-30 06:02:34 PDT
Commit pushed to master at https://github.com/mozilla/balrog

https://github.com/mozilla/balrog/commit/10634f5993a37fe9c719fd70377d8a4f92f18ba0
Bug 1284481: script to extract "active" data from balrog (#132). r=bhearsum
Comment 8 User image Ben Hearsum (:bhearsum) 2016-09-30 06:52:59 PDT
I updated the checked-in sample data shortly after landing this, which uncovered some additional issues, mostly around the test we have that validates the sample data by importing it into an sqlite database. I was able to work around some of the issues by setting --skip-add-locks --skip-add-drop-table --no-create-info  in all the mysqldump commands, but it looks like there's an issue with the escaping of single quotes in some of the rules, and TestSampleData returns errors like:
TransactionError: ('(OperationalError) near "on": syntax error',)

Presumably because "on" is immediately preceded by things like: OS\'

I've left this script checked in, but reverted the sample data.
Comment 9 User image [github robot] 2016-10-05 06:36:29 PDT
Commit pushed to master at https://github.com/mozilla/balrog

https://github.com/mozilla/balrog/commit/94d1cdea726e1397bf5e17495b2a1bc19895cc0d
bug 1284481: script to extract "active" data from balrog (#139). r=bhearsum
Comment 10 User image Ben Hearsum (:bhearsum) 2016-10-05 07:19:00 PDT
(In reply to Ben Hearsum (:bhearsum) from comment #8)
> I updated the checked-in sample data shortly after landing this, which
> uncovered some additional issues, mostly around the test we have that
> validates the sample data by importing it into an sqlite database. I was
> able to work around some of the issues by setting --skip-add-locks
> --skip-add-drop-table --no-create-info  in all the mysqldump commands, but
> it looks like there's an issue with the escaping of single quotes in some of
> the rules, and TestSampleData returns errors like:
> TransactionError: ('(OperationalError) near "on": syntax error',)
> 
> Presumably because "on" is immediately preceded by things like: OS\'
> 
> I've left this script checked in, but reverted the sample data.

Ninad tried quite a few things to fix this test, but in the end we ended up commenting it out. Now that the dumps are generated from a script instead of by a human, it's much less likely that errors will be made. Additionally, I think that having up-to-date dumps outweighs the benefit of having this test.

Thanks for all your hard work here Ninad!

Note You need to log in before you can comment on or make changes to this bug.