Note: There are a few cases of duplicates in user autocompletion which are being worked on.

script to extract "active" data from balrog

RESOLVED FIXED

Status

Release Engineering
Balrog: Backend
RESOLVED FIXED
a year ago
9 months ago

People

(Reporter: bhearsum, Assigned: ninad101, Mentored)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [lang=python][good first bug])

(Reporter)

Description

a year ago
Ben, Nick, and I recently talked about ways to enhance testing in Balrog's stage environment. One of the things that came up is having up-to-date data. Because Balrog's history tables are so huge, it's not practical to import most of that data, but we came up with the idea of using "active" data instead. The smallest set of active data would be:
* The entire rules & permissions table
* Any releases that rules.mapping or rules.whitelist reference

I think it's also important to include some history data to make sure that UI and features are easily testable, so perhaps also include:
* All rules and permissions history (because it's trivial in size)
* History of the release pointed to in the rules.mapping column of the main Firefox nightly rule, capped at a certain number of entries (maybe 500-ish)?
* History of the release pointed to by the main Firefox release rule.
(Reporter)

Updated

11 months ago
Whiteboard: [lang=python][good first bug]
Hi, I would like to work on this bug. Could I get some background information, cause I don't really understand what's going on here?
Flags: needinfo?(bhearsum)
(Reporter)

Comment 2

11 months ago
(In reply to Pranaydeep from comment #1)
> Hi, I would like to work on this bug. Could I get some background
> information, cause I don't really understand what's going on here?

Hi there, thanks for your interest! This bug will mostly revolve around writing some Python and SQL to selectively extract some data from Balrog's database (based on the requirements in the description). Do you think you're comfortable enough with Python and SQL to take a look?
Flags: needinfo?(bhearsum) → needinfo?(theeviltwin)
Yes, I would love to work on this bug. But I am not really familiar with Balrog and what is being done here. Just needed a little bit of background info to get started.
Sorry to waste your time, but this is just my 4th or 5th bug and I am still not completely familiar with how things work here.
Flags: needinfo?(theeviltwin) → needinfo?(bhearsum)
(Reporter)

Comment 4

11 months ago
(In reply to Pranaydeep from comment #3)
> Yes, I would love to work on this bug. But I am not really familiar with
> Balrog and what is being done here. Just needed a little bit of background
> info to get started.
> Sorry to waste your time, but this is just my 4th or 5th bug and I am still
> not completely familiar with how things work here.

Not at all, I'm happy to help you get up to speed.

So, Balrog is our update server. It knows how to map client requests such as https://aus5.mozilla.org/update/3/Firefox/33.0/20141202185629/Darwin_x86_64-gcc3-u-i386-x86_64/en-US/release/default/default/default/update.xml, to responses such as:
<updates>
  <update type="minor" displayVersion="48.0.2" appVersion="48.0.2" platformVersion="48.0.2" buildID="20160823121617" detailsURL="https://www.mozilla.org/en-US/firefox/48.0.2/releasenotes/">
    <patch type="complete" URL="http://download.mozilla.org/?product=firefox-48.0.2-complete&os=osx&lang=en-US" hashFunction="sha512" hashValue="4d3ec75ef848ab194313859119f085d03eb59c29ea8ad5b7aecfdd070fbf22bdf1419840d0aeb8b91bfc8eea802038d642b1d293d6d81ce4435dd1586a87b488" size="83461450"/>
  </update>
</updates>

You can read more about Balrog here: https://wiki.mozilla.org/Balrog, and I'm happy to answer any questions you have.

This specific bug is about making it easier to create a test or staging database for Balrog that is based on production data. You will need to write some SQL or use the SQLAlchemy wrappers to select the following from database:
* Every row from the "rules", "rules_history", "rules_scheduled_changes", "rules_scheduled_changes_history", "permissions", "permissions_history", and "migrate_version" tables.
* Any row from the "releases" table that a rules.mapping, rules.whitelist, rules_scheduled_changes.mapping, or rules_scheduled_changes.whitelist references.
* Any row from the "releases_history" table whose "name" column is "Firefox-mozilla-central-nightly-latest", to a maximum of 500 rows.
* Any row from the "releases_history" table whose "name" column is equal to the rules.mapping value of the rule whose "alias" column is set to "firefox-release". For example, there should be one rules row with an alias set to "firefox-release". If that row's mapping is "Firefox-48.0-build", every row from "releases_history" with that "name" should be selected.

This can be implemented in a new script in https://github.com/mozilla/balrog/tree/master/scripts. If you need an example of how to run some queries in the database, have a look at https://github.com/mozilla/balrog/blob/master/scripts/manage-db.py.

I realize this might be a little overwhelming, so don't hesitate to ask more questions! Feel free to come chat at irc://irc.mozilla.org/#balrog if you like, too.
Flags: needinfo?(bhearsum)
Wow, that's a lot to read. Give me a while to digest this and I'll get back to you. Thanks for the guidance
(Reporter)

Comment 6

10 months ago
Ninad started looking into this recently. Something that we realized while he was working on it was that there's no way to get INSERT statements out of the plain mysql client. This means that the most likely path forward here is using mysqldump. This means that manage-db.py will need to shell out (ideally with subprocess.check_call), and run mysqldump.

Annoyingly, mysqldump doesn't understand mysql:///... style URLs, so manage-db will probably need to parse the dburi it receives to extract the username, password, host, and database.

Another tricky part of this is that AFAICT, mysqldump doesn't allow you to do JOINs, you may only specify tables and WHERE clauses within them. This could make it more difficult to get some of the history rows.
(Reporter)

Updated

10 months ago
Assignee: nobody → bhat.ninadmb

Comment 7

10 months ago
Commit pushed to master at https://github.com/mozilla/balrog

https://github.com/mozilla/balrog/commit/10634f5993a37fe9c719fd70377d8a4f92f18ba0
Bug 1284481: script to extract "active" data from balrog (#132). r=bhearsum
(Reporter)

Comment 8

10 months ago
I updated the checked-in sample data shortly after landing this, which uncovered some additional issues, mostly around the test we have that validates the sample data by importing it into an sqlite database. I was able to work around some of the issues by setting --skip-add-locks --skip-add-drop-table --no-create-info  in all the mysqldump commands, but it looks like there's an issue with the escaping of single quotes in some of the rules, and TestSampleData returns errors like:
TransactionError: ('(OperationalError) near "on": syntax error',)

Presumably because "on" is immediately preceded by things like: OS\'

I've left this script checked in, but reverted the sample data.

Comment 9

10 months ago
Commit pushed to master at https://github.com/mozilla/balrog

https://github.com/mozilla/balrog/commit/94d1cdea726e1397bf5e17495b2a1bc19895cc0d
bug 1284481: script to extract "active" data from balrog (#139). r=bhearsum
(Reporter)

Comment 10

10 months ago
(In reply to Ben Hearsum (:bhearsum) from comment #8)
> I updated the checked-in sample data shortly after landing this, which
> uncovered some additional issues, mostly around the test we have that
> validates the sample data by importing it into an sqlite database. I was
> able to work around some of the issues by setting --skip-add-locks
> --skip-add-drop-table --no-create-info  in all the mysqldump commands, but
> it looks like there's an issue with the escaping of single quotes in some of
> the rules, and TestSampleData returns errors like:
> TransactionError: ('(OperationalError) near "on": syntax error',)
> 
> Presumably because "on" is immediately preceded by things like: OS\'
> 
> I've left this script checked in, but reverted the sample data.

Ninad tried quite a few things to fix this test, but in the end we ended up commenting it out. Now that the dumps are generated from a script instead of by a human, it's much less likely that errors will be made. Additionally, I think that having up-to-date dumps outweighs the benefit of having this test.

Thanks for all your hard work here Ninad!
Status: NEW → RESOLVED
Last Resolved: 10 months ago
Resolution: --- → FIXED
(Reporter)

Updated

9 months ago
Depends on: 1311036
You need to log in before you can comment on or make changes to this bug.