lazy incoming request matching in balrog

RESOLVED FIXED

Status

Release Engineering
General Automation
RESOLVED FIXED
5 years ago
4 years ago

People

(Reporter: bhearsum, Assigned: bhearsum)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [balrog])

Attachments

(1 attachment, 2 obsolete attachments)

(Assignee)

Description

5 years ago
Nick, I'll let you fill in the details here as you've got a better picture of it than me!
Blocks: 832466
(Assignee)

Updated

5 years ago
Assignee: nthomas → bhearsum
Rough sketch:

* drop identifyRequest from web/views/client.py's getQueryFromURL()  [this is the expensive bit we're trying to avoid]
* in AUS.py's expandRelease() we need to change the logic slightly. If a patch is for a specific mapping (rather than *) then call a function which checks if the query matches the named release blob [in normal operation this means we only try to do this for our N partials]
* the check on |rule['mapping'] == updateQuery['name']| check in db.py's getRulesMatchingQuery needs to be redone somehow
(Assignee)

Comment 2

5 years ago
Created attachment 761574 [details] [diff] [review]
lazy request matching, unpolished

This patch ended up a lot bigger than expected, but because of the dependencies on the release/rule in places I had to refactor a bit. I purposely avoiding fully refactoring the business logic to avoid ratholing.

The new dataflow is simpler and more efficient, basically:
* evaluateRules takes an update query and returns the parts of the matching rule that we care about (the release that mapping points to, and the update type). It does the same rules table queries as before, and a single, specific select to the releases table.
* createXML doesn't do any additional queries if the release is null
* if the release isn't null, createXML calls expandRelease. expandRelease will do additional, specific queries to the releases table for each patch type whose 'from' isn't '*'. Until we support multiple partials, this is 1 or 0.

There's a  couple of other random changes here too:
* Unused import removals and other things that pyflakes caught
* Change the 'test' target to always run all tests, rather than bailing when one fails.

Probably could do with a round of clean up, especially around variable names + log messages.
Attachment #761574 - Flags: feedback?(nthomas)
Comment on attachment 761574 [details] [diff] [review]
lazy request matching, unpolished

Looks good from a quick pass, will get out the fine tooth comb for a review.

> app.add_url_rule(
>-    '/update/2/<product>/<version>/<buildID>/<buildTarget>/<locale>/<channel>/<osVersion>/update.xml',
>+    '/update/2/<product>/<version>/<int:buildID>/<buildTarget>/<locale>/<channel>/<osVersion>/update.xml',

If someone sense bogus data for buildID which can't be cast to int, do we return an empty update ?
Attachment #761574 - Flags: feedback?(nthomas) → feedback+
(Assignee)

Comment 4

5 years ago
(In reply to Nick Thomas [:nthomas] from comment #3)
> Comment on attachment 761574 [details] [diff] [review]
> lazy request matching, unpolished
> 
> Looks good from a quick pass, will get out the fine tooth comb for a review.
> 
> > app.add_url_rule(
> >-    '/update/2/<product>/<version>/<buildID>/<buildTarget>/<locale>/<channel>/<osVersion>/update.xml',
> >+    '/update/2/<product>/<version>/<int:buildID>/<buildTarget>/<locale>/<channel>/<osVersion>/update.xml',
> 
> If someone sense bogus data for buildID which can't be cast to int, do we
> return an empty update ?

Yup. Flask handles this internally.
(Assignee)

Comment 5

5 years ago
Created attachment 764253 [details] [diff] [review]
tidyed up version

Not much to tidy, as it turns out. I changed the logic in evaluateRules to return early if there are no rules to get rid of a long intended block, and renamed the local partialMatches function to hasAPartial.
Attachment #761574 - Attachment is obsolete: true
Attachment #764253 - Flags: review?(nthomas)
Comment on attachment 764253 [details] [diff] [review]
tidyed up version

Review of attachment 764253 [details] [diff] [review]:
-----------------------------------------------------------------

Great cleanup! Just a few nits and the test issue we discussed to resolve.

::: auslib/AUS.py
@@ +99,5 @@
> +        if self.queryMatchesRelease(updateQuery, release):
> +            self.log.debug("Incoming query is the same as matching rule's mapping.")
> +            return None, None
> +
> +        self.log.debug("Returing release %s", release['name'])

Nit, Returing ---> Returning

::: auslib/test/test_AUS.py
@@ +18,5 @@
>              m2.side_effect = se
>              served = 0
>              tested = 0
>              while len(results) > 0:
> +                r, _ = AUS.evaluateRules(dict(channel='foo', force=force, mapping=mapping, buildTarget='a', buildID='a', locale='a'))

I'm not sure why mapping is included in the query here, is it necessary ?

@@ +19,5 @@
>              served = 0
>              tested = 0
>              while len(results) > 0:
> +                r, _ = AUS.evaluateRules(dict(channel='foo', force=force, mapping=mapping, buildTarget='a', buildID='a', locale='a'))
> +                print r

Leftover debugging ?

::: auslib/test/web/test_client.py
@@ +85,5 @@
>          # An empty update contains an <updates> tag with a newline, which is what we're expecting here
>          self.assertEqual(minidom.parseString(ret.data).getElementsByTagName('updates')[0].firstChild.nodeValue, '\n')
>  
>      def testVersion3GetWithUpdate(self):
> +        ret = self.client.get('/update/3/b/1/1/p/l/a/a/a/a/update.xml')

per IRC, we need discovered that the old code was not identifying the request correctly because of a data type mismatch between the test data here and what we have in production (for release blobs). The new code had an issue too.

::: auslib/web/views/client.py
@@ +22,5 @@
>      def getQueryFromURL(self, url):
>          query = url.copy()
>          # Query versions 2, 3 and 4 are all roughly the same in contents,
>          # and all treated the same by Balrog.
>          if url['queryVersion'] in (2, 3, 4):

This guard on queryVersion may not be useful any more, given the way we only dispatch specific urls to handlers. Everything in the block after it looks version independent.
Attachment #764253 - Flags: review?(nthomas)
(Assignee)

Comment 7

5 years ago
Created attachment 765120 [details] [diff] [review]
fix bugs; address comments

(In reply to Nick Thomas [:nthomas] from comment #6)
> Comment on attachment 764253 [details] [diff] [review]
> ::: auslib/test/test_AUS.py
> @@ +18,5 @@
> >              m2.side_effect = se
> >              served = 0
> >              tested = 0
> >              while len(results) > 0:
> > +                r, _ = AUS.evaluateRules(dict(channel='foo', force=force, mapping=mapping, buildTarget='a', buildID='a', locale='a'))
> 
> I'm not sure why mapping is included in the query here, is it necessary ?

I think I needed this in an earlier iteration...not needed anymore though.

> ::: auslib/test/web/test_client.py
> @@ +85,5 @@
> >          # An empty update contains an <updates> tag with a newline, which is what we're expecting here
> >          self.assertEqual(minidom.parseString(ret.data).getElementsByTagName('updates')[0].firstChild.nodeValue, '\n')
> >  
> >      def testVersion3GetWithUpdate(self):
> > +        ret = self.client.get('/update/3/b/1/1/p/l/a/a/a/a/update.xml')
> 
> per IRC, we need discovered that the old code was not identifying the
> request correctly because of a data type mismatch between the test data here
> and what we have in production (for release blobs). The new code had an
> issue too.

And for posterity, digging into this ended up with us discovering that we're storing buildids and filesizes as strings in the database (because the balrog submitter doesn't convert them). We decided to keep it that way, because we don't do any computation on the values anyways. I fixed the tests to all use strings instead of ints which revealed that the casting to int that we were doing in the web routing rules actually would've broken request identification (which would've broken partials everywhere).

I went through and adjusted all of the test code to use quoted versions of buildid/filesize. I also found a couple that used an integer hashValue =\.

> ::: auslib/web/views/client.py
> @@ +22,5 @@
> >      def getQueryFromURL(self, url):
> >          query = url.copy()
> >          # Query versions 2, 3 and 4 are all roughly the same in contents,
> >          # and all treated the same by Balrog.
> >          if url['queryVersion'] in (2, 3, 4):
> 
> This guard on queryVersion may not be useful any more, given the way we only
> dispatch specific urls to handlers. Everything in the block after it looks
> version independent.

Good point. I removed this
Attachment #764253 - Attachment is obsolete: true
Attachment #765120 - Flags: review?(nthomas)
Comment on attachment 765120 [details] [diff] [review]
fix bugs; address comments

Review of attachment 765120 [details] [diff] [review]:
-----------------------------------------------------------------

lgtm, r+ with ...

::: auslib/web/views/client.py
@@ +34,3 @@
>          else:
> +            release, update_type = {}, None
> +        # passing {},None returns empty xml

Looks like this if/else can go too.
Attachment #765120 - Flags: review?(nthomas) → review+

Comment 9

5 years ago
Commit pushed to master at https://github.com/mozilla/balrog

https://github.com/mozilla/balrog/commit/82a399731f4fb2965575bb193cb22cb8b02457d4
bug 852738: lazy incoming request matching in balrog. r=nthomas
(Assignee)

Comment 10

5 years ago
Passed all tests (http://10.134.48.37:8080/job/Balrog/26/) and even took us from 270 -> 315 matching snippets in the snippet comparison tests. I assume that increase is because request matching is working better now...but I'm going to dig down and see which specific tests are now passing. Leaving this open until that's figured out.

Before job: http://10.134.48.37:8080/job/Balrog%20Snippet%20Comparison/31/
After job: http://10.134.48.37:8080/job/Balrog%20Snippet%20Comparison/32/
(Assignee)

Comment 11

5 years ago
Did some more digging into this. Turns out that there are 45 more passes after this patch, and every single one of them is an Android snippet (that's every single Android test in fact). The reason they failed when bug 879813 landed was because there weren't rules setup for Android right away - I set them up shortly after landing that patch. I double checked that by rerunning all of the comparison tests against the latest configuration of the Balrog dev server but without this patch -- I got 315 passes.

So, given that, this is fixed.

There was one small follow-up to make us not error out if a release can't be found: https://github.com/mozilla/balrog/commit/fef2349150705199e1999a23481966b60368031c
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.