Closed Bug 832464 Opened 11 years ago Closed 4 years ago

develop a performance suite for balrog

Categories

(Release Engineering Graveyard :: Applications: Balrog (backend), defect, P3)

defect

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: bhearsum, Unassigned)

References

Details

(Whiteboard: [lang=python])

We talked about this a bit and decided that it would be great to have some sort of suite of tests that we do X queries against Balrog (hopefully representative of real load), and report back some measurement of performance. Bonus points for a measurement that doesn't reference wall-clock time or any other machine-dependent metrics. We want to be able to run this test before and after committing code changes to make sure we're aware of any performance hits we take.
Catlee suggested that we should figure out what our most expensive operations are and start counting them. db lookups and JSON encoding/decoding are probably both on that list.
Here's a proof of concept of sorts that does some counting during some of the unit tests: https://github.com/bhearsum/balrog/compare/master...perf-tests

The output is awful right now, eg:
testAddLocaleToReleaseSecondPlatform (auslib.test.test_db.TestReleasesSchema1) ... defaultdict(<type 'list'>, {'dumps': [('/home/bhearsum/Mozilla/checkouts/github/balrog/auslib/blob.py', 54, 'getJSON')], 'loads': [('/home/bhearsum/Mozilla/checkouts/github/balrog/auslib/blob.py', 50, 'loadJSON'), ('/home/bhearsum/Mozilla/checkouts/github/balrog/auslib/test/test_db.py', 966, 'testAddLocaleToReleaseSecondPlatform'), ('/home/bhearsum/Mozilla/checkouts/github/balrog/auslib/test/test_db.py', 994, 'testAddLocaleToReleaseSecondPlatform')], 'update': [('/home/bhearsum/Mozilla/checkouts/github/balrog/auslib/db.py', 821, 'addLocaleToRelease')], 'select': [('/home/bhearsum/Mozilla/checkouts/github/balrog/auslib/db.py', 767, 'getReleaseBlob'), ('/home/bhearsum/Mozilla/checkouts/github/balrog/auslib/db.py', 165, '_returnRowOrRaise')]})

We probably want to create a separate set of tests that are more representative of real world scenarios - rather than tagging along with unit tests. Maybe even integrate with the test-rules.py tests?
Assignee: nobody → bhearsum
(In reply to Ben Hearsum [:bhearsum] from comment #2)
> We probably want to create a separate set of tests that are more
> representative of real world scenarios - rather than tagging along with unit
> tests. Maybe even integrate with the test-rules.py tests?

Integrating with the rules tests would also have the benefit of testing many different code paths that could be taken. However, right now these tests don't pass through the web api at all (they work with the AUS3 class directly), so they're not good enough to be used for this in their current form. I'm thinking that I could rework test-rules.py to go through the web app (whether that means integrating it with the existing unit tests or not is TBD), and then add the instrumentation into it.

Nick, it looks like this code predates the web app, which is probably why it works with AUS3 directly. Is there a reason it has to stay that way, or are the tests just as valid/useful if they go through the web app? I'd love to get your thoughts in general about this bug, too.
Flags: needinfo?(nthomas)
test-rules.py gets AUS3 to generate snippets rather than xml, so that it can compare to the input for AUS2, so it's not a great test. We could set up an instance of AUS2 (it has a Vagrant config too) but I'd rather write unittest-style tests that simulate requests that exercise all the 'business logic'. I will need to do that anyway for the refactoring I'm planning to do. Would the instrumentation fit in OK with that ?
Flags: needinfo?(nthomas)
(In reply to Nick Thomas [:nthomas] from comment #4)
> test-rules.py gets AUS3 to generate snippets rather than xml, so that it can
> compare to the input for AUS2, so it's not a great test.

Aaaah, right.

> We could set up an
> instance of AUS2 (it has a Vagrant config too) but I'd rather write
> unittest-style tests that simulate requests that exercise all the 'business
> logic'. I will need to do that anyway for the refactoring I'm planning to
> do. Would the instrumentation fit in OK with that ?

It should work fine with those tests. It should let us get a pretty detailed picture of specific operations. I still think there's a case to be made for running some "end to end" (for lack of a better term) tests with instrumentation, too. Without these I think we could miss some edge cases where interactions between different methods can get expensive. We might be able to accomplish this simply by using some more realistic data for the https://github.com/mozilla/balrog/blob/master/auslib/test/web/test_client.py tests and adding some extra tests.
Blocks: 671488
Depends on: 852758
Product: mozilla.org → Release Engineering
No longer blocks: balrog-nightly
mass component change
Component: General Automation → Balrog: Backend
Nick and I talked about this a bit yesterday. We're punting on this for now because we don't forsee performance ending up in the critical path for now. It's been great for Nightly so far, and when we do get Beta users onto Balrog, that's about one order of magnitude bigger. Even without any changes I think we'd hold up fine, but we're also talking about doing a short cache on either Zeus or at the application level to speed things up even more.
Assignee: bhearsum → nobody
mass priority change after component move
Priority: -- → P3
This doesn't block blob caching anymore.
No longer blocks: 671488
Whiteboard: [balrog] → [lang=python]

I doubt this will ever happen. We can re-open this as a Github issue if it comes up again.

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WONTFIX
Product: Release Engineering → Release Engineering Graveyard
You need to log in before you can comment on or make changes to this bug.