Closed Bug 1073639 Opened 10 years ago Closed 10 years ago

services/fxaccounts/tests/xpcshell/test_client.js takes insanely long to run on B2G

Categories

(Firefox :: Firefox Accounts, defect)

defect
Not set
normal
Points:
3

Tracking

()

RESOLVED FIXED
Firefox 36
Iteration:
36.1

People

(Reporter: RyanVM, Assigned: markh)

References

Details

Attachments

(1 file, 1 obsolete file)

Late in the 34 cycle, we had to add a gross hack to double the max allowable timeout for B2G xpcshell tests due to MSG-refactor slowdowns. Now that those have been fixed, I ran a Try push to revert that hack.

Turns out that debug is perma-fail due to test_client.js taking almost 6 min (!!!) to run. Even on opt, it takes about 90 seconds.

Can we split this test up please? That's massive. Or we can skip it on debug if you prefer.
FWIW, test_accounts.js is about the same size and runs in ~40s on B2G debug, so maybe test_clients.js is just doing something particularly awful to make it that slow?
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #1)
> FWIW, test_accounts.js is about the same size and runs in ~40s on B2G debug,
> so maybe test_clients.js is just doing something particularly awful to make
> it that slow?

test_client.js just took 33 seconds on my maxed-out MBP; test_accounts took 12. Yes, we can split it up, but are we trading expensive-but-unmetered developer time for cheap-but-visible hardware costs? Can we profile the 6 minute run, or even just look at the timing in STDOUT, so we know what we're optimizing? What's the value of having it run on debug?
I think this is doing lots of network timeouts and happens on Windows too - I'll take a look.
Flags: needinfo?(mhammond)
(In reply to Mark Hammond [:markh] from comment #3)
> I think this is doing lots of network timeouts and happens on Windows too -
> I'll take a look.

I'm afraid I had my tests confused - this test passes in 1.1 seconds for me locally!  Getting the log from passing xpcshell tests is difficult, so I opened bug 1074014, and if that lands I should be able to push to try and see where the slow platforms are getting stuck.
Flags: needinfo?(mhammond)
Bug 1066735 changed the order of tests a bit and now test_client.js is consistently surpassing 2x the per test timeout. Fyi, I plan on disabling it for b2g debug emulators (it will still run on opt) as bug 1066735 is important to get landed asap.
(In reply to Mark Hammond [:markh] from comment #4)
> I'm afraid I had my tests confused - this test passes in 1.1 seconds for me
> locally!  Getting the log from passing xpcshell tests is difficult, so I
> opened bug 1074014, and if that lands I should be able to push to try and
> see where the slow platforms are getting stuck.

Haven't tested this myself, but looks like you can add --verbose here:
http://dxr.mozilla.org/mozilla-central/source/testing/config/mozharness/b2g_emulator_config.py#45

Note that this will show test output for *all* tests though.. so requestCompleteLog still sounds useful.
It seems the main problem here is the crypto done by these tests, which are extremely CPU intensive.  For example, the code at http://mxr.mozilla.org/mozilla-central/source/services/fxaccounts/Credentials.jsm#128 takes ~45ms on my desktop machine, but ~3400ms on b2g runs - and this code alone is hit 14 times by this test - so around 47 seconds alone is in this function.

Which makes me wonder - are real b2g devices *really* seeing this function take ~3.4 seconds to execute, and if so, is that expected and acceptable?

* If so: This code does 1000 PBKDF2 rounds, and I looked at dropping this for these tests, but this looks tricky - there are lots of pre-computed values the test checks against and these all change.  My next best guess was to split the test up into "does lots of crypto" and "only does a little crypto" and disable the former test in all b2g builds.

* If not: then we have some other bug! :)

Jed/Chris, what do you think?
Flags: needinfo?(jed+bmo)
Flags: needinfo?(ckarlof)
(In reply to Mark Hammond [:markh] from comment #8)
...
> Which makes me wonder - are real b2g devices *really* seeing this function
> take ~3.4 seconds to execute, and if so, is that expected and acceptable?

Overall user latency screen to screen (including the network round trip and everything else involved) is on the order of 2-3 seconds -- perceptible but usable. Especially for an important and infrequently performed operation. There are to my knowledge no performance bugs filed against FxA, nor do I recall any conversations about needing to speed it up.
Flags: needinfo?(jed+bmo)
It was decided that this level of performance was fine:

https://bugzilla.mozilla.org/show_bug.cgi?id=968567#c33

The linked bug was to improve it, but I don't believe it was ever done.
Flags: needinfo?(ckarlof)
Thanks for the update - so it looks like we should just split the test and disable the "slow" version on b2g.  The split is done such that no tests in test_client.js invoke that "slow" code I identified before and all tests in test_client_keys do.

With this patch test_client.js takes ~16 seconds on try, which sounds reasonable.  Requesting review from ckarlof to ensure there are no concerns about not having coverage for these functions on b2g, but feel free to delegate.
Attachment #8504438 - Flags: review?(ckarlof)
I think having non-b2g coverage of these functions should be sufficient, but I'll leave the decision to Sam.
Attachment #8504438 - Flags: review?(ckarlof)
Attachment #8504438 - Flags: review?(spenrose)
Comment on attachment 8504438 [details] [diff] [review]
0001-Bug-1073639-split-test_client.js-and-diesable-the-sp.patch

Thanks for much for taking this on, Mark. I read each moved test and did not see any B2G-specific concerns. There is little enough left in test_client.js -- and it is also not B2G-specific -- that I could also see simply marking the original test_client.js to not run on B2G. There is of course some theoretical risk in that something which shouldn't be different on B2G turns out to be, but it sounds like faster test runs are more important.
Attachment #8504438 - Flags: review?(spenrose) → review+
NP about the review delay - I wish all reviewers only had that delay ;)

(In reply to Sam Penrose from comment #13)
> -- and it is also not B2G-specific -- that I could also see simply marking
> the original test_client.js to not run on B2G. There is of course some
> theoretical risk in that something which shouldn't be different on B2G turns
> out to be, but it sounds like faster test runs are more important.

Yeah, and given the split was somewhat arbitrary, let's just do that.
Attachment #8504438 - Attachment is obsolete: true
Attachment #8505878 - Flags: review?(spenrose)
Comment on attachment 8505878 [details] [diff] [review]
0001-Bug-1073639-disable-test_client.js-on-b2g-due-to-exc.patch

Great, thanks Mark!
Attachment #8505878 - Flags: review?(spenrose) → review+
https://hg.mozilla.org/integration/fx-team/rev/f4a64f0b93a1
Assignee: nobody → mhammond
Status: NEW → ASSIGNED
Points: --- → 3
Flags: qe-verify-
Flags: firefox-backlog+
Iteration: --- → 36.1
https://hg.mozilla.org/mozilla-central/rev/f4a64f0b93a1
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla36
Product: Core → Firefox
Target Milestone: mozilla36 → Firefox 36
You need to log in before you can comment on or make changes to this bug.