Closed Bug 1245891 Opened 4 years ago Closed 4 years ago

Add a talos test stress-testing tab-browsing like the testcase from bug 1245834

Categories

(Testing :: Talos, defect)

Version 3
defect
Not set

Tracking

(e10s+, firefox48 fixed)

RESOLVED FIXED
mozilla48
Tracking Status
e10s + ---
firefox48 --- fixed

People

(Reporter: glandium, Assigned: Yoric, Mentored)

References

(Blocks 1 open bug)

Details

(Whiteboard: [lang=js][good first bug])

Attachments

(1 file)

What I did there was to create a new profile, enable session restore, open 3 windows in it, and in each window, create plenty shitloads of tabs. 600+ in one window, 400+ in another, and just one tab in the third window. For a total of 1105 tabs. And that apparently regressed like 1200% in 3 years.
There are only 83 tabs in there.
would you want to increase the number of tabs, or create the new test?
That's a question for Yoric, I guess. I don't know what regressions the existing test is trying to catch, nor what it does measure (as in from when to when).
Flags: needinfo?(dteller)
Revisiting the test long after having written it, I confirm that what it measures is probably not really meaningful. Basically, it mostly measures how long it takes to read & parse sessionstore.js, then open the relevant windows (not tabs).

Using sessionrestore/StartupPerformance.jsm, we could now easily extend the test to also measure the time to create tabs. I'm not enthusiastic about writing this myself, but I could easily mentor someone to do it.
Mentor: dteller
Flags: needinfo?(dteller)
Whiteboard: [lang=js][good first bug]
If this test is not measuring much meaningful, we should consider turning it off.  It sounds like we have some ideas for what would be useful to measure- maybe spelling that out in more detail here and deciding if we should keep the existing sessionrestore* tests running until we write what is useful, or just turn it off to reduce confusion for developers and resources.
The current test might be not meaningful, but it still shows a steady regression over the last year if you look at the graph (and at a higher rate since a month ago). Why are we not catching those regressions or ignoring them? is the question that should be answered before adding a new test that could just as well be ignored.
a lot of bugs are filed for session restore- it just depends on the platform.  There are a few cases where we have really small regressions, then on uplift end up with 3% or something and we are not sure what caused it.
See Also: → 1250362
So if I understand correctly, the test previously reported how long it took to open a session-restored window, and the patch changes that to measure instead how long it took to "actually restore the tabs"?

What does that mean? until all the tabs started loading? until all the tabs completed loading?

I would think that both results could be useful (and we do have indications that this test does show perf changes over time), so I think that instead it should report two "subtests": one for restoring the window, and another for restoring the tabs (possibly the timer for the latter starts when the former completes).

Also #1, we should be aware that changing the test will set a new baseline for the test, meaning that the result won't be comparable to earlier test results.

Also #2, how is this session-restore test change relates to this bug at all?
Previous version reported until we reached session-restored. In other words, it measured:
- how long it takes to read and parse the session restore file;
- how long it takes to open one (and just one) empty window.

With this patch, we also measure:
- how long it takes to open the empty tabs;
- how long it takes to load the tabs that are loaded during startup.

I agree that it would be best to report two subtests, but when I wrote this talos test, it was told to me that this was not possible. If there is a way, I'm interested.

#1 Yes. If you believe that this is better, we could change the name of the test altogether.

#2 I hope that, by taking loading into account, we captures better the issues reported by glandium.
Oh, also, I suspect that the comparison e10s vs non-e10s will be much more meaningful with this test, as it involves creating and loading tabs, which involves cross-process exchanges.
Thanks.

So the current test is meaningful, but the suggested improvement would make it more meaningful.

If we were living in a perfect world, I'd have said that we want to keep the current test and also measure the additional thingy (loading the tabs).

But adding a new test is a long-ish process, so if David considers the patch as absolutely improving the test, then I'm OK with changing the test and living with the negative consequences (establishing a new results baseline which is not comparable to past results, and losing a proven-useful kind of measurement the test currently has).

Joel, are you OK with that?
Flags: needinfo?(jmaher)
yes, I would prefer we update all our tests to be useful in the browser we have today.  We can rebaseline the numbers without much trouble.  It just takes a little bit of coordination!  For this bug, lets just land and move forward.
Flags: needinfo?(jmaher)
Assignee: nobody → dteller
Well, in that case, I'm waiting for reviews.
Comment on attachment 8722449 [details]
MozReview Request: Bug 1245891 - Changing Session Restore Talos tests to include the time to restore actual tabs;r=avih,mconley

https://reviewboard.mozilla.org/r/36041/#review33359

Note that this changes Firefox, so in order for the test to work also on Aurora/Beta, we'll need to uplift the change to Firefox too.
Attachment #8722449 - Flags: review?(avihpit)
Sorry, r+ with the note at comment 16. I accidentally canceled the review.
Comment on attachment 8722449 [details]
MozReview Request: Bug 1245891 - Changing Session Restore Talos tests to include the time to restore actual tabs;r=avih,mconley

https://reviewboard.mozilla.org/r/36041/#review33421

::: browser/components/sessionstore/StartupPerformance.jsm:96
(Diff revision 1)
> -        if (!this._latestRestoredTimeStamp) {
> +        Services.obs.notifyObservers(null, this.StartupPerformance.RESTORED_TOPIC, "");

I think we can use `StartupPerformance.RESTORED_TOPIC` directly, can't we? Is there a reason we have to get it off of `this`?
https://reviewboard.mozilla.org/r/36041/#review33421

> I think we can use `StartupPerformance.RESTORED_TOPIC` directly, can't we? Is there a reason we have to get it off of `this`?

Actually, I wonder how that works at all. It should be `this.RESTORE_TOPIC`.
Comment on attachment 8722449 [details]
MozReview Request: Bug 1245891 - Changing Session Restore Talos tests to include the time to restore actual tabs;r=avih,mconley

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/36041/diff/1-2/
Attachment #8722449 - Flags: review?(avihpit)
 10:11:17     INFO -  Running cycle 1/10 for sessionrestore test...
 10:11:17     INFO -  Reinstalling /builds/slave/test/build/tests/talos/talos/startup_test/sessionrestore/profile/sessionstore.js on top of /tmp/tmperf0Id/profile/sessionstore.js
 10:11:17     INFO -  Reinstalling /builds/slave/test/build/tests/talos/talos/startup_test/sessionrestore/profile/sessionCheckpoints.json on top of /tmp/tmperf0Id/profile/sessionCheckpoints.json
 10:11:17     INFO -  TEST-INFO | started process 16068 (/builds/slave/test/build/application/firefox/firefox -profile /tmp/tmperf0Id/profile http://localhost:58753/startup_test/sessionrestore/index.html)
 10:12:18     INFO -  PROCESS | 16068 | 1456510338924	addons.productaddons	ERROR	Request failed certificate checks: [Exception... "SSL is required and URI scheme is not https."  nsresult: "0x8000ffff (NS_ERROR_UNEXPECTED)"  location: "JS frame :: resource://gre/modules/CertUtils.jsm :: checkCert :: line 145"  data: no]
 11:12:18     INFO - Automation Error: mozprocess timed out after 3600 seconds running ['/builds/slave/test/build/venv/bin/python', '/builds/slave/test/build/tests/talos/talos/run_tests.py', '--branchName', 'Mozilla-Inbound-Non-PGO', '--suite', 'other', '--executablePath', '/builds/slave/test/build/application/firefox/firefox', '--symbolsPath', 'https://queue.taskcluster.net/v1/task/GooTYacjScCvL2zbtbO1Ww/artifacts/public/build/firefox-47.0a1.en-US.linux-x86_64.crashreporter-symbols.zip', '--title', 'talos-linux64-ix-013', '--webServer', 'localhost', '--log-tbpl-level=debug', '--log-errorsummary=/builds/slave/test/build/blobber_upload_dir/other_errorsummary.log', '--log-raw=/builds/slave/test/build/blobber_upload_dir/other_raw.log']
 11:12:18    ERROR - timed out after 3600 seconds of no output
 11:12:18    ERROR - Return code: -9
 11:12:18  WARNING - setting return code to -9
11:12:18 ERROR - # TBPL FAILURE #
Comment on attachment 8722449 [details]
MozReview Request: Bug 1245891 - Changing Session Restore Talos tests to include the time to restore actual tabs;r=avih,mconley

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/36041/diff/2-3/
Flags: needinfo?(dteller)
Comment on attachment 8722449 [details]
MozReview Request: Bug 1245891 - Changing Session Restore Talos tests to include the time to restore actual tabs;r=avih,mconley

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/36041/diff/3-4/
Talos addons are now signed, so we will have to sign this extension prior to landing:
https://wiki.mozilla.org/EngineeringProductivity/HowTo/SignExtensions
Comment on attachment 8722449 [details]
MozReview Request: Bug 1245891 - Changing Session Restore Talos tests to include the time to restore actual tabs;r=avih,mconley

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/36041/diff/4-5/
Signed the add-on. For some reason, tests seem to timeout reproducibly on Treeherder, while they work reproducibly locally.

Could it be related to this?
 11:41:33 INFO - PROCESS | 2221 | 1456515693175 addons.productaddons ERROR Request failed certificate checks: [Exception... "SSL is required and URI scheme is not https." nsresult: "0x8000ffff (NS_ERROR_UNEXPECTED)" location: "JS frame :: resource://gre/modules/CertUtils.jsm :: checkCert :: line 145" data: no]
Flags: needinfo?(jmaher)
I have seen that on try pushes and wonder if it is related.  I assume you have added the new signed.xpi file to the tree in replacement of the old one?  :Yoric, can you point us to the try push?

:ahal, why would :Yoric be having failures with a signed addon?
Flags: needinfo?(jmaher) → needinfo?(ahalberstadt)
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=b88648a41ed2&selectedJob=22518585

I'm not sure that the failure is due to the signed add-on, mind you. But I can't really test otherwise.
I don't see the new signed .xpi in your try push.  Also your try push is based off a revision from Friday Feb 26th, it would need to be updated to something more recent.
Attachment #8722449 - Flags: review?(avihpit) → review+
I think that error is a red herring. I see it on aurora too, where signed addons haven't landed yet. Usually the error if an addon isn't signed would be something like "Failed to install addon <addon> with signed state 0".
Flags: needinfo?(ahalberstadt)
Attachment #8722449 - Attachment description: MozReview Request: Bug 1245891 - Changing Session Restore Talos tests to include the time to restore actual tabs;r?avih,mconley → MozReview Request: Bug 1245891 - Changing Session Restore Talos tests to include the time to restore actual tabs;r=avih,mconley
Attachment #8722449 - Flags: review?(avihpit)
Comment on attachment 8722449 [details]
MozReview Request: Bug 1245891 - Changing Session Restore Talos tests to include the time to restore actual tabs;r=avih,mconley

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/36041/diff/5-6/
Attachment #8722449 - Flags: review?(avihpit)
Ahah, it passes!
https://hg.mozilla.org/mozilla-central/rev/40100741a9a5
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
we have some regressions here, in e10s only, but in opt they are improvements:
https://treeherder.mozilla.org/perf.html#/alerts?id=410

:Yoric, was this intentional?
Flags: needinfo?(dteller)
Regressions were expected. Improvements... may have rational explanations.
Flags: needinfo?(dteller)
can you confirm the regressions were expected for e10s only?  Since e10s is a focus I want to make sure we understand this difference.
Flags: needinfo?(dteller)
I definitely expected worse numbers for e10s. Note that we are measuring code that needs to do more work, so it's not a regression, just worse numbers.
Flags: needinfo?(dteller)
:Yoric, please explain why non e10s got better then?  This isn't a valid explanation of the numbers.
Flags: needinfo?(dteller)
Prior to this patch, we used to measure time until all windows were opened and all empty tabs were opened, but tabs were not necessarily restored.

After this patch, we measure time until all tabs that are supposed to be restored during startup are effectively restored. I don't remember exactly, but I think that's 3 tabs per window + pinned tabs.

I do not see any good reason why non-e10s would be faster :/
Flags: needinfo?(dteller)
it seems that we have a big difference between opt and e10s:
https://treeherder.mozilla.org/perf.html#/graphs?series=%5Bmozilla-inbound,ebc60b2c69a8a5931274e365c4b3ed0650bf12db,1&series=%5Bmozilla-inbound,7c1f3a79275eb6b31e4360bbd300472def25d022,1%5D

:avih, as driving the e10s vs opt, do you have concerns with the new changes to this test?  It would seem odd that e10s would load the pages so much slower than opt.  Maybe we are measuring things incorrectly in this new test?  Is there anything we should do to investigate this further?
Flags: needinfo?(avihpit)
This test is yoric's, and he knows better than me what would be a good thing to measure WRT session restore.

(In reply to David Rajchenbach-Teller [:Yoric] (please use "needinfo") from comment #43)
> I do not see any good reason why non-e10s would be faster :/

And yet, at least on Linux64, the new numbers suggest that e10s is now about 6 times worse than the non-e10s numbers (3500ms vs 600ms), while before the change they were roughly equal at 1600ms.

Reading comment 11, it seems to me that after the change it should now measure higher numbers, which is apparently not the case at least for Linux 64.

Until we can explain the results on all platforms and confirm that it meet yoric's expectations, we should probably consider the test not ready.
Flags: needinfo?(avihpit)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Looks like this caused a %500 regression with e10s. Hence, this should be backed out unless someone can explain what the heck is going on here. :)


https://treeherder.mozilla.org/perf.html#/graphs
tracking-e10s: --- → ?
sorry, here's the correct link to the perf dashboard showing the regression -

https://treeherder.mozilla.org/perf.html#/e10s?filter=sessionrestore
Well, once again, we're changing what we measure. So numbers before and after are just not comparable and don't mean that we should back out anything.
Depends on: 1259770
Hello?  I'm new to programming I have some C#, html, and js experience and would like to learn more and contribute.  How do I become part of this test?
Oops, looks like we forgot to close this bug once mconley found the explanation we were looking for.

Thanks for your offer, ApexBugFinder, but this bug is effectively fixed already.
Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.