Add a short delay between test runs to allow for asynchronous OS cleanup

RESOLVED FIXED in Firefox 61

Status

enhancement
P2
normal
RESOLVED FIXED
Last year
Last year

People

(Reporter: jld, Assigned: igoldan)

Tracking

Version 3
mozilla61
Points:
---

Firefox Tracking Flags

(firefox61 fixed)

Details

(Whiteboard: [PI:March])

Attachments

(2 attachments)

The regression in bug 1434927 became insignificant when I added a 500ms sleep between Talos test runs: the added overhead was almost entirely happening in an OS-level background task asynchronously cleaning up after the previous test run, which was blocking startup during the next run.  Smaller sleep times would probably work (in that case the overhead was ~100ms), but I haven't tested that.

There doesn't seem to be anything else we're currently doing on Linux, other than network namespaces, that causes this kind of performance effect (the sleep change didn't show significant time differences when used without the sandbox change in question); I haven't tested other OSes.

I'm assuming that the time it takes to exit the browser and *immediately* start a new one isn't a useful metric (or, at least, should be measured separately).  This might resemble what happens during an update, but my understanding is that the updater will spend some time swapping files between the quit and restart, so that might introduce enough of a delay in practice.
:igoldan, can you take a look at this?
Flags: needinfo?(igoldan)
Whiteboard: [PI:March]
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #1)
> :igoldan, can you take a look at this?

Yes, as soon as I have some time for it.
Flags: needinfo?(igoldan)
Priority: -- → P2
Assignee: nobody → igoldan
Ok, so I run the tests again and came to the same conclusion [1].

Bellow there's a mapping between the delay amount and the comparison view.
I used a recent & regular revision as the baseline.

0.5 seconds delay: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=a63269fd15a9&newProject=try&newRevision=963a94587307&framework=1&filter=linux64%20session
0.25 seconds delay: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=a63269fd15a9&newProject=try&newRevision=ac5c96eeefab&framework=1&filter=linux64%20session
0.1 seconds delay: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=a63269fd15a9&newProject=try&newRevision=f2376e428f17&framework=1&filter=linux64%20session
0.05 seconds delay: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=a63269fd15a9&newProject=try&newRevision=9a4a12fbd466&framework=1&filter=linux64%20session
0.025 seconds delay: https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=a63269fd15a9&newProject=try&newRevision=166536b21ea4&framework=1&filter=linux64%20session

Conclusion of these tests: we can indeed reduce the original 0.5 seconds delay down to 0.1 seconds.
There seems to be no difference between the 0.5, 0.25 and 0.1 delays.
From the 0.05 seconds delay downwards, the "perf" improvements lessen, meaning that these durations aren't enough for the OS to finish his cleaning jobs.

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1434927#c11
Comment on attachment 8963881 [details]
Bug 1446215 - Add 100ms delay between Talos cycles

https://reviewboard.mozilla.org/r/232738/#review238168

which operating system did you test this on?  I ask as this will apply to linux/windows10/osx, and if one OS needs more time, we could stick with a larger value.  I would imagine windows would need the most time, but that is almost a wild guess.
Attachment #8963881 - Flags: review?(jmaher) → review+
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #5)
> Comment on attachment 8963881 [details]
> Bug 1446215 - Add 100ms delay between Talos cycles
> 
> https://reviewboard.mozilla.org/r/232738/#review238168
> 
> which operating system did you test this on?  I ask as this will apply to
> linux/windows10/osx, and if one OS needs more time, we could stick with a
> larger value.  I would imagine windows would need the most time, but that is
> almost a wild guess.

I tested this on Linux only, because the original regression happened on Linux only.
Comment on attachment 8963900 [details]
Bug 1446215 - Increase threshold to 0.25 seconds

https://reviewboard.mozilla.org/r/232754/#review238176
Attachment #8963900 - Flags: review+
Pushed by igoldan@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/fa72a782c21c
Add 100ms delay between Talos cycles r=jmaher
https://hg.mozilla.org/integration/autoland/rev/4c6551ac004b
Increase threshold to 0.25 seconds r=jmaher
https://hg.mozilla.org/mozilla-central/rev/fa72a782c21c
https://hg.mozilla.org/mozilla-central/rev/4c6551ac004b
Status: NEW → RESOLVED
Closed: Last year
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
This only updated our perf baselines -- no real perf improvement:

== Change summary for alert #12425 (as of Fri, 30 Mar 2018 08:59:59 GMT) ==

Improvements:

  8%  sessionrestore_many_windows linux64 pgo e10s stylo     947.08 -> 875.17
  7%  sessionrestore_many_windows linux64 opt e10s stylo     997.75 -> 932.42

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=12425
You need to log in before you can comment on or make changes to this bug.