Revise default delay for endurance test to make scenarios more realistic

RESOLVED FIXED

Status

Mozilla QA Graveyard
Mozmill Automation
P2
normal
RESOLVED FIXED
5 years ago
3 years ago

People

(Reporter: davehunt, Assigned: Daniela Petrovici)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [mozmill-endurance][needs ESX cluster in PHX] s=121001 u=enhancement c=endurance p=1)

Attachments

(1 attachment)

(Reporter)

Description

5 years ago
After investigating bug 784793 it has been suggested that we should revise the default delay for endurance tests. The options discussed so far:

1. Allow individual tests to set a minimum delay
2. Increase the default delay in the automation script
3. Increase the delay used by Mozmill CI
4. Set base delay, so it can be increased via the command line but not decrease

The first option means tests would likely be inconsistent. This isn't necessarily an issue, but it might make reviewing the reports misleading.

Option two allows the delay to be overridden with a lower value for exploratory testing, and keeps the delay consistent.

With option three we would have to explain to anyone using the default value why there may be misleading results due to short delays.

The fourth option sounds similar to option two, but doesn't allow for exploratory testing of shorter delays.
(Reporter)

Comment 1

5 years ago
I think option two would be most suitable, but I'm keen to hear the thoughts of others.

Vlad: Would you be interested in finding a suitable default? You can base investigation on testTabbedBrowsing_OpenNewWindow. We know that 1s is not enough, but 10s may be more than we need.

We also need to consider what this will do to the baselines and duration for endurance tests.
I agree with Dave in making use of option 2. We could put this as an action item for next week.
Priority: -- → P2
Whiteboard: [mozmill-endurance] → [mozmill-endurance] s=q3 u=enhancement c=endurance p=1
I am going to assign this to myself based on comment 1
Assignee: nobody → vlad.mozbugs
Status: NEW → ASSIGNED
After investigating this yesterday, we can observe a considerable memory decrease when having a 5s default delay

http://mozmill-crowd.blargon7.com/#/endurance/report/d11b1de413a0179d904e7372300ccc18

But my opinion is that we modify the delay only for this test, because having 5s delay over other tests do not make them realistic at all. Thats just my opinion.
(Reporter)

Comment 5

5 years ago
Vlad: What other values did you try? Could you provide a few more reports so we can compare them?

Why would having a 5 second delay for other tests not make them more realistic? We're not suggesting that a user would do the same thing repeatedly every 5 seconds, we're just saying that they would probably never perform any action 0.1 seconds after their previous action.

As mentioned in comment 0:
> 1. Allow individual tests to set a minimum delay
> The first option means tests would likely be inconsistent. This isn't necessarily an issue, but it might make reviewing the reports misleading.
OK then I will just follow up with reports then. 
I've stopped at 5 seconds, but would be nice to know a threshold of memory that we want to see for opening new window. That way I can have a clear objective
(Reporter)

Comment 7

5 years ago
Well with the 5s report it appears to me that roughly half of the explicit memory allocated when opening the windows is cleared after closing them. Ideally all memory allocated would be released. You can also see an increasing trend in the resident memory, which would ideally not be present unless there was a memory leak.

If the above is not achievable within a reasonable delay, then we can compromise. At the moment, 5s is looking like a huge improvement over the current default delay.
At 6s already the window test disconnects http://mozmill-crowd.blargon7.com/#/endurance/report/d11b1de413a0179d904e737230149b30
(Reporter)

Comment 9

5 years ago
I suspect that opening a new window is not resetting the Mozmill timeout, and therefore 60+delay (66) seconds is being reached and Mozmill disconnects. Could you raise a separate bug to investigate this, but for now you can increase the value locally here http://hg.mozilla.org/qa/mozmill-automation/file/416592141962/libs/testrun.py#l625
We should never run into this global timeout which would only kick in after 66s in this case. We always have other stuff ongoing and never wait that long.
(Reporter)

Comment 11

5 years ago
That's why we need to open a bug to investigate it - because we are hitting the global timeout, and I have seen this before. When you say "We always have stuff ongoing" we really need to define "stuff". In case of this test, there are no mouse clicks or keyboard interactions, so perhaps the opening/closing of windows does not reset the timeout...
We are using waitFor() which should fire an event through JSBridge. Would you mind to check in Mozmill 1.5 if we really initiate such a pass event? If not then we have to fix it in Mozmill itself. That's the only thing which comes into my mind right now why we are failing here.
Depends on: 794451
(In reply to Dave Hunt (:davehunt) from comment #7)
> Well with the 5s report it appears to me that roughly half of the explicit
> memory allocated when opening the windows is cleared after closing them.
> Ideally all memory allocated would be released. You can also see an
> increasing trend in the resident memory, which would ideally not be present
> unless there was a memory leak.

There are various sources of fragmentation in Firefox, so unfortunately you aren't going to return to the same level of memory after closing all windows.
Dependency on bug 795579 has been fixed and Mozmill 1.5.19 released. So we can go ahead with a longer delay.
Depends on: 795579
Whiteboard: [mozmill-endurance] s=q3 u=enhancement c=endurance p=1 → [mozmill-endurance] s=121001 u=enhancement c=endurance p=1
Created attachment 666921 [details] [diff] [review]
patch v1.0

* I propose a 5s delay due to the following reasons:

* Compared results between 5s and 9s, there is no difference:
  9s result: http://mozmill-crowd.blargon7.com/#/endurance/report/d11b1de413a0179d904e73723068dc49
  5s result: http://mozmill-crowd.blargon7.com/#/endurance/report/d11b1de413a0179d904e7372300ccc18

* We cannot decrease under 5s because of http://mozmill-crowd.blargon7.com/#/endurance/report/d11b1de413a0179d904e73723013bbe4

* 5s is acceptable and we respect the 10s limit
* Dave said on iRC that we are ok to have endurance testrun last longer, it was initially by design
Attachment #666921 - Flags: review?(hskupin)
Attachment #666921 - Flags: review?(dave.hunt)
Comment on attachment 666921 [details] [diff] [review]
patch v1.0

Andrew, I would really appreciate your feedback on my proposal considering the investigation results. Thanks!
Attachment #666921 - Flags: feedback?(continuation)
Comment on attachment 666921 [details] [diff] [review]
patch v1.0

Seems reasonable to me, but :johns would be a better person to look at this, as he's spent a fair amount of time doing similar investigations.
Attachment #666921 - Flags: feedback?(continuation) → feedback?(jschoenick)
Comment on attachment 666921 [details] [diff] [review]
patch v1.0

Review of attachment 666921 [details] [diff] [review]:
-----------------------------------------------------------------

Patch-wise it looks fine. But lets get feedback from John before we check-in the change.
Attachment #666921 - Flags: review?(hskupin)
Attachment #666921 - Flags: review?(dave.hunt)
Attachment #666921 - Flags: review+
Comment on attachment 666921 [details] [diff] [review]
patch v1.0

Review of attachment 666921 [details] [diff] [review]:
-----------------------------------------------------------------

If I recall correctly there are some timers between GC and image throw-away that run longer than 5s, so this wont guarantee that everything that runs based on the wall clock will occur. However, if the idea is just to make the tests more realistic then it should be a big improvement.

Something this might not address is endurance tests that are doing too many actions per iteration -- For instance, if a test opens 30 tabs per iteration, we would want to somehow pause between tab loads as well.

FWIW areweslimyet has a short delay between actions (opening tabs) and a 30s delay between iterations, to ensure any idle-cleanup happens before measuring memory.
Attachment #666921 - Flags: feedback?(jschoenick) → feedback+
Keywords: checkin-needed
(Reporter)

Comment 20

5 years ago
(In reply to John Schoenick [:johns] from comment #19)
> Something this might not address is endurance tests that are doing too many
> actions per iteration -- For instance, if a test opens 30 tabs per
> iteration, we would want to somehow pause between tab loads as well.

The delay is between iterations and also entities, so we should have this covered.

Before we land this, we should ensure that we can afford the endurance tests to take ~85 minutes without preventing other testruns from executing. Also, we should communicate this change so that anyone monitoring the endurance charts is aware of the upcoming changes.
By later today we will have 3 VMs per platform. So I would say it should be possible by then. Also given that the endurance tests have a lower priority.
Keywords: checkin-needed
Whiteboard: [mozmill-endurance] s=121001 u=enhancement c=endurance p=1 → [mozmill-endurance][needs ESX cluster in PHX] s=121001 u=enhancement c=endurance p=1
(Reporter)

Updated

5 years ago
Blocks: 800872
(Reporter)

Comment 22

5 years ago
This was dependant on https://github.com/mozilla/mozmill-ci/issues/173 which is now resolved.
Assignee: vlad.mozbugs → nobody

Updated

5 years ago
Assignee: nobody → dpetrovici
(Assignee)

Comment 23

5 years ago
I checked the patch with the 5 seconds delay and it is still working properly. The report is here:
http://mozmill-crowd.blargon7.com/#/endurance/report/d96621f14ec6678200ea5c05eaf8e38c

Since we previously based the investigation on testTabbedBrowsing_OpenNewWindow/test1.js which is now skipped, I have used the patch from bug 800872 for testing this.

The endurance tests run for about 1.5 hours with the added delay.

I have also tried a smaller delay (4 seconds) and the tests are running in 1.10 hours. The report for the second trial is here: http://mozmill-crowd.blargon7.com/#/endurance/report/a83c700664548dba07298b74bf035197

Please tell me if I need to try something else too.
Flags: needinfo?(dave.hunt)
(Reporter)

Comment 24

5 years ago
Comment on attachment 666921 [details] [diff] [review]
patch v1.0

Review of attachment 666921 [details] [diff] [review]:
-----------------------------------------------------------------

Landed as:
http://hg.mozilla.org/qa/mozmill-automation/rev/164b9e59bdf1
Attachment #666921 - Flags: checkin+
Flags: needinfo?(dave.hunt)
(Reporter)

Updated

5 years ago
Status: ASSIGNED → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Product: Mozilla QA → Mozilla QA Graveyard
You need to log in before you can comment on or make changes to this bug.