Open Bug 1390884 Opened 2 years ago Updated 7 months ago

Chaos mode makes test verification unreliable

Categories

(Testing :: General, enhancement, P3)

enhancement

Tracking

(Not tracked)

People

(Reporter: gbrown, Unassigned)

References

(Blocks 1 open bug)

Details

Attachments

(2 files)

Test verification tries to efficiently find intermittent test failures by running just-modified tests repeatedly and in various configurations or environments. The initial implementation includes running tests in chaos mode (MOZ_CHAOS_MODE environment variable set).

Initial tests indicate that many more failures occur in chaos mode than in regular mode. I want to investigate those failures and determine if chaos mode is appropriate and practical for test verification.
...but first, as a temporary measure, let's remove chaos mode from test verification, so that we can start using test verification.

I'll leave-open for investigation. Hopefully we can restore this code soon.
Attachment #8897860 - Flags: review?(jmaher)
Comment on attachment 8897860 [details] [diff] [review]
remove chaos mode support from test verification

Review of attachment 8897860 [details] [diff] [review]:
-----------------------------------------------------------------

ok, it was a good idea.
Attachment #8897860 - Flags: review?(jmaher) → review+
Keywords: leave-open
Pushed by gbrown@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/3539f73f0f04
Do not use chaos mode for test verification; r=jmaher
The main issue is seen here:

https://public-artifacts.taskcluster.net/Si-ZnmjIRQCW5cNsu5lRDw/0/public/logs/live_backing.log

[task 2017-09-25T14:57:32.092Z] 14:57:32     INFO - TEST-INFO | started process GECKO(12284)
[task 2017-09-25T14:57:32.136Z] 14:57:32     INFO - GECKO(12284) | *** You are running in chaos test mode. See ChaosMode.h. ***
[task 2017-09-25T14:57:33.289Z] 14:57:33     INFO - GECKO(12284) | 1506351453285	Marionette	INFO	Enabled via --marionette
[task 2017-09-25T14:57:35.114Z] 14:57:35     INFO - GECKO(12284) | 1506351455109	Marionette	INFO	Listening on port 2828
[task 2017-09-25T14:57:35.373Z] 14:57:35     INFO - GECKO(12284) | 1506351455366	Marionette	DEBUG	Register listener.js for window 2147483649
[task 2017-09-25T14:57:35.750Z] 14:57:35     INFO -  Traceback (most recent call last):
[task 2017-09-25T14:57:35.750Z] 14:57:35     INFO -    File "/builds/worker/workspace/build/tests/mochitest/runtests.py", line 2660, in doTests
[task 2017-09-25T14:57:35.750Z] 14:57:35     INFO -      marionette_args=marionette_args,
[task 2017-09-25T14:57:35.751Z] 14:57:35     INFO -    File "/builds/worker/workspace/build/tests/mochitest/runtests.py", line 2164, in runApp
[task 2017-09-25T14:57:35.751Z] 14:57:35     INFO -      addons.install(create_zip(self.mochijar))
[task 2017-09-25T14:57:35.752Z] 14:57:35     INFO -    File "/builds/worker/workspace/build/venv/local/lib/python2.7/site-packages/marionette_driver/addons.py", line 52, in install
[task 2017-09-25T14:57:35.752Z] 14:57:35     INFO -      raise AddonInstallException(e)
[task 2017-09-25T14:57:35.753Z] 14:57:35     INFO -  AddonInstallException: Could not install add-on at '/tmp/tmpxnNgyy.zip': UnknownError: ERROR_FILE_ACCESS: There was an error accessing the filesystem.
[task 2017-09-25T14:57:35.753Z] 14:57:35     INFO -  stacktrace:
[task 2017-09-25T14:57:35.755Z] 14:57:35     INFO -  	WebDriverError@chrome://marionette/content/error.js:239:5
[task 2017-09-25T14:57:35.756Z] 14:57:35     INFO -  	UnknownError@chrome://marionette/content/error.js:537:5
[task 2017-09-25T14:57:35.757Z] 14:57:35     INFO -  	addon.install@chrome://marionette/content/addon.js:101:11
[task 2017-09-25T14:57:35.758Z] 14:57:35     INFO -  	async*GeckoDriver.prototype.installAddon@chrome://marionette/content/driver.js:3326:10
[task 2017-09-25T14:57:35.759Z] 14:57:35     INFO -  	despatch@chrome://marionette/content/server.js:555:20
[task 2017-09-25T14:57:35.760Z] 14:57:35     INFO -  	async*execute@chrome://marionette/content/server.js:529:11
[task 2017-09-25T14:57:35.761Z] 14:57:35     INFO -  	async*onPacket/<@chrome://marionette/content/server.js:504:15
[task 2017-09-25T14:57:35.765Z] 14:57:35     INFO -  	async*onPacket@chrome://marionette/content/server.js:503:8
[task 2017-09-25T14:57:35.767Z] 14:57:35     INFO -  	_onJSONObjectReady/<@chrome://marionette/content/transport.js:501:9
[task 2017-09-25T14:57:35.767Z] 14:57:35    ERROR - Automation Error: Received unexpected exception while running application
[task 2017-09-25T14:57:35.771Z] 14:57:35    ERROR - 
[task 2017-09-25T14:57:35.772Z] 14:57:35     INFO - Stopping web server
[task 2017-09-25T14:57:35.773Z] 14:57:35     INFO - GECKO(12284) | 1506351455742	addons.xpi	WARN	Failed to install /tmp/tmpxnNgyy.zip from file:///tmp/tmpxnNgyy.zip to /tmp/tmpfIuYKv.mozrunner/extensions/staged/mochikit@mozilla.org.xpi: Unix error 4 during operation pump (Interrupted system call) ((unknown module)) No traceback available
[task 2017-09-25T14:57:35.774Z] 14:57:35     INFO - Stopping web socket server
Test chaos mode has a variety of features -- see ChaosMode.h. It seems like test verification remains reliable if only some features are enabled:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=2451d0577730036c61e1e70750546346da50a055&filter-tier=1&filter-tier=2&filter-tier=3

I'd like to land this, go to tier 2, then circle back here another day to figure out the issues and expand chaos mode support.
Attachment #8912348 - Flags: review?(jmaher)
Comment on attachment 8912348 [details] [diff] [review]
add back limited (3) chaos mode steps

Review of attachment 8912348 [details] [diff] [review]:
-----------------------------------------------------------------

very cool
Attachment #8912348 - Flags: review?(jmaher) → review+
Pushed by gbrown@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/87291aa18bf0
Enable limited test chaos mode in test-verify; r=jmaher
Priority: -- → P3
Assignee: gbrown → nobody
The leave-open keyword is there and there is no activity for 6 months.
:gbrown, maybe it's time to close this bug?
Flags: needinfo?(gbrown)
I hope to get to this in 2019.
Flags: needinfo?(gbrown)

The leave-open keyword is there and there is no activity for 6 months.
:gbrown, maybe it's time to close this bug?

Flags: needinfo?(gbrown)

No, still comment 11!

Flags: needinfo?(gbrown)
Keywords: leave-open
You need to log in before you can comment on or make changes to this bug.