reduce web-platform-test TIMEOUT duration
Categories
(Testing :: web-platform-tests, enhancement)
Tracking
(Not tracked)
People
(Reporter: egao, Assigned: egao)
References
(Blocks 1 open bug, )
Details
Attachments
(1 obsolete file)
Currently, web-platform-tests has hundreds of tests that have the expected outcome of TIMEOUT
.
This means that for each test with such outcome, execution of the test has to wait the full duration of the TIMEOUT in order to move on to the next test.
The duration of this TIMEOUT
varies by platform, but is controlled in https://searchfox.org/mozilla-central/source/testing/web-platform/tests/tools/wptrunner/wptrunner/browsers/firefox.py#46:
def get_timeout_multiplier(test_type, run_info_data, **kwargs):
if kwargs["timeout_multiplier"] is not None:
return kwargs["timeout_multiplier"]
if test_type == "reftest":
if run_info_data["debug"] or run_info_data.get("asan"):
return 4
else:
return 2
elif run_info_data["debug"] or run_info_data.get("asan"):
if run_info_data.get("ccov"):
return 4
else:
return 3
elif run_info_data["os"] == "android":
return 4
# https://bugzilla.mozilla.org/show_bug.cgi?id=1538725
elif run_info_data["os"] == "win" and run_info_data["processor"] == "aarch64":
return 4
return 1
For example, on windows10-aarch64
, each expected TIMEOUT
test would wait a full ~40,000ms prior to marking the test as having produced the expected outcome. This is a lot of time wasted.
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 1•5 years ago
|
||
- multiplier is 10
Comment 2•5 years ago
|
||
I looked at a recent m-c push, we run wpt tests 26 times. I was curious about the expected outcomes of the tests, so I see that we have 4 types:
test-ok
test-skip
test-error
test-timeout
in this case all tests are green.
using the log lines from the raw logs:
INFO - TEST-TIMEOUT | /content-security-policy/reporting/report-original-url.sub.html | took 10269ms
I can summarize all the time accrued:
type | seconds | sec/test | # tests |
---|---|---|---|
timeout | 3852 | 13.47 | 286 |
ok | 11907 | 0.73 | 16254 |
skip | 0 | 0.00 | 724 |
error | 406 | 3.94 | 103 |
total | 16167 | 0.93 | 17367 |
In summary we spend 4.5 hours running tests and >1 of those is spend waiting on timeouts. This is win10-qr opt, I imagine the numbers are different for each config/platform. simple math yields almost 28 hours of runtime for each m-c run just waiting on timeouts.
:jgraham, do you have other suggestions for how to reduce the wasted CPU hours spent waiting on timeouts if reducing the timeout isn't realistic?
Comment 3•5 years ago
|
||
So there's a hidden assumption here that time spent waiting on timeouts is entirely wasted. That's not true e.g. in [1] we actually get a bunch of subtest results even though we end up timing out the whole test. Note that we don't record subtests that pass in the ini file, so it's hard to tell just from looking at that where we have cases where the overall test is timing out but some subtests are actually passing.
That said it's obviously true that some fraction of the time we're spending waiting on tests that will always time out is wasted. There are several classes of issue here:
-
Tests that TIMEOUT due to a test bug. In this case we are likely missing important test coverage and should fix the test so it doesn't time out.
-
Tests that TIMEOUT when they should ERROR. In particular it seems like we don't catch JS exceptions in reftests so some reftests will wait for the TIMEOUT when they are never going to complete. This seems like a bug we could fix (although technically it's a breaking change, and the most obvious approaches seem racey, so some care is required).
-
Tests that TIMEOUT because they are slow to run. These should either be split up or given a longer timeout so that we get a meaningful result out of them.
-
Tests that TIMEOUT as a failure mode when they are waiting on an event that never comes. This is a reasonable pattern (the alternative is to set a shorter timeout in the test itself, but that leads to the question of how to pick that shorter timeout, and it turns out there isn't a good way to do it in general).
Possibly in the last case where we know we don't implement the feature and we don't intend to soon, it makes sense to have a shorter timeout. But it makes just as much sense to disable the test in that case, and hope that if we ever change our minds about implementation we remember to re-enable the test.
I guess this isn't the answer you're looking for. "Traige 300 tests" isn't going to give us quick wins in the same way that hacking in a smaller timeout, or disabling all the tests, would. But I think it's a necessary step, and doing it might help us discover a more scalable approach to this problem that could apply in at least some cases.
Assignee | ||
Comment 4•5 years ago
•
|
||
(In reply to James Graham [:jgraham] from comment #3)
So there's a hidden assumption here that time spent waiting on timeouts is entirely wasted. That's not true e.g. in [1] we actually get a bunch of subtest results even though we end up timing out the whole test. Note that we don't record subtests that pass in the ini file, so it's hard to tell just from looking at that where we have cases where the overall test is timing out but some subtests are actually passing.
That said it's obviously true that some fraction of the time we're spending waiting on tests that will always time out is wasted. There are several classes of issue here:
Tests that TIMEOUT due to a test bug. In this case we are likely missing important test coverage and should fix the test so it doesn't time out.
Tests that TIMEOUT when they should ERROR. In particular it seems like we don't catch JS exceptions in reftests so some reftests will wait for the TIMEOUT when they are never going to complete. This seems like a bug we could fix (although technically it's a breaking change, and the most obvious approaches seem racey, so some care is required).
Tests that TIMEOUT because they are slow to run. These should either be split up or given a longer timeout so that we get a meaningful result out of them.
Tests that TIMEOUT as a failure mode when they are waiting on an event that never comes. This is a reasonable pattern (the alternative is to set a shorter timeout in the test itself, but that leads to the question of how to pick that shorter timeout, and it turns out there isn't a good way to do it in general).
Possibly in the last case where we know we don't implement the feature and we don't intend to soon, it makes sense to have a shorter timeout. But it makes just as much sense to disable the test in that case, and hope that if we ever change our minds about implementation we remember to re-enable the test.
I guess this isn't the answer you're looking for. "Traige 300 tests" isn't going to give us quick wins in the same way that hacking in a smaller timeout, or disabling all the tests, would. But I think it's a necessary step, and doing it might help us discover a more scalable approach to this problem that could apply in at least some cases.
As a counterpoint, I'd like to attempt to demonstrate that my changes are safe, at least in the context of the results we are currently getting.
Try
intermittent failures
In the try push above, approximately half of the failures are intermittent failures eg.test-windows10-64/debug-web-platform-tests-e10s-8
; retriggered and ran to PASS state.
Other examples of intermittents are test-windows10-64-shippable/opt-web-platform-tests-e10s-5
and test-android-em-7.0-x86/opt-web-platform-tests-1proc-12
.
In the case of test-windows10-64/debug-web-platform-tests-e10s-8
, the intermittent failure was due to TEST-UNEXPECTED-PASS when it expected TEST-TIMEOUT. This can likely be attributed to my changes, though I am not certain how tests could expect to TIMEOUT but instead PASS. There may be case by case review of the manifest required.
permafails
There are what appears to be permafails in the try push above, such as test-windows7-32/debug-web-platform-tests-e10s-3
or test-windows7-32/debug-web-platform-tests-e10s-10
. These have been replicated on a clean try push on the same base revision, so I feel that these failures are not caused by my changes.
Specific examples
testing/web-platform/tests/WebCryptoAPI/wrapKey_unwrapKey/wrapKey_unwrapKey.https.worker.js
This test is able to pass with the changes I made to shorten TIMEOUTs applied in my local environment:
(python27_development) egao-44538:mozilla-central egao$ clear && ./mach wpt --headless --no-manifest-update testing/web-platform/tests/WebCryptoAPI/wrapKey_unwrapKey/wrapKey_unwrapKey.https.worker.js
0:00.12 mozversion INFO application_buildid: 20190513135754
0:00.12 mozversion INFO application_changeset: cb5734727c0a9d88e3e236a3e4a741a051509e0e
0:00.12 mozversion INFO application_display_name: Nightly
0:00.12 mozversion INFO application_id: {ec8030f7-c20a-464f-9b0e-13a3a9e97384}
0:00.12 mozversion INFO application_name: Firefox
0:00.12 mozversion INFO application_remotingname: firefox
0:00.12 mozversion INFO application_vendor: Mozilla
0:00.12 mozversion INFO application_version: 68.0a1
0:00.12 mozversion INFO platform_buildid: 20190513135754
0:00.12 mozversion INFO platform_changeset: cb5734727c0a9d88e3e236a3e4a741a051509e0e
0:00.12 mozversion INFO platform_version: 68.0a1
0:01.49 INFO Using 1 client processes
0:04.53 INFO Starting http server on 127.0.0.1:8000
0:04.53 INFO Starting http server on 127.0.0.1:8001
0:04.54 INFO Starting https server on 127.0.0.1:8443
0:04.69 SUITE_START: web-platform-test - running 1 tests
0:04.69 INFO Running reftest tests
0:04.69 INFO No reftest tests to run
0:04.69 INFO Running wdspec tests
0:04.70 INFO No wdspec tests to run
0:04.70 INFO Running testharness tests
0:04.74 INFO Setting up ssl
0:04.92 certutil Full command: /Users/egao/src/mozilla-central/objdir-desktop/dist/Nightly.app/Contents/MacOS/certutil -N -d /var/folders/m5/t_7g_qbx5lbcbpdgpy1z2_lr0000gn/T/tmp0YaeEh.mozrunner -f /var/folders/m5/t_7g_qbx5lbcbpdgpy1z2_lr0000gn/T/tmp0YaeEh.mozrunner/.crtdbpw
certutil
0:04.96 certutil
0:04.99 certutil
Certificate Nickname Trust Attributes
SSL,S/MIME,JAR/XPI
web-platform-tests CT,,
0:04.99 INFO Application command: /Users/egao/src/mozilla-central/objdir-desktop/dist/Nightly.app/Contents/MacOS/firefox --marionette about:blank -foreground -profile /var/folders/m5/t_7g_qbx5lbcbpdgpy1z2_lr0000gn/T/tmp0YaeEh.mozrunner
0:04.99 INFO Starting runner
0:05.18 pid:22942 Full command: /Users/egao/src/mozilla-central/objdir-desktop/dist/Nightly.app/Contents/MacOS/firefox --marionette about:blank -foreground -profile /var/folders/m5/t_7g_qbx5lbcbpdgpy1z2_lr0000gn/T/tmp0YaeEh.mozrunner
pid:22942 *** You are running in headless mode.
0:05.79 pid:22942 1558021692671 addons.webextension.screenshots@mozilla.org WARN Loading extension 'screenshots@mozilla.org': Reading manifest: Invalid extension permission: mozillaAddons
0:05.79 pid:22942 1558021692672 addons.webextension.screenshots@mozilla.org WARN Loading extension 'screenshots@mozilla.org': Reading manifest: Invalid extension permission: resource://pdf.js/
0:05.79 pid:22942 1558021692672 addons.webextension.screenshots@mozilla.org WARN Loading extension 'screenshots@mozilla.org': Reading manifest: Invalid extension permission: about:reader*
0:05.86 pid:22942 1558021692734 addons.webextension.screenshots@mozilla.org ERROR Loading extension 'screenshots@mozilla.org': Loading locale file _locales/en_GB/messages.json: Error: Error while loading 'file:///Users/egao/src/mozilla-central/objdir-desktop/dist/Nightly.app/Contents/Resources/browser/features/screenshots@mozilla.org/_locales/en_GB/messages.json' (NS_ERROR_FILE_NOT_FOUND)
0:05.86 pid:22942 1558021692734 addons.xpi-utils WARN addMetadata: Add-on screenshots@mozilla.org is invalid: Error: Extension is invalid(resource://gre/modules/addons/XPIInstall.jsm:426:17) JS Stack trace: loadManifestFromWebManifest@XPIInstall.jsm:426:17
0:05.86 pid:22942 async*loadManifest@XPIInstall.jsm:568:19
0:05.86 pid:22942 async*syncLoadManifest/<@XPIInstall.jsm:653:25
0:05.86 pid:22942 syncLoadManifest@XPIInstall.jsm:659:5
0:05.86 pid:22942 addMetadata@XPIDatabase.jsm:2413:32
0:05.86 pid:22942 processFileChanges@XPIDatabase.jsm:2779:26
0:05.86 pid:22942 checkForChanges@XPIProvider.jsm:2722:55
0:05.86 pid:22942 startup@XPIProvider.jsm:2265:12
0:05.86 pid:22942 callProvider@AddonManager.jsm:193:31
0:05.86 pid:22942 _startProvider@AddonManager.jsm:568:5
0:05.86 pid:22942 startup@AddonManager.jsm:723:14
0:05.86 pid:22942 startup@AddonManager.jsm:2731:26
0:05.86 pid:22942 observe@addonManager.js:65:29
0:05.86 pid:22942 1558021692735 addons.xpi-utils WARN Could not uninstall invalid item from locked install location
0:07.34 pid:22942 JavaScript error: resource:///modules/BrowserGlue.jsm, line 1543: TypeError: addon is null
0:07.77 pid:22942 1558021694644 Marionette INFO Listening on port 2828
0:08.02 TEST_START: /WebCryptoAPI/wrapKey_unwrapKey/wrapKey_unwrapKey.https.worker.html
0:08.18 INFO 6.0
0:14.54 TEST_END: TIMEOUT
0:14.54 INFO No more tests
0:14.57 pid:22942 1558021701445 Marionette INFO Stopped listening on port 2828
0:15.20 INFO Browser exited with return code 0
0:15.20 INFO PROCESS LEAKS None
0:15.20 INFO Closing logging queue
0:15.20 INFO queue closed
0:15.20 INFO PROCESS LEAKS None
0:15.21 INFO Got 0 unexpected results
0:15.21 SUITE_END
web-platform-test
~~~~~~~~~~~~~~~~~
Ran 1 checks (1 tests)
Expected results: 1
OK
0:15.23 INFO Closing logging queue
0:15.23 INFO queue closed
I attempted but was unable to find the same test on linux64/opt
nor windows7-32/debug
.
windows10-aarch64
This is a platform that is known to be slow, and hence the multiplier of 4x was given to this platform. If the TIMEOUT reduction was too aggressive, this platform should be one of the first to show.
With this change, TIMEOUT on windows10-aarch64 is reduced from ~40,000ms to ~4000ms.
In the try push above, it is possible to see that results are identical to mozilla-central
and a clean try
push. The failures in wpt3 and wpt10 are permafails.
My observation
From the data points above, spot-checking done locally with changes applied and comparison against a clean try push, I feel that this change is safe. Additional pushes to try with less aggressive reductions in TIMEOUT to demonstrate sanity of this change could be done.
Comment 5•5 years ago
|
||
In the case where there are subtests that are expected to have some result but where they simply don't run due to the shortened timeout we often won't detect a regression; unless the subtests are all declared up-front the fact that they are missing isn't considered a problem by the harness (this allows e.g. slightly different test names on different platforms). So if some crypto test goes from running N to N/10 subtests that isn't going to show up as a failure in the above try push. Therefore I don't think it's sufficient to demonstrate a lack of problems with the approach.
Also note in the logs that every TEST-TIMEOUT
now causes a browser restart (because if the harness triggers a timeout rather than the js code it assumes the browser is unresponsive), which is likely eating into the claimed time savings.
Assignee | ||
Comment 6•5 years ago
•
|
||
Also, what I am doing in this patch is essentially burying an implicit --timeout-multiplier
command line argument only to tests that have TIMEOUT as expected outcomes.
--timeout-multiplier
is a supported command line argument, and if I insert a value like timeout-multiplier=0.1
then it produces what is basically the same outcome as my change:
(python27_development) egao-44538:mozilla-central egao$ clear && ./mach wpt --timeout-multiplier 0.1 --headless --no-manifest-update testing/web-platform/tests/WebCryptoAPI/wrapKey_unwrapKey/wrapKey_unwrapKey.https.worker.js
0:00.11 mozversion INFO application_buildid: 20190513135754
0:00.11 mozversion INFO application_changeset: cb5734727c0a9d88e3e236a3e4a741a051509e0e
0:00.11 mozversion INFO application_display_name: Nightly
0:00.11 mozversion INFO application_id: {ec8030f7-c20a-464f-9b0e-13a3a9e97384}
0:00.11 mozversion INFO application_name: Firefox
0:00.11 mozversion INFO application_remotingname: firefox
0:00.11 mozversion INFO application_vendor: Mozilla
0:00.11 mozversion INFO application_version: 68.0a1
0:00.11 mozversion INFO platform_buildid: 20190513135754
0:00.11 mozversion INFO platform_changeset: cb5734727c0a9d88e3e236a3e4a741a051509e0e
0:00.11 mozversion INFO platform_version: 68.0a1
0:01.45 INFO Using 1 client processes
0:06.09 INFO Starting http server on 127.0.0.1:8001
0:06.09 INFO Starting http server on 127.0.0.1:8000
0:06.11 INFO Starting https server on 127.0.0.1:8443
0:06.26 SUITE_START: web-platform-test - running 1 tests
0:06.26 INFO Running reftest tests
0:06.27 INFO No reftest tests to run
0:06.27 INFO Running wdspec tests
0:06.27 INFO No wdspec tests to run
0:06.27 INFO Running testharness tests
0:06.30 INFO Setting up ssl
0:06.46 certutil Full command: /Users/egao/src/mozilla-central/objdir-desktop/dist/Nightly.app/Contents/MacOS/certutil -N -d /var/folders/m5/t_7g_qbx5lbcbpdgpy1z2_lr0000gn/T/tmpGDFeo5.mozrunner -f /var/folders/m5/t_7g_qbx5lbcbpdgpy1z2_lr0000gn/T/tmpGDFeo5.mozrunner/.crtdbpw
certutil
0:06.50 certutil
0:06.54 certutil
Certificate Nickname Trust Attributes
SSL,S/MIME,JAR/XPI
web-platform-tests CT,,
0:06.55 INFO Application command: /Users/egao/src/mozilla-central/objdir-desktop/dist/Nightly.app/Contents/MacOS/firefox --marionette about:blank -foreground -profile /var/folders/m5/t_7g_qbx5lbcbpdgpy1z2_lr0000gn/T/tmpGDFeo5.mozrunner
0:06.56 INFO Starting runner
0:06.71 pid:23787 Full command: /Users/egao/src/mozilla-central/objdir-desktop/dist/Nightly.app/Contents/MacOS/firefox --marionette about:blank -foreground -profile /var/folders/m5/t_7g_qbx5lbcbpdgpy1z2_lr0000gn/T/tmpGDFeo5.mozrunner
pid:23787 *** You are running in headless mode.
0:07.21 pid:23787 1558024656004 addons.webextension.screenshots@mozilla.org WARN Loading extension 'screenshots@mozilla.org': Reading manifest: Invalid extension permission: mozillaAddons
0:07.21 pid:23787 1558024656004 addons.webextension.screenshots@mozilla.org WARN Loading extension 'screenshots@mozilla.org': Reading manifest: Invalid extension permission: resource://pdf.js/
0:07.21 pid:23787 1558024656004 addons.webextension.screenshots@mozilla.org WARN Loading extension 'screenshots@mozilla.org': Reading manifest: Invalid extension permission: about:reader*
0:07.28 pid:23787 1558024656072 addons.webextension.screenshots@mozilla.org ERROR Loading extension 'screenshots@mozilla.org': Loading locale file _locales/en_GB/messages.json: Error: Error while loading 'file:///Users/egao/src/mozilla-central/objdir-desktop/dist/Nightly.app/Contents/Resources/browser/features/screenshots@mozilla.org/_locales/en_GB/messages.json' (NS_ERROR_FILE_NOT_FOUND)
0:07.28 pid:23787 1558024656072 addons.xpi-utils WARN addMetadata: Add-on screenshots@mozilla.org is invalid: Error: Extension is invalid(resource://gre/modules/addons/XPIInstall.jsm:426:17) JS Stack trace: loadManifestFromWebManifest@XPIInstall.jsm:426:17
0:07.28 pid:23787 async*loadManifest@XPIInstall.jsm:568:19
0:07.28 pid:23787 async*syncLoadManifest/<@XPIInstall.jsm:653:25
0:07.28 pid:23787 syncLoadManifest@XPIInstall.jsm:659:5
0:07.28 pid:23787 addMetadata@XPIDatabase.jsm:2424:32
0:07.28 pid:23787 processFileChanges@XPIDatabase.jsm:2790:26
0:07.28 pid:23787 checkForChanges@XPIProvider.jsm:2722:55
0:07.28 pid:23787 startup@XPIProvider.jsm:2265:12
0:07.28 pid:23787 callProvider@AddonManager.jsm:193:31
0:07.28 pid:23787 _startProvider@AddonManager.jsm:568:5
0:07.28 pid:23787 startup@AddonManager.jsm:723:14
0:07.28 pid:23787 startup@AddonManager.jsm:2778:26
0:07.28 pid:23787 observe@addonManager.js:65:29
0:07.28 pid:23787 1558024656073 addons.xpi-utils WARN Could not uninstall invalid item from locked install location
0:08.74 pid:23787 JavaScript error: resource:///modules/BrowserGlue.jsm, line 1545: TypeError: addon is null
0:08.86 pid:23787 JavaScript error: chrome://extensions/content/parent/ext-webRequest.js, line 94: NS_ERROR_FILE_NOT_FOUND
0:08.86 pid:23787 JavaScript error: chrome://extensions/content/parent/ext-webRequest.js, line 94: NS_ERROR_FILE_NOT_FOUND
0:09.26 pid:23787 1558024658048 Marionette INFO Listening on port 2828
0:09.47 pid:23787 JavaScript error: resource://gre/modules/RemoteController.js, line 74: TypeError: aEnabledCommands is not iterable
0:09.61 pid:23787 JavaScript error: resource://gre/modules/RemoteController.js, line 74: TypeError: aEnabledCommands is not iterable
0:09.67 TEST_START: /WebCryptoAPI/wrapKey_unwrapKey/wrapKey_unwrapKey.https.worker.html
0:09.84 pid:23787 JavaScript error: resource://gre/modules/RemoteController.js, line 74: TypeError: aEnabledCommands is not iterable
0:09.85 INFO timeout is6.0
0:09.85 INFO total timeout is11.0
0:09.97 pid:23787 JavaScript error: resource://gre/modules/RemoteController.js, line 74: TypeError: aEnabledCommands is not iterable
0:09.98 pid:23787 JavaScript error: resource://gre/modules/RemoteController.js, line 74: TypeError: aEnabledCommands is not iterable
0:10.07 pid:23787 JavaScript error: resource://gre/modules/RemoteController.js, line 74: TypeError: aEnabledCommands is not iterable
0:16.20 TEST_END: Test TIMEOUT. Subtests passed 167/257. Unexpected 0
0:16.21 INFO No more tests
0:16.25 pid:23787 1558024665038 Marionette INFO Stopped listening on port 2828
0:16.42 pid:23787 JavaScript error: resource://gre/modules/RemoteController.js, line 74: TypeError: aEnabledCommands is not iterable
0:16.42 pid:23787 JavaScript error: resource://gre/modules/RemoteController.js, line 74: TypeError: aEnabledCommands is not iterable
0:16.42 pid:23787 JavaScript error: resource://gre/modules/RemoteController.js, line 74: TypeError: aEnabledCommands is not iterable
0:17.39 INFO Browser exited with return code 0
0:17.39 INFO PROCESS LEAKS None
0:17.39 INFO Closing logging queue
0:17.39 INFO queue closed
0:17.39 INFO PROCESS LEAKS None
0:17.39 INFO Got 0 unexpected results
0:17.39 SUITE_END
web-platform-test
~~~~~~~~~~~~~~~~~
Ran 258 checks (257 subtests, 1 tests)
Expected results: 258
OK
0:17.41 INFO Closing logging queue
0:17.41 INFO queue closed
If the provision to change multiplier globally is present, I am not sure how changing it only for TIMEOUT tests would really affect the integrity of the tests.
Comment 7•5 years ago
|
||
Changing the timeout multiplier certainly can affect the validity of the tests. It's there are a knob to turn to account for the different performance of different systems, but it obviously can be set to values that break tests (e.g. setting it to 0 will cause all tests to time out right away).
Updated•5 years ago
|
Comment 8•5 years ago
|
||
I made a spreadsheet showing tests that time out in a recent m-c [1]. We can use that to triage the TIMEOUTs; at first glance it seems like there's quite a few cases where the tests are broken and need to be fixed.
Assignee | ||
Updated•5 years ago
|
Description
•