Open Bug 1425595 Opened 6 years ago Updated 2 years ago

Missing coverage on opt-web-platform-tests-wdspec-e10s (Wd)

Categories

(Testing :: Code Coverage, enhancement)

enhancement

Tracking

(Not tracked)

People

(Reporter: ekyle, Unassigned)

References

(Blocks 1 open bug)

Details

ato says:

> The code coverage results for testing/marionette are lacking coverage
> from the WPT WebDriver conformance test suite.  This job is called 
> opt-web-platform-tests-wdspec-e10s (Wd).  For the coverage results
> to be useful for this component, we would have to include this
> test job.

This data came from ACtiveData, so any part of the ETL pipeline may be in error.
Does ActiveData ignore failing test chunks? Or does it collect coverage for failing test chunks too?
There’s only one failing test in that run, which due to the nature
of slow builds could be unstable because it is an interaction test.
We will want to include failing jobs in the code coverage, if this
isn’t the case.
(In reply to Andreas Tolfsen ‹:ato› from comment #3)
> There’s only one failing test in that run, which due to the nature
> of slow builds could be unstable because it is an interaction test.
> We will want to include failing jobs in the code coverage, if this
> isn’t the case.

It is the case on the codecov.io reports, I don't know about ActiveData.
Kyle?
Flags: needinfo?(klahnakoski)
ActiveData ingests all the coverage artifacts, it does not look at the job status.  

:ato, what file were we looking at again?
Flags: needinfo?(klahnakoski) → needinfo?(ato)
So for example, the GeckoDriver#setWindowRect function [1] is invoked
through the set_window_rect.py WPT test [2].  You can run this file
locally with the following incantation:

	./mach wpt testing/web-platform/tests/webdriver/tests/set_window_rect.py 

I guess it is interesting that we have another test for this function
in the Mn job [3] that also calls this function.  Maybe there is
something more sinister at play here?

  [1] https://codecov.io/gh/marco-c/gecko-dev/src/3ec05888ca32b2d8a14d700474efb0c63411fca2/testing/marionette/driver.js#L1426
  [2] https://searchfox.org/mozilla-central/source/testing/web-platform/tests/webdriver/tests/set_window_rect.py#14
  [3] https://searchfox.org/mozilla-central/source/testing/marionette/harness/marionette_harness/tests/unit/test_window_rect.py
Flags: needinfo?(ato)
I've run the test locally and the function is shown as covered.

I've also noticed it is covered in this build (https://codecov.io/gh/marco-c/gecko-dev/src/b56a0f8804e00086266672540d7eacb63ae3dbf1/testing/marionette/driver.js#L1407).

It isn't covered in this other build (https://codecov.io/gh/marco-c/gecko-dev/src/0d112b8fadbe994214d9419944ee1e4fba987226/testing/marionette/driver.js#L1407) where Wd failed with:
> [task 2018-01-31T19:13:36.165Z] 19:13:36     INFO - Automation Error: mozprocess timed out after 1000 seconds running ['/builds/worker/workspace/build/venv/bin/python', '-u', '/builds/worker/workspace/build/tests/web-platform/runtests.py', '--log-raw=-', '--log-raw=/builds/worker/workspace/build/blobber_upload_dir/wpt_raw.log', '--log-wptreport=/builds/worker/workspace/build/blobber_upload_dir/wptreport.json', '--log-errorsummary=/builds/worker/workspace/build/blobber_upload_dir/wpt_errorsummary.log', '--binary=/builds/worker/workspace/build/application/firefox/firefox', '--symbols-path=https://queue.taskcluster.net/v1/task/AhYwkCQ3RIqvvAf-gbx7_Q/artifacts/public/build/target.crashreporter-symbols.zip', '--stackwalk-binary=/usr/local/bin/linux64-minidump_stackwalk', '--stackfix-dir=/builds/worker/workspace/build/tests/bin', '--run-by-dir=3', '--no-pause-after-test', '--test-type=wdspec', '--stylo-threads=4', '--webdriver-binary=/builds/worker/workspace/build/tests/bin/geckodriver', '--prefs-root=/builds/worker/workspace/build/tests/web-platform/prefs', '--processes=1', '--config=/builds/worker/workspace/build/tests/web-platform/wptrunner.ini', '--ca-cert-path=/builds/worker/workspace/build/tests/web-platform/certs/cacert.pem', '--host-key-path=/builds/worker/workspace/build/tests/web-platform/certs/web-platform.test.key', '--host-cert-path=/builds/worker/workspace/build/tests/web-platform/certs/web-platform.test.pem', '--certutil-binary=/builds/worker/workspace/build/tests/bin/certutil']
> [task 2018-01-31T19:13:36.171Z] 19:13:36    ERROR - timed out after 1000 seconds of no output
> [task 2018-01-31T19:13:36.171Z] 19:13:36    ERROR - Return code: -15
> [task 2018-01-31T19:13:36.171Z] 19:13:36    ERROR - No suite end message was emitted by this harness.
> [task 2018-01-31T19:13:36.172Z] 19:13:36    ERROR - # TBPL FAILURE #

How are we handling the process on timeout? Are we abruptly killing it? It's possible that, if we are abruptly killing it, we could lose coverage.
(In reply to Marco Castelluccio [:marco] from comment #7)
> How are we handling the process on timeout? Are we abruptly killing it? It's
> possible that, if we are abruptly killing it, we could lose coverage.

We are indeed losing coverage, there are no gcda or jsvm info files generated.
The first question still stands, are we killing the process or are we letting it run and terminating the job? If we are killing it, how are we killing it?
(In reply to Marco Castelluccio [:marco] from comment #8)
> (In reply to Marco Castelluccio [:marco] from comment #7)
> > How are we handling the process on timeout? Are we abruptly killing it? It's
> > possible that, if we are abruptly killing it, we could lose coverage.
> 
> We are indeed losing coverage, there are no gcda or jsvm info files
> generated.
> The first question still stands, are we killing the process or are we
> letting it run and terminating the job? If we are killing it, how are we
> killing it?

James, do you know?
Flags: needinfo?(james)
The logging

> [task 2018-01-31T19:13:36.171Z] 19:13:36    ERROR - timed out after 1000 seconds of no output
> [task 2018-01-31T19:13:36.171Z] 19:13:36    ERROR - Return code: -15

indicates that taskcluster is killing the whole testrunner with SIGTERM. Generally wpt tries to do a graceful shutdown by first requesting an in-app shutdown and then falling back on SIGTERM and SIGKILL only if required. But the shutdown for wdspec tests is a little different; it looks like we don't necessarily try for a more graceful shutdown than sending SIGTERM in that case. So there is possibly a bug to fix here, but in the specific case mentioned it's nothing to do with wpt and it's not really fixable.
Flags: needinfo?(james)
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.