Closed
Bug 1405428
Opened 7 years ago
Closed 6 years ago
Run test-verify on all supported tests to find and eliminate existing failures
Categories
(Testing :: General, defect, P3)
Testing
General
Tracking
(firefox63 fixed)
RESOLVED
FIXED
mozilla63
Tracking | Status | |
---|---|---|
firefox63 | --- | fixed |
People
(Reporter: gbrown, Assigned: jmaher)
References
(Blocks 1 open bug)
Details
Attachments
(5 files, 3 obsolete files)
164.89 KB,
patch
|
gbrown
:
review+
|
Details | Diff | Splinter Review |
10.75 KB,
patch
|
gbrown
:
review+
|
Details | Diff | Splinter Review |
16.51 KB,
patch
|
gbrown
:
review+
|
Details | Diff | Splinter Review |
35.62 KB,
patch
|
gbrown
:
review+
|
Details | Diff | Splinter Review |
2.20 KB,
patch
|
gbrown
:
review+
|
Details | Diff | Splinter Review |
Follow-up on concern raised in bug 1405338. I'm not sure how feasible this is, but I should have a look at it.
Assignee | ||
Comment 1•6 years ago
|
||
I would imagine we would need to run tests in smaller chunks to make this happen, if we could have a mode where this could happen fairly easily and a way to parse logs or get results, this could be a trackable solution. a few thoughts here: 1) get the list of failing tests and annotate manifests so we can know what to expect as 'ignore TV'- that is easier to query and reduce 2) go suite by suite (like xpcshell, then devtools, etc.) and get all tests green, maybe 1 suite/quarter. 3) possibly enable a simple rule at first and make that single simple rule tier-1 while we work on greening up the rest. I think overall we need to have a starting point of what we expect as passing and failing.
Assignee | ||
Updated•6 years ago
|
Whiteboard: [PI:January]
Reporter | ||
Updated•6 years ago
|
Priority: -- → P3
Comment 2•6 years ago
|
||
Moving this to March since it's a P3 - :gbrown, if you prefer it to be on your February list then go ahead and change the whiteboard tag thanks :)
Flags: needinfo?(gbrown)
Whiteboard: [PI:January] → [PI:February]
Reporter | ||
Comment 3•6 years ago
|
||
I don't have any specific plans for working on this in the near future.
Flags: needinfo?(gbrown)
Whiteboard: [PI:February] → [PI:March]
Assignee | ||
Updated•6 years ago
|
Whiteboard: [PI:March]
Reporter | ||
Comment 4•6 years ago
|
||
We can now mark tests that fail test-verify as skip-if = 'verify' -- that might be useful here (but should be used with caution).
Reporter | ||
Comment 5•6 years ago
|
||
We have sometimes thought of this bug as the major impediment to running TV as tier 1, but that is a misconception. Even if we identify all the current TV failures, new test failures may be introduced by product changes: Since tests are not modified, TV will not run against the broken tests...until the test is modified. Then again, we are back to the place where a TV failure does not always indicate a fault in the push.
Assignee | ||
Comment 6•6 years ago
|
||
good point, I hadn't thought through the the end of the solution like you did. How could we measure how often this happens in our current state of tier-2? If we could do that, it would help understand the risk or the solution to it. Possibly having TV-backfill(<testname>) implemented in treeherder would give us the ability to find the offending cause or |skip-if = verify|.
Assignee | ||
Comment 7•6 years ago
|
||
if we can easily run the failing test on the previous push, then this would help us determine if this is test related or product related and we could find the root cause if we want to backfill far enough :)
Assignee | ||
Comment 8•6 years ago
|
||
I filed bug 1465117 to do the work regarding running test-verify on a previous revision.
Assignee | ||
Comment 9•6 years ago
|
||
Reporter | ||
Comment 10•6 years ago
|
||
Comment on attachment 8984506 [details] [diff] [review] annotate mochitest files for linux and some osx/windows Review of attachment 8984506 [details] [diff] [review]: ----------------------------------------------------------------- I spot-checked some changes -- certainly not an exhaustive review -- and what I looked at looked fine - bravo! One minor curiosity...Most TV failure bugs that I have seen have been TV failures on all/most platforms: If it fails on Linux, I expect to see failures on Windows/osx/Android, and on opt/debug/pgo - usually. Many of your annotations are quite specific: verify && debug && (os == linux). And many seem to be debug-only failures. Is that right? Do you have any insight into why? And one big concern: I'm not *sure* I see the point of this bug (which I filed!). Let's briefly pause to consider whether this is a good idea. On the negative side, with these annotations we won't verify any of these tests, and probably no one will ever investigate why they fail verification. Also it makes the manifests more crowded/less readable. On the positive side, I suppose we increase confidence in TV failures and reduce TV noise: We eliminate most of today's TV failures, so when we subsequently see a TV failure, it's more unexpected and more likely to be related to recent changes. But failures still don't positively indicate a failure caused by the failing push - comment 5. I'd like to hear your thoughts on this trade-off and any other issues you see. But neither concern need hold up check-in: If you are confident this is a good idea, let's proceed before this bit-rots!
Attachment #8984506 -
Flags: review?(gbrown) → review+
Assignee | ||
Comment 11•6 years ago
|
||
high rate of debug failures: leaks/assertions are only found on debug specific platforms: some tests are platform specific or don't run on a platform for other reasons ** I still need to fine tooth comb windows/osx, so some of these might become more generic in due time this is a good point about clutter and the overall point. This did give me pause to reconsider. I do think we should move forward; these will not be too hard to remove if we change our mind. I am thinking someone could chip away at these failures, contributors, etc. I bet half of these would be easy to fix; Then we challenge- what if one of these tests start intermittently failing at a high rate and we run TV to see how stable it is...but it doesn't run, so we cannot test it. Possibly we do not look for skip-if when running from a MOZHARNESS_TEST_PATHS env var.
Comment 12•6 years ago
|
||
Pushed by jmaher@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/b7974b9e7221 skip-if = verify on mochitests which do not pass test-verify. r=gbrown
Assignee | ||
Comment 13•6 years ago
|
||
we still have jsreftest/crashtest/reftest, xpcshell, and web-platform-tests
Keywords: leave-open
Comment 14•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/b7974b9e7221
Assignee | ||
Comment 15•6 years ago
|
||
xpcshell was much easier than mochitest
Attachment #8985112 -
Flags: review?(gbrown)
Reporter | ||
Comment 16•6 years ago
|
||
Comment on attachment 8985112 [details] [diff] [review] annotate xpcshell files Review of attachment 8985112 [details] [diff] [review]: ----------------------------------------------------------------- ::: toolkit/modules/tests/xpcshell/xpcshell.ini @@ +52,5 @@ > skip-if = toolkit == 'android' > [test_Services.js] > skip-if = toolkit == 'android' > [test_sqlite.js] > +skip-if = toolkit == 'android' || (verify && !debug && os == 'win') I'm not sure if it is important, but you have been inconsistent in using () around os == ... in complex expressions. Consider: skip-if = toolkit == 'android' || (verify && !debug && (os == 'win'))
Attachment #8985112 -
Flags: review?(gbrown) → review+
Assignee | ||
Comment 17•6 years ago
|
||
thanks for calling that out- one is my script which uses (os == ...) so we can chain multiple os clauses, and the other is fine tuning by hand when I sanity check the patch and find a couple other cases to annotate.
Comment 18•6 years ago
|
||
Pushed by jmaher@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/5dfe7a1c9e6f skip-if = verify on xpcshell tests which do not pass test-verify. r=gbrown
Comment 20•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/5dfe7a1c9e6f
Reporter | ||
Comment 21•6 years ago
|
||
Comment on attachment 8985236 [details] [diff] [review] annotate crashtests that fail on test-verify Review of attachment 8985236 [details] [diff] [review]: ----------------------------------------------------------------- Almost entirely linux/debug issues here - how odd!
Attachment #8985236 -
Flags: review?(gbrown) → review+
Comment 22•6 years ago
|
||
Pushed by jmaher@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/b1fa2cf6ccae annotate crashtest files. r=gbrown
Assignee | ||
Comment 23•6 years ago
|
||
almost all the crashtest issues are filed as bug 1468949.
Comment 24•6 years ago
|
||
Backed out changeset b1fa2cf6ccae (bug 1405428) for frequent failures in tests/dom/media/test/crashtests/crashtests.list Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=183333616&repo=mozilla-inbound Push with failures: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=b1fa2cf6ccaeed98689a3b35d3b97adf3181a677&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable&selectedJob=183333616 Backout: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=85c1c680af293ab3314a203a4bdcf42328954c79&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable&filter-resultStatus=success
Flags: needinfo?(jmaher)
Assignee | ||
Comment 25•6 years ago
|
||
this time using reftest annotations (not manifestparser). As a bonus I have data from reftest to add :)
Attachment #8985236 -
Attachment is obsolete: true
Flags: needinfo?(jmaher)
Attachment #8986166 -
Flags: review?(gbrown)
Reporter | ||
Updated•6 years ago
|
Attachment #8986166 -
Flags: review?(gbrown) → review+
Comment 26•6 years ago
|
||
Pushed by jmaher@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/b1c9becdad64 annotate crashtests and reftests which fail in test-verify mode. r=gbrown
Assignee | ||
Comment 27•6 years ago
|
||
after this patch is just jsreftests, then we are "done". that will get us to 98%, and there are: * tests which landed between running tests and landing annotations * tests which landed/changed/failed prior to tier-1 * tests which have duplicate names * tests which were accidentally overlooked in the generation of test names (I suspect a small volume)
Attachment #8986209 -
Flags: review?(gbrown)
Reporter | ||
Comment 28•6 years ago
|
||
Comment on attachment 8986209 [details] [diff] [review] annotate web-platform-tests that fail on test-verify Review of attachment 8986209 [details] [diff] [review]: ----------------------------------------------------------------- "fails to repeat itself" might be misleading: Verify failures might be intermittent failures, chaos-only failures, relies on earlier test / cannot run by itself, or cannot repeat failures. (I assume you haven't investigated all of these!) s/fails to repeat itself/fails --verify mode/ if that's not too much trouble? I am far from an expert on wpt manifests, but these look okay to me. Please post some try links here, for the record. ::: testing/web-platform/meta/webrtc/RTCDTMFSender-ontonechange.https.html.ini @@ -1,2 @@ > [RTCDTMFSender-ontonechange.https.html] > - disabled: if debug and not e10s and (os == "win") and (version == "6.1.7601") and (processor == "x86") and (bits == 32): Did you intend to remove the original condition?
Attachment #8986209 -
Flags: review?(gbrown) → review+
Assignee | ||
Comment 29•6 years ago
|
||
I intended to remove the disabled clause as we do not run that condition anymore. I agree, |fails --verify mode| sounds better, I will update. here is a near the end try push: https://searchfox.org/mozilla-central/source/testing/talos/talos/tests/motionmark/htmlsuite.manifest original: osx: https://treeherder.mozilla.org/#/jobs?repo=try&revision=2a7b5273415fd1417703a5f7e42b2b35f1ea955d linux: https://treeherder.mozilla.org/#/jobs?repo=try&revision=ec90e9a5704b002b91149fdda80dc0d68c20655d windows: https://treeherder.mozilla.org/#/jobs?repo=try&revision=4e47fcb6c2aff5a5c5ea814f31ad4af38fe6461f
Comment 30•6 years ago
|
||
Pushed by jmaher@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/f2909770e169 annotate web-platform-tests which fail to pass test-verify. r=gbrown
Comment 31•6 years ago
|
||
Backed out changeset f2909770e169 (bug 1405428) for wpt failures with no valid output Backout link: https://hg.mozilla.org/integration/mozilla-inbound/rev/5d130a2d962c247af9316edb7506b19165e506dc Failures: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=f2909770e169489b93445827f00f464feb696228 Recent log: https://treeherder.mozilla.org/logviewer.html#?job_id=183824283&repo=mozilla-inbound
Flags: needinfo?(jmaher)
Comment 32•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/b1c9becdad64
Assignee | ||
Comment 33•6 years ago
|
||
relanding reftest/web-platform-tests annotations now that bug 1469583 is merged around
Flags: needinfo?(jmaher)
Comment 34•6 years ago
|
||
Pushed by jmaher@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/0b026cd7903d annotate web-platform-tests which fail to pass test-verify. r=gbrown
Comment 35•6 years ago
|
||
Backout by aciure@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/7af2b112f375 Backed out 1 changesets for wpt failures. CLOSED TREE
Assignee | ||
Comment 36•6 years ago
|
||
this was backed out because it failed some regular wpt tests (not test-verify). I had removed a block of annotations in a few files to indicate it was a CRASH all the time and I was wrong, it is not all the time (I think only on debug), so I put that back as it was originally. failures: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=0b026cd7903d9de7ff047550a73600fe2a8101b7 try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=9564b906d3acea07668d8af5a50f90e0cccdec9d
Attachment #8986209 -
Attachment is obsolete: true
Attachment #8987578 -
Flags: review?(gbrown)
Assignee | ||
Comment 37•6 years ago
|
||
once we do that, we can see a lot of green jsreftests: https://treeherder.mozilla.org/#/jobs?repo=try&revision=6df562639e234df6006ed0bfa971bdebfe6b5c34
Attachment #8987580 -
Flags: review?(gbrown)
Reporter | ||
Comment 38•6 years ago
|
||
Comment on attachment 8987580 [details] [diff] [review] fix the jsreftest paths in perfile.py- s/test/tests/ Review of attachment 8987580 [details] [diff] [review]: ----------------------------------------------------------------- ::: taskcluster/taskgraph/util/perfile.py @@ +59,5 @@ > except CalledProcessError: > return 0 > > + # TODO: move this to a static function so we only load it once > + # Hack a method for inserting a file to a try commit that we can run specific tests Did you mean to include this? @@ +101,5 @@ > > if not gpu: > test_count += 1 > > + print("JMAHER: total changed_files: %s, and test_count: %s" % (len(changed_files), test_count)) Accidental inclusion?
Reporter | ||
Updated•6 years ago
|
Attachment #8987578 -
Flags: review?(gbrown) → review+
Assignee | ||
Comment 39•6 years ago
|
||
and this is the real patch, sorry for the confusion.
Attachment #8987580 -
Attachment is obsolete: true
Attachment #8987580 -
Flags: review?(gbrown)
Attachment #8987595 -
Flags: review?(gbrown)
Reporter | ||
Updated•6 years ago
|
Attachment #8987595 -
Flags: review?(gbrown) → review+
Comment 40•6 years ago
|
||
Pushed by jmaher@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/3eab1911d04a annotate web-platform-tests which fail to pass test-verify. r=gbrown https://hg.mozilla.org/integration/mozilla-inbound/rev/7afe3bbea78c fix paths for detecting jsreftests. r=gbrown
Assignee | ||
Comment 41•6 years ago
|
||
once these are successfully on trunk, we can call this done
Keywords: leave-open
Comment 42•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/3eab1911d04a https://hg.mozilla.org/mozilla-central/rev/7afe3bbea78c
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
status-firefox63:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
You need to log in
before you can comment on or make changes to this bug.
Description
•