Closed Bug 1405428 Opened 8 years ago Closed 7 years ago

Run test-verify on all supported tests to find and eliminate existing failures

Tracking

(firefox63 fixed)

Status:

RESOLVED FIXED

Milestone:

mozilla63

Tracking Flags:

Tracking

Status

firefox63

---

fixed

People

(Reporter: gbrown, Assigned: jmaher)

References

(Blocks 1 open bug)

Details

Attachments

(5 files, 3 obsolete files)

annotate mochitest files for linux and some osx/windows 7 years ago Joel Maher ( :jmaher ) (UTC -8) (PTO back normal Nov 17) 164.89 KB, patch	gbrown : review+	Details \| Diff \| Splinter Review
annotate xpcshell files 7 years ago Joel Maher ( :jmaher ) (UTC -8) (PTO back normal Nov 17) 10.75 KB, patch	gbrown : review+	Details \| Diff \| Splinter Review
annotate crashtests that fail on test-verify 7 years ago Joel Maher ( :jmaher ) (UTC -8) (PTO back normal Nov 17) 6.30 KB, patch	gbrown : review+	Details \| Diff \| Splinter Review
annotate crashtests and reftests that fail on test-verify 7 years ago Joel Maher ( :jmaher ) (UTC -8) (PTO back normal Nov 17) 16.51 KB, patch	gbrown : review+	Details \| Diff \| Splinter Review
annotate web-platform-tests that fail on test-verify 7 years ago Joel Maher ( :jmaher ) (UTC -8) (PTO back normal Nov 17) 35.80 KB, patch	gbrown : review+	Details \| Diff \| Splinter Review
annotate web-platform-tests that fail on test-verify 7 years ago Joel Maher ( :jmaher ) (UTC -8) (PTO back normal Nov 17) 35.62 KB, patch	gbrown : review+	Details \| Diff \| Splinter Review
fix the jsreftest paths in perfile.py- s/test/tests/ 7 years ago Joel Maher ( :jmaher ) (UTC -8) (PTO back normal Nov 17) 4.25 KB, patch		Details \| Diff \| Splinter Review
fix the jsreftest paths in perfile.py- s/test/tests/ 7 years ago Joel Maher ( :jmaher ) (UTC -8) (PTO back normal Nov 17) 2.20 KB, patch	gbrown : review+	Details \| Diff \| Splinter Review

Geoff Brown [:gbrown]

Reporter

Description

•

8 years ago

Follow-up on concern raised in bug 1405338. I'm not sure how feasible this is, but I should have a look at it.

Joel Maher ( :jmaher ) (UTC -8) (PTO back normal Nov 17)

Assignee

Comment 1

•

7 years ago

I would imagine we would need to run tests in smaller chunks to make this happen, if we could have a mode where this could happen fairly easily and a way to parse logs or get results, this could be a trackable solution. a few thoughts here: 1) get the list of failing tests and annotate manifests so we can know what to expect as 'ignore TV'- that is easier to query and reduce 2) go suite by suite (like xpcshell, then devtools, etc.) and get all tests green, maybe 1 suite/quarter. 3) possibly enable a simple rule at first and make that single simple rule tier-1 while we work on greening up the rest. I think overall we need to have a starting point of what we expect as passing and failing.

Joel Maher ( :jmaher ) (UTC -8) (PTO back normal Nov 17)

Assignee

Updated

•

7 years ago

Whiteboard: [PI:January]

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

Priority: -- → P3

Robert Wood [:rwood]

Comment 2

•

7 years ago

Moving this to March since it's a P3 - :gbrown, if you prefer it to be on your February list then go ahead and change the whiteboard tag thanks :)

Flags: needinfo?(gbrown)

Whiteboard: [PI:January] → [PI:February]

Geoff Brown [:gbrown]

Reporter

Comment 3

•

7 years ago

I don't have any specific plans for working on this in the near future.

Flags: needinfo?(gbrown)

Whiteboard: [PI:February] → [PI:March]

Joel Maher ( :jmaher ) (UTC -8) (PTO back normal Nov 17)

Assignee

Updated

•

7 years ago

Whiteboard: [PI:March]

Geoff Brown [:gbrown]

Reporter

Comment 4

•

7 years ago

We can now mark tests that fail test-verify as skip-if = 'verify' -- that might be useful here (but should be used with caution).

Geoff Brown [:gbrown]

Reporter

Comment 5

•

7 years ago

We have sometimes thought of this bug as the major impediment to running TV as tier 1, but that is a misconception. Even if we identify all the current TV failures, new test failures may be introduced by product changes: Since tests are not modified, TV will not run against the broken tests...until the test is modified. Then again, we are back to the place where a TV failure does not always indicate a fault in the push.

Joel Maher ( :jmaher ) (UTC -8) (PTO back normal Nov 17)

Assignee

Comment 6

•

7 years ago

good point, I hadn't thought through the the end of the solution like you did. How could we measure how often this happens in our current state of tier-2? If we could do that, it would help understand the risk or the solution to it. Possibly having TV-backfill(<testname>) implemented in treeherder would give us the ability to find the offending cause or |skip-if = verify|.

Joel Maher ( :jmaher ) (UTC -8) (PTO back normal Nov 17)

Assignee

Comment 7

•

7 years ago

if we can easily run the failing test on the previous push, then this would help us determine if this is test related or product related and we could find the root cause if we want to backfill far enough :)

Joel Maher ( :jmaher ) (UTC -8) (PTO back normal Nov 17)

Assignee

Comment 8

•

7 years ago

I filed bug 1465117 to do the work regarding running test-verify on a previous revision.

Joel Maher ( :jmaher ) (UTC -8) (PTO back normal Nov 17)

Assignee

Updated

•

7 years ago

Depends on: 1465117

Joel Maher ( :jmaher ) (UTC -8) (PTO back normal Nov 17)

Assignee

Comment 9

•

7 years ago

Attached patch annotate mochitest files for linux and some osx/windows — Details — Splinter Review

Assignee: nobody → jmaher

Status: NEW → ASSIGNED

Attachment #8984506 - Flags: review?(gbrown)

Geoff Brown [:gbrown]

Reporter

Comment 10

•

7 years ago

Comment on attachment 8984506 [details] [diff] [review] annotate mochitest files for linux and some osx/windows Review of attachment 8984506 [details] [diff] [review]: ----------------------------------------------------------------- I spot-checked some changes -- certainly not an exhaustive review -- and what I looked at looked fine - bravo! One minor curiosity...Most TV failure bugs that I have seen have been TV failures on all/most platforms: If it fails on Linux, I expect to see failures on Windows/osx/Android, and on opt/debug/pgo - usually. Many of your annotations are quite specific: verify && debug && (os == linux). And many seem to be debug-only failures. Is that right? Do you have any insight into why? And one big concern: I'm not *sure* I see the point of this bug (which I filed!). Let's briefly pause to consider whether this is a good idea. On the negative side, with these annotations we won't verify any of these tests, and probably no one will ever investigate why they fail verification. Also it makes the manifests more crowded/less readable. On the positive side, I suppose we increase confidence in TV failures and reduce TV noise: We eliminate most of today's TV failures, so when we subsequently see a TV failure, it's more unexpected and more likely to be related to recent changes. But failures still don't positively indicate a failure caused by the failing push - comment 5. I'd like to hear your thoughts on this trade-off and any other issues you see. But neither concern need hold up check-in: If you are confident this is a good idea, let's proceed before this bit-rots!

Attachment #8984506 - Flags: review?(gbrown) → review+

Joel Maher ( :jmaher ) (UTC -8) (PTO back normal Nov 17)

Assignee

Comment 11

•

7 years ago

high rate of debug failures: leaks/assertions are only found on debug specific platforms: some tests are platform specific or don't run on a platform for other reasons ** I still need to fine tooth comb windows/osx, so some of these might become more generic in due time this is a good point about clutter and the overall point. This did give me pause to reconsider. I do think we should move forward; these will not be too hard to remove if we change our mind. I am thinking someone could chip away at these failures, contributors, etc. I bet half of these would be easy to fix; Then we challenge- what if one of these tests start intermittently failing at a high rate and we run TV to see how stable it is...but it doesn't run, so we cannot test it. Possibly we do not look for skip-if when running from a MOZHARNESS_TEST_PATHS env var.

Pulsebot

Comment 12

•

7 years ago

Pushed by jmaher@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/b7974b9e7221 skip-if = verify on mochitests which do not pass test-verify. r=gbrown

Joel Maher ( :jmaher ) (UTC -8) (PTO back normal Nov 17)

Assignee

Comment 13

•

7 years ago

we still have jsreftest/crashtest/reftest, xpcshell, and web-platform-tests

Keywords: leave-open

Eliza Balazs [:ebalazs_]

Comment 14

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/b7974b9e7221

Joel Maher ( :jmaher ) (UTC -8) (PTO back normal Nov 17)

Assignee

Comment 15

•

7 years ago

Attached patch annotate xpcshell files — Details — Splinter Review

xpcshell was much easier than mochitest

Attachment #8985112 - Flags: review?(gbrown)

Geoff Brown [:gbrown]

Reporter

Comment 16

•

7 years ago

Comment on attachment 8985112 [details] [diff] [review] annotate xpcshell files Review of attachment 8985112 [details] [diff] [review]: ----------------------------------------------------------------- ::: toolkit/modules/tests/xpcshell/xpcshell.ini @@ +52,5 @@ > skip-if = toolkit == 'android' > [test_Services.js] > skip-if = toolkit == 'android' > [test_sqlite.js] > +skip-if = toolkit == 'android' || (verify && !debug && os == 'win') I'm not sure if it is important, but you have been inconsistent in using () around os == ... in complex expressions. Consider: skip-if = toolkit == 'android' || (verify && !debug && (os == 'win'))

Attachment #8985112 - Flags: review?(gbrown) → review+

Joel Maher ( :jmaher ) (UTC -8) (PTO back normal Nov 17)

Assignee

Comment 17

•

7 years ago

thanks for calling that out- one is my script which uses (os == ...) so we can chain multiple os clauses, and the other is fine tuning by hand when I sanity check the patch and find a couple other cases to annotate.

Pulsebot

Comment 18

•

7 years ago

Pushed by jmaher@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/5dfe7a1c9e6f skip-if = verify on xpcshell tests which do not pass test-verify. r=gbrown

Joel Maher ( :jmaher ) (UTC -8) (PTO back normal Nov 17)

Assignee

Comment 19

•

7 years ago

Attached patch annotate crashtests that fail on test-verify (obsolete) — Details — Splinter Review

yes, these do crash!

Attachment #8985236 - Flags: review?(gbrown)

Narcis Beleuzu [:NarcisB]

Comment 20

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/5dfe7a1c9e6f

Geoff Brown [:gbrown]

Reporter

Comment 21

•

7 years ago

Comment on attachment 8985236 [details] [diff] [review] annotate crashtests that fail on test-verify Review of attachment 8985236 [details] [diff] [review]: ----------------------------------------------------------------- Almost entirely linux/debug issues here - how odd!

Attachment #8985236 - Flags: review?(gbrown) → review+

Pulsebot

Comment 22

•

7 years ago

Pushed by jmaher@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/b1fa2cf6ccae annotate crashtest files. r=gbrown

Joel Maher ( :jmaher ) (UTC -8) (PTO back normal Nov 17)

Assignee

Comment 23

•

7 years ago

almost all the crashtest issues are filed as bug 1468949.

Eliza Balazs [:ebalazs_]

Comment 24

•

7 years ago

Flags: needinfo?(jmaher)

Joel Maher ( :jmaher ) (UTC -8) (PTO back normal Nov 17)

Assignee

Comment 25

•

7 years ago

Attached patch annotate crashtests and reftests that fail on test-verify — Details — Splinter Review

this time using reftest annotations (not manifestparser). As a bonus I have data from reftest to add :)

Attachment #8985236 - Attachment is obsolete: true

Flags: needinfo?(jmaher)

Attachment #8986166 - Flags: review?(gbrown)

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

Attachment #8986166 - Flags: review?(gbrown) → review+

Pulsebot

Comment 26

•

7 years ago

Pushed by jmaher@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/b1c9becdad64 annotate crashtests and reftests which fail in test-verify mode. r=gbrown

Joel Maher ( :jmaher ) (UTC -8) (PTO back normal Nov 17)

Assignee

Comment 27

•

7 years ago

Attached patch annotate web-platform-tests that fail on test-verify (obsolete) — Details — Splinter Review

after this patch is just jsreftests, then we are "done". that will get us to 98%, and there are: * tests which landed between running tests and landing annotations * tests which landed/changed/failed prior to tier-1 * tests which have duplicate names * tests which were accidentally overlooked in the generation of test names (I suspect a small volume)

Attachment #8986209 - Flags: review?(gbrown)

Geoff Brown [:gbrown]

Reporter

Comment 28

•

7 years ago

Comment on attachment 8986209 [details] [diff] [review] annotate web-platform-tests that fail on test-verify Review of attachment 8986209 [details] [diff] [review]: ----------------------------------------------------------------- "fails to repeat itself" might be misleading: Verify failures might be intermittent failures, chaos-only failures, relies on earlier test / cannot run by itself, or cannot repeat failures. (I assume you haven't investigated all of these!) s/fails to repeat itself/fails --verify mode/ if that's not too much trouble? I am far from an expert on wpt manifests, but these look okay to me. Please post some try links here, for the record. ::: testing/web-platform/meta/webrtc/RTCDTMFSender-ontonechange.https.html.ini @@ -1,2 @@ > [RTCDTMFSender-ontonechange.https.html] > - disabled: if debug and not e10s and (os == "win") and (version == "6.1.7601") and (processor == "x86") and (bits == 32): Did you intend to remove the original condition?

Attachment #8986209 - Flags: review?(gbrown) → review+

Joel Maher ( :jmaher ) (UTC -8) (PTO back normal Nov 17)

Assignee

Comment 29

•

7 years ago

I intended to remove the disabled clause as we do not run that condition anymore. I agree, |fails --verify mode| sounds better, I will update. here is a near the end try push: https://searchfox.org/mozilla-central/source/testing/talos/talos/tests/motionmark/htmlsuite.manifest original: osx: https://treeherder.mozilla.org/#/jobs?repo=try&revision=2a7b5273415fd1417703a5f7e42b2b35f1ea955d linux: https://treeherder.mozilla.org/#/jobs?repo=try&revision=ec90e9a5704b002b91149fdda80dc0d68c20655d windows: https://treeherder.mozilla.org/#/jobs?repo=try&revision=4e47fcb6c2aff5a5c5ea814f31ad4af38fe6461f

Pulsebot

Comment 30

•

7 years ago

Pushed by jmaher@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/f2909770e169 annotate web-platform-tests which fail to pass test-verify. r=gbrown

Bogdan Tara[:bogdan_tara | bogdant]

Comment 31

•

7 years ago

Backed out changeset f2909770e169 (bug 1405428) for wpt failures with no valid output Backout link: https://hg.mozilla.org/integration/mozilla-inbound/rev/5d130a2d962c247af9316edb7506b19165e506dc Failures: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=f2909770e169489b93445827f00f464feb696228 Recent log: https://treeherder.mozilla.org/logviewer.html#?job_id=183824283&repo=mozilla-inbound

Flags: needinfo?(jmaher)

Andreea Pavel [:apavel]

Comment 32

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/b1c9becdad64

Joel Maher ( :jmaher ) (UTC -8) (PTO back normal Nov 17)

Assignee

Comment 33

•

7 years ago

relanding reftest/web-platform-tests annotations now that bug 1469583 is merged around

Flags: needinfo?(jmaher)

Pulsebot

Comment 34

•

7 years ago

Pushed by jmaher@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/0b026cd7903d annotate web-platform-tests which fail to pass test-verify. r=gbrown

Pulsebot

Comment 35

•

7 years ago

Backout by aciure@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/7af2b112f375 Backed out 1 changesets for wpt failures. CLOSED TREE

Joel Maher ( :jmaher ) (UTC -8) (PTO back normal Nov 17)

Assignee

Comment 36

•

7 years ago

Attached patch annotate web-platform-tests that fail on test-verify — Details — Splinter Review

this was backed out because it failed some regular wpt tests (not test-verify). I had removed a block of annotations in a few files to indicate it was a CRASH all the time and I was wrong, it is not all the time (I think only on debug), so I put that back as it was originally. failures: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=0b026cd7903d9de7ff047550a73600fe2a8101b7 try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=9564b906d3acea07668d8af5a50f90e0cccdec9d

Attachment #8986209 - Attachment is obsolete: true

Attachment #8987578 - Flags: review?(gbrown)

Joel Maher ( :jmaher ) (UTC -8) (PTO back normal Nov 17)

Assignee

Comment 37

•

7 years ago

Attached patch fix the jsreftest paths in perfile.py- s/test/tests/ (obsolete) — Details — Splinter Review

once we do that, we can see a lot of green jsreftests: https://treeherder.mozilla.org/#/jobs?repo=try&revision=6df562639e234df6006ed0bfa971bdebfe6b5c34

Attachment #8987580 - Flags: review?(gbrown)

Geoff Brown [:gbrown]

Reporter

Comment 38

•

7 years ago

Comment on attachment 8987580 [details] [diff] [review] fix the jsreftest paths in perfile.py- s/test/tests/ Review of attachment 8987580 [details] [diff] [review]: ----------------------------------------------------------------- ::: taskcluster/taskgraph/util/perfile.py @@ +59,5 @@ > except CalledProcessError: > return 0 > > + # TODO: move this to a static function so we only load it once > + # Hack a method for inserting a file to a try commit that we can run specific tests Did you mean to include this? @@ +101,5 @@ > > if not gpu: > test_count += 1 > > + print("JMAHER: total changed_files: %s, and test_count: %s" % (len(changed_files), test_count)) Accidental inclusion?

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

Attachment #8987578 - Flags: review?(gbrown) → review+

Joel Maher ( :jmaher ) (UTC -8) (PTO back normal Nov 17)

Assignee

Comment 39

•

7 years ago

Attached patch fix the jsreftest paths in perfile.py- s/test/tests/ — Details — Splinter Review

and this is the real patch, sorry for the confusion.

Attachment #8987580 - Attachment is obsolete: true

Attachment #8987580 - Flags: review?(gbrown)

Attachment #8987595 - Flags: review?(gbrown)

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

Attachment #8987595 - Flags: review?(gbrown) → review+

Pulsebot

Comment 40

•

7 years ago

Pushed by jmaher@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/3eab1911d04a annotate web-platform-tests which fail to pass test-verify. r=gbrown https://hg.mozilla.org/integration/mozilla-inbound/rev/7afe3bbea78c fix paths for detecting jsreftests. r=gbrown

Joel Maher ( :jmaher ) (UTC -8) (PTO back normal Nov 17)

Assignee

Comment 41

•

7 years ago

once these are successfully on trunk, we can call this done

Keywords: leave-open

Eliza Balazs [:ebalazs_]

Comment 42

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/3eab1911d04a https://hg.mozilla.org/mozilla-central/rev/7afe3bbea78c

Status: ASSIGNED → RESOLVED

Closed: 7 years ago

status-firefox63: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla63

Andreas Pehrson [:pehrsons]

Updated

•

7 years ago

Depends on: 1498538

Mark Banner (:standard8)

Updated

•

6 years ago

Regressions: 1577197

Geoff Brown [:gbrown]

Reporter

Updated

•

4 years ago

Updated

•

3 years ago