<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Updated

•

2 years ago

Regressed by: 854796

No longer regressions: 854796

Updated

•

2 years ago

Keywords: regression

Comment 3

•

2 years ago

Set release status flags based on info from the regressing bug 854796

status-firefox115: --- → unaffected

status-firefox116: --- → unaffected

status-firefox117: --- → affected

status-firefox-esr102: --- → unaffected

status-firefox-esr115: --- → unaffected

Comment 4

•

2 years ago

I think the failure is in accessible/tests/mochitest/actions/, but I cannot figure out what it is.

Assignee

Comment 5

•

2 years ago

Yeah, me neither; I can't seem to find any TEST-FAIL, despite it reporting one failure in that suite. A hint seems to be that actions/test_link.html takes the longest (at around 1.5 seconds), but that doesn't seem particularly excessive to me. I also don't know what the time for that test file was before. My patch did adjust how we deal with links, and touched that file. I'm having no luck getting this to fail locally, even with --verify.

Comment 6

•

2 years ago

a reported failure without a TEST-FAIL is usually something outside of the harness, like a shutdown timeout, a leak, a shutdown crash, an assertion outside of a test.

I didn't see anything in the logs. This is frustrating that we cannot see the failure- maybe a bit more looking and I or someone else will see the failure.

Comment hidden (Intermittent Failures Robot)

Comment 8

•

2 years ago

hmm, a try push doesn't reproduce this:
https://treeherder.mozilla.org/jobs?repo=try&tier=1%2C2%2C3&revision=60601afff062b14865823839e6d12eedb7c0bb06&selectedTaskRun=HZ9lKzHhRRGJ4THrrSKYRw.0

possibly that is related to an artifact build, I will try a few other pushes

Comment 9

•

2 years ago

I needed to retrigger. I bisected this down a bit, I can get this down to 2 tests. I have 2 combinations that work:

1 (try push):

 [test_keys.xhtml]
 [test_link.html]

2 (try push):

 [test_link.html]
 [test_media.html]

but not (try push):

 [test_keys.html]
 [test_media.html]

I don't see anything in the log to indicate what is wrong, but given the root cause of this and the narrowed down tests, maybe something can be figured out.

Comment hidden (Intermittent Failures Robot)

Assignee

Comment 14

•

2 years ago

Okay, thank you for the help. The cause is not clear to me even after investigation. I think I will try to re-write the seemingly-problematic tests in our newer accessibility testing framework so that I can remove the test file that seems to be causing problems.

Comment 15

•

2 years ago

Set release status flags based on info from the regressing bug 854796

status-firefox118: --- → affected

Comment hidden (Intermittent Failures Robot)

Updated

•

2 years ago

Duplicate of this bug: 1847153

Comment 19

•

2 years ago

FWIW, there are dozens of instances of this failure that are being mis-starred to grab-bag-bug 1538599 because the log parser was seeing "exit status 2" and suggesting bug 1538599 as a result. I'm extending the title of this bug in the hopes that treeherder will pick that up and start suggesting this bug instead.

Summary: Frequent The mochitest suite: mochitest-a11y ran with return status: FAILURE → Frequent The mochitest suite: mochitest-a11y ran with return status: FAILURE (use this bug if you see "exit status 2" for a mochitest-a11y log)

Comment hidden (Intermittent Failures Robot)

[1] https://firefox-source-docs.mozilla.org/testing/mochitest-plain/faq.html#how-can-i-get-the-full-log-output-for-my-test-in-automation-for-debugging

Comment 21

•

2 years ago

•

Edited

One idea for a way forward here... Maybe it'd be worth activating the "full log" for mochitest-a11y-* tests (temporarily at least), to hopefully gather some more data? (That might even make the test-failure line show up in the log, which would be a clue about what's going on and why it's not showing up in the current "more concise" logs.)

In other words: it might be worth trying to effectively call SimpleTest.requestCompleteLog()[1] for every single test in this test-suite. (But probably not actually do that -- just change the default behavior for mochitest-a11y-* or something like that, temporarily.)

(I don't know precisely how to do that, but I'm pretty sure it should be possible.)

Comment 22

•

2 years ago

We have good reason to believe the problematic test is accessible/tests/mochitest/actions/test_link.html. At a minimum, we could at least try enabling the full log for that test. But that might not be so useful because I suspect this is somehow impacting other tests via side effects or similar. Otherwise, we should just be seeing a failure in one specific test.

Joel, is there a way to enable the full log for an entire directory or do we need to land a patch to enable it for every test in the actions directory?

Flags: needinfo?(jmaher)

Updated

•

2 years ago

Depends on: 1847161

Comment 23

•

2 years ago

•

Edited

(In reply to James Teh [:Jamie] from comment #22)

We have good reason to believe the problematic test is accessible/tests/mochitest/actions/test_link.html. At a minimum, we could at least try enabling the full log for that test.

I posted a patch in bug 1847161 to do that for that test and the other two that were mentioned in comment 9. That should be sufficient, based on what jmaher noted there.

Henrik Skupin [:whimboo][⌚️UTC+1]

Comment 24

•

2 years ago

Let's see what that yields.

Flags: needinfo?(jmaher)

Updated

•

2 years ago

Keywords: intermittent-failure

Comment 25

•

2 years ago

thanks :dholbert, I need to add some harness debugging as well to determine where the error might be coming from

Comment 26

•

2 years ago

•

Edited

With the full logs for these tests: comparing one "bad" log vs. one "good" log for the same job type (from the previous treeherder cycle, so testing essentially same code/conditions), I'm not seeing any meaningful differences between the two logs.

"bad": https://treeherder.mozilla.org/logviewer?job_id=424980707&repo=autoland
"good": https://treeherder.mozilla.org/logviewer?job_id=424977939&repo=autoland

After stripping out timestamps and some background noise to make the logs more-usefully-diffable, I'm not seeing any meaningful differences when comparing the "good" vs. "bad" log side-by-side, focusing on the relevant portion which is the portion leading up to this in the bad log:

INFO - Passed:  382
INFO - Failed:  1
INFO - Todo:    1

(The "good" log instead has Failed: 0 but the Passed/Todo values are the same, 382 & 1.)

The portion of the log above that, up to the previous Passed/Failed/Todo results section, looks essentially the same to me between good & bad logs.

Comment 27

•

2 years ago

So: this is looking a lot like it might be a harness bug of some sort (or an extremely weird test bug that somehow produces an increment in the test failure count without logging anything at all).

Assignee

Comment 28

•

2 years ago

•

Edited

Thanks Daniel. Looking narrowly at what my changes actually did that could have caused this, my best guess is that we're dealing with a timing issue of some sort having to do with accessible recreation. I was getting timeouts (with useful error messages) when running the patch through Try, though I ended up fixing those by rewriting the problematic tests as browser tests. The changes I made cause recreation of accessibles if an anchor element gains/loses an href (in the absence of a click listener), or gains/loses a click listener (in the absence of an href). If we have neither, the spec requires us to treat those accessibles as generics, so we recreate them as HyperTextAccessibles. The timing issues that I was seeing on Try before landing my initial patch appeared to be related to markup of the form <a onclick=""></a>. I fixed the issues by adding an href attribute to those anchor elements in the test so that the accessible wouldn't be recreated (the href would force it to be a link from the beginning), and rewriting the pieces of the test that expected no href as a browser test, operating under the theory that the "onclick" attribute was being processed later and forcing recreation of the accessible. The browser tests didn't have any problem with that, and Try was happy.

This is all to say: I don't know what about this confluence of factors causes these mystery failures and I certainly don't understand what the error is now, but my leading suspect is - waves hands vaguely - something to do with accessible recreation, and I think the way forward is to rewrite test_link.html as a browser test, which I'm working on now.

Assignee

Comment 29

•

2 years ago

Attached file Bug 1845221: Remove failing mochitest, migrate tests to browser tests,r?eeejay,jamie — Details

This revision removes test_link.html since it has been failing mysteriously in
CI. It's unclear why this is happening, and no amount of logging or local
testing has revealed the issue. This revision migrates the tests to the browser
tests, lumping it in with similar tests in browser_caching_actions.js.

Phabricator Automation

Updated

•

2 years ago

Assignee: nobody → nlapre

Status: NEW → ASSIGNED

Comment 30

•

2 years ago

•

Edited

FWIW I added some logging to try to get to the bottom of who/what is incrementing the failure count:
https://treeherder.mozilla.org/jobs?repo=try&revision=d69ef28d9bcc9a42108f91f506fc53e75d70b9b2

In at least some cases, the issue seems to be from some test logging a result after SimpleTest.finish() is called:
https://treeherder.mozilla.org/logviewer?job_id=425024689&repo=try&lineNumber=3774-3813

(i.e. we're hitting this line which increments the test failure count:
https://searchfox.org/mozilla-central/rev/b4dadcc4c45a2e381a4866d28ebf3a901decccc1/testing/mochitest/tests/SimpleTest/TestRunner.js#854

TestRunner.updateUI([{ result: false }]);

)

This is probably test_link.html since this happens during that test.

Not every failure triggers my logging for this situation, but some of them (e.g. the log linked in this comment) do seem to be doing this. I know we used to (and probably still intend to?) print a log message when this happens, but apparently that's not happening right now (aside from the logging that I'm adding here).

nlapre, it's probably worth looking into how/why that test is still registering more test-results after SimpleTest.finish() is called (maybe it's listening for events and calling is(), ok() , etc. in those event-handlers; and some events come in and trigger those handlers after the test calls SimpleTest.finish()?)

(I don't know to-what-extent your posted rewrite patch would already address/not-address that issue.)

https://hg.mozilla.org/mozilla-central/rev/1d8ec2085c98

Assignee

Comment 31

•

2 years ago

Huh, interesting! Thanks for that. My knowledge of the fine inner workings of the accessibility event queue tests is limited, but I'm surprised that it manages to keep listening for things even after processing the entire event queue. It's unexpected to me that the failure message doesn't get through, but that your logging does.

(I don't know to-what-extent your posted rewrite patch would already address/not-address that issue.)

I think it would fully address those issues, by way of deleting the test file entirely :) More seriously, I suspect that the introduction of async/await in the browser test rewrite of these tests will render these issues obsolete, since the listeners are better-encapsulated.

Comment hidden (Intermittent Failures Robot)

Pulsebot

Comment 34

•

2 years ago

Pushed by nlapre@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/1d8ec2085c98 Remove failing mochitest, migrate tests to browser tests,r=eeejay

Comment hidden (Intermittent Failures Robot)

Iulian Moraru

Comment 36

•

2 years ago

bugherder

Status: ASSIGNED → RESOLVED

Closed: 2 years ago

status-firefox118: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → 118 Branch

Comment 37

•

2 years ago

The patch landed in nightly and beta is affected.
:nlapre, is this bug important enough to require an uplift?

If yes, please nominate the patch for beta approval.
If no, please set status-firefox117 to wontfix.

For more information, please visit BugBot documentation.

Flags: needinfo?(nlapre)

Assignee

Updated

•

2 years ago

status-firefox117: affected → wontfix

Flags: needinfo?(nlapre)

Updated

•

2 years ago

Depends on: 1848205

Comment 38

•

2 years ago

(In reply to Daniel Holbert [:dholbert] from comment #30)

In at least some cases, the issue seems to be from some test logging a result after SimpleTest.finish() is called:
https://treeherder.mozilla.org/logviewer?job_id=425024689&repo=try&lineNumber=3774-3813

It's actually a bit more subtle than this. See bug 1848205 to see what was going on. It seems that the issue was that this test was navigating and thereby getting a second callback to the unload handler that checks for results being logged after finish is called; and in the second callback, some data was missing and so the test harness's logic falls over and fails to log, despite incrementing the failure count.

Comment 39

•

2 years ago

Attached file Bug 1845221 part 1: Split up two-part condition. r?jmaher (obsolete) — Details

This patch doesn't affect behavior; it's just splitting one compound check into
two nested ones, and adjusting the indentation.

I'm doing this because the next patch will adjust the inner logic, and
splitting it up this way lets me do that in a more targeted way.

Comment 40

•

2 years ago

Attached file Bug 1845221 part 2: Log an error when mochitests hit unload handler with missing data. r?jmaher (obsolete) — Details

As described in the bug, this can sometimes happen when mochitests navigate,
e.g. due to synthesizing a click on a link to the same document.

Depends on D185943

Comment 41

•

2 years ago

Sorry, copypaste error; I posted those^ patches on the wrong bug. They're meant for bug 1848205.

Phabricator Automation

Comment 42

•

2 years ago

Comment on attachment 9348452 [details]
Bug 1845221 part 1: Split up two-part condition. r?jmaher

Revision D185943 was moved to bug 1848205. Setting attachment 9348452 [details] to obsolete.

Attachment #9348452 - Attachment is obsolete: true

Phabricator Automation

Comment 43

•

2 years ago

Comment on attachment 9348453 [details]
Bug 1845221 part 2: Log an error when mochitests hit unload handler with missing data. r?jmaher

Revision D185944 was moved to bug 1848205. Setting attachment 9348453 [details] to obsolete.

Attachment #9348453 - Attachment is obsolete: true

Comment hidden (Intermittent Failures Robot)

Comment 45

•

2 years ago

•

Edited

(In reply to Intermittent Failures Robot from comment #44)

57 failures in 3341 pushes (0.017 failures/push) were associated with this bug in the last 7 days.

Note, we haven't actually had any failures reported on trunk after the fix landed in comment 36, so don't let comment 44 alarm you. :)

Comment 46

•

2 years ago

•

Edited

Ah, but we did have 29 failures on beta. We should uplift Nathan's patch to beta to avoid spurious test failures there (which may be confusing/difficult-to-star per comment 19)

Assignee

Comment 47

•

2 years ago

Great idea, thanks Daniel. Ryan, could you uplift this patch to beta? If we don't, we'll see spurious test failures causing noise, as Daniel mentioned. It should apply cleanly.

Flags: needinfo?(ryanvm)

Assignee

Updated

•

2 years ago

status-firefox117: wontfix → affected

Updated

•

2 years ago

Flags: needinfo?(ryanvm)

Comment 48

•

2 years ago

Comment on attachment 9347489 [details]
Bug 1845221: Remove failing mochitest, migrate tests to browser tests,r?eeejay,jamie

Beta/Release Uplift Approval Request

User impact if declined: Spurious test failures which might be confusing and difficult to attribute.
Is this code covered by automated tests?: Yes
Has the fix been verified in Nightly?: Yes
Needs manual test from QE?: No
If yes, steps to reproduce:
List of other uplifts needed: None
Risk to taking this patch: Low
Why is the change risky/not risky? (and alternatives if risky): Only impacts tests. Ports a test from one framework to another.
String changes made/needed:
Is Android affected?: No

Attachment #9347489 - Flags: approval-mozilla-beta?

Comment 49

•

2 years ago

(I think this technically doesn't need approval since it's test-only, but that is a good way to be sure it gets noticed. :) )

Ryan VanderMeulen [:RyanVM]

Comment 50

•

2 years ago

Sorry, I didn't realise you and Nathan had talked about this not needing to go through the normal uplift process. I wasn't aware of the approval exemption for test only patches. My bad.

Comment 51

•

2 years ago

Comment on attachment 9347489 [details]
Bug 1845221: Remove failing mochitest, migrate tests to browser tests,r?eeejay,jamie

Indeed, no approval needed since it's test-only, but I'll rubberstamp it anyway :-)

Attachment #9347489 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Pulsebot

Comment 52

•

2 years ago

uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/dcd75122a622

Ryan VanderMeulen [:RyanVM]

Updated

•

2 years ago

status-firefox117: affected → fixed