Open Bug 1585239 Opened 2 years ago Updated 2 years ago

Intermittent accessible/tests/browser/states/browser_offscreen_element_in_out_of_process_iframe.js | Uncaught exception - OperationError: The operation failed for an operation-specific reason

Categories

(Core :: Disability Access APIs, defect, P5)

defect

Tracking

()

People

(Reporter: intermittent-bug-filer, Assigned: hiro)

References

Details

(Keywords: intermittent-failure, leave-open, regression, Whiteboard: [stockwell unknown])

Attachments

(3 files)

Filed by: aciure [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=269191405&repo=autoland
Full log: https://queue.taskcluster.net/v1/task/NmWBB_88T4OccTIfBWqVYg/runs/0/artifacts/public/logs/live_backing.log


[task 2019-10-01T03:30:25.179Z] 03:30:25 INFO - TEST-START | accessible/tests/browser/states/browser_offscreen_element_in_out_of_process_iframe.js
[task 2019-10-01T03:30:26.033Z] 03:30:26 INFO - GECKO(1718) | JavaScript error: chrome://mochitests/content/browser/accessible/tests/browser/states/head.js, line 55: TypeError: acc.getAccessibleFor(...) is null
[task 2019-10-01T03:30:26.033Z] 03:30:26 INFO - GECKO(1718) | JavaScript error: , line 0: NS_ERROR_UNEXPECTED
[task 2019-10-01T03:30:26.033Z] 03:30:26 INFO - GECKO(1718) | JavaScript error: , line 0: OperationError: The operation failed for an operation-specific reason
[task 2019-10-01T03:30:26.078Z] 03:30:26 INFO - TEST-INFO | started process screencapture
[task 2019-10-01T03:30:26.226Z] 03:30:26 INFO - TEST-INFO | screencapture: exit 0
[task 2019-10-01T03:30:26.226Z] 03:30:26 INFO - Buffered messages logged at 03:30:25
[task 2019-10-01T03:30:26.228Z] 03:30:26 INFO - Entering test bound
[task 2019-10-01T03:30:26.228Z] 03:30:26 INFO - Console message: OpenGL compositor Initialized Succesfully.
[task 2019-10-01T03:30:26.228Z] 03:30:26 INFO - Version: 2.1 INTEL-12.9.22
[task 2019-10-01T03:30:26.228Z] 03:30:26 INFO - Vendor: Intel Inc.
[task 2019-10-01T03:30:26.228Z] 03:30:26 INFO - Renderer: Intel Iris OpenGL Engine
[task 2019-10-01T03:30:26.228Z] 03:30:26 INFO - FBO Texture Target: TEXTURE_2D
[task 2019-10-01T03:30:26.229Z] 03:30:26 INFO - Buffered messages logged at 03:30:26
[task 2019-10-01T03:30:26.229Z] 03:30:26 INFO - Console message: [JavaScript Error: "OperationError: The operation failed for an operation-specific reason"]
[task 2019-10-01T03:30:26.229Z] 03:30:26 INFO - Console message: [JavaScript Error: "TypeError: acc.getAccessibleFor(...) is null" {file: "chrome://mochitests/content/browser/accessible/tests/browser/states/head.js" line: 55}]
[task 2019-10-01T03:30:26.229Z] 03:30:26 INFO - Console message: [JavaScript Error: "NS_ERROR_UNEXPECTED"]
[task 2019-10-01T03:30:26.230Z] 03:30:26 INFO - Buffered messages finished
[task 2019-10-01T03:30:26.230Z] 03:30:26 INFO - TEST-UNEXPECTED-FAIL | accessible/tests/browser/states/browser_offscreen_element_in_out_of_process_iframe.js | Uncaught exception - OperationError: The operation failed for an operation-specific reason
[task 2019-10-01T03:30:26.235Z] 03:30:26 INFO - Not taking screenshot here: see the one that was previously logged
[task 2019-10-01T03:30:26.235Z] 03:30:26 INFO - TEST-UNEXPECTED-FAIL | accessible/tests/browser/states/browser_offscreen_element_in_out_of_process_iframe.js | A promise chain failed to handle a rejection: The operation failed for an operation-specific reason - stack: (No stack available.)
[task 2019-10-01T03:30:26.235Z] 03:30:26 INFO - Rejection date: Tue Oct 01 2019 03:30:25 GMT+0000 (Greenwich Mean Time) - false == true - JS frame :: resource://testing-common/PromiseTestUtils.jsm :: assertNoUncaughtRejections :: line 265
[task 2019-10-01T03:30:26.235Z] 03:30:26 INFO - Stack trace:
[task 2019-10-01T03:30:26.235Z] 03:30:26 INFO - resource://testing-common/PromiseTestUtils.jsm:assertNoUncaughtRejections:265
[task 2019-10-01T03:30:26.235Z] 03:30:26 INFO - chrome://mochikit/content/browser-test.js:Tester_execTest/<:1381
[task 2019-10-01T03:30:26.235Z] 03:30:26 INFO - chrome://mochikit/content/browser-test.js:Tester_execTest:1385
[task 2019-10-01T03:30:26.235Z] 03:30:26 INFO - chrome://mochikit/content/browser-test.js:nextTest/<:1213
[task 2019-10-01T03:30:26.235Z] 03:30:26 INFO - chrome://mochikit/content/tests/SimpleTest/SimpleTest.js:SimpleTest.waitForFocus/waitForFocusInner/focusedOrLoaded/<:805
[task 2019-10-01T03:30:26.235Z] 03:30:26 INFO - Leaving test bound
[task 2019-10-01T03:30:26.236Z] 03:30:26 INFO - GECKO(1718) | MEMORY STAT | vsize 7045MB | residentFast 339MB | heapAllocated 155MB
[task 2019-10-01T03:30:26.236Z] 03:30:26 INFO - TEST-OK | accessible/tests/browser/states/browser_offscreen_element_in_out_of_process_iframe.js | took 895ms
[task 2019-10-01T03:30:26.236Z] 03:30:26 INFO - checking window state

Duplicate of this bug: 1580751

Hiro or Surkov, can you take a look? From the logs, it seems the getAccessibleFor call at https://searchfox.org/mozilla-central/rev/17756e2a5c180d980a4b08d99f8cc0c97290ae8d/accessible/tests/browser/states/head.js#41 can return null.

Flags: needinfo?(surkov.alexander)
Flags: needinfo?(hikezoe.birchill)
See Also: → 1588266
See Also: → 1588839

Sure, I can take a look. It seems that we need to make sure the accessibility service has been initialized in the cross-process iframe.

Assignee: nobody → hikezoe.birchill
Flags: needinfo?(hikezoe.birchill)

it's a bit weird that there's no any passing logs [1], the logs starts right from the "acc.getAccessibleFor(...) is null" failure. I suppose though that the first one fails [2], which means waitForIFrameUpdates doesn't always work [3]. Also see discussion at [4], where waitForIFrameUpdates was flaky on osx.

So, in order to make a11y feel ok, I think waitForIFrameUpdates method should also check whether iframe accessible document is in good shape yet (checking BUSY state should be ok, similar to [5]).

[1] https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=269191405&repo=autoland&lineNumber=1095-1096
[2] https://searchfox.org/mozilla-central/source/accessible/tests/browser/states/browser_offscreen_element_in_out_of_process_iframe.js#61
[3] https://searchfox.org/mozilla-central/source/accessible/tests/browser/states/head.js#25
[4] https://phabricator.services.mozilla.com/D44951#inline-274332
[5] https://searchfox.org/mozilla-central/source/accessible/tests/mochitest/common.js#152

Flags: needinfo?(surkov.alexander)

There are 22 total failures in the last 7 days on linux64 opt and asan, linux64-shippable opt and macosx1014-64-shippable opt.

Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=278353409&repo=autoland&lineNumber=1880

[task 2019-11-27T02:29:31.309Z] 02:29:31 INFO - TEST-START | accessible/tests/browser/states/browser_offscreen_element_in_out_of_process_iframe.js
[task 2019-11-27T02:29:31.914Z] 02:29:31 INFO - GECKO(1909) | JavaScript error: chrome://mochitests/content/browser/accessible/tests/browser/states/head.js, line 42: TypeError: acc.getAccessibleFor(...) is null
[task 2019-11-27T02:29:31.914Z] 02:29:31 INFO - GECKO(1909) | JavaScript error: , line 0: NS_ERROR_UNEXPECTED
[task 2019-11-27T02:29:31.915Z] 02:29:31 INFO - GECKO(1909) | JavaScript error: , line 0: OperationError: The operation failed for an operation-specific reason
[task 2019-11-27T02:29:31.935Z] 02:29:31 INFO - TEST-INFO | started process screentopng
[task 2019-11-27T02:29:32.539Z] 02:29:32 INFO - TEST-INFO | screentopng: exit 0
[task 2019-11-27T02:29:32.542Z] 02:29:32 INFO - Buffered messages logged at 02:29:31
[task 2019-11-27T02:29:32.542Z] 02:29:32 INFO - Entering test bound
[task 2019-11-27T02:29:32.542Z] 02:29:32 INFO - Console message: [JavaScript Error: "TypeError: acc.getAccessibleFor(...) is null" {file: "chrome://mochitests/content/browser/accessible/tests/browser/states/head.js" line: 42}]
[task 2019-11-27T02:29:32.542Z] 02:29:32 INFO - Console message: [JavaScript Error: "NS_ERROR_UNEXPECTED"]
[task 2019-11-27T02:29:32.542Z] 02:29:32 INFO - Console message: [JavaScript Error: "OperationError: The operation failed for an operation-specific reason"]
[task 2019-11-27T02:29:32.542Z] 02:29:32 INFO - Buffered messages finished
[task 2019-11-27T02:29:32.545Z] 02:29:32 INFO - TEST-UNEXPECTED-FAIL | accessible/tests/browser/states/browser_offscreen_element_in_out_of_process_iframe.js | Uncaught exception - OperationError: The operation failed for an operation-specific reason
[task 2019-11-27T02:29:32.545Z] 02:29:32 INFO - Not taking screenshot here: see the one that was previously logged
[task 2019-11-27T02:29:32.545Z] 02:29:32 INFO - TEST-UNEXPECTED-FAIL | accessible/tests/browser/states/browser_offscreen_element_in_out_of_process_iframe.js | A promise chain failed to handle a rejection: The operation failed for an operation-specific reason - stack: (No stack available.)
[task 2019-11-27T02:29:32.545Z] 02:29:32 INFO - Rejection date: Wed Nov 27 2019 02:29:31 GMT+0000 (Coordinated Universal Time) - false == true - JS frame :: resource://testing-common/PromiseTestUtils.jsm :: assertNoUncaughtRejections :: line 265
[task 2019-11-27T02:29:32.546Z] 02:29:32 INFO - Stack trace:
[task 2019-11-27T02:29:32.546Z] 02:29:32 INFO - resource://testing-common/PromiseTestUtils.jsm:assertNoUncaughtRejections:265
[task 2019-11-27T02:29:32.546Z] 02:29:32 INFO - chrome://mochikit/content/browser-test.js:Tester_execTest/<:1100
[task 2019-11-27T02:29:32.546Z] 02:29:32 INFO - chrome://mochikit/content/browser-test.js:Tester_execTest:1104
[task 2019-11-27T02:29:32.547Z] 02:29:32 INFO - chrome://mochikit/content/browser-test.js:nextTest/<:932
[task 2019-11-27T02:29:32.547Z] 02:29:32 INFO - chrome://mochikit/content/tests/SimpleTest/SimpleTest.js:SimpleTest.waitForFocus/waitForFocusInner/focusedOrLoaded/<:805
[task 2019-11-27T02:29:32.548Z] 02:29:32 INFO - Leaving test bound
[task 2019-11-27T02:29:32.548Z] 02:29:32 INFO - GECKO(1909) | MEMORY STAT | vsize 2844MB | residentFast 357MB | heapAllocated 158MB
[task 2019-11-27T02:29:32.549Z] 02:29:32 INFO - TEST-OK | accessible/tests/browser/states/browser_offscreen_element_in_out_of_process_iframe.js | took 636ms

Hiroyuki can you assign someone to take a look?

Flags: needinfo?(hikezoe.birchill)
Whiteboard: [stockwell needswork:owner]

The "someone" is me. :)

I started looking into this again, and tried the way what :surkov suggested in comment 6. A key factor is waitForFocus in addA11Loadevent, without waiting for focus, the test fails locally even if we check the busy state for the iframe document. )Note that I can't reproduce the failure locally with the current m-c. )

Flags: needinfo?(hikezoe.birchill)

It didn't fix the failure completely. See https://treeherder.mozilla.org/#/jobs?repo=try&revision=90806830abfdadead49ef9b274dcac5c4cf615e4&selectedJob=278549669

The failure told us that even if we can get the accessible document for the cross-process iframe, we still fail to get the accessible element in question in the document. I will add code to try the accessible element until it succeeds.

(In reply to Hiroyuki Ikezoe (:hiro) from comment #15)

It didn't fix the failure completely. See https://treeherder.mozilla.org/#/jobs?repo=try&revision=90806830abfdadead49ef9b274dcac5c4cf615e4&selectedJob=278549669

The failure told us that even if we can get the accessible document for the cross-process iframe, we still fail to get the accessible element in question in the document. I will add code to try the accessible element until it succeeds.

This worked on a try run.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c6b4acfbe261cb3685a6f9c870d4a1fbb2956e4e

We sometime fail to nsIAccessibilityService.getAccessibleFor to get the
accessible element in out-of-process iframe even if the accessible document of
the iframe is ready to be used, so we try to get the accessible element until we
succeed to work around this issue.

Depends on D55067

Note that D55067 doesn't fix the failure completely but I hope it will mitigate it. So I am going to land it now.

Keywords: leave-open
Pushed by hikezoe.birchill@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/120e92938637
Wait for accessibility document ready in iframe. r=surkov

This actually seems worse now than before... Hiro, are you still working on this?

Flags: needinfo?(hikezoe.birchill)

There are 37 total failures in the last 7 days on linux64 opt, linux64-shippable opt, windows7-32-shippable opt

Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=280073463&repo=mozilla-central&lineNumber=1353

[task 2019-12-06T22:50:07.760Z] 22:50:07 INFO - TEST-START | accessible/tests/browser/states/browser_offscreen_element_in_out_of_process_iframe.js
[task 2019-12-06T22:50:08.181Z] 22:50:08 INFO - GECKO(4988) | JavaScript error: chrome://mochitests/content/browser/accessible/tests/browser/states/head.js, line 71: TypeError: acc.getAccessibleFor(...) is null
[task 2019-12-06T22:50:08.181Z] 22:50:08 INFO - GECKO(4988) | JavaScript error: , line 0: NS_ERROR_UNEXPECTED
[task 2019-12-06T22:50:08.186Z] 22:50:08 INFO - GECKO(4988) | JavaScript error: , line 0: OperationError: The operation failed for an operation-specific reason
[task 2019-12-06T22:50:08.225Z] 22:50:08 INFO - TEST-INFO | started process screenshot
[task 2019-12-06T22:50:08.304Z] 22:50:08 INFO - TEST-INFO | screenshot: exit 0
[task 2019-12-06T22:50:08.304Z] 22:50:08 INFO - Buffered messages logged at 22:50:07
[task 2019-12-06T22:50:08.304Z] 22:50:08 INFO - Entering test bound
[task 2019-12-06T22:50:08.304Z] 22:50:08 INFO - Buffered messages logged at 22:50:08
[task 2019-12-06T22:50:08.304Z] 22:50:08 INFO - must wait for focus
[task 2019-12-06T22:50:08.304Z] 22:50:08 INFO - Console message: [JavaScript Error: "TypeError: acc.getAccessibleFor(...) is null" {file: "chrome://mochitests/content/browser/accessible/tests/browser/states/head.js" line: 71}]
[task 2019-12-06T22:50:08.305Z] 22:50:08 INFO - Console message: [JavaScript Error: "NS_ERROR_UNEXPECTED"]
[task 2019-12-06T22:50:08.305Z] 22:50:08 INFO - Console message: [JavaScript Error: "OperationError: The operation failed for an operation-specific reason"]
[task 2019-12-06T22:50:08.305Z] 22:50:08 INFO - Buffered messages finished
[task 2019-12-06T22:50:08.305Z] 22:50:08 INFO - TEST-UNEXPECTED-FAIL | accessible/tests/browser/states/browser_offscreen_element_in_out_of_process_iframe.js | Uncaught exception - OperationError: The operation failed for an operation-specific reason
[task 2019-12-06T22:50:08.305Z] 22:50:08 INFO - Not taking screenshot here: see the one that was previously logged
[task 2019-12-06T22:50:08.305Z] 22:50:08 INFO - TEST-UNEXPECTED-FAIL | accessible/tests/browser/states/browser_offscreen_element_in_out_of_process_iframe.js | A promise chain failed to handle a rejection: The operation failed for an operation-specific reason - stack: (No stack available.)
[task 2019-12-06T22:50:08.305Z] 22:50:08 INFO - Rejection date: Fri Dec 06 2019 22:50:08 GMT+0000 (Greenwich Mean Time) - false == true - JS frame :: resource://testing-common/PromiseTestUtils.jsm :: assertNoUncaughtRejections :: line 265
[task 2019-12-06T22:50:08.305Z] 22:50:08 INFO - Stack trace:
[task 2019-12-06T22:50:08.306Z] 22:50:08 INFO - resource://testing-common/PromiseTestUtils.jsm:assertNoUncaughtRejections:265
[task 2019-12-06T22:50:08.306Z] 22:50:08 INFO - chrome://mochikit/content/browser-test.js:Tester_execTest/<:1100
[task 2019-12-06T22:50:08.306Z] 22:50:08 INFO - chrome://mochikit/content/browser-test.js:Tester_execTest:1104
[task 2019-12-06T22:50:08.306Z] 22:50:08 INFO - chrome://mochikit/content/browser-test.js:nextTest/<:932
[task 2019-12-06T22:50:08.306Z] 22:50:08 INFO - chrome://mochikit/content/tests/SimpleTest/SimpleTest.js:SimpleTest.waitForFocus/waitForFocusInner/focusedOrLoaded/<:805
[task 2019-12-06T22:50:08.306Z] 22:50:08 INFO - Leaving test bound
[task 2019-12-06T22:50:08.306Z] 22:50:08 INFO - GECKO(4988) | MEMORY STAT | vsize 2104174MB | vsizeMaxContiguous 66103308MB | residentFast 310MB | heapAllocated 147MB
[task 2019-12-06T22:50:08.306Z] 22:50:08 INFO - TEST-OK | accessible/tests/browser/states/browser_offscreen_element_in_out_of_process_iframe.js | took 496ms

There seems to be a spike since 1st of December, however so far there are no failures since the 8th of December.

Whiteboard: [stockwell unknown] → [stockwell needswork:owner]

I have no further idea other than D55068. It would be nice someone who is familiar with accessibility code looks into this.

Flags: needinfo?(hikezoe.birchill) → needinfo?(surkov.alexander)

(In reply to Hiroyuki Ikezoe (:hiro) from comment #26)

I have no further idea other than D55068. It would be nice someone who is familiar with accessibility code looks into this.

Does suggestion from https://phabricator.services.mozilla.com/D55068#1685788 not work?

If not, then I'd delegate the thing to Jamie, whether the approach form the patch should be taken or should be investigated further (note, it should work, just not optimal).

Flags: needinfo?(surkov.alexander)
Flags: needinfo?(jteh)
Flags: needinfo?(hikezoe.birchill)

It doesn't fix the intermittent failure. It's been already replaced.

Flags: needinfo?(hikezoe.birchill)

I think it'd make sense to tweak this test so that it observes a11y events from the parent process, rather than trying to do it from the child. That's what we do in most of our other e10s/Fission tests. That should make the flow simpler, as well as reducing some potential race conditions. Yura, would you mind taking a look at this when you get a chance?

Flags: needinfo?(jteh) → needinfo?(yzenevich)

(In reply to James Teh [:Jamie] from comment #29)

I think it'd make sense to tweak this test so that it observes a11y events from the parent process, rather than trying to do it from the child. That's what we do in most of our other e10s/Fission tests. That should make the flow simpler, as well as reducing some potential race conditions. Yura, would you mind taking a look at this when you get a chance?

Out of curiosity could you elaborate how listening events in parent process could help? My understanding (per D55068) is there's no accessible for an element in a child process when the test tries to access to it, but its document is not busy (so it's constructed) and there's no DOM/layout mutations, which has to guarantee the element is accessible, and - I must be wrong - but it seems there's no direct correlation with parent/child process architecture, and thus it's not clear how moving the logic into parent may help (extra timing perhaps?).

(In reply to alexander :surkov (:asurkov) from comment #30)

Out of curiosity could you elaborate how listening events in parent process could help?

You're right: it won't help directly. However, the flow as it stands is somewhat hard to follow, and it's also inconsistent with the way we handle this kind of thing in other tests. That means we need to debug the utility functions used only for this test and makes it hard to diagnose/predict possible race conditions, etc.

That said, looking at this again, I just spotted this:
https://searchfox.org/mozilla-central/rev/6305f6935f496b3a302c7afcc579399a4217729c/accessible/tests/browser/states/head.js#26
The Promise gets created, but not returned or awaited. I think that means that waitForReady will just resolve immediately, so we don't actually wait until the busy state gets cleared. That's perhaps the cause of the problem here.

(In reply to James Teh [:Jamie] from comment #31)

(In reply to alexander :surkov (:asurkov) from comment #30)

Out of curiosity could you elaborate how listening events in parent process could help?

You're right: it won't help directly. However, the flow as it stands is somewhat hard to follow, and it's also inconsistent with the way we handle this kind of thing in other tests. That means we need to debug the utility functions used only for this test and makes it hard to diagnose/predict possible race conditions, etc.

fair enough

That said, looking at this again, I just spotted this:
https://searchfox.org/mozilla-central/rev/6305f6935f496b3a302c7afcc579399a4217729c/accessible/tests/browser/states/head.js#26
The Promise gets created, but not returned or awaited. I think that means that waitForReady will just resolve immediately, so we don't actually wait until the busy state gets cleared. That's perhaps the cause of the problem here.

You're right. Hiro, will you be about to check this finding out anytime soon?

Flags: needinfo?(hikezoe.birchill)

Gosh! Thanks for finding it!

I just pushed a try with an await there.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=84e3da21e477e814ec833e7c1801f71261f6057a

Flags: needinfo?(hikezoe.birchill)

Though the try result looks great, but I just realized this intermittent failure has stopped since Dec. 7th. I don't know what has changed.

Cancelling NI Yura for now, since hopefully this will be a simple fix.

Flags: needinfo?(yzenevich)
Attachment #9116902 - Attachment description: Bug 1585239 - Do `await` the Promise for the accessibility document. r?surkov → Bug 1585239 - Return the Promise for the accessibility document in the function runs in the iframe context. r?surkov
Pushed by hikezoe.birchill@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5e8b48c8cd93
Return the Promise for the accessibility document in the function runs in the iframe context. r=surkov
You need to log in before you can comment on or make changes to this bug.