Closed Bug 1754332 Opened 2 years ago Closed 2 years ago

Intermittent org.mozilla.geckoview.test.PanZoomControllerTest#touchEventForResultWithPreventDefault | org.mozilla.geckoview.test.util.UiThreadUtils$TimeoutException: Timed out after 30000ms

Categories

(GeckoView :: General, defect, P1)

Unspecified
All

Tracking

(firefox-esr91 unaffected, firefox97 unaffected, firefox98 unaffected, firefox99 fixed)

RESOLVED FIXED
99 Branch
Tracking Status
firefox-esr91 --- unaffected
firefox97 --- unaffected
firefox98 --- unaffected
firefox99 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: jnicol)

References

(Regression)

Details

(Keywords: intermittent-failure, regression, Whiteboard: [geckoview:m99][stockwell unknown])

Attachments

(1 file)

Filed by: mlaza [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=367152305&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/e0BXvpb7Qm2MvKMt4H43Jw/runs/0/artifacts/public/logs/live_backing.log


[task 2022-02-09T00:19:52.174Z] 00:19:52     INFO -  TEST-START | org.mozilla.geckoview.test.PanZoomControllerTest#touchEventForResultWithPreventDefault
[task 2022-02-09T00:20:22.673Z] 00:20:22     INFO -  org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: numtests=875
[task 2022-02-09T00:20:22.673Z] 00:20:22     INFO -  org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: stream=
[task 2022-02-09T00:20:22.673Z] 00:20:22     INFO -  org.mozilla.geckoview.test | Error in touchEventForResultWithPreventDefault(org.mozilla.geckoview.test.PanZoomControllerTest):
[task 2022-02-09T00:20:22.675Z] 00:20:22     INFO -  org.mozilla.geckoview.test | org.mozilla.geckoview.test.util.UiThreadUtils$TimeoutException: Timed out after 30000ms
[task 2022-02-09T00:20:22.675Z] 00:20:22     INFO -  org.mozilla.geckoview.test | 	at org.mozilla.geckoview.test.util.UiThreadUtils$TimeoutRunnable.run(UiThreadUtils.java:49)
[task 2022-02-09T00:20:22.675Z] 00:20:22     INFO -  org.mozilla.geckoview.test | 	at android.os.Handler.handleCallback(Handler.java:751)
[task 2022-02-09T00:20:22.675Z] 00:20:22     INFO -  org.mozilla.geckoview.test | 	at android.os.Handler.dispatchMessage(Handler.java:95)
[task 2022-02-09T00:20:22.675Z] 00:20:22     INFO -  org.mozilla.geckoview.test | 	at org.mozilla.geckoview.test.util.UiThreadUtils.waitForCondition(UiThreadUtils.java:158)
[task 2022-02-09T00:20:22.675Z] 00:20:22     INFO -  org.mozilla.geckoview.test | 	at org.mozilla.geckoview.test.util.UiThreadUtils.waitForResult(UiThreadUtils.java:79)
[task 2022-02-09T00:20:22.675Z] 00:20:22     INFO -  org.mozilla.geckoview.test | 	at org.mozilla.geckoview.test.rule.GeckoSessionTestRule.waitForResult(GeckoSessionTestRule.java:2602)
[task 2022-02-09T00:20:22.675Z] 00:20:22     INFO -  org.mozilla.geckoview.test | 	at org.mozilla.geckoview.test.PanZoomControllerTest.touchEventForResultWithPreventDefault(PanZoomControllerTest.kt:442)
[task 2022-02-09T00:20:22.675Z] 00:20:22     INFO -  org.mozilla.geckoview.test |
[task 2022-02-09T00:20:22.675Z] 00:20:22     INFO -  org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: id=AndroidJUnitRunner
[task 2022-02-09T00:20:22.675Z] 00:20:22     INFO -  org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: test=touchEventForResultWithPreventDefault
[task 2022-02-09T00:20:22.675Z] 00:20:22     INFO -  org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: class=org.mozilla.geckoview.test.PanZoomControllerTest
[task 2022-02-09T00:20:22.675Z] 00:20:22     INFO -  org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: stack=org.mozilla.geckoview.test.util.UiThreadUtils$TimeoutException: Timed out after 30000ms
[task 2022-02-09T00:20:22.675Z] 00:20:22     INFO -  org.mozilla.geckoview.test | 	at org.mozilla.geckoview.test.util.UiThreadUtils$TimeoutRunnable.run(UiThreadUtils.java:49)
[task 2022-02-09T00:20:22.675Z] 00:20:22     INFO -  org.mozilla.geckoview.test | 	at android.os.Handler.handleCallback(Handler.java:751)
[task 2022-02-09T00:20:22.675Z] 00:20:22     INFO -  org.mozilla.geckoview.test | 	at android.os.Handler.dispatchMessage(Handler.java:95)
[task 2022-02-09T00:20:22.675Z] 00:20:22     INFO -  org.mozilla.geckoview.test | 	at org.mozilla.geckoview.test.util.UiThreadUtils.waitForCondition(UiThreadUtils.java:158)
[task 2022-02-09T00:20:22.675Z] 00:20:22     INFO -  org.mozilla.geckoview.test | 	at org.mozilla.geckoview.test.util.UiThreadUtils.waitForResult(UiThreadUtils.java:79)
[task 2022-02-09T00:20:22.675Z] 00:20:22     INFO -  org.mozilla.geckoview.test | 	at org.mozilla.geckoview.test.rule.GeckoSessionTestRule.waitForResult(GeckoSessionTestRule.java:2602)
[task 2022-02-09T00:20:22.675Z] 00:20:22     INFO -  org.mozilla.geckoview.test | 	at org.mozilla.geckoview.test.PanZoomControllerTest.touchEventForResultWithPreventDefault(PanZoomControllerTest.kt:442)
[task 2022-02-09T00:20:22.675Z] 00:20:22     INFO -  org.mozilla.geckoview.test |
[task 2022-02-09T00:20:22.675Z] 00:20:22     INFO -  org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: current=424
[task 2022-02-09T00:20:22.676Z] 00:20:22     INFO -  org.mozilla.geckoview.test | INSTRUMENTATION_STATUS_CODE: -2
[task 2022-02-09T00:20:22.676Z] 00:20:22     INFO -  Printing logcat for test:
[task 2022-02-09T00:20:22.749Z] 00:20:22     INFO -  02-09 00:19:51.401 E/GeckoSessionTestRule( 3484): test_start 1f0befec-3ff2-40ff-89cf-b127eb38b1ec touchEventForResultWithPreventDefault(org.mozilla.geckoview.test.PanZoomControllerTest)
[task 2022-02-09T00:20:22.749Z] 00:20:22     INFO -  02-09 00:19:51.401 E/GeckoSessionTestRule( 3484): before prepareStatement touchEventForResultWithPreventDefault(org.mozilla.geckoview.test.PanZoomControllerTest)
[task 2022-02-09T00:20:22.750Z] 00:20:22     INFO -  02-09 00:19:51.403 D/GeckoViewStartup( 3484): onEvent GeckoView:StorageDelegate:Attached
...
[task 2022-02-09T00:20:22.792Z] 00:20:22     INFO -  02-09 00:20:21.890 E/GeckoSessionTestRule( 3484): 	at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:866)
[task 2022-02-09T00:20:22.792Z] 00:20:22     INFO -  02-09 00:20:21.890 E/GeckoSessionTestRule( 3484): 	at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:756)
[task 2022-02-09T00:20:22.792Z] 00:20:22     INFO -  02-09 00:20:21.891 D/EGL_emulation(14674): eglMakeCurrent: 0x72373707be00: ver 3 0 (tinfo 0x72373711d0e0)
[task 2022-02-09T00:20:22.792Z] 00:20:22     INFO -  02-09 00:20:21.894 D/GeckoViewContent( 3484): handleEvent: DOMWindowClose
[task 2022-02-09T00:20:22.792Z] 00:20:22     INFO -  02-09 00:20:21.895 D/GeckoViewConsole( 3484): onEvent GeckoView:WebExtension:List null
[task 2022-02-09T00:20:22.793Z] 00:20:22     INFO -  02-09 00:20:21.897 E/GeckoSessionTestRule( 3484): test_end c5ee677f-bc83-49bd-9e28-2d35f3d0f059 touchEventForResultWithPreventDefault(org.mozilla.geckoview.test.PanZoomControllerTest)
[task 2022-02-09T00:20:22.793Z] 00:20:22  WARNING -  TEST-UNEXPECTED-FAIL | org.mozilla.geckoview.test.PanZoomControllerTest#touchEventForResultWithPreventDefault | org.mozilla.geckoview.test.util.UiThreadUtils$TimeoutException: Timed out after 30000ms
[task 2022-02-09T00:20:22.793Z] 00:20:22     INFO -  TEST-INFO took 30575ms

resetting priority to reexamine in triage. Huge spike in failures over the last week

Priority: P5 → --

Can you take a look and see if you can find out what caused this to fail?

Flags: needinfo?(agi)

Looks like this started with the changes in Bug 1754129, retriggers here.

Flags: needinfo?(hikezoe.birchill)
Regressed by: 1754129

Nevermind the above comment, it's from Bug 1331109, retriggers here.

Flags: needinfo?(hikezoe.birchill) → needinfo?(jnicol)
Regressed by: 1331109
No longer regressed by: 1754129

Set release status flags based on info from the regressing bug 1331109

Has Regression Range: --- → yes

Thank you Cosmin!

Flags: needinfo?(agi)

Update:
There have been 70 failures within the last 7 days:
• 3 failures on Android 7.0 x86-64 Lite WebRender debug
• 12 failures on Android 7.0 x86-64 Lite WebRender opt
• 4 failures on Android 7.0 x86-64 WebRender debug-isolated-process
• 6 failures on Android 7.0 x86-64 WebRender debug
• 32 failures on Android 7.0 x86-64 WebRender opt
• 6 failures on Android 7.0 x86-64 Lite WebRender Shippable opt
• 7 failures on Android 7.0 x86-64 Shippable WebRender opt
Recent failure log: https://treeherder.mozilla.org/logviewer?job_id=367615340&repo=autoland&lineNumber=9940

[task 2022-02-11T21:28:35.434Z] 21:28:35     INFO -  02-11 21:28:19.478 D/GeckoViewProgress( 3481): receiveMessage: MozAfterPaint
[task 2022-02-11T21:28:35.434Z] 21:28:35     INFO -  02-11 21:28:34.423 E/GeckoSessionTestRule( 3481): Error
[task 2022-02-11T21:28:35.434Z] 21:28:35     INFO -  02-11 21:28:34.423 E/GeckoSessionTestRule( 3481): org.mozilla.geckoview.test.util.UiThreadUtils$TimeoutException: Timed out after 30000ms
[task 2022-02-11T21:28:35.434Z] 21:28:35     INFO -  02-11 21:28:34.423 E/GeckoSessionTestRule( 3481): 	at org.mozilla.geckoview.test.util.UiThreadUtils$TimeoutRunnable.run(UiThreadUtils.java:49)
[task 2022-02-11T21:28:35.434Z] 21:28:35     INFO -  02-11 21:28:34.423 E/GeckoSessionTestRule( 3481): 	at android.os.Handler.handleCallback(Handler.java:751)
[task 2022-02-11T21:28:35.434Z] 21:28:35     INFO -  02-11 21:28:34.423 E/GeckoSessionTestRule( 3481): 	at android.os.Handler.dispatchMessage(Handler.java:95)
[task 2022-02-11T21:28:35.434Z] 21:28:35     INFO -  02-11 21:28:34.423 E/GeckoSessionTestRule( 3481): 	at org.mozilla.geckoview.test.util.UiThreadUtils.waitForCondition(UiThreadUtils.java:158)
[task 2022-02-11T21:28:35.435Z] 21:28:35     INFO -  02-11 21:28:34.423 E/GeckoSessionTestRule( 3481): 	at org.mozilla.geckoview.test.util.UiThreadUtils.waitForResult(UiThreadUtils.java:79)
[task 2022-02-11T21:28:35.435Z] 21:28:35     INFO -  02-11 21:28:34.423 E/GeckoSessionTestRule( 3481): 	at org.mozilla.geckoview.test.rule.GeckoSessionTestRule.waitForResult(GeckoSessionTestRule.java:2602)
[task 2022-02-11T21:28:35.435Z] 21:28:35     INFO -  02-11 21:28:34.423 E/GeckoSessionTestRule( 3481): 	at org.mozilla.geckoview.test.PanZoomControllerTest.touchEventForResultWithPreventDefault(PanZoomControllerTest.kt:442)
[task 2022-02-11T21:28:35.435Z] 21:28:35     INFO -  02-11 21:28:34.423 E/GeckoSessionTestRule( 3481): 	at java.lang.reflect.Method.invoke(Native Method)
[task 2022-02-11T21:28:35.435Z] 21:28:35     INFO -  02-11 21:28:34.423 E/GeckoSessionTestRule( 3481): 	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
[task 2022-02-11T21:28:35.435Z] 21:28:35     INFO -  02-11 21:28:34.423 E/GeckoSessionTestRule( 3481): 	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
[task 2022-02-11T21:28:35.435Z] 21:28:35     INFO -  02-11 21:28:34.423 E/GeckoSessionTestRule( 3481): 	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
[task 2022-02-11T21:28:35.435Z] 21:28:35     INFO -  02-11 21:28:34.423 E/GeckoSessionTestRule( 3481): 	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
[task 2022-02-11T21:28:35.435Z] 21:28:35     INFO -  02-11 21:28:34.423 E/GeckoSessionTestRule( 3481): 	at org.junit.rules.Verifier$1.evaluate(Verifier.java:35)
[task 2022-02-11T21:28:35.436Z] 21:28:35     INFO -  02-11 21:28:34.423 E/GeckoSessionTestRule( 3481): 	at org.mozilla.geckoview.test.rule.GeckoSessionTestRule$2.lambda$evaluate$0$org-mozilla-geckoview-test-rule-GeckoSessionTestRule$2(GeckoSessionTestRule.java:1451)
[task 2022-02-11T21:28:35.436Z] 21:28:35     INFO -  02-11 21:28:34.423 E/GeckoSessionTestRule( 3481): 	at org.mozilla.geckoview.test.rule.GeckoSessionTestRule$2$$ExternalSyntheticLambda0.run(Unknown Source)
[task 2022-02-11T21:28:35.436Z] 21:28:35     INFO -  02-11 21:28:34.423 E/GeckoSessionTestRule( 3481): 	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:428)
[task 2022-02-11T21:28:35.436Z] 21:28:35     INFO -  02-11 21:28:34.423 E/GeckoSessionTestRule( 3481): 	at java.util.concurrent.FutureTask.run(FutureTask.java:237)
[task 2022-02-11T21:28:35.436Z] 21:28:35     INFO -  02-11 21:28:34.423 E/GeckoSessionTestRule( 3481): 	at android.app.Instrumentation$SyncRunnable.run(Instrumentation.java:1950)
[task 2022-02-11T21:28:35.436Z] 21:28:35     INFO -  02-11 21:28:34.423 E/GeckoSessionTestRule( 3481): 	at android.os.Handler.handleCallback(Handler.java:751)
[task 2022-02-11T21:28:35.436Z] 21:28:35     INFO -  02-11 21:28:34.423 E/GeckoSessionTestRule( 3481): 	at android.os.Handler.dispatchMessage(Handler.java:95)
[task 2022-02-11T21:28:35.436Z] 21:28:35     INFO -  02-11 21:28:34.423 E/GeckoSessionTestRule( 3481): 	at android.os.Looper.loop(Looper.java:154)
[task 2022-02-11T21:28:35.436Z] 21:28:35     INFO -  02-11 21:28:34.423 E/GeckoSessionTestRule( 3481): 	at android.app.ActivityThread.main(ActivityThread.java:6077)
[task 2022-02-11T21:28:35.436Z] 21:28:35     INFO -  02-11 21:28:34.423 E/GeckoSessionTestRule( 3481): 	at java.lang.reflect.Method.invoke(Native Method)
[task 2022-02-11T21:28:35.436Z] 21:28:35     INFO -  02-11 21:28:34.423 E/GeckoSessionTestRule( 3481): 	at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:866)
[task 2022-02-11T21:28:35.436Z] 21:28:35     INFO -  02-11 21:28:34.423 E/GeckoSessionTestRule( 3481): 	at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:756)
[task 2022-02-11T21:28:35.437Z] 21:28:35     INFO -  02-11 21:28:34.424 D/EGL_emulation(14712): eglMakeCurrent: 0x722e9b27bae0: ver 3 0 (tinfo 0x722e90bc0b80)
[task 2022-02-11T21:28:35.437Z] 21:28:35     INFO -  02-11 21:28:34.427 D/GeckoViewContent( 3481): handleEvent: DOMWindowClose
[task 2022-02-11T21:28:35.437Z] 21:28:35     INFO -  02-11 21:28:34.428 D/GeckoViewConsole( 3481): onEvent GeckoView:WebExtension:List null
[task 2022-02-11T21:28:35.437Z] 21:28:35     INFO -  02-11 21:28:34.431 E/GeckoSessionTestRule( 3481): test_end c5ee677f-bc83-49bd-9e28-2d35f3d0f059 touchEventForResultWithPreventDefault(org.mozilla.geckoview.test.PanZoomControllerTest)
[task 2022-02-11T21:28:35.437Z] 21:28:35  WARNING -  TEST-UNEXPECTED-FAIL | org.mozilla.geckoview.test.PanZoomControllerTest#touchEventForResultWithPreventDefault | org.mozilla.geckoview.test.util.UiThreadUtils$TimeoutException: Timed out after 30000ms
[task 2022-02-11T21:28:35.437Z] 21:28:35     INFO -  TEST-INFO took 31202ms
[task 2022-02-11T21:28:35.437Z] 21:28:35     INFO -  org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: numtests=875
[task 2022-02-11T21:28:35.437Z] 21:28:35     INFO -  org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: stream=
[task 2022-02-11T21:28:35.437Z] 21:28:35     INFO -  org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: id=AndroidJUnitRunner
[task 2022-02-11T21:28:35.437Z] 21:28:35     INFO -  org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: test=scrollToHorizontalAuto
[task 2022-02-11T21:28:35.437Z] 21:28:35     INFO -  org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: class=org.mozilla.geckoview.test.PanZoomControllerTest
[task 2022-02-11T21:28:35.437Z] 21:28:35     INFO -  org.mozilla.geckoview.test | INSTRUMENTATION_STATUS: current=425
[task 2022-02-11T21:28:35.438Z] 21:28:35     INFO -  org.mozilla.geckoview.test | INSTRUMENTATION_STATUS_CODE: 1
[task 2022-02-11T21:28:35.438Z] 21:28:35     INFO -  TEST-START | org.mozilla.geckoview.test.PanZoomControllerTest#scrollToHorizontalAuto
Whiteboard: [stockwell needswork:owner]

Phew. :)

That said, I will have to jump in this failure since I wrote the test.

Priority: -- → P1
Whiteboard: [stockwell needswork:owner] → [stockwell needswork:owner][geckoview:m99]

Admittedly I don't really understand what this test is doing, but I think I can see what's going wrong.

The test spoofs an input event which leads to us calling IAPZCTreeManager::AddInputBlockCallback, and we time out waiting for the callback to be fired. What's happening is that InputQueue::ContentReceivedInputBlock() is being called before InputQueue::AddInputBlockCallback. This means when we ProcessQueue() at the end of ContentReceivedInputBlock() there is no callback to fire. When AddInputBlockCallback() is eventually called it is too late, the InputBlockState has already been removed from mQueuedInputs, so the callback will never be fired.

The GPU process means that APZCTreeManager is cross-process, which adds enough latency to AddInputBlockCallback that sometimes this occurs. I guess it was theoretically possible before, though.

I'll try to figure out the best way to avoid this, though perhaps Botond might have a suggestion?

Flags: needinfo?(jnicol) → needinfo?(botond)

Thanks for the diagnosis, Jamie! I agree this is potentially a real bug.

I can think of two possible solution approaches:

Approach 1

If no input block callback is found here, save the input block id --> APZHandledResult mapping in a map mInputBlockResults.

In AddInputBlockCallback(), first check mInputBlockResults for a mapping, and if one is found, invoke the callback right away and remove the mapping. (This handles the case where the AddInputBlockCallback() happens after the processing of the input block.) If not, add the callback to mInputBlockCallbacks as today.

We'd probably want some mechanism to expire old mappings from mInputBlockResults for cases where the caller never installs an input block callback. Possibly we could store a timestamp as part of the mapping, and periodically (e.g. on every new input block?) remove mappings older than a certain age.

We can restrict the use of mInputBlockResults to Android and/or to touch events to avoid unnecessary footprint.

Approach 2

Combine ReceiveInputEvent and AddInputBlockCallback into a single atomic operation. This would involve ReceiveInputBlock taking an optional input block callback parameter.

The logic of whether the input block callback needs to be actually installed (e.g. checks like this and this) would need to move into the implementation of ReceiveInputEvent. Importantly, the two IPC messages PAPZInputBridgeChild::ReceiveMultiTouchInputEvent and PAPZCTreeManagerChild::AddInputBlockCallback would be collapsed into one, ensuring that the input block can't be processed in between the two. I believe this would ensure the race that is at issue here never happens.

Flags: needinfo?(botond)

I had thought of approach 1 myself, wasn't sure about the best approach for expiring the old results. But I like approach 2 much better, seems more explicit and robust. Will give that a go. Thanks Botond!

Rather than using a separate function
APZCTreeManager::AddInputBlockCallback(), allow an optional callback
argument to be passed to APZInputBridge::ReceiveInputEvent().

This avoids a race condition where the input block is processed before
the callback has a chance to be added to the input queue, meaning the
callback will never be fired. With the GPU process enabled the added
latency of adding the callback cross-process meant this occured
frequently, causing intermittent junit test failures.

This works similarily to before, with the in-process APZCTreeManager
implementation simply calling InputQueue::AddInputBlockCallback() and
the InputQueue will fire the callback when the input block has been
processed. For the cross-process implementation, APZInputBridgeChild
maintains a local map of the callbacks, and APZInputBridgeParent adds
an intermediate callback to its local InputQueue. This intermediate
callback will call APZInputBridgeParent::SendCallInputBlockCallback(),
which tells the APZInputBridgeChild to fire the real callback.

Care must be taken in both APZInputBridgeChild and Parent to only add
their respective callbacks if we determine that it is required, eg not
already handled by the root APZ.

Assignee: nobody → jnicol
Status: NEW → ASSIGNED
Pushed by jnicol@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/1c7fabe9662f
Add callback argument to APZInputBridge::ReceiveInputEvent. r=botond,geckoview-reviewers,agi
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 99 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: