Closed Bug 1780229 Opened 2 years ago Closed 2 years ago

Intermittent toolkit/components/extensions/test/xpcshell/test_ext_native_messaging.js | test_pending_sendNativeMessageReply_resets_bgscript_idle_timeout - [test_pending_sendNativeMessageReply_resets_bgscript_idle_timeout : 88] Extension left running at

Categories

(WebExtensions :: General, defect, P5)

defect

Tracking

(firefox-esr91 unaffected, firefox-esr102 unaffected, firefox102 unaffected, firefox103 unaffected, firefox104 fixed)

RESOLVED FIXED
104 Branch
Tracking Status
firefox-esr91 --- unaffected
firefox-esr102 --- unaffected
firefox102 --- unaffected
firefox103 --- unaffected
firefox104 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: rpl)

References

(Regression)

Details

(Keywords: intermittent-failure, regression)

Attachments

(1 file)

Filed by: abutkovits [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=384789404&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/DjedsPHTSXa0gQELQUy52g/runs/0/artifacts/public/logs/live_backing.log
Reftest URL: https://hg.mozilla.org/mozilla-central/raw-file/tip/layout/tools/reftest/reftest-analyzer.xhtml#logurl=https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/DjedsPHTSXa0gQELQUy52g/runs/0/artifacts/public/logs/live_backing.log&only_show_unexpected=1


[task 2022-07-19T14:14:15.268Z] 14:14:15     INFO -  TEST-PASS | toolkit/components/extensions/test/xpcshell/test_ext_native_messaging.js | test_pending_sendNativeMessageReply_resets_bgscript_idle_timeout - [test_pending_sendNativeMessageReply_resets_bgscript_idle_timeout : 831] Initial background context is still available as expected - 66 == 66
[task 2022-07-19T14:14:15.268Z] 14:14:15     INFO -  "wait for NativeApp to reply to all messages sent"
[task 2022-07-19T14:14:15.269Z] 14:14:15  WARNING -  TEST-UNEXPECTED-FAIL | toolkit/components/extensions/test/xpcshell/test_ext_native_messaging.js | test_pending_sendNativeMessageReply_resets_bgscript_idle_timeout - [test_pending_sendNativeMessageReply_resets_bgscript_idle_timeout : 854] Got the expected reply for the first message sent - "echo-all-queued-messages" == "delayed-echo"
[task 2022-07-19T14:14:15.269Z] 14:14:15     INFO -  /builds/worker/workspace/build/tests/xpcshell/tests/toolkit/components/extensions/test/xpcshell/test_ext_native_messaging.js:testSendNativeMessage:854
[task 2022-07-19T14:14:15.269Z] 14:14:15     INFO -  /builds/worker/workspace/build/tests/xpcshell/head.js:_do_main:238
[task 2022-07-19T14:14:15.270Z] 14:14:15     INFO -  /builds/worker/workspace/build/tests/xpcshell/head.js:_execute_test:595
[task 2022-07-19T14:14:15.270Z] 14:14:15     INFO -  -e:null:1
[task 2022-07-19T14:14:15.270Z] 14:14:15     INFO -  exiting test
[task 2022-07-19T14:14:15.271Z] 14:14:15     INFO -  "Clearing pref "extensions.eventPages.enabled""
[task 2022-07-19T14:14:15.271Z] 14:14:15     INFO -  Unexpected exception NS_ERROR_ABORT:
[task 2022-07-19T14:14:15.272Z] 14:14:15     INFO -  _abort_failed_test@/builds/worker/workspace/build/tests/xpcshell/head.js:866:20
[task 2022-07-19T14:14:15.272Z] 14:14:15     INFO -  do_report_result@/builds/worker/workspace/build/tests/xpcshell/head.js:967:5
[task 2022-07-19T14:14:15.273Z] 14:14:15     INFO -  Assert<@/builds/worker/workspace/build/tests/xpcshell/head.js:73:21
[task 2022-07-19T14:14:15.273Z] 14:14:15     INFO -  proto.report@resource://testing-common/Assert.jsm:228:10
[task 2022-07-19T14:14:15.274Z] 14:14:15     INFO -  equal@resource://testing-common/Assert.jsm:270:8
[task 2022-07-19T14:14:15.274Z] 14:14:15     INFO -  testSendNativeMessage@/builds/worker/workspace/build/tests/xpcshell/tests/toolkit/components/extensions/test/xpcshell/test_ext_native_messaging.js:854:8
[task 2022-07-19T14:14:15.274Z] 14:14:15     INFO -  _do_main@/builds/worker/workspace/build/tests/xpcshell/head.js:238:6
[task 2022-07-19T14:14:15.275Z] 14:14:15     INFO -  _execute_test@/builds/worker/workspace/build/tests/xpcshell/head.js:595:5
[task 2022-07-19T14:14:15.275Z] 14:14:15     INFO -  @-e:1:1
[task 2022-07-19T14:14:15.275Z] 14:14:15     INFO -  exiting test
[task 2022-07-19T14:14:15.276Z] 14:14:15  WARNING -  TEST-UNEXPECTED-FAIL | toolkit/components/extensions/test/xpcshell/test_ext_native_messaging.js | test_pending_sendNativeMessageReply_resets_bgscript_idle_timeout - [test_pending_sendNativeMessageReply_resets_bgscript_idle_timeout : 88] Extension left running at test shutdown - "running" == "unloaded"
[task 2022-07-19T14:14:15.276Z] 14:14:15     INFO -  resource://testing-common/ExtensionXPCShellUtils.jsm:ExtensionWrapper/<:88
[task 2022-07-19T14:14:15.277Z] 14:14:15     INFO -  /builds/worker/workspace/build/tests/xpcshell/head.js:_execute_test/<:659
[task 2022-07-19T14:14:15.277Z] 14:14:15     INFO -  /builds/worker/workspace/build/tests/xpcshell/head.js:_execute_test:668
[task 2022-07-19T14:14:15.277Z] 14:14:15     INFO -  -e:null:1
[task 2022-07-19T14:14:15.278Z] 14:14:15     INFO -  exiting test
[task 2022-07-19T14:14:15.278Z] 14:14:15     INFO -  NS_ERROR_ABORT:
[task 2022-07-19T14:14:15.278Z] 14:14:15     INFO -  _abort_failed_test@/builds/worker/workspace/build/tests/xpcshell/head.js:866:20
[task 2022-07-19T14:14:15.279Z] 14:14:15     INFO -  do_report_result@/builds/worker/workspace/build/tests/xpcshell/head.js:967:5
[task 2022-07-19T14:14:15.279Z] 14:14:15     INFO -  Assert<@/builds/worker/workspace/build/tests/xpcshell/head.js:73:21
[task 2022-07-19T14:14:15.280Z] 14:14:15     INFO -  proto.report@resource://testing-common/Assert.jsm:228:10
[task 2022-07-19T14:14:15.280Z] 14:14:15     INFO -  equal@resource://testing-common/Assert.jsm:270:8
[task 2022-07-19T14:14:15.281Z] 14:14:15     INFO -  ExtensionWrapper/<@resource://testing-common/ExtensionXPCShellUtils.jsm:88:24
[task 2022-07-19T14:14:15.281Z] 14:14:15     INFO -  _execute_test/<@/builds/worker/workspace/build/tests/xpcshell/head.js:659:28
[task 2022-07-19T14:14:15.282Z] 14:14:15     INFO -  _execute_test@/builds/worker/workspace/build/tests/xpcshell/head.js:668:5
[task 2022-07-19T14:14:15.282Z] 14:14:15     INFO -  @-e:1:1
[task 2022-07-19T14:14:15.283Z] 14:14:15     INFO -  "Force a GC"
[task 2022-07-19T14:14:15.283Z] 14:14:15     INFO -  Error: Found unexpected files in temporary directory: generated-extension.xpi at resource://testing-common/AddonTestUtils.jsm:336
[task 2022-07-19T14:14:15.284Z] 14:14:15     INFO -  init/<@resource://testing-common/AddonTestUtils.jsm:336:15
[task 2022-07-19T14:14:15.284Z] 14:14:15     INFO -  _execute_test/<@/builds/worker/workspace/build/tests/xpcshell/head.js:659:28
[task 2022-07-19T14:14:15.284Z] 14:14:15     INFO -  _execute_test@/builds/worker/workspace/build/tests/xpcshell/head.js:671:16
[task 2022-07-19T14:14:15.285Z] 14:14:15     INFO -  @-e:1:1
[task 2022-07-19T14:14:15.285Z] 14:14:15     INFO -  <<<<<<<
[task 2022-07-19T14:14:15.286Z] 14:14:15     INFO -  TEST-START | toolkit/components/extensions/test/xpcshell/test_ext_native_messaging.js
Regressed by: 1770696

Set release status flags based on info from the regressing bug 1770696

:rpl, since you are the author of the regressor, bug 1770696, could you take a look?
For more information, please visit auto_nag documentation.

Flags: needinfo?(lgreco)
Summary: Intermittent TV toolkit/components/extensions/test/xpcshell/test_ext_native_messaging.js | test_pending_sendNativeMessageReply_resets_bgscript_idle_timeout - [test_pending_sendNativeMessageReply_resets_bgscript_idle_timeout : 88] Extension left running at → Intermittent toolkit/components/extensions/test/xpcshell/test_ext_native_messaging.js | test_pending_sendNativeMessageReply_resets_bgscript_idle_timeout - [test_pending_sendNativeMessageReply_resets_bgscript_idle_timeout : 88] Extension left running at

(In reply to Iulian Moraru from comment #3)

This started to fail on tier 1: https://treeherder.mozilla.org/logviewer?job_id=384891860&repo=autoland&lineNumber=3908

I manage to reproduce locally the windows 10 one, and I'm on it.

From what I see in comment 3 on linux we may have a similar intermittent failure our of the TV job when fission is disabled, I got a similar failure triggered on windows 10 while investigating it and so I'll take a look if I can also trigger it on Linux with fission disabled and make sure that the patch I'm going to attach will cover both.

Flags: needinfo?(lgreco)
Assignee: nobody → lgreco
Status: NEW → ASSIGNED

This patch includes the following tweaks to the new test cases as introduced from Bug 1770696:

  • Added an explicit await waitForSubprocessExit(); to the two new test cases which are starting
    a NativeApplication as part of the test case, to make sure to wait for the subprocess to have
    been fully exited, otherwise that triggers a failure on windows 10 because the files that should
    be removed when the test task is exiting would still be in use (and trying to remove them fails
    only on the windows platform)

  • Reworked the NativeApp used by the testSendNativeMessage test task to don't wrongly assume the
    NativeApp handling all the messages sent to be the same one even if one is still running to
    handle a delayed message pending a reply:

    • Each sendNativeMessage call will be spawning its own NativeApp instance and so the test native
      app is now resorting to a time.sleep(...) call to delay replying to any message received,
      to give some time for the test case to cover the behavior expected while a native message is
      still pending a reply.
    • Added Some stderr logs to the NativeApp, to make it easier to see at which point of
      the test case the NativeApp is being started, replying to a message and terminated, along
      with including the process pid to more easily determine if the NativeApp instance is still
      the one we expect to be running and replying to the native messages received.

The changes in comment 5 contains the changes I've applied and tested locally to prevent the intermittent failures that I've been able to reproduce locally (for the linux one I had to add a .only() on the test case that was triggering it to force it to be reproduced locally, while the one on windows was actually triggered often enough without any actual change to the test case).

Pushed to try here to double-check if the same kind of intermittents (or new ones) are still triggered when running the new version of the test on the build infrastructure:

The previous push to try did show that the previous version of the attached patch was not able to fully fix one of the two underlying reasons for the intermittent failures, and so I digged more into it.

In my previous investigation I missed to notice that the NativeApp instance handling each of the sendNativeMessage request was actually processed by its own instance, and that instance gets killed when the message pending a reply is handled.

I've update the patch to keep this into account (and also added some NativeApp logs including the NativeApp process id to make it easier to see in the trace logs when the native app got started, handled the message and then got terminated), unfortunately I also had to resort to make the delayedecho NativeApp to simply reply to the message received with an artificial delay using time.sleep(...), which I'm not really that trilled about.

I've pushed the new version to try here to double-check if this version still fails intermittently:

Pushed by luca.greco@alcacoop.it:
https://hg.mozilla.org/integration/autoland/rev/b45962248b4c
Prevent test_ext_native_messaging.js intermittent failures. r=mixedpuppy
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 104 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: