Intermittent mochitest-JP "0 ERROR TEST-UNEXPECTED-TIMEOUT | automation.py | application timed out after 330 seconds with no output" after "Component returned failure code: 0x80070057 (NS_ERROR_ILLEGAL_VALUE) [nsIXPCComponents_Utils.nukeSandbox]"

RESOLVED INCOMPLETE

Status

defect
P1
normal
RESOLVED INCOMPLETE
4 years ago
2 years ago

People

(Reporter: RyanVM, Unassigned)

Tracking

({intermittent-failure})

Dependency tree / graph

Firefox Tracking Flags

(firefox45 wontfix, firefox46 wontfix, firefox47 wontfix, firefox48 wontfix, firefox49 affected, firefox-esr45 wontfix, firefox50 affected)

Details

Attachments

(4 attachments)

12:55:16 INFO - JavaScript error: resource://gre/modules/addons/XPIProvider.jsm -> jar:file:///c:/users/cltbld/appdata/local/temp/tmpufvkip.mozrunner/extensions/test-page-mod-debugger@jetpack.xpi!/bootstrap.js, line 303: NS_ERROR_ILLEGAL_VALUE: Component returned failure code: 0x80070057 (NS_ERROR_ILLEGAL_VALUE) [nsIXPCComponents_Utils.nukeSandbox]
12:55:20 INFO - 1426190120163 Browser.Experiments.Experiments TRACE Experiments #0::enabled=true, true
12:55:20 INFO - 1426190120164 Browser.Experiments.Experiments TRACE Experiments #0::Registering instance with Addon Manager.
12:55:20 INFO - 1426190120165 Browser.Experiments.Experiments TRACE Experiments #0::Registering previous experiment add-on provider.
12:55:20 INFO - 1426190120165 Browser.Experiments.Experiments TRACE PreviousExperimentProvider #0::startup()
12:55:20 INFO - 1426190120166 Browser.Experiments.Experiments TRACE Experiments #0::_loadFromCache
12:55:20 INFO - 1426190120171 Browser.Experiments.Experiments TRACE Experiments #0::_loadTask finished ok
12:55:20 INFO - 1426190120171 Browser.Experiments.Experiments TRACE Experiments #0::_run
12:55:20 INFO - 1426190120172 Browser.Experiments.Experiments TRACE Experiments #0::_main iteration
12:55:20 INFO - 1426190120172 Browser.Experiments.Experiments TRACE Experiments #0::_evaluateExperiments
12:55:20 INFO - 1426190120174 Browser.Experiments.Experiments TRACE Experiments #0::_main finished, scheduling next run
13:00:50 INFO - 0 ERROR TEST-UNEXPECTED-TIMEOUT | automation.py | application timed out after 330 seconds with no output
13:00:50 INFO - TEST-INFO | screenshot: exit status 0
13:00:50 INFO - TEST-INFO | crashinject: exit status 0
13:00:51 INFO - TEST-INFO | Main app process: exit status 1
13:00:51 INFO - 1 ERROR TEST-UNEXPECTED-FAIL | automation.py | application terminated with exit code 1
Dave, this is a manual star on Treeherder thanks to the generic nature of the failure. Any chance this can be prioritized for investigation?
Flags: needinfo?(dtownsend)
Summary: Intermittent mochitest-JP automation.py | application timed out after 330 seconds with no output → Intermittent mochitest-JP "0 ERROR TEST-UNEXPECTED-TIMEOUT | automation.py | application timed out after 330 seconds with no output"
Posted patch logging patchSplinter Review
Looks like something is failing when the harness attempts to uninstall one test add-on and install the next. This adds some extra logging and catches any errors and logs them so we might have better information to go on.

If anything goes wrong installing/uninstalling an add-on I've assumed we'll be in an inconsistent state and just ended the test run there.
Flags: needinfo?(dtownsend)
Attachment #8577293 - Flags: review?(evold)
Attachment #8577293 - Flags: review?(evold) → review+
https://hg.mozilla.org/mozilla-central/rev/6c269cbf361c
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla39
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla39 → ---
Priority: -- → P1
This was exacerbated by the recent Addon SDK update in bug 1171638, which I've since backed out.
This makes me think the issue is in automation.py, since the error is about no output in 330 when there was clearly output:

05:37:11 INFO - TEST-PASS | simple-prefs-l10n/main.testAOMLocalization | This test is done.
05:37:11 INFO - TEST-PASS | simple-prefs-l10n/main.testAOMLocalization | There was a clean UI.
05:37:11 INFO - TEST-END | simple-prefs-l10n/main.testAOMLocalization
05:37:11 INFO - 3 of 3 tests passed.
05:37:11 INFO - TEST-INFO | jetpack-addon-harness.js | Uninstalling test add-on addon-sdk/source/test/addons/simple-prefs-l10n
05:37:11 INFO - 1433507831710 Browser.Experiments.Experiments TRACE Experiments #0::onUninstalled() - addon id: test-simple-prefs-l10n@jetpack
05:42:41 WARNING - TEST-UNEXPECTED-TIMEOUT | automation.py | application timed out after 330 seconds with no output
05:42:41 INFO - TEST-INFO | screenshot: exit status 0
05:42:41 INFO - TEST-INFO | crashinject: exit status 0
05:42:43 INFO - TEST-INFO | Main app process: exit status 1
05:42:43 WARNING - TEST-UNEXPECTED-FAIL | automation.py | application terminated with exit code 1
I could use some help with this one, I've never looked at this automation.py file before.
(In reply to Erik Vold [:erikvold] (please needinfo? me) from comment #188)
> I could use some help with this one, I've never looked at this automation.py
> file before.

It looks like this error is coming from https://github.com/mozilla/gecko-dev/blob/6e929ef81c0807a969f9064449736f6b92966e12/testing/mochitest/runtests.py#L2331
Assignee: nobody → evold
Attachment #8616173 - Flags: review?(dtownsend) → review+
https://hg.mozilla.org/mozilla-central/rev/c0177adc8763
Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
Given the number of OrangeFactor Robot comments in here, this should probably be re-opened.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
With my patch from bug 1231784 applied, this is:
osx 10.10 opt: 100% reproducible
osx 10.10 debug: not reproducible
osx 10.6 opt: intermittent
osx 10.6 debug: not reproducible
linux opt/debug: not reproducible

Given the hit or miss nature of where this is reproducible, this is likely a race condition. I believe the "fix" from last time simply changed the timing enough to hide the race condition for a bit. This also has nothing to do with automation.py, I believe the relevant line from the log is here:
 15:24:46 INFO - JavaScript error: resource://gre/modules/addons/XPIProvider.jsm -> jar:file:///var/folders/2_/5bvqc7_51034r2hmy_t0wjpm00000w/T/tmpUew2qN.mozrunner/extensions/test-addon-author-email@jetpack.xpi!/bootstrap.js, line 299: NS_ERROR_ILLEGAL_VALUE: Component returned failure code: 0x80070057 (NS_ERROR_ILLEGAL_VALUE) [nsIXPCComponents_Utils.nukeSandbox]

(Please note that this bug is being used as a sort of catch all by OrangeFactor Robot.. but I'm actually hitting the error from comment 0 here).
Blocks: 1231784
Summary: Intermittent mochitest-JP "0 ERROR TEST-UNEXPECTED-TIMEOUT | automation.py | application timed out after 330 seconds with no output" → Intermittent mochitest-JP "0 ERROR TEST-UNEXPECTED-TIMEOUT | automation.py | application timed out after 330 seconds with no output" after "Component returned failure code: 0x80070057 (NS_ERROR_ILLEGAL_VALUE) [nsIXPCComponents_Utils.nukeSandbox]"
Oh hm.. that nukeSandbox error shows up in passing logs too. I'm not sure anymore. All I know for certain is that this is the last output we get:
https://treeherder.mozilla.org/logviewer.html#?job_id=16343364&repo=try#L1547
I'm no JS expert, but I believe this error handler is attached to the wrong Promise:
https://dxr.mozilla.org/mozilla-central/source/testing/mochitest/jetpack-addon-harness.js#227

Try run coming shortly which should hopefully give a clue to what's happening.
That didn't make a difference. I'm flailing about here :/

Dave, is there anyone who could help me dig into this a bit? It looks like Erik isn't around anymore. Thanks!
Flags: needinfo?(dtownsend)
The catch block never gets hit in this patch:
https://hg.mozilla.org/try/rev/2ab8fcaa6a3b

So I think that the nukeSandbox JavaScript error is fatal with my patch, but non-fatal without my patch. I don't understand why.
const declarations are no longer accessible from outside the script that makes
them so in order for bootstrap.js to be able to access the loaderSandbox
declaration we must use var.

Review commit: https://reviewboard.mozilla.org/r/34031/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/34031/
Attachment #8717069 - Flags: review?(jsantell)
jetpack-addon-harness.js runs in a browser window scope so it already has the
setTimeout functions available to it. Loading Timer.jsm overrides the DOM
timer functions with those from Timer.jsm. Any other code that used setTimeout
previously will have timer IDs from the DOM functions which don't match those
in Timer.jsm. If this other code attempts to clear a timer it can then end up
clearing an unrelated timer. In the intermittent failure here the
browser-thumbnails code manages to clear the timer that is waiting to resolve
the promise that makes tests continue.

I've also added an additional timer that throws an exception and so ends tests
if the add-on uninstall doesn't actually complete in a reasonable time as well
as removing the add-on listener.

Review commit: https://reviewboard.mozilla.org/r/34033/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/34033/
Attachment #8717070 - Flags: review?(jsantell)
Attachment #8717069 - Flags: review?(jsantell) → review+
Comment on attachment 8717069 [details]
MozReview Request: Bug 1142734: Allow unloading the loader sandbox module. r?jsantell

https://reviewboard.mozilla.org/r/34031/#review30733
Comment on attachment 8717070 [details]
MozReview Request: Bug 1142734: Stop using Timer.jsm to avoid replacing the browser window setTimout and clearTimeout functions. r?jsantell

https://reviewboard.mozilla.org/r/34033/#review30735

::: testing/mochitest/jetpack-addon-harness.js:115
(Diff revision 1)
> +    }, 10000);

Maybe make this a constant at the top of the file?
Attachment #8717070 - Flags: review?(jsantell) → review+
Assignee: evold → dtownsend
Flags: needinfo?(dtownsend)
https://hg.mozilla.org/mozilla-central/rev/9c63922674ac
https://hg.mozilla.org/mozilla-central/rev/5afb767f3591
Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → FIXED
OrangeFactor says this is still happening on both trunk and Aurora.
Assignee: dtownsend → nobody
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla41 → ---
(In reply to Ryan VanderMeulen [:RyanVM] from comment #253)
> OrangeFactor says this is still happening on both trunk and Aurora.
This seems to be a new issue. It's M(oth) and not M(JP) and the logs don't mention |nukeSandbox|. Is that a cookie test? See https://treeherder.mozilla.org/logviewer.html#?job_id=23461353&repo=mozilla-inbound#L33562
https://bugzilla.mozilla.org/show_bug.cgi?id=1399562
Status: REOPENED → RESOLVED
Closed: 4 years ago2 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.