Closed Bug 1142734 Opened 10 years ago Closed 8 years ago

Intermittent mochitest-JP "0 ERROR TEST-UNEXPECTED-TIMEOUT | automation.py | application timed out after 330 seconds with no output" after "Component returned failure code: 0x80070057 (NS_ERROR_ILLEGAL_VALUE) [nsIXPCComponents_Utils.nukeSandbox]"

Categories

(Add-on SDK Graveyard :: General, defect, P1)

x86
Windows XP
defect

Tracking

(firefox45 wontfix, firefox46 wontfix, firefox47 wontfix, firefox48 wontfix, firefox49 affected, firefox-esr45 wontfix, firefox50 affected)

RESOLVED INCOMPLETE
Tracking Status
firefox45 --- wontfix
firefox46 --- wontfix
firefox47 --- wontfix
firefox48 --- wontfix
firefox49 --- affected
firefox-esr45 --- wontfix
firefox50 --- affected

People

(Reporter: RyanVM, Unassigned)

References

Details

(Keywords: intermittent-failure)

Attachments

(4 files)

12:55:16 INFO - JavaScript error: resource://gre/modules/addons/XPIProvider.jsm -> jar:file:///c:/users/cltbld/appdata/local/temp/tmpufvkip.mozrunner/extensions/test-page-mod-debugger@jetpack.xpi!/bootstrap.js, line 303: NS_ERROR_ILLEGAL_VALUE: Component returned failure code: 0x80070057 (NS_ERROR_ILLEGAL_VALUE) [nsIXPCComponents_Utils.nukeSandbox] 12:55:20 INFO - 1426190120163 Browser.Experiments.Experiments TRACE Experiments #0::enabled=true, true 12:55:20 INFO - 1426190120164 Browser.Experiments.Experiments TRACE Experiments #0::Registering instance with Addon Manager. 12:55:20 INFO - 1426190120165 Browser.Experiments.Experiments TRACE Experiments #0::Registering previous experiment add-on provider. 12:55:20 INFO - 1426190120165 Browser.Experiments.Experiments TRACE PreviousExperimentProvider #0::startup() 12:55:20 INFO - 1426190120166 Browser.Experiments.Experiments TRACE Experiments #0::_loadFromCache 12:55:20 INFO - 1426190120171 Browser.Experiments.Experiments TRACE Experiments #0::_loadTask finished ok 12:55:20 INFO - 1426190120171 Browser.Experiments.Experiments TRACE Experiments #0::_run 12:55:20 INFO - 1426190120172 Browser.Experiments.Experiments TRACE Experiments #0::_main iteration 12:55:20 INFO - 1426190120172 Browser.Experiments.Experiments TRACE Experiments #0::_evaluateExperiments 12:55:20 INFO - 1426190120174 Browser.Experiments.Experiments TRACE Experiments #0::_main finished, scheduling next run 13:00:50 INFO - 0 ERROR TEST-UNEXPECTED-TIMEOUT | automation.py | application timed out after 330 seconds with no output 13:00:50 INFO - TEST-INFO | screenshot: exit status 0 13:00:50 INFO - TEST-INFO | crashinject: exit status 0 13:00:51 INFO - TEST-INFO | Main app process: exit status 1 13:00:51 INFO - 1 ERROR TEST-UNEXPECTED-FAIL | automation.py | application terminated with exit code 1
Dave, this is a manual star on Treeherder thanks to the generic nature of the failure. Any chance this can be prioritized for investigation?
Flags: needinfo?(dtownsend)
Summary: Intermittent mochitest-JP automation.py | application timed out after 330 seconds with no output → Intermittent mochitest-JP "0 ERROR TEST-UNEXPECTED-TIMEOUT | automation.py | application timed out after 330 seconds with no output"
Attached patch logging patch — — Splinter Review
Looks like something is failing when the harness attempts to uninstall one test add-on and install the next. This adds some extra logging and catches any errors and logs them so we might have better information to go on. If anything goes wrong installing/uninstalling an add-on I've assumed we'll be in an inconsistent state and just ended the test run there.
Flags: needinfo?(dtownsend)
Attachment #8577293 - Flags: review?(evold)
Attachment #8577293 - Flags: review?(evold) → review+
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla39
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla39 → ---
Priority: -- → P1
This was exacerbated by the recent Addon SDK update in bug 1171638, which I've since backed out.
This makes me think the issue is in automation.py, since the error is about no output in 330 when there was clearly output: 05:37:11 INFO - TEST-PASS | simple-prefs-l10n/main.testAOMLocalization | This test is done. 05:37:11 INFO - TEST-PASS | simple-prefs-l10n/main.testAOMLocalization | There was a clean UI. 05:37:11 INFO - TEST-END | simple-prefs-l10n/main.testAOMLocalization 05:37:11 INFO - 3 of 3 tests passed. 05:37:11 INFO - TEST-INFO | jetpack-addon-harness.js | Uninstalling test add-on addon-sdk/source/test/addons/simple-prefs-l10n 05:37:11 INFO - 1433507831710 Browser.Experiments.Experiments TRACE Experiments #0::onUninstalled() - addon id: test-simple-prefs-l10n@jetpack 05:42:41 WARNING - TEST-UNEXPECTED-TIMEOUT | automation.py | application timed out after 330 seconds with no output 05:42:41 INFO - TEST-INFO | screenshot: exit status 0 05:42:41 INFO - TEST-INFO | crashinject: exit status 0 05:42:43 INFO - TEST-INFO | Main app process: exit status 1 05:42:43 WARNING - TEST-UNEXPECTED-FAIL | automation.py | application terminated with exit code 1
I could use some help with this one, I've never looked at this automation.py file before.
(In reply to Erik Vold [:erikvold] (please needinfo? me) from comment #188) > I could use some help with this one, I've never looked at this automation.py > file before. It looks like this error is coming from https://github.com/mozilla/gecko-dev/blob/6e929ef81c0807a969f9064449736f6b92966e12/testing/mochitest/runtests.py#L2331
Assignee: nobody → evold
Attachment #8616173 - Flags: review?(dtownsend)
Attachment #8616173 - Flags: review?(dtownsend) → review+
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
Given the number of OrangeFactor Robot comments in here, this should probably be re-opened.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
With my patch from bug 1231784 applied, this is: osx 10.10 opt: 100% reproducible osx 10.10 debug: not reproducible osx 10.6 opt: intermittent osx 10.6 debug: not reproducible linux opt/debug: not reproducible Given the hit or miss nature of where this is reproducible, this is likely a race condition. I believe the "fix" from last time simply changed the timing enough to hide the race condition for a bit. This also has nothing to do with automation.py, I believe the relevant line from the log is here: 15:24:46 INFO - JavaScript error: resource://gre/modules/addons/XPIProvider.jsm -> jar:file:///var/folders/2_/5bvqc7_51034r2hmy_t0wjpm00000w/T/tmpUew2qN.mozrunner/extensions/test-addon-author-email@jetpack.xpi!/bootstrap.js, line 299: NS_ERROR_ILLEGAL_VALUE: Component returned failure code: 0x80070057 (NS_ERROR_ILLEGAL_VALUE) [nsIXPCComponents_Utils.nukeSandbox] (Please note that this bug is being used as a sort of catch all by OrangeFactor Robot.. but I'm actually hitting the error from comment 0 here).
Blocks: 1231784
Summary: Intermittent mochitest-JP "0 ERROR TEST-UNEXPECTED-TIMEOUT | automation.py | application timed out after 330 seconds with no output" → Intermittent mochitest-JP "0 ERROR TEST-UNEXPECTED-TIMEOUT | automation.py | application timed out after 330 seconds with no output" after "Component returned failure code: 0x80070057 (NS_ERROR_ILLEGAL_VALUE) [nsIXPCComponents_Utils.nukeSandbox]"
Oh hm.. that nukeSandbox error shows up in passing logs too. I'm not sure anymore. All I know for certain is that this is the last output we get: https://treeherder.mozilla.org/logviewer.html#?job_id=16343364&repo=try#L1547
I'm no JS expert, but I believe this error handler is attached to the wrong Promise: https://dxr.mozilla.org/mozilla-central/source/testing/mochitest/jetpack-addon-harness.js#227 Try run coming shortly which should hopefully give a clue to what's happening.
That didn't make a difference. I'm flailing about here :/ Dave, is there anyone who could help me dig into this a bit? It looks like Erik isn't around anymore. Thanks!
Flags: needinfo?(dtownsend)
The catch block never gets hit in this patch: https://hg.mozilla.org/try/rev/2ab8fcaa6a3b So I think that the nukeSandbox JavaScript error is fatal with my patch, but non-fatal without my patch. I don't understand why.
const declarations are no longer accessible from outside the script that makes them so in order for bootstrap.js to be able to access the loaderSandbox declaration we must use var. Review commit: https://reviewboard.mozilla.org/r/34031/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/34031/
Attachment #8717069 - Flags: review?(jsantell)
jetpack-addon-harness.js runs in a browser window scope so it already has the setTimeout functions available to it. Loading Timer.jsm overrides the DOM timer functions with those from Timer.jsm. Any other code that used setTimeout previously will have timer IDs from the DOM functions which don't match those in Timer.jsm. If this other code attempts to clear a timer it can then end up clearing an unrelated timer. In the intermittent failure here the browser-thumbnails code manages to clear the timer that is waiting to resolve the promise that makes tests continue. I've also added an additional timer that throws an exception and so ends tests if the add-on uninstall doesn't actually complete in a reasonable time as well as removing the add-on listener. Review commit: https://reviewboard.mozilla.org/r/34033/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/34033/
Attachment #8717070 - Flags: review?(jsantell)
Attachment #8717069 - Flags: review?(jsantell) → review+
Comment on attachment 8717069 [details] MozReview Request: Bug 1142734: Allow unloading the loader sandbox module. r?jsantell https://reviewboard.mozilla.org/r/34031/#review30733
Comment on attachment 8717070 [details] MozReview Request: Bug 1142734: Stop using Timer.jsm to avoid replacing the browser window setTimout and clearTimeout functions. r?jsantell https://reviewboard.mozilla.org/r/34033/#review30735 ::: testing/mochitest/jetpack-addon-harness.js:115 (Diff revision 1) > + }, 10000); Maybe make this a constant at the top of the file?
Attachment #8717070 - Flags: review?(jsantell) → review+
Assignee: evold → dtownsend
Flags: needinfo?(dtownsend)
Status: REOPENED → RESOLVED
Closed: 10 years ago9 years ago
Resolution: --- → FIXED
OrangeFactor says this is still happening on both trunk and Aurora.
Assignee: dtownsend → nobody
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla41 → ---
(In reply to Ryan VanderMeulen [:RyanVM] from comment #253) > OrangeFactor says this is still happening on both trunk and Aurora. This seems to be a new issue. It's M(oth) and not M(JP) and the logs don't mention |nukeSandbox|. Is that a cookie test? See https://treeherder.mozilla.org/logviewer.html#?job_id=23461353&repo=mozilla-inbound#L33562
Status: REOPENED → RESOLVED
Closed: 9 years ago8 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: