Closed Bug 865391 Opened 11 years ago Closed 10 years ago

Intermittent Jetpack TEST-UNEXPECTED-FAIL | unknown | Test output exceeded timeout (300s).

Categories

(Add-on SDK Graveyard :: General, defect, P2)

defect

Tracking

(firefox24 affected, firefox25 affected, firefox26 affected)

RESOLVED WORKSFORME
Tracking Status
firefox24 --- affected
firefox25 --- affected
firefox26 --- affected

People

(Reporter: RyanVM, Unassigned)

References

Details

(Keywords: intermittent-failure)

Attachments

(3 files, 1 obsolete file)

https://tbpl.mozilla.org/php/getParsedLog.php?id=22183878&tree=Mozilla-Inbound

WINNT 6.2 mozilla-inbound opt test jetpack on 2013-04-24 02:58:19 PDT for push 095979278426
slave: t-w864-ix-083

Testing all available packages: C:\slave\test\build\jetpack.
Testing C:\slave\test\build\jetpack...
Using binary at 'firefox/firefox.exe'.
Using profile at 'c:\users\cltbld~1.t-w\appdata\local\temp\tmpuayqg2.mozrunner'.
TEST-UNEXPECTED-FAIL | unknown | Test output exceeded timeout (60s).
Traceback (most recent call last):
  File "jetpack/bin/cfx", line 33, in <module>
    cuddlefish.run()
  File "C:\slave\test\build\jetpack\python-lib\cuddlefish\__init__.py", line 602, in run
    test_all_packages(env_root, defaults=options.__dict__)
  File "C:\slave\test\build\jetpack\python-lib\cuddlefish\__init__.py", line 484, in test_all_packages
    env_root=env_root)
  File "C:\slave\test\build\jetpack\python-lib\cuddlefish\__init__.py", line 928, in run
    bundle_sdk=options.bundle_sdk)
  File "C:\slave\test\build\jetpack\python-lib\cuddlefish\runner.py", line 743, in run_app
    OUTPUT_TIMEOUT, test_name, parseable)
Exception: Test output exceeded timeout (60s).
program finished with exit code 1
Blocks: 904063
Assignee: nobody → zer0
It seems that the failure happens before any test is actually running – that's why the "unknown" part in the error message –, so basically it seems that from when runner.py is executed, to the accessing at the log file for the first time, on try server, sometimes, a minute is passed.

I have no past experience on try server, but gabor said that it could be that it hangs for minutes. It could be also because some sync I/O operation, I don't know. There is no actually test to fix or disable here, so my blind shot is just increase the output's timeout and see if it's actually depends by slowness of try server machine or not.
Attachment #789126 - Flags: review?(dtownsend+bugmail)
Pointer to Github pull-request
Comment on attachment 789142 [details]
Pointer to Github pull request: https://github.com/mozilla/addon-sdk/pull/1162

*sigh*
Attachment #789142 - Attachment is obsolete: true
Attachment #789126 - Flags: review?(dtownsend+bugmail) → review+
Commits pushed to master at https://github.com/mozilla/addon-sdk

https://github.com/mozilla/addon-sdk/commit/98880f600a0906fc07fe10cd466fdcf9dbc8ae8f
Bug 865391 - Intermittent Jetpack TEST-UNEXPECTED-FAIL | unknown | Test output exceeded timeout (60s)

Increased the timeout from 1 minute to 5 minutes - hoping it would be enough

https://github.com/mozilla/addon-sdk/commit/52b1103d65690c5dbc562bf1cae812aa84573330
Merge pull request #1162 from ZER0/runner-fail/865391

Bug 865391 - Intermittent Jetpack TEST-UNEXPECTED-FAIL | unknown | Test ... r=@Mossop
Merged, but leave the bug open because we're not sure yet if that's enough to fix the failure.
We use an automated tool for marking bugs. [leave open] on the whiteboard is the magic phrase you're looking for.
Whiteboard: [leave open]
I think I'm actually going to close this for the simple reason that this is rare so we're not going to know if it has been fixed unless it crops up again. So better to mark it as fixed and reopen it if it re-occurs than to leave it and forget to close it later.
Whiteboard: [leave open]
So I guess I should actually do that...
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla26
https://tbpl.mozilla.org/php/getParsedLog.php?id=26650771&tree=Fx-Team

Or not.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Summary: Intermittent Jetpack TEST-UNEXPECTED-FAIL | unknown | Test output exceeded timeout (60s). → Intermittent Jetpack TEST-UNEXPECTED-FAIL | unknown | Test output exceeded timeout (300s).
Target Milestone: mozilla26 → ---
Gabor told that try server could holds around 5 - 10 minutes; I tried 5 minutes at first shot to avoid to wait longer. One thing I can do is put 12 minutes or more and see if something will change. I could probably set some exaggerate timeout (30 minutes or 1 hour maybe, like the execution timeout) and see what's happening: in that case if it's still present, we could safely assume is not due the fact that the machine is terrible slow.
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #24)
> https://tbpl.mozilla.org/php/getParsedLog.php?id=26650771&tree=Fx-Team
> 
> Or not.

This log appears different to the others and has an additional error that may point to the source of the problem:

console.error: addon-sdk: 
  Message: TypeError: browser(...) is undefined
  Stack:
    onWindowActivation@resource://gre/modules/XPIProvider.jsm -> jar:file:///var/folders/61/xnng5xgx64zd5m3hgq331vwr00000w/T/tmpP6f2UP.mozrunner/extensions/3b01a5af-2ea5-454c-a7d7-31af4f22d677@jetpack.xpi!/bootstrap.js -> resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/commonjs/sdk/windows/firefox.js:160
_emitOnObject@resource://gre/modules/XPIProvider.jsm -> jar:file:///var/folders/61/xnng5xgx64zd5m3hgq331vwr00000w/T/tmpP6f2UP.mozrunner/extensions/3b01a5af-2ea5-454c-a7d7-31af4f22d677@jetpack.xpi!/bootstrap.js -> resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/commonjs/sdk/deprecated/events.js:153
_emit@resource://gre/modules/XPIProvider.jsm -> jar:file:///var/folders/61/xnng5xgx64zd5m3hgq331vwr00000w/T/tmpP6f2UP.mozrunner/extensions/3b01a5af-2ea5-454c-a7d7-31af4f22d677@jetpack.xpi!/bootstrap.js -> resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/commonjs/sdk/deprecated/events.js:123
handleEvent@resource://gre/modules/XPIProvider.jsm -> jar:file:///var/folders/61/xnng5xgx64zd5m3hgq331vwr00000w/T/tmpP6f2UP.mozrunner/extensions/3b01a5af-2ea5-454c-a7d7-31af4f22d677@jetpack.xpi!/bootstrap.js -> resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/commonjs/sdk/windows/observer.js:36
Assignee: zer0 → evold
Afaict the problem is occuring before any tests start (ie TEST-START is not reached) so I can't imagine how https://github.com/mozilla/addon-sdk/pull/1162  would fix this issue.  I'd like to revert that to start with, is there so other reason why the change was made?  60 seconds is a very long time as it is, I can't see why we'd ever need to wait 5 mins instead.
Flags: needinfo?(dtownsend+bugmail)
So I think the problem is https://github.com/mozilla/addon-sdk/blob/master/lib/sdk/addon/runner.js#L114-L116

We wait for the add/window to load, but if it does not load, then we get no error and the add-on does not run.  I made bug 906369 about logging an error if the addon/window does not load, but it doesn't seem possible to detect if the the window hasn't loaded, without using a setTimeout.

I was against addon/window from the beginning and I'm fairly certain I argued against requiring a background window for every add-on.. I'd love to rip this out.
Flags: needinfo?(rFobic)
(In reply to Erik Vold [:erikvold] [:ztatic] from comment #27)
> Afaict the problem is occuring before any tests start (ie TEST-START is not
> reached) so I can't imagine how
> https://github.com/mozilla/addon-sdk/pull/1162  would fix this issue.  I'd
> like to revert that to start with, is there so other reason why the change
> was made?  60 seconds is a very long time as it is, I can't see why we'd
> ever need to wait 5 mins instead.

Yep I made all of these comments here: https://bugzilla.mozilla.org/show_bug.cgi?id=820953#c13
(In reply to Erik Vold [:erikvold] [:ztatic] from comment #28)
> So I think the problem is
> https://github.com/mozilla/addon-sdk/blob/master/lib/sdk/addon/runner.
> js#L114-L116
> 
> We wait for the add/window to load, but if it does not load, then we get no
> error and the add-on does not run.  I made bug 906369 about logging an error
> if the addon/window does not load, but it doesn't seem possible to detect if
> the the window hasn't loaded, without using a setTimeout.
> 
> I was against addon/window from the beginning and I'm fairly certain I
> argued against requiring a background window for every add-on.. I'd love to
> rip this out.

Oops, I made these comments here https://bugzilla.mozilla.org/show_bug.cgi?id=820953#c13

;)
Note that we only appear to be running into this issue on non Jetpack trees, so if are tests are removed we likely will not be able to get the information that we need in order to solve the issue.

Also there is no test that can be disabled here, because the problem is that our tests are not running, so that is not an option.
Attachment #791748 - Flags: review?(dtownsend+bugmail)
(In reply to Erik Vold [:erikvold] [:ztatic] from comment #32)
> Created attachment 791748 [details]
> Pointer to Github pull request:
> https://github.com/mozilla/addon-sdk/pull/1182
> 
> Pointer to Github pull-request

Note if this is r+'d we should uplift it asap because the issue seems to happen on non Jetpack trees only so far.
OS: Windows 8 → All
Hardware: x86_64 → All
Ahh so the addon/window issue could be one of two problems, either the window doesn't load or it loads before we've attached the DOMContentLoaded listener.
(In reply to Erik Vold [:erikvold] [:ztatic] from comment #27)

> Afaict the problem is occuring before any tests start (ie TEST-START is not
> reached) so I can't imagine how
> https://github.com/mozilla/addon-sdk/pull/1162  would fix this issue.  I'd
> like to revert that to start with, is there so other reason why the change
> was made?  60 seconds is a very long time as it is, I can't see why we'd
> ever need to wait 5 mins instead.

I'm not an expert about try server; as I wrote before in accordance with Gabor one minute is not much on those machines. Consider that we had also increase the timeout of the running tests up to 1 hours and half. I do not mind to revert - it was just an attempt to check if actually the machine are actually so slow there were some block maybe on I/O part. It's exactly because the TEST-START wasn't reach that I tried with the timeout before, to see if the culprit was outside our javascript code, and was on the test runner / try server instead.
(In reply to Erik Vold [:erikvold] [:ztatic] from comment #28)

> We wait for the add/window to load, but if it does not load, then we get no
> error and the add-on does not run.  I made bug 906369 about logging an error
> if the addon/window does not load, but it doesn't seem possible to detect if
> the the window hasn't loaded, without using a setTimeout.

In this scenario I think a timeout is legitimate: at the end we want to reject the promise if it's not "ready" before an acceptable amount of time: is the same mechanism we have in the runner, where we do not want that the test hangs.
(In reply to Erik Vold [:erikvold] [:ztatic] from comment #33)
> (In reply to Erik Vold [:erikvold] [:ztatic] from comment #32)
> > Created attachment 791748 [details]
> > Pointer to Github pull request:
> > https://github.com/mozilla/addon-sdk/pull/1182
> > 
> > Pointer to Github pull-request
> 
> Note if this is r+'d we should uplift it asap because the issue seems to
> happen on non Jetpack trees only so far.

Not sure what are you suggesting: it seems your PR is just to collect data, so why uplift here?
(In reply to Erik Vold [:erikvold] [:ztatic] from comment #28)
> So I think the problem is
> https://github.com/mozilla/addon-sdk/blob/master/lib/sdk/addon/runner.
> js#L114-L116
> 
> We wait for the add/window to load, but if it does not load, then we get no
> error and the add-on does not run.  

Can you elaborate more on case where window won't load ? I would expect this to be deterministic specially since it's always a same document, if add-on window is not loaded add-on failed so we can not really do much.

> I made bug 906369 about logging an error
> if the addon/window does not load, but it doesn't seem possible to detect if
> the the window hasn't loaded, without using a setTimeout.
>

If window can fail to load than we should make sure platform has an API to report that, which we'll be able to hook into. In the meantime using a workaround with a timer or whatever else sounds reasonable way to me. However I don't see how timer here would be any better than timer we have for tests in general, if window failed to load add-on is not going to work.  

> 
> I was against addon/window from the beginning and I'm fairly certain I
> argued against requiring a background window for every add-on.. I'd love to
> rip this out.
>

This was added for a good reasons and I'm happy to iterate over them again, I think idea of ripping it out is very harsh, not to mention that every add-on did had hidden frame and windows anyway they just were not put under same addon/window iframe.


Including gabor and eddy to this thread as they may have better insight on weather iframe could fail to load a document and if there is a way to detect that was a case.
Flags: needinfo?(rFobic)
Flags: needinfo?(gkrizsanits)
Flags: needinfo?(ejpbruel)
(In reply to Erik Vold [:erikvold] [:ztatic] from comment #34)
> Ahh so the addon/window issue could be one of two problems, either the
> window doesn't load or it loads before we've attached the DOMContentLoaded
> listener.

My expectation was that document loading is async operation and since listener is attached right after location is changed it should be executed. Although I guess it's easy enough to check `document.readyState` and resolve promise rightaway if it's already interactive and `document.URL` is one that we expect.
Erik one more thing regarding addon/window, note that current implementation is a temporary workaround until we switch to permanent windowless docshells (see Bug 565388 for more details), which was designed to do exactly what we were using hidden iframes and all that workarounds.
(In reply to Erik Vold [:erikvold] [:ztatic] from comment #27)
> like to revert that to start with, is there so other reason why the change
> was made?  60 seconds is a very long time as it is, I can't see why we'd
> ever need to wait 5 mins instead.

Pleas don't. On a windows vm during mem page swapping, while all the cpu's are at 100%, loading a browser in debug mode... 1 minute is not at all an unreachable latency. I would not expect any frequent intermittent because of this, but can cause annoyances sometimes. So for one I would sleep better if it stayed 5 min. unless you have a good reason against it.

(In reply to Erik Vold [:erikvold] [:ztatic] from comment #28)
> I was against addon/window from the beginning and I'm fairly certain I
> argued against requiring a background window for every add-on.. I'd love to
> rip this out.

I hope when you say background window, you mean an iframe in the hidden window, and we are still not spawning any more hidden windows... How would it be better if there were only one iframe for all the addons? It could still produce the same failure, no?

(In reply to Irakli Gozilalishvili [:irakli] [:gozala] [@gozala] from comment #38)
> Can you elaborate more on case where window won't load ? I would expect this
> to be deterministic specially since it's always a same document, if add-on
> window is not loaded add-on failed so we can not really do much.

Hmm... I've seen some intermittent hidden window loading issues in the past a few times... You could try it again in case of failure, but I don't think that's the way to go here ofc, so let's not do that. What fails exactly? I mean is the hidden window loaded, and then we attach an iframe to it with some trivial content (some data url) and that iframe fails to load? Or do we have some more complex document? Can we check if the hidden window is loaded correctly?

> If window can fail to load than we should make sure platform has an API to
> report that, which we'll be able to hook into. In the meantime using a
> workaround with a timer or whatever else sounds reasonable way to me.
> However I don't see how timer here would be any better than timer we have
> for tests in general, if window failed to load add-on is not going to work.  

Still not sure what fails exactly.

> This was added for a good reasons and I'm happy to iterate over them again,

I would love to hear that some time, I had some concerns too, but this is an off topic (and might be even too late for changing back anyway) so let's not discuss it here.

OK so what should we do? Since I assume there is no way we can reproduce this locally, I'd add some extra logging to the test runner on try, to try to figure out what is failing exactly (hidden window state change events, addon window state change events, anything that might reveals some more info about this issue)

I'd love to be more helpful but I'm really out of ideas based on the limited amount of info we have.
Flags: needinfo?(gkrizsanits)
(In reply to Erik Vold [:erikvold] [:ztatic] from comment #27)
> Afaict the problem is occuring before any tests start (ie TEST-START is not
> reached) so I can't imagine how
> https://github.com/mozilla/addon-sdk/pull/1162  would fix this issue.  I'd
> like to revert that to start with, is there so other reason why the change
> was made?  60 seconds is a very long time as it is, I can't see why we'd
> ever need to wait 5 mins instead.

That timeout catches when nothing is logged for a time. If the VM is running slowly it isn't impossible that it could go longer. I agree I thought it was unlikely to fix it but I also don't see a reason to back it out, 5 minutes seems fine to me.
Flags: needinfo?(dtownsend+bugmail)
Comment on attachment 791748 [details]
Pointer to Github pull request: https://github.com/mozilla/addon-sdk/pull/1182

Let's see what new info this gets us
Attachment #791748 - Flags: review?(dtownsend+bugmail) → review+
Gabor and Irakli.

You both seem to be saying that before the addon/window module landed that every jetpack add-on had a hidden window.  As far as I know that is not the case and the hidden window iframes were only used if the add-on used the panel module or the the page-worker module.

Can you explain explicitly why I am wrong? in other words, for what purpose were add-ons previously using a hidden window?
Flags: needinfo?(rFobic)
Flags: needinfo?(gkrizsanits)
Commits pushed to master at https://github.com/mozilla/addon-sdk

https://github.com/mozilla/addon-sdk/commit/6da2ee4d28548c34ffd44106d5e6e3c7962a9634
Bug 865391 Adding debugging information to test my theory that addon/window is the cause of this intermittent failure

https://github.com/mozilla/addon-sdk/commit/1ab0df2193e9b5cfb9ef7b0699e586c4abee1bed
Merge pull request #1182 from erikvold/865391

Bug 865391 Adding debugging information to test my theory that addon/window is the cause of this intermittent failure r=@Mossop
(In reply to Erik Vold [:erikvold] [:ztatic] from comment #45)
> You both seem to be saying that before the addon/window module landed that
> every jetpack add-on had a hidden window.  As far as I know that is not the

I more like asking... I'm sure you know the SDK better than me. But my concern here is that now something can go wrong with the addon window and as a result no addon will work. With your fix this would turn into: something can go wrong and the majority of the addons don't work (that uses panel, worker, etc). I don't think that is a proper fix, although it would probably make this bug off our map, but I would prefer understanding what's going on here first, instead of blind fixing it and leaving behind some sneaky ghost in the shell type of bug. By the way, in general I always prefer to stay as much independent form the hidden window as possible.

> 
> Can you explain explicitly why I am wrong? in other words, for what purpose
> were add-ons previously using a hidden window?

I would really like to understand this change. What was before and after in terms of addon start-up. Previously if an addon relied on the hidden window (because it used panel or whatever), would that have run into this very same problem?
Flags: needinfo?(gkrizsanits)
(In reply to Erik Vold [:erikvold] [:ztatic] from comment #45)
> Gabor and Irakli.
> 
> You both seem to be saying that before the addon/window module landed that
> every jetpack add-on had a hidden window.  As far as I know that is not the
> case and the hidden window iframes were only used if the add-on used the
> panel module or the the page-worker module.
> 
> Can you explain explicitly why I am wrong? in other words, for what purpose
> were add-ons previously using a hidden window?

Those were the API using hidden iframes, but all the platforms other than OSX were
creating hidden iframe in the hidden window in order to load XUL document in order
to be able to later used by panel and page-worker to create an actual XUL iframes.
That host iframe also had different document depending on a platform causing some
platform specific akwardness. With addon/window we just switched to creating iframe
regardless of platform and started loading same document into it.

Either way reverting that change at this point is pretty much unrealistic, not to mention that we will just hide problem we're running into under the rug & possibly
make it platform specific.
Flags: needinfo?(rFobic)
Gabor I think Erik can explain better under "window failing to load". In fact I was just wondering if that even can happen and hoped you may have a better clue what that can be.
Flags: needinfo?(evold)
(In reply to Irakli Gozilalishvili [:irakli] [:gozala] [@gozala] from comment #49)
> (In reply to Erik Vold [:erikvold] [:ztatic] from comment #45)
> > Gabor and Irakli.
> > 
> > You both seem to be saying that before the addon/window module landed that
> > every jetpack add-on had a hidden window.  As far as I know that is not the
> > case and the hidden window iframes were only used if the add-on used the
> > panel module or the the page-worker module.
> > 
> > Can you explain explicitly why I am wrong? in other words, for what purpose
> > were add-ons previously using a hidden window?
> 
> Those were the API using hidden iframes, but all the platforms other than
> OSX were
> creating hidden iframe in the hidden window in order to load XUL document in
> order
> to be able to later used by panel and page-worker to create an actual XUL
> iframes.
> That host iframe also had different document depending on a platform causing
> some
> platform specific akwardness. With addon/window we just switched to creating
> iframe
> regardless of platform and started loading same document into it.
> 
> Either way reverting that change at this point is pretty much unrealistic,
> not to mention that we will just hide problem we're running into under the
> rug & possibly
> make it platform specific.

You seemed to have missed my point Irakli.

I was not talking about removing the hidden window/frame and braking panel or page-workers..
Flags: needinfo?(evold)
Erik, I think I may have some clue about what may be going wrong here.
Gabor's comments made me realize that we assume that at `final-ui-startup` `appShellService.hiddenDOMWindow` is already loaded:
https://github.com/mozilla/addon-sdk/blob/master/lib/sdk/addon/runner.js#L81-L100

And that is when we create iframe with addon/window in it:
https://github.com/mozilla/addon-sdk/blob/master/lib/sdk/addon/window.js#L19-L29

Maybe on this terribly slow machines `final-ui-startup` happens while `appShellService.hiddenDOMWindow` is still "about:blank" or not interactive yet
causing issues we've observing.


I guess we can validate this suspicion by logging URI and readyState of the
`appShellService.hiddenDOMWindow.document` before creating an iframe.

Alternatively we could just wait for `DOMContentLoaded` along with `final-ui-startup` before proceeding further.

P.S.: Once Bug 565388 is finally fixed and we switch to it we won't have to wait for `appShellService.hiddenDOMWindow` or `final-ui-startup` to move along.
(In reply to Irakli Gozilalishvili [:irakli] [:gozala] [@gozala] from comment #52)
> Erik, I think I may have some clue about what may be going wrong here.
> Gabor's comments made me realize that we assume that at `final-ui-startup`
> `appShellService.hiddenDOMWindow` is already loaded:
> https://github.com/mozilla/addon-sdk/blob/master/lib/sdk/addon/runner.js#L81-
> L100
> 
> And that is when we create iframe with addon/window in it:
> https://github.com/mozilla/addon-sdk/blob/master/lib/sdk/addon/window.js#L19-
> L29
> 
> Maybe on this terribly slow machines `final-ui-startup` happens while
> `appShellService.hiddenDOMWindow` is still "about:blank" or not interactive
> yet
> causing issues we've observing.
> 
> 
> I guess we can validate this suspicion by logging URI and readyState of the
> `appShellService.hiddenDOMWindow.document` before creating an iframe.
> 
> Alternatively we could just wait for `DOMContentLoaded` along with
> `final-ui-startup` before proceeding further.
> 
> P.S.: Once Bug 565388 is finally fixed and we switch to it we won't have to
> wait for `appShellService.hiddenDOMWindow` or `final-ui-startup` to move
> along.

Sounds possible.
Comment on attachment 794042 [details]
Pointer to Github pull request: https://github.com/mozilla/addon-sdk/pull/1217

We already had a similar issue but related to different causes, so I think it would be good take in account such scenario.

Irakli, do you also know if bug 565388 will be fixed anytime soon?
Attachment #794042 - Flags: review?(rFobic)
All yours Matteo, Dave said you had something here.
Assignee: evold → nobody
Flags: needinfo?(zer0)
I didn't have more than the PR I already attached, following your insight and Irakli ones; that I think makes a lot of sense because we had already a similar issues incidentally when we handled the Iceweasel thing. It could be worthy to land, if the patch makes sense.
Flags: needinfo?(zer0)
(In reply to Irakli Gozilalishvili [:irakli] [:gozala] [@gozala] from comment #38)
> (In reply to Erik Vold [:erikvold] [:ztatic] from comment #28)
> > So I think the problem is
> > https://github.com/mozilla/addon-sdk/blob/master/lib/sdk/addon/runner.
> > js#L114-L116
> > 
> > We wait for the add/window to load, but if it does not load, then we get no
> > error and the add-on does not run.  
> 
> Can you elaborate more on case where window won't load ? I would expect this
> to be deterministic specially since it's always a same document, if add-on
> window is not loaded add-on failed so we can not really do much.
> 
> > I made bug 906369 about logging an error
> > if the addon/window does not load, but it doesn't seem possible to detect if
> > the the window hasn't loaded, without using a setTimeout.
> >
> 
> If window can fail to load than we should make sure platform has an API to
> report that, which we'll be able to hook into. In the meantime using a
> workaround with a timer or whatever else sounds reasonable way to me.
> However I don't see how timer here would be any better than timer we have
> for tests in general, if window failed to load add-on is not going to work.  
> 
> > 
> > I was against addon/window from the beginning and I'm fairly certain I
> > argued against requiring a background window for every add-on.. I'd love to
> > rip this out.
> >
> 
> This was added for a good reasons and I'm happy to iterate over them again,
> I think idea of ripping it out is very harsh, not to mention that every
> add-on did had hidden frame and windows anyway they just were not put under
> same addon/window iframe.
> 
> 
> Including gabor and eddy to this thread as they may have better insight on
> weather iframe could fail to load a document and if there is a way to detect
> that was a case.

I really don't know all that much about how iframe loading could fail. That's the kind of question I would ask bz. Might as well ask him directly.
Assignee: nobody → bzbarsky
Flags: needinfo?(ejpbruel) → needinfo?(bzbarsky)
Assignee: bzbarsky → nobody
(In reply to Irakli Gozilalishvili [:irakli] [:gozala] [@gozala] from comment #52)
> Erik, I think I may have some clue about what may be going wrong here.
> Gabor's comments made me realize that we assume that at `final-ui-startup`
> `appShellService.hiddenDOMWindow` is already loaded:
> https://github.com/mozilla/addon-sdk/blob/master/lib/sdk/addon/runner.js#L81-
> L100

WARNING: NS_ENSURE_TRUE(mHiddenWindow) failed: file ../../../../xpfe/appshell/src/nsAppShellService.cpp, line 643

I wonder what happens if the hidden window fails to load... I vaguely remember that it tries to load early and if that fails tries it again later?

(In reply to Eddy Bruel [:ejpbruel] from comment #58)
> I really don't know all that much about how iframe loading could fail.
> That's the kind of question I would ask bz. Might as well ask him directly.

I guess it can fail many ways... the real question is what can/should we do about it. But I'm still not sure what is happening here tbh.

(In reply to Irakli Gozilalishvili [:irakli] [:gozala] [@gozala] from comment #52)
> I guess we can validate this suspicion by logging URI and readyState of the
> `appShellService.hiddenDOMWindow.document` before creating an iframe.

Yes. That's the info we need as I suggested earlier.
I revert my changes in the PR, adds only log information as Irakli suggested there.

As I mentioned before, we already faced a similar issue before, for different reasons, and there was some alternative solution before we found the right culprit. It's the Bug 851426. Notice however, that in this specific case, the code was running and we got a runtime error. It could be because we used a different topic.

Notice that we can also waiting for the `hiddenWindow` to be ready, if this is can fix the issue. But then, `exports.window` becomes async (see Bug 851426).
It depends on how you define "load".

For example, what happens if it's an HTTP load that comes back with status 204?  What about coming back with a content-type we don't handle?  Or a redirect that's blocked for security reasons?  There are probably other interesting situations, even assuming all of our code is working correctly, we're not out of memory, etc.
Flags: needinfo?(bzbarsky)
(In reply to Boris Zbarsky [:bz] from comment #61)
> It depends on how you define "load".

If I get it correctly, we create a new iframe in the hidden window then load a data url in it like this:

docShell.createAboutBlankContentViewer(addonPrincipal);
let window = docShell.contentViewer.DOMDocument.defaultView;
window.location = "data:application/vnd.mozilla.xul+xml;charset=utf-8,<window/>";

and then we're waiting for DOMContentLoaded event which we don't always get.

source:
https://github.com/mozilla/addon-sdk/blob/master/lib/sdk/addon/window.js#L39
That's... really odd.  That should deterministically either always fire the event or always not fire it, I would think.
(In reply to [github robot] from comment #46)
> Commits pushed to master at https://github.com/mozilla/addon-sdk
> 
> https://github.com/mozilla/addon-sdk/commit/
> 6da2ee4d28548c34ffd44106d5e6e3c7962a9634
> Bug 865391 Adding debugging information to test my theory that addon/window
> is the cause of this intermittent failure
> 
> https://github.com/mozilla/addon-sdk/commit/
> 1ab0df2193e9b5cfb9ef7b0699e586c4abee1bed
> Merge pull request #1182 from erikvold/865391
> 
> Bug 865391 Adding debugging information to test my theory that addon/window
> is the cause of this intermittent failure r=@Mossop

I reverted this because it apparently caused bug 909330 in https://github.com/mozilla/addon-sdk/commit/f29d2bb38c91d04e7733088308823c34fc390b6d
This last two have a js call stack. 

exception is from here:
http://mxr.mozilla.org/mozilla-central/source/addon-sdk/source/lib/sdk/windows/firefox.js#160

which can only happen if this guy here is undefined:
http://mxr.mozilla.org/mozilla-central/source/addon-sdk/source/lib/sdk/windows/firefox.js#172

That can happen two ways, one is that we are too early and it has not been set, but that's very unlikely given the fact that we register a listener for it only a one line before it... Is it possible that Trait.resolve(...).compose(...) fails? Like an exception is thrown, or returns undefined for some other reason?

On IRC Irakli mentioned that he is going to land an addon window refactoring patch that might or might not makes this problem go away.

Since this signature can be a signature for many different and unrelated issues, can we somehow change our timeout mechanism in a way that it also appends the last exception to this: "Test output exceeded timeout (300s)." ?
Comment on attachment 794042 [details]
Pointer to Github pull request: https://github.com/mozilla/addon-sdk/pull/1217

You can use a=@gozala on patches that just add more logging to investigate into issue. But please remove lines you've duplicated by accident
Attachment #794042 - Flags: review?(rFobic)
Commits pushed to master at https://github.com/mozilla/addon-sdk

https://github.com/mozilla/addon-sdk/commit/b6050126e3d6af9cc379923f203ee10656bf2cd7
Bug 865391 - Intermittent Jetpack TEST-UNEXPECTED-FAIL | unknown | Test output exceeded timeout (300s).

- Log infos

https://github.com/mozilla/addon-sdk/commit/f60e6df873c475e44a679853283d71a258866e1e
Merge pull request #1217 from ZER0/unknown-test/865391

Bug 865391 - Intermittent Jetpack TEST-UNEXPECTED-FAIL | unknown | Test output exceeded timeout (300s). a=@gozala
We didn't have this bug for a while, I will close it and reopen it in case it will happens again regularly.
Status: REOPENED → RESOLVED
Closed: 11 years ago10 years ago
Resolution: --- → FIXED
Resolution: FIXED → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: