Closed Bug 816099 Opened 7 years ago Closed 7 years ago

window_manager.js exceptions

Categories

(Firefox OS Graveyard :: Gaia, defect, P1)

All
Gonk (Firefox OS)
defect

Tracking

(blocking-basecamp:+)

RESOLVED FIXED
blocking-basecamp +

People

(Reporter: rsomani, Assigned: ochameau)

References

Details

(Whiteboard: QARegressExclude)

Attachments

(4 files)

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:16.0) Gecko/20100101 Firefox/16.0
Build ID: 20121024073032

Steps to reproduce:

We have a test script that runs over Marionette. It launches few applications and send them click events. On recent b2g builds, it has started failing very early in its execution with Marionette throwing an exception:
I/Gecko   (  127): 1354088708776           Marionette       INFO    sendToClient: {"from":"conn0.marionette1","error":{"message":{},"status":17,"stacktrace":null}}, undefined, null



Actual results:

From Marionette source code, status 17 maps to JavaScript exception.
Following are the JavaScript errors seen in logcat with the above marionette error:
 
E/GeckoConsole( 127): [JavaScript Error: "TypeError: app is undefined" {file: "app://system.example.com/js/window_manager.js" line: 1070}]
E/GeckoConsole( 125): [JavaScript Error: "TypeError: openFrame is null" {file: "app://system.example.com/js/window_manager.js" line: 661}]
E/GeckoConsole( 125): [JavaScript Error: "TypeError: openFrame is null" {file: "app://system.example.com/js/window_manager.js" line: 326}]
E/GeckoConsole( 125): [JavaScript Error: "TypeError: closeFrame is null" {file: "app://system.example.com/js/window_manager.js" line: 700}]

These seem real issues with window_manager.  We tried adding null checks where it was running into these. While these exceptions went away, script still failed elsewhere.

Can someone check why window_manager may be running into these exceptions? And more importantly, why they are causing Marionette to exit by throwing an exception?



Expected results:

Marionette shouldn't get any exception.
Status: UNCONFIRMED → NEW
blocking-basecamp: --- → ?
Ever confirmed: true
OS: All → Gonk (Firefox OS)
Hi Alive,
Are you suggesting that bug 814583 is the root cause of this issue?
(In reply to Michael Vines [:m1] from comment #2)
> Hi Alive,
> Are you suggesting that bug 814583 is the root cause of this issue?

Hi, sorry I should speak something on comment 1.
And Yes, I spent some on finding out what happened to window manager last week.
basecamp- as we don't block on tests. Please renom with justification if you think that this should block. Note that no one on platform triage has access to bug 802677.
blocking-basecamp: ? → -
re-noming.   This bug is blocking our stability testing and needs to be addressed immediately.
blocking-basecamp: - → ?
(In reply to Lawrence Mandel [:lmandel] from comment #4)
> Note that no one on platform triage has access to bug 802677.

I just added you. Let me know who else needs access
rsomani, can you please bisect what changeset in gecko or gaia broke this? That would help tremendously in diagnosing this.
Flags: needinfo?(rsomani)
Malini, this is an urgent partner request. Can you take a look?
Assignee: nobody → mdas
Priority: -- → P1
blocking-basecamp: ? → +
rsomani, can you include a reproducible test case? 

It looks like the app returned from window_manager doesn't exist, so marionette can't do anything with it, but I'll need a test case so I can investigate.
https://bugzilla.mozilla.org/show_bug.cgi?id=814583#c0 I believe this is root cause,and,since it's already fixed, please check if the error still happens.
Whiteboard: [needs retest by CAF]
(In reply to Malini Das [:mdas] from comment #9)
> rsomani, can you include a reproducible test case? 
> 
> It looks like the app returned from window_manager doesn't exist, so
> marionette can't do anything with it, but I'll need a test case so I can
> investigate.

We prepared a minimal sample that reproduces what we see at https://www.codeaurora.org/patches/quic/b2g/SAMPLE_19408_Marionette_App_Launch_20121130.tar.gz

Hopefully this helps!
Flags: needinfo?(rsomani)
Whiteboard: [needs retest by CAF]
(In reply to Alive Kuo [:alive] from comment #10)
> https://bugzilla.mozilla.org/show_bug.cgi?id=814583#c0 I believe this is
> root cause,and,since it's already fixed, please check if the error still
> happens.

Unfortunately looks like this didn't help.
I just ran this on an unagi build from yesterday (Dec 2), with production gaia I built today (Dec 3), and I have no errors using the launchapp.py script. It happily runs init_launchApp and continues to launches apps forever (there's a while(1) in here, so it this a stress test of some sort?).

In any case, I can't reproduce it with the latest builds. Can you try to update your B2G and gaia and rerun the test?
Attached file b2g build manifest
FYI, I've attached the b2g manifest used to build the version I'm running.

On top of this, I pushed the production version of gaia, with commit hash: fa58b6a600838fe6e9c55c7929f29043e95a5fe6
need to verify and close this asap!  added keyword.
Keywords: qawanted
We'll try to reproduce the issue here again with Malini's input ASAP.
Still able to reproduce this issue on our end at the tip beta/nightly, we are continuing to look for differences between the two build environments that might explain why it can't be reproduced.
So I've been running this several times.  Often, but not always, it dies due to this error:

E/GeckoConsole(  438): [JavaScript Error: "Firefox can't establish a connection to the server at ws://test-agent.gaiamobile.org:8789/." {file: "app://test-agent.gaiamobile.org/common/vendor/test-agent/test-agent.js" line: 1214}]

The test-agent app doesn't behave nicely when launched without the test agent being active.

It would help if you could attach a logcat of a failing run, or event a partial one:

adb logcat | grep Gecko

This would help us determine which app is responsible for the error you're seeing.

Also, are you using an engineering or user build?  I tested with an engineering build made against mozilla-inbound and tip gaia.
Attached file Test Run 1
Hi Griffin,

We tried with userdebug builds and also production builds in otoro.
We consistently see this issue within 5 minutes.
Please find the atatched logs in multiple test runs.
Attached file Test Run 2
(In reply to Jonathan Griffin (:jgriffin) from comment #18)
> So I've been running this several times.  Often, but not always, it dies due
> to this error:
> 
> E/GeckoConsole(  438): [JavaScript Error: "Firefox can't establish a
> connection to the server at ws://test-agent.gaiamobile.org:8789/." {file:
> "app://test-agent.gaiamobile.org/common/vendor/test-agent/test-agent.js"
> line: 1214}]
> 
> The test-agent app doesn't behave nicely when launched without the test
> agent being active.
> 
> It would help if you could attach a logcat of a failing run, or event a
> partial one:
> 
> adb logcat | grep Gecko
> 
> This would help us determine which app is responsible for the error you're
> seeing.
> 
> Also, are you using an engineering or user build?  I tested with an
> engineering build made against mozilla-inbound and tip gaia.

Right, that's why I specified in Comment 14 to use the production gaia, so you won't get any issues. It doesn't seem like running our developer version is what they want.
(In reply to vasanth from comment #21)
> Created attachment 688723 [details]
> Test Run 2

Are you using the build versions I stated in Comment 14? I still can't reproduce this problem. I do get the same test-agent.js test error that jgriffin mentioned, but that's only if I use non-production gaia.

Looking at the log of Test Run 2, the error you're running into is:

E/GeckoConsole(  106): [JavaScript Error: "TypeError: openFrame is null" {file: "app://system.example.com/js/window_manager.js" line: 653}]

where, if you look at https://github.com/mozilla-b2g/gaia/blob/master/apps/system/js/window_manager.js#L653, it's because openFrame doesn't seem to get set.

Also, the line number doesn't match with line error you would get if you used the gaia version I specified in Comment 14(you'd get line 654 in this case). Please update your gaia and gecko and try to reproduce.

In any case, this is an error with window_manager.js, not marionette.
(In reply to Malini Das [:mdas] from comment #23)
> Also, the line number doesn't match with line error you would get if you
> used the gaia version I specified in Comment 14(you'd get line 654 in this
> case). Please update your gaia and gecko and try to reproduce.

We're using:
https://github.com/mozilla-b2g/gaia/commit/60f8ed91c1e02e8fd9f3197af99f3c7004afb524
https://github.com/mozilla/mozilla-central/commit/223842c5fb45121a228033e85f3e5ee9ee76594e

These SHA1s  are close enough to the the current tip of beta/nightly that updating *again* should be unnecessary unless you are aware of a fix that is landed since then to either project that could correct this problem.


> In any case, this is an error with window_manager.js, not marionette.

Who can work on this error?
I've cc'd Alexander and Etienne, two Gaia developers who have worked on the window manager.  Can either of you shed any light on the error that's being seen?
Wild guess: is it possible that we're launching app2 while we're still animating the opening of app1?

Looks like it may cause us to set openFrame to null.
(In reply to Michael Vines [:m1] from comment #11)
> We prepared a minimal sample that reproduces what we see at
> https://www.codeaurora.org/patches/quic/b2g/
> SAMPLE_19408_Marionette_App_Launch_20121130.tar.gz
> 
> Hopefully this helps!

Finally able to run this test and seeing exceptions you mentioned.
Looks like launching apps while there are still being loaded mess up with window manager code. I'll try to adress that.
I have a patch that seems to fix all window manager exception, but the launchapp.py program still says:
  Launch Apps script started..
  Got Some Exception exception :{}
Is it expected ? If not, can I get some more information ?
Here is a tentative fix. It allows to run for an infinite time (or very very long) the python test provided in comment .
We were having various exception being thrown when apps are launched too quickly so that animations collide. For exemple the openCallback function was called multiple times.
Assignee: mdas → poirot.alex
Attachment #689251 - Flags: review?(alive)
(In reply to Alexandre Poirot (:ochameau) from comment #29)
> I have a patch that seems to fix all window manager exception, but the
> launchapp.py program still says:
>   Launch Apps script started..
>   Got Some Exception exception :{}
> Is it expected ? If not, can I get some more information ?

This usually indicates that some JS error got caught in window.onerror; the way we're running code in a sandbox does not allow us to extract the stack trace.  Often, these errors will appear in logcat, however.
Comment on attachment 689251 [details]
Pull request 6860 - tentative fix

Seems like some error handling, I will give a r+.
but lets see if timdream have some feedback on this coz this is window manager we need to be careful.
Attachment #689251 - Flags: review?(alive)
Attachment #689251 - Flags: review+
Attachment #689251 - Flags: feedback?(timdream+bugs)
Comment on attachment 689251 [details]
Pull request 6860 - tentative fix

Thank you for the clean up!
Attachment #689251 - Flags: feedback?(timdream+bugs) → feedback+
Keywords: qawanted
Summary: window_manager.js exception → window_manager.js exceptions
https://github.com/mozilla-b2g/gaia/commit/33536a037db21d3cfe769877d8f3c385dd76367b
Merged, going to mark fixed. Reopen it if this doesn't resolve your trouble.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Component: General → Gaia
Whiteboard: QARegressExclude
You need to log in before you can comment on or make changes to this bug.