Closed Bug 1046878 Opened 10 years ago Closed 10 years ago

JavaScript Error: "tabParent is null" {file: "chrome://marionette/content/marionette-listener.js" line: 121}

Categories

(Remote Protocol :: Marionette, defect, P1)

Other
Gonk (Firefox OS)
defect

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: zcampbell, Unassigned)

References

Details

(Whiteboard: [affects=b2g])

Attachments

(2 files)

I/Gecko   ( 5803): 1406826256923 Marionette: emitting Touch event of type touchend to element with id:  and tag name: IMG at coordinates (270, 224) relative to the viewport
E/GeckoConsole( 5461): [JavaScript Error: "tabParent is null" {file: "chrome://marionette/content/marionette-listener.js" line: 121}]
This bug only occurs when using the Flame in 319mb mode.

The test case is test_gallery_flick and involves pushing 4 images and swiping between them. 

The swipe fails but the device is still active.

This is causing QA smoketests to fail.



Device    Flame
Memory    319mb
Gaia      25e998814ba89f30fe44cd2fdfbb44d160a04641
Gecko     https://hg.mozilla.org/mozilla-central/rev/08c23f12a43e
BuildID   20140730045709
Version   34.0a1
ro.build.version.incremental=110
ro.build.date=Fri Jun 27 15:57:58 CST 2014
B1TC00011230
OS: Linux → Gonk (Firefox OS)
Hardware: x86_64 → Other
Summary: tabParent is null → JavaScript Error: "tabParent is null" {file: "chrome://marionette/content/marionette-listener.js" line: 121}
I can't replicate this exactly manually, but doing this action manually does cause some memory pressure events to occur (noted in logcat).
Attached file tabparent.log
Here is an unmodified logcat. Unfortunately I can't see anything suspicious.
Hi Olli, khuey and I have been talking about this bug. He observed it and agreed that it seemed to be memory related.

Have you any idea what might be causing the tabParent to be null?
Flags: needinfo?(bugs)
If we kill the child process/tab, tabParent will point to null
http://mxr.mozilla.org/mozilla-central/source/dom/ipc/TabParent.cpp?rev=6ca74bed32d8&mark=344-344#328
calls
http://mxr.mozilla.org/mozilla-central/source/content/base/src/nsFrameLoader.cpp?rev=5132e8ddbbda&mark=1366-1366#1360
and after that tabParent will return null
http://mxr.mozilla.org/mozilla-central/source/content/base/src/nsFrameLoader.cpp?rev=5132e8ddbbda&mark=2693-2694#2691

In that log I can see
I/Gecko   (14695): ###!!! [Parent][MessageChannel] Error: Channel error: cannot send/recv
I/Gecko   (14695): 
E/OomLogger(14695): [Kill]: send sigkill to 14818 (OperatorVariant), adj 734, size 2929
Flags: needinfo?(bugs)
The relevant process is still running though.  Maybe marionette is poking at the wrong process?
jgriffin and I discussed the possibility that the index here has gone stale, he said he'd look into it.

http://dxr.mozilla.org/mozilla-central/source/testing/marionette/marionette-listener.js#118
Flags: needinfo?(jgriffin)
We track frames by index in this code, so if some other frame disappears (i.e., crashes) after we've switched into a different frame, it could have this effect.

The question is, should we be bullet-proof against other frames crashing?  That does indicate a real problem, even though theoretically we might make ourselves immune to this case.
Flags: needinfo?(jgriffin)
If a frame crashes for some other reason then the test case will pick that up. I don't think that's the job of Marionette.
OOMing/killing some background app is perfectly valid thing, so testing frameworks should be able
to deal with it.
You could try logging frameLoader.ownerElement.src to see if you're still pointing at a frame with the correct URL.
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #11)
> You could try logging frameLoader.ownerElement.src to see if you're still
> pointing at a frame with the correct URL.

Thanks, Kyle - is that suggestion for jgriffin?

All - what's next, here?  For context, we're being asked to have our primary Gaia UI Tests (Python) run in the 319MB memory mode (rather than the default 1GB), and as one of the issues precluding us from doing so, I'd like to strike it off the list :-)
Mdas is going to look at this this week.
Flags: needinfo?(mdas)
P1 for blocking testing
Flags: needinfo?(mdas)
Priority: -- → P1
(In reply to Stephen Donner [:stephend] from comment #12)
> (In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #11)
> > You could try logging frameLoader.ownerElement.src to see if you're still
> > pointing at a frame with the correct URL.
> 
> Thanks, Kyle - is that suggestion for jgriffin?

Yes.  Or whoever is investigating this.

> All - what's next, here?  For context, we're being asked to have our primary
> Gaia UI Tests (Python) run in the 319MB memory mode (rather than the default
> 1GB), and as one of the issues precluding us from doing so, I'd like to
> strike it off the list :-)

If you're seeing this more frequently on low-memory devices then it's more likely that the problem is the index since apps are more likely to get in low-memory.
Zac, I'm trying to reproduce this using a 319mb ram flame with both today's master build and a build from the 31st, but I'm not successful. I'm running via:

gaiatest --address=localhost:2828 --testvars=/Users/mdas/testvars.json  --restart ./gaiatest/tests/functional/gallery/test_gallery_flick.py

But it's successful each time. How have you been running it?

Also, if I remove the --repeat, I sometimes get a stale element reference problem on the second shot, which seems like a separate problem.
^ forgot to flag you
Flags: needinfo?(zcampbell)
I got it to fail on the first attempt and it is also failing on our CI but last week when I filed this I'd say it's about 1 in 4 or 5.
Are you positive you're on 319mb of ram?




Gaia      d85bbae28dd9ab9679b42d8d37c84810059e097c
Gecko     https://hg.mozilla.org/mozilla-central/rev/191e834ff32b
BuildID   20140805160201
Version   34.0a1
ro.build.version.incremental=110
ro.build.date=Fri Jun 27 15:57:58 CST 2014
B1TC00011230
Flags: needinfo?(zcampbell)
(In reply to Zac C (:zac) from comment #18)
> I got it to fail on the first attempt and it is also failing on our CI but
> last week when I filed this I'd say it's about 1 in 4 or 5.
> Are you positive you're on 319mb of ram?

Yup, fastboo getvar ram shows me: mem: 319m

I'll try again with today's m-c
Viorela, can you + the team try this again, and if you can reproduce, need-info :mdas with the details and attach a logcat, etc.?  Thanks!
Flags: needinfo?(viorela.ioia)
I've been able to reproduce this failure locally in the latest master build, using a 319mb ram flame.
The test is failing frequently - 5 out of 10 times. I'll attach a logcat of the issue.

Gaia      8f955d80d175e52283275d3197e4eae2574b389f
Gecko     https://hg.mozilla.org/mozilla-central/rev/391f42c733fc
BuildID   20140811160202
Version   34.0a1
ro.build.version.incremental=109
ro.build.date=Mon Jun 16 16:51:29 CST 2014
B1TC00011220

Malini, can you please take another look over this? Thanks!
Flags: needinfo?(viorela.ioia) → needinfo?(mdas)
Attached file logcat.txt
(In reply to Zac C (:zac) from comment #23)
> It's still a pretty consistent failure on Jenkins:
> 
> http://jenkins1.qa.scl3.mozilla.com/job/flame-319.mozilla-central.ui.
> functional.smoke/lastCompletedBuild/testReport/test_gallery_flick/
> TestGallery/test_gallery_full_screen_image_flicks/history/

Has anyone tried comment 11?
We need mdas to try comment #11 :)
Yes, I'm focusing on another bug at the moment, I'll get on this tomorrow.
Flags: needinfo?(mdas)
Flags: needinfo?(mdas)
Whiteboard: [affects=b2g]
I see this failing on a normal build too:

Gaia      3584b2723412ed3299c6761f465885d80651c87e
Gecko     https://hg.mozilla.org/mozilla-central/rev/dac8b4a0bd7c
BuildID   20140821040201
Version   34.0a1
ro.build.version.incremental=110
ro.build.date=Fri Jun 27 15:57:58 CST 2014
B1TC00011230

http://jenkins1.qa.scl3.mozilla.com/job/flame.mozilla-central.ui.functional.smoke/83/HTML_Report/
I've tried with today's m-c build using 319mb and 512mb of ram, but I still can reproduce this. I've run it with and without --restart, like so "gaiatest --address=localhost:2828 --testvars=/Users/mdas/Code/testvarshome.json --repeat=10 --restart  ./gaiatest/tests/functional/gallery/test_gallery_flick.py" but the only errors I ever get in the logcat are "I/Gecko   ( 4041): 1408710887775	Marionette	INFO	sendToClient: {"from":"0","error":{"message":"Unable to locate element: #homescreen[loading-state=false]","status":7,"stacktrace":null}}, {12185f00-63d3-4444-8072-84fc37904fd9}, {12185f00-63d3-4444-8072-84fc37904fd9}"

I've run the test over 40 times without seeing this failure... is there *anything* else I need to run this? Is it possible to check out one of the CI phones to test on?
Flags: needinfo?(mdas)
A bit out of ideas myself then.. Maybe you could even push down to 256mb of RAM, you'd definitely see apps being killed then!
I'm also unable to replicate this on my Flame with a full flash of latest mozilla-central nightly and 319MB memory. I ran five repeats with zero failures. I'll try again with 256MB.
Also unable to replicate with 256MB memory.
Closing as WORKSFORME. Please reopen if we see this in Jenkins again or if we're able to reproduce it locally. I have not seen this in Jenkins since switching the jobs to use 319MB memory.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WORKSFORME
Product: Testing → Remote Protocol
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: