Closed Bug 1058158 Opened 10 years ago Closed 10 years ago

Marionette WebAPI tests should run in either a test container or a clean environment

Tracking

(blocking-b2g:2.1+, firefox33 wontfix, firefox34 fixed, firefox35 fixed, b2g-v2.1 fixed, b2g-v2.2 fixed)

Status:

RESOLVED FIXED

Milestone:

mozilla35

Project Flags:

blocking-b2g

2.1+

Tracking Flags:

Tracking

Status

firefox33

---

wontfix

firefox34

---

fixed

firefox35

---

fixed

b2g-v2.1

---

fixed

b2g-v2.2

---

fixed

People

(Reporter: qdot, Assigned: mdas)

References

Details

Attachments

(3 files, 9 obsolete files)

path to use test_container app when testing in b2g 10 years ago Malini Das [:mdas] - Away, not checking bugmail 9.69 KB, patch		Details \| Diff \| Splinter Review
use test_container app when testing in b2g 10 years ago Malini Das [:mdas] - Away, not checking bugmail 10.04 KB, patch		Details \| Diff \| Splinter Review
run in test-container 10 years ago Malini Das [:mdas] - Away, not checking bugmail 15.83 KB, patch		Details \| Diff \| Splinter Review
run in test-container 10 years ago Malini Das [:mdas] - Away, not checking bugmail 15.83 KB, patch		Details \| Diff \| Splinter Review
testcon 10 years ago Malini Das [:mdas] - Away, not checking bugmail 15.80 KB, patch		Details \| Diff \| Splinter Review
run in test-container 10 years ago Malini Das [:mdas] - Away, not checking bugmail 15.93 KB, patch		Details \| Diff \| Splinter Review
run in test-container 10 years ago Malini Das [:mdas] - Away, not checking bugmail 16.32 KB, patch	jgriffin : review-	Details \| Diff \| Splinter Review
run in test-container, remove oop option 10 years ago Malini Das [:mdas] - Away, not checking bugmail 22.17 KB, patch	jgriffin : review+	Details \| Diff \| Splinter Review
run in test-container, remove oop option 10 years ago Malini Das [:mdas] - Away, not checking bugmail 22.22 KB, patch	jgriffin : review+	Details \| Diff \| Splinter Review
test container patch for aurora 10 years ago Malini Das [:mdas] - Away, not checking bugmail 22.20 KB, patch		Details \| Diff \| Splinter Review
test container patch for aurora (v2, with navigation test fix) 10 years ago Kyle Machulis [:qdot] [:kmachulis] (INACTIVE) 22.53 KB, patch		Details \| Diff \| Splinter Review
Emulator log showing test_emulator_crash failure 10 years ago Kyle Machulis [:qdot] [:kmachulis] (INACTIVE) 306.43 KB, text/x-log		Details

Kyle Machulis [:qdot] [:kmachulis] (INACTIVE)

Reporter

Description

•

10 years ago

[Blocking Requested - why for this release]: Blocks settings API (bug 900551)

In bug 968709, a patch was landed to reset the gecko environment to the system app so we can make sure we're running in something similar to B2G, since prior unit tests can put the system into odd states due to navigating away from the system app. The problem is that, while this makes the tests work, this leaves the process in an undefined state.

To fix this, we need to either run webapi tests in a test container, or restart the b2g process before running webapi tests.

Jonathan Griffin (:jgriffin)

Comment 1

•

10 years ago

WebAPI tests get run in two ways:  with and without OOP.  With OOP, they're run in a gaia test-container app, but without OOP, they're run in place of the system app.

Another way to fix this, then, would be to run the non-OOP tests in a different test-container that doesn't get run OOP.

Malini Das [:mdas] - Away, not checking bugmail

Assignee

Comment 2

•

10 years ago

Attached patch path to use test_container app when testing in b2g (obsolete) — Details — Splinter Review

Try is closed, and I have to head out for the day. I've only tested this locally against my phone and browser, but no chance to test on emulator. qDot, if you have time, can you test this? It's a patch on top of m-c (should work on m-i too).

If you're running the test by calling runtests.py directly, please make sure that 'b2g' is somewhere in your --type list.

Flags: needinfo?(kyle)

Kyle Machulis [:qdot] [:kmachulis] (INACTIVE)

Reporter

Comment 3

•

10 years ago

Unfortunately, on the emulator this makes test_import_script crash the test container app, at least locally. I'm patching and pushing to try since it's open again, will post results URL here.

Flags: needinfo?(kyle)

Kyle Machulis [:qdot] [:kmachulis] (INACTIVE)

Reporter

Comment 4

•

10 years ago

https://tbpl.mozilla.org/?tree=Try&rev=23fefaef8313

Gregor Wagner [:gwagner]

Updated

•

10 years ago

blocking-b2g: 2.1? → 2.1+

Malini Das [:mdas] - Away, not checking bugmail

Assignee

Comment 5

•

10 years ago

Attached patch use test_container app when testing in b2g (obsolete) — Details — Splinter Review

updated patch.

Attachment #8478584 - Attachment is obsolete: true

Malini Das [:mdas] - Away, not checking bugmail

Assignee

Updated

•

10 years ago

Blocks: 1059956

Malini Das [:mdas] - Away, not checking bugmail

Assignee

Comment 6

•

10 years ago

Attached patch run in test-container (obsolete) — Details — Splinter Review

testing out the latest patch on try https://tbpl.mozilla.org/?tree=Try&rev=6ca42ba28455, will r? if it passes

Attachment #8479304 - Attachment is obsolete: true

Malini Das [:mdas] - Away, not checking bugmail

Assignee

Comment 7

•

10 years ago

Attached patch run in test-container (obsolete) — Details — Splinter Review

This is more up to date (https://tbpl.mozilla.org/?tree=Try&rev=abc3a219fe29)

Attachment #8480860 - Attachment is obsolete: true

Malini Das [:mdas] - Away, not checking bugmail

Assignee

Comment 8

•

10 years ago

So these try runs are passing up until they all hit this timeout at some point:

command timed out: 7200 seconds elapsed running ['/tools/buildbot/bin/python', 'scripts/scripts/marionette.py', '--cfg', 'marionette/automation_emulator_config.py', '--blob-upload-branch', 'try', '--download-symbols', 'ondemand'], attempting to kill

It's unclear to me why this is happening, and I'm heading out for a week's PTO today. Has anyone seen this before?

Jonathan Griffin (:jgriffin)

Comment 9

•

10 years ago

Basically, the tests have become much slower...and they're hitting the buildbot 2-hr timeout.  Without this change, the tests take ~90 min, for some reason this change makes them take > 2 hr.  :(

Jonathan Griffin (:jgriffin)

Comment 10

•

10 years ago

A couple of potential solutions:  we could try running them on faster VM's (but that typically boosts speed by only about 10%), or we could chunk the tests.

Malini Das [:mdas] - Away, not checking bugmail

Assignee

Comment 11

•

10 years ago

Attached patch testcon (obsolete) — Details — Splinter Review

Thanks for the response. The original patch would start and kill a test container per test case, this will instead preserve it until it sees a case that doesn't require the test container. It will then look for the test container app and kill it if it exists.

Try:  https://tbpl.mozilla.org/?tree=Try&rev=5dd399f37ce0

If this approach is still too slow, how about we separate the Marionette unit tests from the webapi tests, and enable the test container app using --test-container or using a separate manifest with test_container = true? That way, we keep the system app untouched when running the Mn tests, so we keep the test environment as pristine as possible.

https://bugzilla.mozilla.org/show_bug.cgi?id=1022862 is the separation effort but not much as been done since June.

Malini Das [:mdas] - Away, not checking bugmail

Assignee

Updated

•

10 years ago

Attachment #8480867 - Attachment is obsolete: true

Malini Das [:mdas] - Away, not checking bugmail

Assignee

Comment 12

•

10 years ago

Attached patch run in test-container (obsolete) — Details — Splinter Review

"manager" wasn't defined in gaia (let manager = window.wrappedJSObject.AppWindowManager || window.wrappedJSObject.WindowManager;). Maybe from dom bluetooth's test_navigate_to_default_url.py (https://mxr.mozilla.org/mozilla-central/source/dom/bluetooth/tests/marionette/test_navigate_to_default_url.py#16) doing weird stuff to gaia state? In any case, here's a retry that worked well locally:

https://tbpl.mozilla.org/?tree=Try&rev=4947fdcd616f

Attachment #8481613 - Attachment is obsolete: true

Malini Das [:mdas] - Away, not checking bugmail

Assignee

Comment 13

•

10 years ago

Attached patch run in test-container (obsolete) — Details — Splinter Review

https://tbpl.mozilla.org/?tree=Try&rev=a49620d09800

Finally green with this patch! Now I can stop thinking about this and go enjoy my vacation :P

Attachment #8481798 - Attachment is obsolete: true

Attachment #8482260 - Flags: review?(jgriffin)

Jonathan Griffin (:jgriffin)

Comment 14

•

10 years ago

Comment on attachment 8482260 [details] [diff] [review]
run in test-container

Review of attachment 8482260 [details] [diff] [review]:
-----------------------------------------------------------------

This patch basically removes the 'oop' feature of the runner, but leaves a bunch of related code in place.  It causes all of our unit tests to be run OOP, regardless of what you specify in the manifest, but allows webapi tests to run in whatever state the phone happens to be in.

I think removing this feature is ok, as I looked and nothing outside of our unit tests currently uses it, but I think we should be more explicit about tearing it out.

We could land this if this patch is urgent and clean it up after mdas returns, or we could just wait until then.

Attachment #8482260 - Flags: review?(jgriffin) → review-

Malini Das [:mdas] - Away, not checking bugmail

Assignee

Comment 15

•

10 years ago

Attached patch run in test-container, remove oop option (obsolete) — Details — Splinter Review

Removed the oop code from the runner.

Ran in try: https://tbpl.mozilla.org/?tree=Try&rev=db38e9ce01c5

The red, failed build is because I aborted it. The orange is an intermittent.

Attachment #8482260 - Attachment is obsolete: true

Attachment #8486493 - Flags: review?(jgriffin)

Jonathan Griffin (:jgriffin)

Comment 16

•

10 years ago

Comment on attachment 8486493 [details] [diff] [review]
run in test-container, remove oop option

Review of attachment 8486493 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks, looks good.

Attachment #8486493 - Flags: review?(jgriffin) → review+

Malini Das [:mdas] - Away, not checking bugmail

Assignee

Comment 17

•

10 years ago

pushed to m-i:
https://hg.mozilla.org/integration/mozilla-inbound/rev/c72a49a66333

Phil Ringnalda (:philor)

Comment 18

•

10 years ago

Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/bebc8e9a6da6 - b2g's Mnw hasn't finished yet to know how it'll fare, but Mn on all non-b2g says "| KeyError: 'b2g'" for every test.

Malini Das [:mdas] - Away, not checking bugmail

Assignee

Comment 19

•

10 years ago

Thanks, I forgot to test that patch locally on browser! Retrying with a new patch:

Mn:
 https://tbpl.mozilla.org/?tree=Try&rev=84a07e0dd1f8
Mnw:
https://tbpl.mozilla.org/?tree=Try&rev=1e9544ee90f2

Malini Das [:mdas] - Away, not checking bugmail

Assignee

Comment 20

•

10 years ago

Attached patch run in test-container, remove oop option — Details — Splinter Review

the only change from the previous patch is line 388 of marionette_test.py. This patch ran green on Mn and Mnw

Attachment #8486493 - Attachment is obsolete: true

Attachment #8487489 - Flags: review?(jgriffin)

Jonathan Griffin (:jgriffin)

Comment 21

•

10 years ago

Comment on attachment 8487489 [details] [diff] [review]
run in test-container, remove oop option

Review of attachment 8487489 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for pointing out the difference with the previous patch, makes it easy to review!

Attachment #8487489 - Flags: review?(jgriffin) → review+

Malini Das [:mdas] - Away, not checking bugmail

Assignee

Comment 22

•

10 years ago

Thanks, pushed to m-i
https://hg.mozilla.org/integration/mozilla-inbound/rev/d96632436134

Ryan VanderMeulen [:RyanVM]

Comment 23

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/d96632436134

Status: NEW → RESOLVED

Closed: 10 years ago

Flags: in-testsuite+

Resolution: --- → FIXED

Target Milestone: --- → mozilla35

Ryan VanderMeulen [:RyanVM]

Comment 24

•

10 years ago

https://hg.mozilla.org/releases/mozilla-aurora/rev/5a7b47cb1498

status-b2g-v2.1: --- → fixed

status-b2g-v2.2: --- → fixed

status-firefox33: --- → wontfix

status-firefox34: --- → fixed

status-firefox35: --- → fixed

Ryan VanderMeulen [:RyanVM]

Comment 25

•

10 years ago

Backed out from Aurora for Mnw permafail.
https://hg.mozilla.org/releases/mozilla-aurora/rev/549846c956b8

https://tbpl.mozilla.org/php/getParsedLog.php?id=48211120&tree=Mozilla-Aurora

status-b2g-v2.1: fixed → affected

status-firefox34: fixed → affected

Flags: needinfo?(mdas)

Keywords: branch-patch-needed

Kyle Machulis [:qdot] [:kmachulis] (INACTIVE)

Reporter

Updated

•

10 years ago

Blocks: 1061805

Kyle Machulis [:qdot] [:kmachulis] (INACTIVE)

Reporter

Comment 26

•

10 years ago

It's looking like this may break the test_switch_frame.py unit tests on Aurora. I'm not sure how the other two patches would affect that functionality, but I'm not ruling that out either. Will run another try with just the marionette patches.

https://tbpl.mozilla.org/?tree=Try&rev=7da5f9669516

Good news is, this doesn't seem to break the bluetooth tests anymore. So we've at least fixed one thing.

Kyle Machulis [:qdot] [:kmachulis] (INACTIVE)

Reporter

Comment 27

•

10 years ago

Here's with only the marionette patches and navigate removal.

https://tbpl.mozilla.org/?tree=Try&rev=0197d4f2975b

Kyle Machulis [:qdot] [:kmachulis] (INACTIVE)

Reporter

Comment 28

•

10 years ago

So everything is green on b2g-i/m-i/m-c, so there's gotta be something else we need to backport to aurora to fix this. Since the frame switch stuff is marionette, is there possibly some marionette patch we should backport? It doesn't look like there's many between the aurora branch and now.

Kyle Machulis [:qdot] [:kmachulis] (INACTIVE)

Reporter

Updated

•

10 years ago

Blocks: 1068077

Malini Das [:mdas] - Away, not checking bugmail

Assignee

Comment 29

•

10 years ago

Attached patch test container patch for aurora (obsolete) — Details — Splinter Review

So test_switch_frame.py isn't supposed to be run.  Bug 977893 is responsible for adding the 'b2g=false' line to the manifest, which shuts off this test. Adding this patch wholesale is a bit much for this. We can either uplift that patch to m-a, or this modified version of this patch that adds that line.

I'm running the modified version of this on aurora with your navigate patch applied: https://tbpl.mozilla.org/?tree=Try&rev=2b58b78c6fcd

Flags: needinfo?(mdas)

Kyle Machulis [:qdot] [:kmachulis] (INACTIVE)

Reporter

Comment 30

•

10 years ago

It looks like the patch for test_navigation.py is screwed up between the m-c and aurora patches? We're not doing a skip_if_b2g on test_navigate_frame, which makes the test app crash. That's what's causing the reds, and it's failing locally too.

Here's a new try, with the test skip moved.

https://tbpl.mozilla.org/?tree=Try&rev=609b9f1311fd

Kyle Machulis [:qdot] [:kmachulis] (INACTIVE)

Reporter

Comment 31

•

10 years ago

Attached patch test container patch for aurora (v2, with navigation test fix) — Details — Splinter Review

Attachment #8490809 - Attachment is obsolete: true

Kyle Machulis [:qdot] [:kmachulis] (INACTIVE)

Reporter

Updated

•

10 years ago

Depends on: 1069115

Kyle Machulis [:qdot] [:kmachulis] (INACTIVE)

Reporter

Comment 32

•

10 years ago

So along with the failures in navigation and frame, I was noticing landing this up'd our intermittents in bluetooth. I found that I had forgotten to change some settings functions in the webapi tests to use transaction events when landing bug 900551. There's a good chance that if we landed without these fixes, we'd just get backed right out again due to obviousness in orange factor. I got them fixed and have a new try running with those patches on top. I really, really hope this is it.

https://tbpl.mozilla.org/?tree=Try&rev=fe009c7736f2

Kyle Machulis [:qdot] [:kmachulis] (INACTIVE)

Reporter

Comment 33

•

10 years ago

Oook, well locally bug 1069115 is causing everything to fail, so maybe I stopped that prior build to soon. Here's a try with just the fixed navigation tests again. >.>

https://tbpl.mozilla.org/?tree=Try&rev=90437d10da50

Kyle Machulis [:qdot] [:kmachulis] (INACTIVE)

Reporter

Comment 34

•

10 years ago

Ok, back to running 1058158 and 1068077 locally against the emulator. I'm getting intermittent failures in telephony tests and wifi. This is the error that shows up for telephony:

09-18 23:35:52.042 E/GeckoConsole(   44): [JavaScript Error: "uncaught exception: RadioNotAvailable"]

For wifi, I get a lot of promise failures due to missing fields.

Kyle Machulis [:qdot] [:kmachulis] (INACTIVE)

Reporter

Comment 35

•

10 years ago

Attached file Emulator log showing test_emulator_crash failure — Details

ni?'ing a couple of people on the RIL team, got names from blaming telephony ril code.

To get the ni?'ers up to date: in this bug, we're trying to run some tests in a container app to make sure each webapi test run is done in a clean environment. While we've landed this patch to m-c, unfortunately this is causing intermittents on Aurora. What I'm wondering about is how we can get to a state where a Promise is getting back RadioNotAvailable error as shown in this attachment. This is running a reduced test set of a few system tests, then the bluetooth test, then the telephony tests. About 80% of the time I can get the crash_emulator and a lot of the outbound telephony tests to fail on this message. Inbound seems to work ok, oddly enough. Just need to figure out where to start tracking this.

Flags: needinfo?(szchen)

Flags: needinfo?(htsai)

Szu-Yu Chen [:aknow] (inactive)

Comment 36

•

10 years ago

(In reply to Kyle Machulis [:kmachulis] [:qdot] (USE NEEDINFO?) from comment #35)
> Created attachment 8491934 [details]
> Emulator log showing test_emulator_crash failure
> 
> ni?'ing a couple of people on the RIL team, got names from blaming telephony
> ril code.
> 
> To get the ni?'ers up to date: in this bug, we're trying to run some tests
> in a container app to make sure each webapi test run is done in a clean
> environment. While we've landed this patch to m-c, unfortunately this is
> causing intermittents on Aurora. What I'm wondering about is how we can get
> to a state where a Promise is getting back RadioNotAvailable error as shown
> in this attachment. This is running a reduced test set of a few system
> tests, then the bluetooth test, then the telephony tests. About 80% of the
> time I can get the crash_emulator and a lot of the outbound telephony tests
> to fail on this message. Inbound seems to work ok, oddly enough. Just need
> to figure out where to start tracking this.

Kyle,

How do you run the test? I tried to apply the patch locally on aurora and then execute:
  ./mach marionette-webapi <telephony-test-suit>
The result is good. How could I reproduce your result? Do I miss some options?

The error "RadioNotAvailable" means that the radio on emulator is not yet turned on.
Does system app runs on your test environment? Ideally, when the emulator power up, system app (gaia/apps/system/js/radio.js) will sent a request to gecko to turn on the radio. All the telephony tests assume that the radio is default on.

Flags: needinfo?(szchen)

Kyle Machulis [:qdot] [:kmachulis] (INACTIVE)

Reporter

Comment 37

•

10 years ago

I've been running the tests using runtests.py, but marionette-webapi does basically the same thing. If you run /just/ the telephony tests, it will usually pass. The problem comes when you try to run the tbpl bank of tests (in testing/marionette/client/marionette/tests/unit-test.ini). 

The way I've been running these is to run 1 or 2 tests out of testing/marionette/client/marionette/tests/unit/unit-tests.ini (just comment most of them out, leave a couple uncommented), just to make sure we run something in the test container. After this, bluetooth and telephony should run, and that's usually how telephony fails. That's the quickest way to repro this, as running all of the tests takes well over an hour.

Flags: needinfo?(szchen)

Kyle Machulis [:qdot] [:kmachulis] (INACTIVE)

Reporter

Comment 38

•

10 years ago

Ok. Well. Stacked ALL of the patches I need to land to aurora to get settings stablized EXCEPT for bug 1068077 (just because I forgot to move it to before the try in the mq) and ran a try:

https://tbpl.mozilla.org/?tree=Try&rev=e3c5b655b26a

Mnw, as well as everything else, passes.

Ignore the M-e10s, those don't run on m-a. So with this, it looks like we could land everything? I'm retriggering to find out.

The weird part is that this works while still leaving in the navigate test (which is what bug 1068077 removed), which is what this was supposed to remove. I guess we can leave that for later if this works though.

Kyle Machulis [:qdot] [:kmachulis] (INACTIVE)

Reporter

Comment 39

•

10 years ago

https://hg.mozilla.org/releases/mozilla-aurora/rev/2c38a719e7a7

status-b2g-v2.1: affected → fixed

Ryan VanderMeulen [:RyanVM]

Updated

•

10 years ago

status-firefox34: affected → fixed

Keywords: branch-patch-needed

Kyle Machulis [:qdot] [:kmachulis] (INACTIVE)

Reporter

Comment 40

•

10 years ago

Ok, this landed to aurora with no problems, so I'm betting the issues I was seeing with telephony and other tests were due to bug 1068077, since we didn't re-establish the system app before starting the webapi tests. I guess I'll leave bug 1068077 open for right now, but I'll probably close it as a WONTFIX once we divide the webapi tests out from the unit test manifest in bug 1022862.

:aknow, thanks for the help thus far. :)

Flags: needinfo?(szchen)

Flags: needinfo?(htsai)

Kyle Machulis [:qdot] [:kmachulis] (INACTIVE)

Reporter

Updated

•

10 years ago

Blocks: 1107982

BMO Automation

Updated

•

1 year ago

Product: Testing → Remote Protocol

You need to log in before you can comment on or make changes to this bug.