Closed Bug 1051986 Opened 11 years ago Closed 11 years ago

[mozdevice] Make B2GDeviceRunner's stop method kill the remote process instead of the initial adb process

Tracking

(Not tracked)

Status:

RESOLVED FIXED

Milestone:

mozilla35

People

(Reporter: davehunt, Assigned: davehunt)

References

Details

Attachments

(1 file, 7 obsolete files)

Kill the correct b2g process on device. v1.0 11 years ago Dave Hunt [:davehunt] [he/him] ⌚BST 1.63 KB, patch	ahal : review+	Details \| Diff \| Splinter Review
Kill the correct b2g process on device. v1.1 11 years ago Dave Hunt [:davehunt] [he/him] ⌚BST 2.55 KB, patch	davehunt : review+	Details \| Diff \| Splinter Review
Kill the correct b2g process on device. v1.2 11 years ago Dave Hunt [:davehunt] [he/him] ⌚BST 3.85 KB, patch	ahal : review+	Details \| Diff \| Splinter Review
Kill the correct b2g process on device. v1.3 11 years ago Dave Hunt [:davehunt] [he/him] ⌚BST 5.93 KB, patch		Details \| Diff \| Splinter Review
Kill the correct b2g process on device. v1.4 11 years ago Dave Hunt [:davehunt] [he/him] ⌚BST 6.72 KB, patch	ahal : review-	Details \| Diff \| Splinter Review
Kill the correct b2g process on device. v1.5 11 years ago Dave Hunt [:davehunt] [he/him] ⌚BST 6.00 KB, patch		Details \| Diff \| Splinter Review
Kill the correct b2g process on device. v1.6 11 years ago Dave Hunt [:davehunt] [he/him] ⌚BST 6.23 KB, patch	ahal : review+	Details \| Diff \| Splinter Review
Kill the correct b2g process on device. v1.7 11 years ago Dave Hunt [:davehunt] [he/him] ⌚BST 5.54 KB, patch	ahal : review+	Details \| Diff \| Splinter Review

Dave Hunt [:davehunt] [he/him] ⌚BST

Assignee

Description

•

11 years ago

Currently the stop method on a B2GDeviceRunner instance will kill the remote_binary process, which is '/system/bin/b2g.sh'. The remote process is actually '/system/b2g/b2g', and a kill on the remote_binary is not producing a crash report. We should create a new stop method that first kills the remote process, and then kills the remote_binary. I have tested this locally and am able to produce crash reports.

Dave Hunt [:davehunt] [he/him] ⌚BST

Assignee

Updated

•

11 years ago

OS: Mac OS X → Gonk (Firefox OS)

Hardware: x86 → ARM

Dave Hunt [:davehunt] [he/him] ⌚BST

Assignee

Comment 1

•

11 years ago

Attached patch Kill the correct b2g process on device. v1.0 (obsolete) — Details — Splinter Review

Assignee: nobody → dave.hunt

Status: NEW → ASSIGNED

Attachment #8470931 - Flags: review?(ahalberstadt)

Andrew Halberstadt [:ahal]

Comment 2

•

11 years ago

Comment on attachment 8470931 [details] [diff] [review] Kill the correct b2g process on device. v1.0 Review of attachment 8470931 [details] [diff] [review]: ----------------------------------------------------------------- Thanks! r+ with comments addressed. ::: testing/mozbase/mozrunner/mozrunner/base/device.py @@ +96,5 @@ > if not self.device.wait_for_net(): > raise Exception("Failed to get a network connection") > > + def stop(self, sig=None): > + sig = sig or signal.SIGABRT I feel like SIGKILL should be the default (which mozdevice should already do if you just pass in None). @@ +114,5 @@ > if not self.app_ctx.dm.processExist(self.app_ctx.remote_process): > break > time.sleep(1) > else: > print("timed out waiting for '%s' process to exit" % self.app_ctx.remote_process) I think this while loop should also get moved into stop(). Seems like there is less of a chance for race conditions that way.

Attachment #8470931 - Flags: review?(ahalberstadt) → review+

James Graham [:jgraham]

Comment 3

•

11 years ago

So it feels kind of like the model's broken here. mozrunner seems to assume that a Runner corresponds to a particular process. But for B2G that isn't really the case, or at least the process isn't local. So it violates the assumed invariant that, in the absence of any crashes, process_handler is a live process until .stop is called. It seems like the fix for this should really be either to remove that assumption or to write something like a proxy for the remote b2g process so it can be used like a local process.

James Graham [:jgraham]

Comment 4

•

11 years ago

When I say "the process isn't local" I mean that there exists a process on the device that could be used ("b2g") but the one that's actually used is the adb process that will start adb then exit.

Dave Hunt [:davehunt] [he/him] ⌚BST

Assignee

Comment 5

•

11 years ago

Attached patch Kill the correct b2g process on device. v1.1 (obsolete) — Details — Splinter Review

Carried r+ and pushed to try: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=5c1466009e35 I won't take any further action until James' comments are addressed, though I'm not sure I have much useful input to add myself.

Attachment #8470931 - Attachment is obsolete: true

Attachment #8470952 - Flags: review+

Dave Hunt [:davehunt] [he/him] ⌚BST

Assignee

Comment 6

•

11 years ago

I understand now that my original comment was wrong. As James points out, stop currently tries to kill the adb command that started the remote process. I suspect that fixing the model could be taken care of in a separate bug, and that we can temporarily workaround it to unblock bug 1039626. I have also found that the remote process is somewhat resilient to being killed. I have found that first running stop before killing the remote process makes it much more obedient. I'll have an updated patch for review after some more testing.

Summary: [mozdevice] Make B2GDeviceRunner's stop method kill the remote process instead of the startup script → [mozdevice] Make B2GDeviceRunner's stop method kill the remote process instead of the initial adb process

Dave Hunt [:davehunt] [he/him] ⌚BST

Assignee

Comment 7

•

11 years ago

Attached patch Kill the correct b2g process on device. v1.2 (obsolete) — Details — Splinter Review

Requesting review again due to the changes. Try run: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=b32634f64472

Attachment #8470952 - Attachment is obsolete: true

Attachment #8471571 - Flags: review?(ahalberstadt)

Jonathan Griffin (:jgriffin)

Comment 8

•

11 years ago

Yes, if you kill the process without running 'adb stop' on it first, the system will automatically try to restart it.

Andrew Halberstadt [:ahal]

Comment 9

•

11 years ago

Comment on attachment 8471571 [details] [diff] [review] Kill the correct b2g process on device. v1.2 Review of attachment 8471571 [details] [diff] [review]: ----------------------------------------------------------------- Looks good to me, r+ with one comment. ::: testing/mozbase/mozrunner/mozrunner/base/device.py @@ +105,5 @@ > + if not self.is_running() == remote_pid: > + break > + time.sleep(1) > + else: > + print("timed out waiting for '%s' process to exit" % self.app_ctx.remote_process) I think we should call BaseRunner.stop() at the end here so that process_handler gets set to None.

Attachment #8471571 - Flags: review?(ahalberstadt) → review+

Andrew Halberstadt [:ahal]

Comment 10

•

11 years ago

Also have you tested this in the case that we are replacing the existing profile? CC'ing mdas just in case this breaks stuff for her.

Dave Hunt [:davehunt] [he/him] ⌚BST

Assignee

Comment 11

•

11 years ago

(In reply to Andrew Halberstadt [:ahal] from comment #9) > I think we should call BaseRunner.stop() at the end here so that > process_handler gets set to None. The process has already stopped, so including the BaseRunner.stop() pollutes the console with a message indicating that the process has already been killed. Perhaps in the case where we use ADB to launch a remote process we should set the process_handler to None after a successful start? (In reply to Andrew Halberstadt [:ahal] from comment #10) > Also have you tested this in the case that we are replacing the existing > profile? CC'ing mdas just in case this breaks stuff for her. The profile is recreated each time we call runner.start(). Could you expand on your question, as I'm not sure I understand it yet. I also discovered from speaking to :jgraham and :dhylands that calling stop on the b2g process will kill it, so we're somewhat relying on a race condition here that the process is still available after calling stop so we can then kill it with a specified signal. As mentioned in comment 8 we are unable to simply kill the process as it will restart, which seems to be particularly disruptive during boot.

Andrew Halberstadt [:ahal]

Comment 12

•

11 years ago

(In reply to Dave Hunt (:davehunt) from comment #11) > The profile is recreated each time we call runner.start(). Could you expand > on your question, as I'm not sure I understand it yet. One of the motivating factors of implementing DeviceRunner the way it is was so that we could push a custom profile to the device and modify profiles.ini to point to it. My understanding was that you didn't need this feature for your work. But e.g the Gij tests that Malini is working on are heavily dependent on the ability to push customized profiles. I just wanted to make sure that we aren't breaking her work.

James Graham [:jgraham]

Comment 13

•

11 years ago

(I also depend on the profiles feature)

Malini Das [:mdas] - Away, not checking bugmail

Comment 14

•

11 years ago

(In reply to Andrew Halberstadt [:ahal] from comment #12) > (In reply to Dave Hunt (:davehunt) from comment #11) > > The profile is recreated each time we call runner.start(). Could you expand > > on your question, as I'm not sure I understand it yet. > > One of the motivating factors of implementing DeviceRunner the way it is was > so that we could push a custom profile to the device and modify profiles.ini > to point to it. My understanding was that you didn't need this feature for > your work. But e.g the Gij tests that Malini is working on are heavily > dependent on the ability to push customized profiles. I just wanted to make > sure that we aren't breaking her work. Seems like it shouldn't affect it at all since we set the B2GDeviceRunner object's profile in start_runner (https://github.com/mozilla-b2g/gaia/blob/master/tests/python/runner-service/runner_service/handlers/runner.py#L125). Unless we destroy that object, it should push the custom profile.

Dave Hunt [:davehunt] [he/him] ⌚BST

Assignee

Comment 15

•

11 years ago

Attached patch Kill the correct b2g process on device. v1.3 (obsolete) — Details — Splinter Review

This version of the patch now calls the BaseRunner.stop() method too as I'm no longer seeing the issues I previously saw with that. I'm also now storing if the runner has witnessed a crash, which makes it possible for me to write a test that causes a crash. Previously I was able to do this by directly checking for crashes in the test, but now the on_finish method is where the crash is being picked up so it's missed. I'm still unsure about calling stop immediately followed by killing the process. While this has worked flawlessly for me I'm conscious that it would seem to be a race condition. See comment 11 for more details.

Attachment #8471571 - Attachment is obsolete: true

Attachment #8477310 - Flags: review?(ahalberstadt)

Attachment #8477310 - Flags: feedback?(james)

Dave Hunt [:davehunt] [he/him] ⌚BST

Assignee

Comment 16

•

11 years ago

Comment on attachment 8477310 [details] [diff] [review] Kill the correct b2g process on device. v1.3 I refreshed the wrong patch. Please hold.

Attachment #8477310 - Attachment is obsolete: true

Attachment #8477310 - Flags: review?(ahalberstadt)

Attachment #8477310 - Flags: feedback?(james)

Dave Hunt [:davehunt] [he/him] ⌚BST

Assignee

Comment 17

•

11 years ago

Attached patch Kill the correct b2g process on device. v1.4 (obsolete) — Details — Splinter Review

Try run: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=2f20ee2935c6

Attachment #8477324 - Flags: review?(ahalberstadt)

Attachment #8477324 - Flags: feedback?(james)

Andrew Halberstadt [:ahal]

Comment 18

•

11 years ago

Comment on attachment 8477324 [details] [diff] [review] Kill the correct b2g process on device. v1.4 Review of attachment 8477324 [details] [diff] [review]: ----------------------------------------------------------------- This will break crash stacks on timeout, otherwise it looks good. ::: testing/mozbase/mozrunner/mozrunner/base/device.py @@ +98,5 @@ > > + def stop(self, sig=None): > + try: > + remote_pid = self.is_running() > + self.app_ctx.stop_application() Is this necessary? I think that stopping the application this way makes the 'sig' parameter useless (i.e we won't trigger the crashreporter on timeout). When testing locally, calling runner.on_timeout() should generate and process minidumps. @@ +122,5 @@ > if match: > self.last_test = match[-1] > > def on_timeout(self): > + self.stop() You need to pass in sig=signal.SIGABRT here. Otherwise we won't get stacks if there is a timeout.

Attachment #8477324 - Flags: review?(ahalberstadt) → review-

Dave Hunt [:davehunt] [he/him] ⌚BST

Assignee

Comment 19

•

11 years ago

(In reply to Andrew Halberstadt [:ahal] from comment #18) > Comment on attachment 8477324 [details] [diff] [review] > Kill the correct b2g process on device. v1.4 > > Review of attachment 8477324 [details] [diff] [review]: > ----------------------------------------------------------------- > > This will break crash stacks on timeout, otherwise it looks good. > > ::: testing/mozbase/mozrunner/mozrunner/base/device.py > @@ +98,5 @@ > > > > + def stop(self, sig=None): > > + try: > > + remote_pid = self.is_running() > > + self.app_ctx.stop_application() > > Is this necessary? I think that stopping the application this way makes the > 'sig' parameter useless (i.e we won't trigger the crashreporter on timeout). > When testing locally, calling runner.on_timeout() should generate and > process minidumps. Yes, see comment 8. If we don't first stop the b2g process then it will restart as soon as we kill it. This is a concern as mentioned in comment 11 and comment 15, but I'm not sure what else we should try. > @@ +122,5 @@ > > if match: > > self.last_test = match[-1] > > > > def on_timeout(self): > > + self.stop() > > You need to pass in sig=signal.SIGABRT here. Otherwise we won't get stacks > if there is a timeout. I'll fix that and trigger a new try run. My last try run was very sad and I've yet to investigate why, so will try to reproduce the failures today on my Ubuntu VM.

Dave Hunt [:davehunt] [he/him] ⌚BST

Assignee

Comment 20

•

11 years ago

Attached patch Kill the correct b2g process on device. v1.5 (obsolete) — Details — Splinter Review

Try: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=ce5d3337c0a2

Attachment #8477324 - Attachment is obsolete: true

Attachment #8477324 - Flags: feedback?(james)

Flags: needinfo?(dave.hunt)

Dave Hunt [:davehunt] [he/him] ⌚BST

Assignee

Comment 21

•

11 years ago

Attached patch Kill the correct b2g process on device. v1.6 (obsolete) — Details — Splinter Review

Okay, I think I finally worked this out. The patch was causing us not to stop b2g on emulator at all and therefore not pick up the new profile, etc. This version of the patch works for me on device and emulator. Try: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=0b878a3dd796

Attachment #8479876 - Attachment is obsolete: true

Attachment #8481593 - Flags: review?(ahalberstadt)

Flags: needinfo?(dave.hunt)

Andrew Halberstadt [:ahal]

Comment 22

•

11 years ago

Comment on attachment 8481593 [details] [diff] [review] Kill the correct b2g process on device. v1.6 Review of attachment 8481593 [details] [diff] [review]: ----------------------------------------------------------------- Thanks for digging into this!

Attachment #8481593 - Flags: review?(ahalberstadt) → review+

Dave Hunt [:davehunt] [he/him] ⌚BST

Assignee

Updated

•

11 years ago

Keywords: checkin-needed

Dave Hunt [:davehunt] [he/him] ⌚BST

Assignee

Comment 23

•

11 years ago

Investigating potential regression in try results.

Flags: needinfo?(ahalberstadt)

Keywords: checkin-needed

Andrew Halberstadt [:ahal]

Comment 24

•

11 years ago

https://tbpl.mozilla.org/php/getParsedLog.php?id=47239041&tree=Try&full=1#error0 Yep, this is a regression from the current patch and I see what is happening. In mozprocess.wait() we join the outputThread here: http://mxr.mozilla.org/mozilla-central/source/testing/mozbase/mozprocess/mozprocess/processhandler.py#820 Normally this wait() function is called by the main thread and everything is fine. The "onTimeout" handlers however, are called from within the outThread here: http://mxr.mozilla.org/mozilla-central/source/testing/mozbase/mozprocess/mozprocess/processhandler.py#787 Because of this patch, the on_timeout handler in mozrunner now calls mozrunner.stop(), which then calls mozprocess.kill(), which then calls mozprocess.wait(). This results in the outThread trying to join with itself. I think you could fix this by not calling BaseRunner.stop() anymore (I know I asked you to call it originally... sorry about that!). Alternatively we could fix mozprocess so that it won't try to join if we are in the outThread context.

Flags: needinfo?(ahalberstadt)

Dave Hunt [:davehunt] [he/him] ⌚BST

Assignee

Comment 25

•

11 years ago

Attached patch Kill the correct b2g process on device. v1.7 — Details — Splinter Review

Try: https://tbpl.mozilla.org/?tree=Try&rev=c285ff92cdae

Attachment #8481593 - Attachment is obsolete: true

Attachment #8486521 - Flags: review?(ahalberstadt)

Andrew Halberstadt [:ahal]

Comment 26

•

11 years ago

Comment on attachment 8486521 [details] [diff] [review] Kill the correct b2g process on device. v1.7 Review of attachment 8486521 [details] [diff] [review]: ----------------------------------------------------------------- If this fixes the intermittent, lgtm!

Attachment #8486521 - Flags: review?(ahalberstadt) → review+

Dave Hunt [:davehunt] [he/him] ⌚BST

Assignee

Updated

•

11 years ago

Keywords: checkin-needed

Ryan VanderMeulen [:RyanVM]

Comment 27

•

11 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/55d2bb1cbe23

Keywords: checkin-needed

Ryan VanderMeulen [:RyanVM]

Comment 28

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/55d2bb1cbe23

Status: ASSIGNED → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla35

You need to log in before you can comment on or make changes to this bug.