Closed Bug 750548 Opened 8 years ago Closed 7 years ago

Talos check for process close is incorrect on Android

Categories

(Testing :: Talos, defect)

ARM
Android
defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rnewman, Assigned: jmaher)

References

Details

Attachments

(1 file)

Alas, Android is in control of the lifecycle of some of our processes; quitting Fennec does not terminate all of the processes that we caused to exist.

Fennec's main process is something like org.mozilla.fennec.

Another is a content provider process, which until today was org.mozilla.gecko.PasswordsProvider.

Talos tests by name for when our process is closed, which until now didn't pick up the org.mozilla.gecko name.

Now we've changed it to be based on the package: org.mozilla.fennec.PasswordsProvider, where the package matches the channel.

Now we get:

  utils.talosError: 'browser failed to close after being initialized'

Unfortunately, the regex in ffprocess_remote is:

  .*$stuff.*

so we can't even prefix the package name.

We need to find an alternative way of testing for Fennec having shut down on Android.
We can look for multiple process names in talos.  Is it safe to assume we don't change the list of processes very often?  Is it also safe to assume that if a given list of processes are not found in 'ps', that we are closed?
(In reply to Joel Maher (:jmaher) from comment #2)
> We can look for multiple process names in talos.  Is it safe to assume we
> don't change the list of processes very often?  Is it also safe to assume
> that if a given list of processes are not found in 'ps', that we are closed?

I believe that we only care about the main process; that if it's missing, we're closed; that it doesn't change often, and that it has a predictable name -- it might be enough to remove the trailing .* from the regex.

(The problem here is one of false positives, not false negatives.)

But I don't actually know what exact values these things have when running in practice, so I can't really make a concrete recommendation.

Really, I don't see too much value in checking for process close on Android. Uninstalling/reinstalling/replacing an APK will ensure total shutdown, and we've even discussed removing the Quit menu option. Again, though, I lack a lot of context here.
there are many reasons to wait for a quit.  If we cannot determine that a quit happened, we will timeout and fail.  Some types of tests we can determine we have reached the end of the test by parsing the log file, but as it stands all harnesses expect to see the process quit before calling the test done.

There are scenarios where we crash or fail to run anything.  These tests run in a minute or two total runtime and if we couldn't detect if the process was terminated we would wait for 2400 seconds and report a timeout, not a crash.

I think it is safe to assume we can remove the trailing '.*' in the regex.  If we do that for talos, we should do it for all the other automation as well.
I am sort of cheating here by slipping in the env vars to the process, but in looking over the process name stuff and testing it, I found that we had some simple very related cleanup that was necessary.
Assignee: nobody → jmaher
Status: NEW → ASSIGNED
Attachment #669681 - Flags: review?(jhammel)
Comment on attachment 669681 [details] [diff] [review]
allow a proper process name to be managed in talos and pass env vars through (1.0)

TBH, I'm not sure if I'm qualified to review this, but looks fine to me
Attachment #669681 - Flags: review?(jhammel) → review+
http://hg.mozilla.org/build/talos/rev/d15e777b8a82
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Depends on: 801633
You need to log in before you can comment on or make changes to this bug.