Tegras which ran on https://hg.mozilla.org/try/rev/cd5bc37f35ac have a stray fennec process still running ("talosError: 'Found processes still running: [NNNN] org.mozilla.fennec. Please close them before running talos.'")

RESOLVED FIXED

Status

Infrastructure & Operations
CIDuty
RESOLVED FIXED
6 years ago
a month ago

People

(Reporter: philor, Unassigned)

Tracking

({intermittent-failure})

Details

(Whiteboard: [buildduty])

Attachments

(1 attachment)

(Reporter)

Description

6 years ago
Apparently talos checks for running fennec processes in a more thorough way than the end-of-run killing of processes does, so after https://hg.mozilla.org/try/rev/cd5bc37f35ac ran tests on a build with some sort of odd branding producing some sort of unexpected name, every tegra that was involved is failing on talos runs on other trees like https://tbpl.mozilla.org/php/getParsedLog.php?id=18558678&tree=Firefox with "talosError: 'Found processes still running: [1574] org.mozilla.fennec. Please close them before running talos.'"

Updated

6 years ago
Whiteboard: [buildduty]
(Reporter)

Updated

6 years ago
Keywords: intermittent-failure

Comment 1

6 years ago
I've gracefully tegra-038 and tegra-189, will restart the clientproxy process when they are idle
(Reporter)

Comment 2

6 years ago
tegra-267
tegra-120
tegra-054
tegra-173
tegra-129
tegra-233
tegra-053
(Reporter)

Comment 3

6 years ago
tegra-271
tegra-098
tegra-037
tegra-076
tegra-357
tegra-139
(Reporter)

Comment 4

6 years ago
tegra-266
tegra-073
(Reporter)

Comment 5

6 years ago
tegra-313
(Reporter)

Comment 6

6 years ago
tegra-223
tegra-105
(Reporter)

Comment 7

6 years ago
tegra-215
tegra-168
tegra-114

Comment 8

6 years ago
tegra-073
tegra-166

Updated

6 years ago
Summary: Tegras which ran on https://hg.mozilla.org/try/rev/cd5bc37f35ac have a stray fennec process still running → Tegras which ran on https://hg.mozilla.org/try/rev/cd5bc37f35ac have a stray fennec process still running ("talosError: 'Found processes still running: [NNNN] org.mozilla.fennec. Please close them before running talos.'")

Comment 9

6 years ago
tegra-313
tegra-357
tegra-310
Also interesting, tegra-166 has |package:org.mozilla.catlee_firefox| installed as well --- thats odd
Created attachment 699046 [details] [diff] [review]
[tools] solve it with code

ben, you should be able to run this manually at your discretion (before we even deploy this) from any foopy with:

for i in 123 456 789; do echo CLEANING tegra-$i; SUT_NAME=tegra-$i python cleanup.py; done

from a sut_tools directory with a cleanup.py with this patch applied

That said, if I get review today I'll deploy this today, so that these can self-correct.  (Unsure what to do about the catlee_firefox case I saw on tegra-166)
Attachment #699046 - Flags: review?(jmaher)
Attachment #699046 - Flags: feedback?(bhearsum)
For why that patch is correct/useful see:
http://mxr.mozilla.org/mozilla-central/source/mobile/android/branding/unofficial/configure.sh#5

Which makes the package name different for unofficial
tegra-159
tegra-192
Could we also add some cleanup to where we check for running processes on the next run?
(In reply to Ed Morley [:edmorley UTC+0] from comment #14)
> Could we also add some cleanup to where we check for running processes on
> the next run?

Oh of course cleanup.py runs at both ends of the run, so the attached patch already does this :-)
tegra-232
tegra-357
tegra-173
tegra-105
tegra-253
tegra-101
tegra-282
kats, not sure if anyone mentioned this on IRC to you, but the branding changes you made in https://hg.mozilla.org/try/rev/cd5bc37f35ac and the other similar pushes broke cleanup on the tegras, so has been burning later jobs on them.

Until the patch lands here, please don't send any more branding changes like that to Try :-)
can somebody explain the problem we have here, I really don't understand what happened.  I see that org.mozilla.fennec is still running, we already expect to clean that up.  What was the process name that didn't get cleaned up?  what was the process name that we thought we were cleaning up.
Comment on attachment 699046 [details] [diff] [review]
[tools] solve it with code

Review of attachment 699046 [details] [diff] [review]:
-----------------------------------------------------------------

::: sut_tools/cleanup.py
@@ +53,3 @@
>          for proc in processNames:
> +            if package_basename == "%s" % proc or \
> +               package_basename.startswith("%s_" % proc):

I am not sure what this solves, are you looking for org.mozilla.fennec_aurora | org.mozilla.fennec_XXXXXX ?  I am not sure this really solves the problem since the bug description says org.mozilla.fennec is running.
Attachment #699046 - Flags: review?(jmaher) → review-
tegra-165
(In reply to Ed Morley [:edmorley UTC+0] from comment #20)
> kats, not sure if anyone mentioned this on IRC to you, but the branding
> changes you made in https://hg.mozilla.org/try/rev/cd5bc37f35ac and the
> other similar pushes broke cleanup on the tegras, so has been burning later
> jobs on them.
> 
> Until the patch lands here, please don't send any more branding changes like
> that to Try :-)

Yeah, philor told me what happened. Sorry about that, I was just trying to run an aurora tree on try :(
It's ok, totally not something you should have foreseen :-)
Comment on attachment 699046 [details] [diff] [review]
[tools] solve it with code

Review of attachment 699046 [details] [diff] [review]:
-----------------------------------------------------------------

changing to a r+.  Callek informed me that the process was 'org.mozilla.fennec_', and we never got the username.  This would not be caught no matter what, and it is a pretty good safety solution.
Attachment #699046 - Flags: review- → review+
Please can we land this as soon as possible?
It's burning dozens of jobs an hour :-(
Comment on attachment 699046 [details] [diff] [review]
[tools] solve it with code

Review of attachment 699046 [details] [diff] [review]:
-----------------------------------------------------------------

http://hg.mozilla.org/build/tools/rev/d9d0bfb239a8
Attachment #699046 - Flags: feedback?(bhearsum) → checked-in+
(In reply to Justin Wood (:Callek) from comment #28)
> http://hg.mozilla.org/build/tools/rev/d9d0bfb239a8

And all foopies updated from 6216e233938b to d9d0bfb239a8

leaving open for final verification that this did, in-fact help
Thank you :-)
Will let you know if I see any more for jobs started after now.
What's the actionable item here?
(Reporter)

Updated

6 years ago
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
(Assignee)

Updated

5 years ago
Product: mozilla.org → Release Engineering

Updated

a month ago
Product: Release Engineering → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.