adjust clientproxy to halt if halt.flg found

RESOLVED WONTFIX

Status

P3
normal
RESOLVED WONTFIX
7 years ago
7 months ago

People

(Reporter: bear, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2840] )

Attachments

(2 attachments, 2 obsolete attachments)

(Reporter)

Description

7 years ago
we have been needing a way to get clientproxy to stop since buildbot doesn't have an easy way to "notice" when a graceful shutdown has happened (to clientproxy it looks like a normal buildbot shutdown.)

So, with this patch, just before clientproxy would start buildbot it looks for halt.flg and if found exits.
(Reporter)

Updated

7 years ago
Assignee: nobody → bear
Whiteboard: [android][tegra][sut_tools]
(Reporter)

Comment 1

7 years ago
Created attachment 611551 [details] [diff] [review]
make clientproxy look for halt.flg just before starting buildbot
Attachment #611551 - Flags: review?(bugspam.Callek)
Created attachment 611556 [details] [diff] [review]
alternate implementation
Attachment #611556 - Flags: review?(bear)
(Reporter)

Updated

7 years ago
Attachment #611551 - Flags: review?(bugspam.Callek) → review-
(Reporter)

Comment 3

7 years ago
Comment on attachment 611556 [details] [diff] [review]
alternate implementation

callek +1 - can you merge my patch and yours into a single
Attachment #611556 - Flags: review?(bear) → review+
(Reporter)

Comment 4

7 years ago
Created attachment 611574 [details] [diff] [review]
make clientproxy look for halt.flg just before starting buildbot (merged)
Attachment #611551 - Attachment is obsolete: true
Attachment #611556 - Attachment is obsolete: true
(Reporter)

Updated

7 years ago
Attachment #611574 - Flags: review?(bugspam.Callek)
Comment on attachment 611574 [details] [diff] [review]
make clientproxy look for halt.flg just before starting buildbot (merged)

>diff --git a/sut_tools/clientproxy.py b/sut_tools/clientproxy.py
>             elif state == 'start':
>-                if tegraActive and not bbActive:
>+                if tegraActive and not bbActive not os.path.isfile(haltFile):

r+ assuming you fix the typo here (missing 'and')

Also I don't feel like this is necessary with the above sys.exit(1) for when bbActive is False but I don't mind being explicit.
Attachment #611574 - Flags: review?(bugspam.Callek) → review+
Comment on attachment 611574 [details] [diff] [review]
make clientproxy look for halt.flg just before starting buildbot (merged)

Testing showed problems with this, I'll articulate tomorrow, and get a followup patch up.
Attachment #611574 - Flags: review+ → review-
Created attachment 613430 [details] [diff] [review]
WIP [deadlocks]

Soooooo I can't find a simple solution to the issue I'm encountering here, it looks like whatever I do with this we are deadlocking cp.

f? simply incase either of you have any simple ideas

Tested with setting halt.flg before starting cp, setting halt.flg only after we started running, adding numerous log.debug() etc. I think if we want to do this we'll need more eyes on it -- and I think its best for me to defer further work on this to later, since halt.flg is a convenience.

Attached is where I got to so far, log follows.

------- 
2012-04-09 16:25:16,128 INFO    MainProcess: debug level is on
2012-04-09 16:25:16,138 INFO    MainProcess: 30525: ourIP 10.250.48.199 tegra tegra-011 tegraIP 10.250.51.248 bbpath /builds/tegra-011
2012-04-09 16:25:16,142 INFO    MainProcess: process shutting down
2012-04-09 16:25:16,144 DEBUG   MainProcess: running all "atexit" finalizers with priority >= 0
2012-04-09 16:25:16,144 DEBUG   MainProcess: running the remaining "atexit" finalizers
2012-04-09 16:25:16,145 INFO    MainProcess: process shutting down
2012-04-09 16:25:16,147 DEBUG   MainProcess: running all "atexit" finalizers with priority >= 0
2012-04-09 16:25:16,147 DEBUG   MainProcess: running the remaining "atexit" finalizers
2012-04-09 16:25:16,155 INFO    MainProcess: monitoring started (process pid 30528)
2012-04-09 16:25:16,162 DEBUG   MainProcess: EVENT: None
2012-04-09 16:25:16,167 DEBUG   dialback: Queue._after_fork()
2012-04-09 16:25:16,169 INFO    dialback: child process calling self.run()
2012-04-09 16:25:16,170 INFO    dialback: Binding to 0.0.0.0 port 42011
2012-04-09 16:25:16,181 DEBUG   MainProcess: socket data [20120409-16:25:07 trace output]
2012-04-09 16:25:16,181 INFO    MainProcess: heartbeat detected
2012-04-09 16:25:16,182 DEBUG   MainProcess: Queue._start_thread()
2012-04-09 16:25:16,183 DEBUG   MainProcess: doing self._thread.start()
2012-04-09 16:25:16,189 DEBUG   MainProcess: starting thread to feed data to pipe
2012-04-09 16:25:16,190 DEBUG   MainProcess: ... done self._thread.start()
2012-04-09 16:25:16,190 DEBUG   MainProcess: bbActive False tegraActive False
2012-04-09 16:25:16,191 DEBUG   MainProcess: EVENT: None
2012-04-09 16:26:16,284 DEBUG   MainProcess: socket data [20120409-16:26:07 Thump thump - 00:26:e8:d]:3f:54
2012-04-09 16:26:16,284 INFO    MainProcess: heartbeat detected
2012-04-09 16:26:16,284 DEBUG   MainProcess: bbActive False tegraActive False
2012-04-09 16:26:16,285 DEBUG   MainProcess: EVENT: active
2012-04-09 16:26:16,285 DEBUG   MainProcess: event active hbFails 0 / 50
2012-04-09 16:26:16,286 DEBUG   MainProcess: bbActive False tegraActive True
2012-04-09 16:26:16,286 DEBUG   MainProcess: EVENT: active
2012-04-09 16:26:16,286 DEBUG   MainProcess: event active hbFails 0 / 50
2012-04-09 16:26:16,287 DEBUG   MainProcess: bbActive False tegraActive True
2012-04-09 16:26:16,287 DEBUG   MainProcess: EVENT: verify
2012-04-09 16:26:16,287 DEBUG   MainProcess: event verify hbFails 0 / 50
2012-04-09 16:26:16,287 INFO    MainProcess: Running verify code
2012-04-09 16:26:50,578 WARNING MainProcess: INFO: attempting to ping tegra
2012-04-09 16:26:50,578 WARNING MainProcess: INFO: updateSUT.py: Connecting to: tegra-011
2012-04-09 16:26:50,579 WARNING MainProcess: reconnecting socket
2012-04-09 16:26:50,579 WARNING MainProcess: INFO: updateSUT.py: We're running SUTAgentAndroid Version 1.07
2012-04-09 16:26:50,579 WARNING MainProcess: INFO: Got expected SUTAgent version '1.07'
2012-04-09 16:26:50,579 WARNING MainProcess: INFO: attempting to create file /mnt/sdcard/writetest
2012-04-09 16:26:50,579 WARNING MainProcess: removing file: /mnt/sdcard/writetest
2012-04-09 16:26:50,580 WARNING MainProcess: devroot /mnt/sdcard/tests
2012-04-09 16:26:50,580 WARNING MainProcess: removeDir() returned [Deleting file(s) from /mnt/sdcard/tests
2012-04-09 16:26:50,580 WARNING MainProcess: Deleting file(s) from /mnt/sdcard/tests/profile
2012-04-09 16:26:50,580 WARNING MainProcess: Deleting file(s) from /mnt/sdcard/tests/profile/startupCache
2012-04-09 16:26:50,580 WARNING MainProcess:    Unable to delete â'â?¿A§i\â¢i!.v↑
2012-04-09 16:26:50,580 WARNING MainProcess: Unable to delete directory /mnt/sdcard/tests/profile/startupCache
2012-04-09 16:26:50,581 WARNING MainProcess: Unable to delete directory /mnt/sdcard/tests/profile
2012-04-09 16:26:50,581 WARNING MainProcess: Unable to delete directory /mnt/sdcard/tests
2012-04-09 16:26:50,581 WARNING MainProcess: $> ]
2012-04-09 16:26:50,581 WARNING MainProcess: /builds/tegra-011/error.flg
2012-04-09 16:26:50,581 WARNING MainProcess: Remote Device Error: Unable to properly remove /mnt/sdcard/tests
2012-04-09 16:26:50,581 WARNING MainProcess: verify.py returned with errors
2012-04-09 16:26:50,582 DEBUG   MainProcess: bbActive False tegraActive True
2012-04-09 16:26:50,582 INFO    MainProcess: HALT requested, stopping clientproxy
2012-04-09 16:26:50,582 DEBUG   MainProcess: MONITOR-PID: 30528
2012-04-09 16:26:50,582 INFO    MainProcess: monitor stopped
2012-04-09 16:26:50,584 INFO    MainProcess: process shutting down
2012-04-09 16:26:50,585 DEBUG   MainProcess: running all "atexit" finalizers with priority >= 0
2012-04-09 16:26:50,585 DEBUG   MainProcess: telling queue thread to quit
2012-04-09 16:26:50,585 DEBUG   MainProcess: feeder thread got sentinel -- exiting
2012-04-09 16:26:50,585 INFO    MainProcess: calling join() for process dialback

-- 
We deadlock here, and interesting the pid we have at the start of __main__ is not what ps shows:

foopy05:~ cltbld$ ps auxwwww | grep tegra-011 | grep clientproxy.py
cltbld   30529   0.0  0.1  2446748   2820   ??  S     4:25PM   0:00.04 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/Resources/Python.app/Contents/MacOS/Python clientproxy.py -b --tegra=tegra-011 --debug
cltbld   30528   0.0  0.1  2455964   3600   ??  S     4:25PM   0:00.03 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/Resources/Python.app/Contents/MacOS/Python clientproxy.py -b --tegra=tegra-011 --debug
Attachment #613430 - Flags: feedback?(bear)
Attachment #613430 - Flags: feedback?(armenzg)

Comment 8

7 years ago
I am trying to match the end of your output to code to try to understand this better.
From your output it seems that we get "monitor stopped" when monitorEvents function finishes but we don't call shutdownDialback does not get called (I'm trying to write explicitely my understanding) at exit.

It seems that the remaining output comes from calling "_exit_function()":
pydoc.net/multiprocessing/2.6.2.1/multiprocessing.util
>    for p in active_children():
>        info('calling join() for process %s', p.name)
>        p.join()

Did you have two processes at the beginning?

NOTE: As much as I would like to help you with this. I think I should not as I have a countdown with my trip next week and other work. I wanted to look at it so at least I can follow where you guys are heading.

###### reference code
>    db = Process(name='dialback', target=handleDialback, args=(sutDialbackPort, eventQueue))
>
>    signal.signal(signal.SIGTERM, handleSigTERM)
>    atexit.register(shutdownDialback, db) 
>
>    db.start()
>    monitorEvents(options, eventQueue)

(In reply to Justin Wood (:Callek) from comment #7)
> 2012-04-09 16:26:50,582 INFO    MainProcess: monitor stopped
> 2012-04-09 16:26:50,584 INFO    MainProcess: process shutting down
> 2012-04-09 16:26:50,585 DEBUG   MainProcess: running all "atexit" finalizers
> with priority >= 0
> 2012-04-09 16:26:50,585 DEBUG   MainProcess: telling queue thread to quit
> 2012-04-09 16:26:50,585 DEBUG   MainProcess: feeder thread got sentinel --
> exiting
> 2012-04-09 16:26:50,585 INFO    MainProcess: calling join() for process
> dialback
> 
> -- 
> We deadlock here, and interesting the pid we have at the start of __main__
> is not what ps shows:
> 
> foopy05:~ cltbld$ ps auxwwww | grep tegra-011 | grep clientproxy.py
> cltbld   30529   0.0  0.1  2446748   2820   ??  S     4:25PM   0:00.04
> /opt/local/Library/Frameworks/Python.framework/Versions/2.6/Resources/Python.
> app/Contents/MacOS/Python clientproxy.py -b --tegra=tegra-011 --debug
> cltbld   30528   0.0  0.1  2455964   3600   ??  S     4:25PM   0:00.03
> /opt/local/Library/Frameworks/Python.framework/Versions/2.6/Resources/Python.
> app/Contents/MacOS/Python clientproxy.py -b --tegra=tegra-011 --debug
(Reporter)

Updated

7 years ago
Attachment #613430 - Flags: feedback?(bear)
Attachment #613430 - Flags: feedback?(armenzg)
Attachment #613430 - Flags: feedback+
Bear's not working on this, and I have other things on my plate before I would prioritize it high enough to take, but it is a nice-to-have. [though last we tried, had probs]
Assignee: bear → nobody
Priority: -- → P3
(Reporter)

Comment 10

6 years ago
resolving as INCOMPLETE because these bugs are most likely not even being worked on or known about anymore
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → INCOMPLETE
Whiteboard: [android][tegra][sut_tools]
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Product: mozilla.org → Release Engineering

Updated

5 years ago
Duplicate of this bug: 941503

Updated

4 years ago
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2832]

Updated

4 years ago
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2832] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2839]

Updated

4 years ago
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2839] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2840]
Probably won't be fixing this because buildapi is going to be deprecated after all scheduling moves to Taskcluster.
Flags: needinfo?(bugspam.Callek)
(In reply to Ben Hearsum [:bhearsum] from comment #12)
> Probably won't be fixing this because buildapi is going to be deprecated
> after all scheduling moves to Taskcluster.

Oops, meant to ask if this is still an issue!
clientproxy itself is dead
Status: REOPENED → RESOLVED
Last Resolved: 6 years ago4 years ago
Flags: needinfo?(bugspam.Callek)
Resolution: --- → WONTFIX
(Assignee)

Updated

7 months ago
Component: General Automation → General
Product: Release Engineering → Release Engineering
You need to log in before you can comment on or make changes to this bug.