Closed Bug 741503 Opened 13 years ago Closed 10 years ago

adjust clientproxy to halt if halt.flg found

Categories

(Release Engineering :: General, defect, P3)

ARM
Android
defect

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: bear, Unassigned)

Details

(Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2840] )

Attachments

(2 files, 2 obsolete files)

we have been needing a way to get clientproxy to stop since buildbot doesn't have an easy way to "notice" when a graceful shutdown has happened (to clientproxy it looks like a normal buildbot shutdown.) So, with this patch, just before clientproxy would start buildbot it looks for halt.flg and if found exits.
Assignee: nobody → bear
Whiteboard: [android][tegra][sut_tools]
Attachment #611551 - Flags: review?(bugspam.Callek)
Attached patch alternate implementation (obsolete) — Splinter Review
Attachment #611556 - Flags: review?(bear)
Attachment #611551 - Flags: review?(bugspam.Callek) → review-
Comment on attachment 611556 [details] [diff] [review] alternate implementation callek +1 - can you merge my patch and yours into a single
Attachment #611556 - Flags: review?(bear) → review+
Attachment #611551 - Attachment is obsolete: true
Attachment #611556 - Attachment is obsolete: true
Attachment #611574 - Flags: review?(bugspam.Callek)
Comment on attachment 611574 [details] [diff] [review] make clientproxy look for halt.flg just before starting buildbot (merged) >diff --git a/sut_tools/clientproxy.py b/sut_tools/clientproxy.py > elif state == 'start': >- if tegraActive and not bbActive: >+ if tegraActive and not bbActive not os.path.isfile(haltFile): r+ assuming you fix the typo here (missing 'and') Also I don't feel like this is necessary with the above sys.exit(1) for when bbActive is False but I don't mind being explicit.
Attachment #611574 - Flags: review?(bugspam.Callek) → review+
Comment on attachment 611574 [details] [diff] [review] make clientproxy look for halt.flg just before starting buildbot (merged) Testing showed problems with this, I'll articulate tomorrow, and get a followup patch up.
Attachment #611574 - Flags: review+ → review-
Attached patch WIP [deadlocks]Splinter Review
Soooooo I can't find a simple solution to the issue I'm encountering here, it looks like whatever I do with this we are deadlocking cp. f? simply incase either of you have any simple ideas Tested with setting halt.flg before starting cp, setting halt.flg only after we started running, adding numerous log.debug() etc. I think if we want to do this we'll need more eyes on it -- and I think its best for me to defer further work on this to later, since halt.flg is a convenience. Attached is where I got to so far, log follows. ------- 2012-04-09 16:25:16,128 INFO MainProcess: debug level is on 2012-04-09 16:25:16,138 INFO MainProcess: 30525: ourIP 10.250.48.199 tegra tegra-011 tegraIP 10.250.51.248 bbpath /builds/tegra-011 2012-04-09 16:25:16,142 INFO MainProcess: process shutting down 2012-04-09 16:25:16,144 DEBUG MainProcess: running all "atexit" finalizers with priority >= 0 2012-04-09 16:25:16,144 DEBUG MainProcess: running the remaining "atexit" finalizers 2012-04-09 16:25:16,145 INFO MainProcess: process shutting down 2012-04-09 16:25:16,147 DEBUG MainProcess: running all "atexit" finalizers with priority >= 0 2012-04-09 16:25:16,147 DEBUG MainProcess: running the remaining "atexit" finalizers 2012-04-09 16:25:16,155 INFO MainProcess: monitoring started (process pid 30528) 2012-04-09 16:25:16,162 DEBUG MainProcess: EVENT: None 2012-04-09 16:25:16,167 DEBUG dialback: Queue._after_fork() 2012-04-09 16:25:16,169 INFO dialback: child process calling self.run() 2012-04-09 16:25:16,170 INFO dialback: Binding to 0.0.0.0 port 42011 2012-04-09 16:25:16,181 DEBUG MainProcess: socket data [20120409-16:25:07 trace output] 2012-04-09 16:25:16,181 INFO MainProcess: heartbeat detected 2012-04-09 16:25:16,182 DEBUG MainProcess: Queue._start_thread() 2012-04-09 16:25:16,183 DEBUG MainProcess: doing self._thread.start() 2012-04-09 16:25:16,189 DEBUG MainProcess: starting thread to feed data to pipe 2012-04-09 16:25:16,190 DEBUG MainProcess: ... done self._thread.start() 2012-04-09 16:25:16,190 DEBUG MainProcess: bbActive False tegraActive False 2012-04-09 16:25:16,191 DEBUG MainProcess: EVENT: None 2012-04-09 16:26:16,284 DEBUG MainProcess: socket data [20120409-16:26:07 Thump thump - 00:26:e8:d]:3f:54 2012-04-09 16:26:16,284 INFO MainProcess: heartbeat detected 2012-04-09 16:26:16,284 DEBUG MainProcess: bbActive False tegraActive False 2012-04-09 16:26:16,285 DEBUG MainProcess: EVENT: active 2012-04-09 16:26:16,285 DEBUG MainProcess: event active hbFails 0 / 50 2012-04-09 16:26:16,286 DEBUG MainProcess: bbActive False tegraActive True 2012-04-09 16:26:16,286 DEBUG MainProcess: EVENT: active 2012-04-09 16:26:16,286 DEBUG MainProcess: event active hbFails 0 / 50 2012-04-09 16:26:16,287 DEBUG MainProcess: bbActive False tegraActive True 2012-04-09 16:26:16,287 DEBUG MainProcess: EVENT: verify 2012-04-09 16:26:16,287 DEBUG MainProcess: event verify hbFails 0 / 50 2012-04-09 16:26:16,287 INFO MainProcess: Running verify code 2012-04-09 16:26:50,578 WARNING MainProcess: INFO: attempting to ping tegra 2012-04-09 16:26:50,578 WARNING MainProcess: INFO: updateSUT.py: Connecting to: tegra-011 2012-04-09 16:26:50,579 WARNING MainProcess: reconnecting socket 2012-04-09 16:26:50,579 WARNING MainProcess: INFO: updateSUT.py: We're running SUTAgentAndroid Version 1.07 2012-04-09 16:26:50,579 WARNING MainProcess: INFO: Got expected SUTAgent version '1.07' 2012-04-09 16:26:50,579 WARNING MainProcess: INFO: attempting to create file /mnt/sdcard/writetest 2012-04-09 16:26:50,579 WARNING MainProcess: removing file: /mnt/sdcard/writetest 2012-04-09 16:26:50,580 WARNING MainProcess: devroot /mnt/sdcard/tests 2012-04-09 16:26:50,580 WARNING MainProcess: removeDir() returned [Deleting file(s) from /mnt/sdcard/tests 2012-04-09 16:26:50,580 WARNING MainProcess: Deleting file(s) from /mnt/sdcard/tests/profile 2012-04-09 16:26:50,580 WARNING MainProcess: Deleting file(s) from /mnt/sdcard/tests/profile/startupCache 2012-04-09 16:26:50,580 WARNING MainProcess: Unable to delete â'â?¿A§i\â¢i!.v↑ 2012-04-09 16:26:50,580 WARNING MainProcess: Unable to delete directory /mnt/sdcard/tests/profile/startupCache 2012-04-09 16:26:50,581 WARNING MainProcess: Unable to delete directory /mnt/sdcard/tests/profile 2012-04-09 16:26:50,581 WARNING MainProcess: Unable to delete directory /mnt/sdcard/tests 2012-04-09 16:26:50,581 WARNING MainProcess: $> ] 2012-04-09 16:26:50,581 WARNING MainProcess: /builds/tegra-011/error.flg 2012-04-09 16:26:50,581 WARNING MainProcess: Remote Device Error: Unable to properly remove /mnt/sdcard/tests 2012-04-09 16:26:50,581 WARNING MainProcess: verify.py returned with errors 2012-04-09 16:26:50,582 DEBUG MainProcess: bbActive False tegraActive True 2012-04-09 16:26:50,582 INFO MainProcess: HALT requested, stopping clientproxy 2012-04-09 16:26:50,582 DEBUG MainProcess: MONITOR-PID: 30528 2012-04-09 16:26:50,582 INFO MainProcess: monitor stopped 2012-04-09 16:26:50,584 INFO MainProcess: process shutting down 2012-04-09 16:26:50,585 DEBUG MainProcess: running all "atexit" finalizers with priority >= 0 2012-04-09 16:26:50,585 DEBUG MainProcess: telling queue thread to quit 2012-04-09 16:26:50,585 DEBUG MainProcess: feeder thread got sentinel -- exiting 2012-04-09 16:26:50,585 INFO MainProcess: calling join() for process dialback -- We deadlock here, and interesting the pid we have at the start of __main__ is not what ps shows: foopy05:~ cltbld$ ps auxwwww | grep tegra-011 | grep clientproxy.py cltbld 30529 0.0 0.1 2446748 2820 ?? S 4:25PM 0:00.04 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/Resources/Python.app/Contents/MacOS/Python clientproxy.py -b --tegra=tegra-011 --debug cltbld 30528 0.0 0.1 2455964 3600 ?? S 4:25PM 0:00.03 /opt/local/Library/Frameworks/Python.framework/Versions/2.6/Resources/Python.app/Contents/MacOS/Python clientproxy.py -b --tegra=tegra-011 --debug
Attachment #613430 - Flags: feedback?(bear)
Attachment #613430 - Flags: feedback?(armenzg)
I am trying to match the end of your output to code to try to understand this better. From your output it seems that we get "monitor stopped" when monitorEvents function finishes but we don't call shutdownDialback does not get called (I'm trying to write explicitely my understanding) at exit. It seems that the remaining output comes from calling "_exit_function()": pydoc.net/multiprocessing/2.6.2.1/multiprocessing.util > for p in active_children(): > info('calling join() for process %s', p.name) > p.join() Did you have two processes at the beginning? NOTE: As much as I would like to help you with this. I think I should not as I have a countdown with my trip next week and other work. I wanted to look at it so at least I can follow where you guys are heading. ###### reference code > db = Process(name='dialback', target=handleDialback, args=(sutDialbackPort, eventQueue)) > > signal.signal(signal.SIGTERM, handleSigTERM) > atexit.register(shutdownDialback, db) > > db.start() > monitorEvents(options, eventQueue) (In reply to Justin Wood (:Callek) from comment #7) > 2012-04-09 16:26:50,582 INFO MainProcess: monitor stopped > 2012-04-09 16:26:50,584 INFO MainProcess: process shutting down > 2012-04-09 16:26:50,585 DEBUG MainProcess: running all "atexit" finalizers > with priority >= 0 > 2012-04-09 16:26:50,585 DEBUG MainProcess: telling queue thread to quit > 2012-04-09 16:26:50,585 DEBUG MainProcess: feeder thread got sentinel -- > exiting > 2012-04-09 16:26:50,585 INFO MainProcess: calling join() for process > dialback > > -- > We deadlock here, and interesting the pid we have at the start of __main__ > is not what ps shows: > > foopy05:~ cltbld$ ps auxwwww | grep tegra-011 | grep clientproxy.py > cltbld 30529 0.0 0.1 2446748 2820 ?? S 4:25PM 0:00.04 > /opt/local/Library/Frameworks/Python.framework/Versions/2.6/Resources/Python. > app/Contents/MacOS/Python clientproxy.py -b --tegra=tegra-011 --debug > cltbld 30528 0.0 0.1 2455964 3600 ?? S 4:25PM 0:00.03 > /opt/local/Library/Frameworks/Python.framework/Versions/2.6/Resources/Python. > app/Contents/MacOS/Python clientproxy.py -b --tegra=tegra-011 --debug
Attachment #613430 - Flags: feedback?(bear)
Attachment #613430 - Flags: feedback?(armenzg)
Attachment #613430 - Flags: feedback+
Bear's not working on this, and I have other things on my plate before I would prioritize it high enough to take, but it is a nice-to-have. [though last we tried, had probs]
Assignee: bear → nobody
Priority: -- → P3
resolving as INCOMPLETE because these bugs are most likely not even being worked on or known about anymore
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → INCOMPLETE
Whiteboard: [android][tegra][sut_tools]
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Product: mozilla.org → Release Engineering
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2832]
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2832] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2839]
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2839] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2840]
Probably won't be fixing this because buildapi is going to be deprecated after all scheduling moves to Taskcluster.
Flags: needinfo?(bugspam.Callek)
(In reply to Ben Hearsum [:bhearsum] from comment #12) > Probably won't be fixing this because buildapi is going to be deprecated > after all scheduling moves to Taskcluster. Oops, meant to ask if this is still an issue!
clientproxy itself is dead
Status: REOPENED → RESOLVED
Closed: 13 years ago10 years ago
Flags: needinfo?(bugspam.Callek)
Resolution: --- → WONTFIX
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: