Closed Bug 641475 Opened 15 years ago Closed 15 years ago

Android Tegras are all purple

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: mfinkle, Assigned: jmaher)

References

Details

Attachments

(3 files, 6 obsolete files)

tools/sut_tools helper script patch 15 years ago Aki Sasaki (not active) 8.24 KB, patch		Details \| Diff \| Splinter Review
devicemanager patch on foopy01 15 years ago Aki Sasaki (not active) 3.00 KB, patch		Details \| Diff \| Splinter Review
Patch with Aki's changes and a fix for the callback server 15 years ago cmtalbert 4.10 KB, patch	jmaher : review-	Details \| Diff \| Splinter Review
The test 15 years ago cmtalbert 612 bytes, text/x-python		Details
updated patch to account for r- comments 15 years ago Joel Maher ( :jmaher ) (UTC -8) 4.52 KB, patch		Details \| Diff \| Splinter Review
updated patch for r- comments and reverting reboot() (2.0) 15 years ago Joel Maher ( :jmaher ) (UTC -8) 4.05 KB, patch	cmtalbert : review- mozilla : feedback+	Details \| Diff \| Splinter Review
updated with hack to reboot to force callback server (1.0) 15 years ago Joel Maher ( :jmaher ) (UTC -8) 4.79 KB, patch	cmtalbert : review+	Details \| Diff \| Splinter Review
updated with hack to reboot to force callback server (1.1) 15 years ago Joel Maher ( :jmaher ) (UTC -8) 4.79 KB, patch	cmtalbert : review+	Details \| Diff \| Splinter Review
updated with hack to reboot to force callback server (1.2) 15 years ago Joel Maher ( :jmaher ) (UTC -8) 4.78 KB, patch	jmaher : review+	Details \| Diff \| Splinter Review

Mark Finkle (:mfinkle) (use needinfo?)

Reporter

Description

•

15 years ago

All the Android tegras used for talos and testing seem to be purple.

Aki Sasaki (not active)

Updated

•

15 years ago

Assignee: nobody → aki

Aki Sasaki (not active)

Comment 1

•

15 years ago

I've found some bugs in our SUT helper scripts. Working on production systems is never ideal, but as our production systems are currently all broken, I'm hoping it's faster than iterating in staging.

Aki Sasaki (not active)

Comment 2

•

15 years ago

Attached patch tools/sut_tools helper script patch — Details — Splinter Review

This is what I've got running in production, which seems to have fixed it for now. Notably: 1) I had failed to start the buildslaves with SUT_IP or SUT_NAME defined (and check.py -r didn't start the buildslave at all). So I wrote in support for figuring this stuff out, though the Talos remotePerfConfigurator step still needs SUT_IP. We can probably just use the slave name property and get rid of these env vars. 2) I changed installApp.py to use uninstallAppAndReboot() + installApp(), because: a) updateApp() assumed that we would always reboot. I strongly suspect that if the apk is the exact same apk that is currently installed, nothing happens, and then the callback server waits for a ping that never comes and the tegra goes purple. b) I don't trust the callback server at all at this point (see reboot.py, below). So I'm waiting til devRoot is back, then checking to see if Fennec is installed (actually I check this in 3 places, once before uninstalling, once after uninstalling, and once after installing). 3) reboot.py is broken. The only reason it works is the dm.reboot() timeout is generally > time it takes to reboot, and < time it takes for buildbot to error out. It relies on the callback server but never gets a call back. I manually telnetted to the tegra while reboot.py was still waiting, and did an |info uptime| which was <5min. I'm going to edit this to do similar polling, using |info uptime|, which will probably be more reliable. 4) At this point I'm not entirely sure what we gain by having clientproxy.py and check.py since I didn't really see buildslaves being affected by these at all. I'm thinking, going forward, a) Add a step at the very beginning of all remote tests that checks for tegra connectivity that tries for X amount of time, then kicks the job into RETRY mode and takes the buildslave offline. b) Edit check.py to actually start the buildslave if it's -r and it's not running c) Change the test factories and scripts to use the slavename rather than SUT_IP and SUT_NAME. Also, the scripts should all be usable via commandline in a user-friendly way with expected behavior. Any comments/questions/objections?

Aki Sasaki (not active)

Comment 3

•

15 years ago

Attached patch devicemanager patch on foopy01 (obsolete) — Details — Splinter Review

This is the existing devicemanager patch on foopy01. I don't think it's all necessary or very clean, but it seems to be working atm.

John O'Duinn [:joduinn] (please use "needinfo?" flag)

Updated

•

15 years ago

Blocks: 561908

Aki Sasaki (not active)

Comment 4

•

15 years ago

http://tinderbox.mozilla.org/showbuilds.cgi?tree=Mobile looks a lot better. The immediate workaround is in, though un-landed. We (ateam+releng+anyone else who wants to) need to also discuss comment 2. I'm going to start working out of user repos to get this stuff sorted out in a cleaner fashion.

Aki Sasaki (not active)

Comment 5

•

15 years ago

Refactoring into a functions.py and keep wishing I had a central config. Currently trying not to rewrite the sut factories in mozharness, but it may come to that.

Aki Sasaki (not active)

Comment 6

•

15 years ago

http://hg.mozilla.org/users/asasaki_mozilla.com/sut-tools-refactor/ (untested atm)

Aki Sasaki (not active)

Comment 7

•

15 years ago

Ok, now that things are looking like they're moving to resolution, a summary of where things are as I understand them. (In reply to comment #2) > 1) I had failed to start the buildslaves with SUT_IP or SUT_NAME defined (and > check.py -r didn't start the buildslave at all). So I wrote in support for > figuring this stuff out, though the Talos remotePerfConfigurator step still > needs SUT_IP. We can probably just use the slave name property and get rid of > these env vars. As Bear pointed out in IRC (though I argued rather loudly) I failed to do start clientproxy properly... I did a cd builds for i in tegra-*; do echo $i cd $i python clientproxy.py -b --tegra=`pwd` cd .. done when that should be --tegra=$i However, we really need a way to start these guys up... I rebooted foopy01 to clear any old crufty processes and had to write this for loop while trying to debug breakage. I've fixed up foopy01 (reboot, restart clientproxy correctly, verify that the buildslaves start automatically) since there was a lull in test runs. > b) I don't trust the callback server at all at this point (see reboot.py, > below). The callback server had a bug -- the global variable needed to be reset every run, but was only set at first run. Clint has a patch for devicemanager coming, which should fix installApp.py and reboot.py without changes. Some of my changes may still be wanted for some future version of the scripts (e.g. pulling functions out so they can be shared across scripts, guessing the tegra name so it's easier to call the scripts from manual command line) but if we can get to working with a small delta, I'm perfectly happy to leave things the way they are for now.

Aki Sasaki (not active)

Comment 8

•

15 years ago

(In reply to comment #7) > However, we really need a way to start these guys up... This may just mean having a start_cps.sh that does the above [corrected] for loop. Currently foopy01 has the patched sut_tools, to avoid the devicemanager callback server. Once we get the devicemanager patch, we can revert the sut_tools patch and apply the devicemanager patch and things should still work.

cmtalbert

Comment 9

•

15 years ago

Attached patch Patch with Aki's changes and a fix for the callback server (obsolete) — Details — Splinter Review

This patch contains all of aki's changes from attachment 519277 [details] [diff] [review]. It also includes the fix for the callbackserver (we didn't reset gCallbackData so if we re-instantiated the callbackServer object, we'd blow right through our wait loop, causing the device and the devicemanager.py to get out of sync). Also, while debugging this I found that the processExist code has not been updated to account for the env string being the first member of the string that is passed to processExist. So I changed that so that it properly looks for the application name in the second element of the string passed to processExist (processExist is passed exactly the command line we send to the agent, so it has the form "<envstring> <app> <redirect>"). Running a burn in test now, using the same apk file and the same port number. Will attach my test.

Attachment #519277 - Attachment is obsolete: true

Attachment #519566 - Flags: review?(jmaher)

cmtalbert

Comment 10

•

15 years ago

Attached file The test — Details

This is the test. Needs a little cleanup to run in our test harness.

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 11

•

15 years ago

Comment on attachment 519566 [details] [diff] [review] Patch with Aki's changes and a fix for the callback server > if len(parts[0].strip('"').split('=')) > 1: > appname = ' '.join(parts[1:]) > > pieces = appname.split(' ') >- parts = pieces[0].split('/') >+ # We need pieces[1] because pieces[0] is now the env string >+ parts = pieces[1].split('/') > app = parts[-1] > procre = re.compile('.*' + app + '.*') This pieces[0] has worked for thousands of tests, I think we are fixing the wrong thing. I bet in the scenario where we have a "" for env, we end up with an extra ' ' at the front of the appname which causes us to screw up on split(' '). I propose something more like: pieces = appname.strip().split(' ') <- assuming strip() removed leading whitespace. >+ else: >+ print '.', > time.sleep(step) > t += step all print statements should have a if self.debug clause. This should probably be a self.debug >= 2 condition. > try: > if (self.debug >= 3): print "Shutting down server now" > self.server.shutdown() > except: > print "Unable to shutdown callback server - check for a connection on port: " + str(self.port) > return gCallbackData same here about the print and self.debug. > > class myhandler(SocketServer.BaseRequestHandler): > def handle(self): > global gCallbackData > gCallbackData = self.request.recv(1024) >- #print "Callback Handler got data: " + str(gCallbackData) >+ print "Callback Handler got data: " + str(gCallbackData) > self.request.send("OK") and here again. r- only for the processExist stuff. I will work on an updated patch here shortly.

Attachment #519566 - Flags: review?(jmaher) → review-

Aki Sasaki (not active)

Comment 12

•

15 years ago

Yeah, the prints are leftover from my debugging. We actually don't need them, but if we had the |print '.',| inside a |if self.debug > 3| that would help us make sure long waits don't hit buildbot timeouts.

Mike Taylor [:bear]

Comment 13

•

15 years ago

(In reply to comment #7) > Ok, now that things are looking like they're moving to resolution, a summary of > where things are as I understand them. > > (In reply to comment #2) > > 1) I had failed to start the buildslaves with SUT_IP or SUT_NAME defined (and > > check.py -r didn't start the buildslave at all). So I wrote in support for > > figuring this stuff out, though the Talos remotePerfConfigurator step still > > needs SUT_IP. We can probably just use the slave name property and get rid of > > these env vars. > > As Bear pointed out in IRC (though I argued rather loudly) I failed to do start > clientproxy properly... I did a no more loudly than I was also the startup was on my todo list because it would be something that I don't expect a buildduty person to have to remember - I have a prototype ready that also does some other maintainance items but wasn't ready to test it yet. > > cd builds > for i in tegra-*; do > echo $i > cd $i > python clientproxy.py -b --tegra=`pwd` > cd .. > done > > when that should be --tegra=$i > > However, we really need a way to start these guys up... I rebooted foopy01 to > clear any old crufty processes and had to write this for loop while trying to > debug breakage. > > I've fixed up foopy01 (reboot, restart clientproxy correctly, verify that the > buildslaves start automatically) since there was a lull in test runs. > > > b) I don't trust the callback server at all at this point (see reboot.py, > > below). > > The callback server had a bug -- the global variable needed to be reset every > run, but was only set at first run. Clint has a patch for devicemanager > coming, which should fix installApp.py and reboot.py without changes. > > Some of my changes may still be wanted for some future version of the scripts > (e.g. pulling functions out so they can be shared across scripts, guessing the > tegra name so it's easier to call the scripts from manual command line) but if > we can get to working with a small delta, I'm perfectly happy to leave things > the way they are for now. looking at your changes on the plane I love the way they were heading - I was being very conservative in changing how sut_tools worked (for silly bear reasons in retrospect). I would love to continue with any change you started if the end goal is making things easier and simpler for buildduty.

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 14

•

15 years ago

Attached patch updated patch to account for r- comments (obsolete) — Details — Splinter Review

updated the patch to add some smarts into the processExists. Tested with mochitest and reftest as well as some hand testing of a few bogus appname variables.

Attachment #519566 - Attachment is obsolete: true

Attachment #519587 - Flags: review?(ctalbert)

Aki Sasaki (not active)

Comment 15

•

15 years ago

The unit tests on test-master01 are running but purple due to reboot.py disconnects. Either clientproxy is killing buildbot or something else is up. A fix may be changing the reboot step in the factory to use DisconnectStep.

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 16

•

15 years ago

Attached patch updated patch for r- comments and reverting reboot() (2.0) (obsolete) — Details — Splinter Review

Attachment #519587 - Attachment is obsolete: true

Attachment #519587 - Flags: review?(ctalbert)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Updated

•

15 years ago

Attachment #519707 - Flags: review?(ctalbert)

Attachment #519707 - Flags: feedback?(aki)

Aki Sasaki (not active)

Comment 17

•

15 years ago

Comment on attachment 519707 [details] [diff] [review] updated patch for r- comments and reverting reboot() (2.0) I'd love a if (self.debug >= 3): print ".", somewhere in the while (count < timeout): loop, so we can extend past the buildbot timeout. I'm wondering why you're doing a global gCallbackData and not self.gCallbackData ? Do we access gCallbackData by other means than either callbackServer or the return value from callbackServer.disconnect() ? But this looks like it'll work for us, even without the above.

Attachment #519707 - Flags: feedback?(aki) → feedback+

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 18

•

15 years ago

Attached patch updated with hack to reboot to force callback server (1.0) (obsolete) — Details — Splinter Review

this is a second option to reboot(wait=True). This puts a file in the sutagent folder which upon reboot forces a callback. No agent change needed. Tested in a loop locally as well as one at a time. I prefer this because it uses callback for all reboots we do and no polling.

Attachment #519732 - Flags: review?(ctalbert)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 19

•

15 years ago

Attached patch updated with hack to reboot to force callback server (1.1) (obsolete) — Details — Splinter Review

added the ',' to the end of print '.' :)

Attachment #519732 - Attachment is obsolete: true

Attachment #519732 - Flags: review?(ctalbert)

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Updated

•

15 years ago

Attachment #519736 - Flags: review?(ctalbert)

Attachment #519736 - Flags: feedback?(aki)

Aki Sasaki (not active)

Comment 20

•

15 years ago

Comment on attachment 519736 [details] [diff] [review] updated with hack to reboot to force callback server (1.1) >+ else: >+ if (self.debug >= 3): print '.' No comma! D:

cmtalbert

Comment 21

•

15 years ago

Comment on attachment 519732 [details] [diff] [review] updated with hack to reboot to force callback server (1.0) >diff --git a/build/mobile/devicemanager.py b/build/mobile/devicemanager.py >--- a/build/mobile/devicemanager.py >+++ b/build/mobile/devicemanager.py >@@ -567,21 +567,24 @@ class DeviceManager: > # iterates process list and returns pid if exists, otherwise None > # external function > # returns: > # success: pid > # failure: None > def processExist(self, appname): > pid = None > >- #remove the environment variables in the cli if they exist >+ #filter out extra spaces > parts = filter(lambda x: x != '', appname.split(' ')) >+ appname = ' '.join(parts[:]) > >- if len(parts[0].strip('"').split('=')) > 1: >- appname = ' '.join(parts[1:]) >+ #filter out the quoted env string if it exists >+ parts = appname.split('"') >+ if (len(parts) > 2): >+ appname = ' '.join(parts[2:]).strip() This looks a lot better than my change. Nice. >@@ -1284,42 +1291,47 @@ class DeviceManager: > > gCallbackData = '' > > class myServer(SocketServer.TCPServer): > allow_reuse_address = True > > class callbackServer(): > def __init__(self, ip, port, debuglevel): >+ global gCallbackData > self.ip = ip > self.port = port > self.connected = False > self.debug = debuglevel > if (self.debug >= 3) : print "Creating server with " + str(ip) + ":" + str(port) > self.server = myServer((ip, port), self.myhandler) > self.server_thread = Thread(target=self.server.serve_forever) > self.server_thread.setDaemon(True) > self.server_thread.start() >+ gCallbackData = '' Put this ^ right after the 'global gCallbackData' up at the top of this function. Putting it after thread.start() gives us a very slight race condition. > > def disconnect(self, step = 60, timeout = 600): >+ global gCallbackData You don't need this ^ statement because you're not writing to the gCallbackData variable in this function, you can remove it. > t = 0 > if (self.debug >= 3): print "Calling disconnect on callback server" > while t < timeout: > if (gCallbackData): > # Got the data back > if (self.debug >= 3): print "Got data back from agent: " + str(gCallbackData) > break >+ else: >+ if (self.debug >= 3): print '.' > time.sleep(step) > t += step > > try: > if (self.debug >= 3): print "Shutting down server now" > self.server.shutdown() > except: >- print "Unable to shutdown callback server - check for a connection on port: " + str(self.port) >+ if (self.debug >= 1): print "Unable to shutdown callback server - check for a connection on port: " + str(self.port) > return gCallbackData > > class myhandler(SocketServer.BaseRequestHandler): > def handle(self): > global gCallbackData > gCallbackData = self.request.recv(1024) > #print "Callback Handler got data: " + str(gCallbackData) > self.request.send("OK")

Attachment #519732 - Flags: review+

cmtalbert

Comment 22

•

15 years ago

Comment on attachment 519736 [details] [diff] [review] updated with hack to reboot to force callback server (1.1) >diff --git a/build/mobile/devicemanager.py b/build/mobile/devicemanager.py >--- a/build/mobile/devicemanager.py >+++ b/build/mobile/devicemanager.py >@@ -567,21 +567,24 @@ class DeviceManager: > # iterates process list and returns pid if exists, otherwise None > # external function > # returns: > # success: pid > # failure: None > def processExist(self, appname): > pid = None > >- #remove the environment variables in the cli if they exist >+ #filter out extra spaces > parts = filter(lambda x: x != '', appname.split(' ')) >+ appname = ' '.join(parts[:]) > >- if len(parts[0].strip('"').split('=')) > 1: >- appname = ' '.join(parts[1:]) >+ #filter out the quoted env string if it exists >+ parts = appname.split('"') >+ if (len(parts) > 2): >+ appname = ' '.join(parts[2:]).strip() This looks much better than my change. >@@ -1284,42 +1291,47 @@ class DeviceManager: > > gCallbackData = '' > > class myServer(SocketServer.TCPServer): > allow_reuse_address = True > > class callbackServer(): > def __init__(self, ip, port, debuglevel): >+ global gCallbackData > self.ip = ip > self.port = port > self.connected = False > self.debug = debuglevel > if (self.debug >= 3) : print "Creating server with " + str(ip) + ":" + str(port) > self.server = myServer((ip, port), self.myhandler) > self.server_thread = Thread(target=self.server.serve_forever) > self.server_thread.setDaemon(True) > self.server_thread.start() >+ gCallbackData = '' Move this ^ up to just after the 'global gCallbackData' line at the top of this function. Putting it after thread.start() causes a small race condition. > > def disconnect(self, step = 60, timeout = 600): >+ global gCallbackData Don't need this ^ cause you're not writing to gCallbackData in this function anymore. r+ with those changes. Looks really good and is testing well on my tegra.

Attachment #519736 - Flags: review?(ctalbert) → review+

cmtalbert

Comment 23

•

15 years ago

Comment on attachment 519707 [details] [diff] [review] updated patch for r- comments and reverting reboot() (2.0) I think I'd rather not use the polling because that hasn't been sufficient in the past, so let's go forward with the callbackserver approach for reboot. as done in the newer patch.

Attachment #519707 - Flags: review?(ctalbert) → review-

Joel Maher ( :jmaher ) (UTC -8)

Assignee

Comment 24

•

15 years ago

Attached patch updated with hack to reboot to force callback server (1.2) — Details — Splinter Review

fixed nits, carryover r+ from previous patch.

Assignee: aki → jmaher

Attachment #519707 - Attachment is obsolete: true

Attachment #519736 - Attachment is obsolete: true

Attachment #519736 - Flags: feedback?(aki)

Attachment #519750 - Flags: review+

Aki Sasaki (not active)

Comment 25

•

15 years ago

Ok. We're getting to the next error, which is a good sign. Looks like we're installing, getting a callback, returning the correct status, and then installApp.py tries to push app.ini and dies: Shutting down server now INFO: updateApp: got status back: update started org.mozilla.fennec /data/data/com.mozilla.SUTAgentAndroid/files/tests/fennec-4.0b6pre.multi.eabi-arm.apk uninst complete [1] install complete [1] Success updateApp() call returned update started org.mozilla.fennec /data/data/com.mozilla.SUTAgentAndroid/files/tests/fennec-4.0b6pre.multi.eabi-arm.apk uninst complete [1] install complete [1] Success in push file with: ../talos-data/fennec/application.ini, and: /data/data/org.mozilla.fennec/application.ini reconnecting socket remote hash returned: 'd41d8cd98f00b204e9800998ecf8427e' local hash returned: '014f44856c68554c84c46d1b7e512c12' failed making directory: /data/data/org.mozilla.fennec unable to make dirs: /data/data/org.mozilla.fennec/application.ini /builds/tegra-078/talos-data/../error.flg Remote Device Error: unable to push ../talos-data/fennec/application.ini I don't know why this is failing since we haven't touched the pushFile portion. Some guesses: 1) The callback happens when the device is online, but it's not ready to take any commands yet. I've inserted a time.sleep(60) at the end of callbacksvr::disconnect() right before the return, but this isn't helping. Either that's not it or it needs a longer sleep. 2) The processExists ? function wouldn't be trying to kill something with an argument that contains org.mozilla.fennec correct? (Pretty sure this isn't it) 3) Some weird permission issue

Aki Sasaki (not active)

Comment 26

•

15 years ago

foopy04:builds cltbld$ ls tegra-081/talos-data/fennec META-INF defaults platform.ini chrome greprefs.js plugin-container chrome.manifest lib res classes.dex lib.id resources.arsc components modules update.locale Or application.ini is missing =P

Aki Sasaki (not active)

Comment 27

•

15 years ago

Oh, this may be us cleaning up the directory.

Aki Sasaki (not active)

Comment 28

•

15 years ago

Stopping work on this to build rc1.

Aki Sasaki (not active)

Comment 29

•

15 years ago

Not sure where things are atm, but it seemed like updateApp wasn't installing fennec at all, hence the pushFile error. Quick thoughts: 1) To investigate: OSX firewall settings on foopy0{1..4} 2) Question: Do we believe that the callback server is a superior solution to polling? If so, why?

Mike Taylor [:bear]

Comment 30

•

15 years ago

(In reply to comment #29) > Not sure where things are atm, but it seemed like updateApp wasn't installing > fennec at all, hence the pushFile error. from the outside It does appear that something is not right with updateApp. we tried install, reboot, update, reboot and also uninstall, reboot, update and some other combinations but what is odd is that jmaher is able to use the sut_tools installApp scripts with the current patched devicemanager and have it work so we need to get tegra-002 over to bmoss (or ateam) so that they can confirm it is setup properly. > > Quick thoughts: > > 1) To investigate: OSX firewall settings on foopy0{1..4} > > 2) Question: Do we believe that the callback server is a superior solution to > polling? If so, why? callback is preferred, IMO, because it's discrete and immediate - with polling you loops and delays and flags/variables to remember what the state is/was in other parts of the loop. not to say if polling works right now we shouldn't use it with an eye towards moving back to callbacks later

Aki Sasaki (not active)

Comment 31

•

15 years ago

(In reply to comment #30) > callback is preferred, IMO, because it's discrete and immediate - with polling > you loops and delays and flags/variables to remember what the state is/was in > other parts of the loop. > > not to say if polling works right now we shouldn't use it with an eye towards > moving back to callbacks later Callback has many moving parts, and requires both sides to be set up correctly at the same time. If one side isn't set up correctly, or the message comes when the listener isn't listening, it's gone forever and we have no logs to help us. If we poll, one side is active and the other listens on the 20701 port it always listens on. We can log to our hearts' content, and if we miss a poll we can always wait and try again, or log the reason why the poll failed. What do we gain by having a temporal callback that has no logging attached except "waiting for message that isn't coming" ?

Mike Taylor [:bear]

Comment 32

•

15 years ago

(In reply to comment #31) > (In reply to comment #30) > > callback is preferred, IMO, because it's discrete and immediate - with polling > > you loops and delays and flags/variables to remember what the state is/was in > > other parts of the loop. > > > > not to say if polling works right now we shouldn't use it with an eye towards > > moving back to callbacks later > > Callback has many moving parts, and requires both sides to be set up correctly > at the same time. If one side isn't set up correctly, or the message comes > when the listener isn't listening, it's gone forever and we have no logs to > help us. > > If we poll, one side is active and the other listens on the 20701 port it > always listens on. We can log to our hearts' content, and if we miss a poll we > can always wait and try again, or log the reason why the poll failed. > > What do we gain by having a temporal callback that has no logging attached > except "waiting for message that isn't coming" ? The only thing that dialback callback gave me that polling couldn't was notification when a reboot happened that was not planned - that shortened the time the code was sitting in an event loop waiting for something that never would have happened. This is what is making me want to have callback available even if it isn't the primary method.

Aki Sasaki (not active)

Comment 33

•

15 years ago

(In reply to comment #32) > The only thing that dialback callback gave me that polling couldn't was > notification when a reboot happened that was not planned - that shortened the > time the code was sitting in an event loop waiting for something that never > would have happened. This is what is making me want to have callback available > even if it isn't the primary method. If polling is smart, it can do |info uptime| as well as checking devRoot. I'm aware it's slower by design, but it's also less fragile by design. I will take that tradeoff in a heartbeat.

Aki Sasaki (not active)

Comment 34

•

15 years ago

No longer all purple.

Status: NEW → RESOLVED

Closed: 15 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

12 years ago

Product: mozilla.org → Release Engineering

You need to log in before you can comment on or make changes to this bug.