Closed
Bug 853961
Opened 11 years ago
Closed 11 years ago
Revamp Autophone logging
Categories
(Testing Graveyard :: Autophone, defect)
Testing Graveyard
Autophone
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: bc, Assigned: bc)
References
Details
Attachments
(13 files, 5 obsolete files)
500 bytes,
patch
|
mcote
:
review+
|
Details | Diff | Splinter Review |
1.36 KB,
patch
|
mcote
:
review+
|
Details | Diff | Splinter Review |
2.10 KB,
patch
|
mcote
:
review+
|
Details | Diff | Splinter Review |
532 bytes,
patch
|
mcote
:
review+
|
Details | Diff | Splinter Review |
36.15 KB,
application/zip
|
Details | |
65.25 KB,
patch
|
mcote
:
feedback+
|
Details | Diff | Splinter Review |
1.04 KB,
patch
|
mcote
:
review+
|
Details | Diff | Splinter Review |
332 bytes,
patch
|
mcote
:
review+
|
Details | Diff | Splinter Review |
629 bytes,
patch
|
mcote
:
review+
|
Details | Diff | Splinter Review |
2.87 KB,
patch
|
Details | Diff | Splinter Review | |
9.97 KB,
patch
|
Details | Diff | Splinter Review | |
3.94 KB,
patch
|
mcote
:
review+
|
Details | Diff | Splinter Review |
3.97 KB,
patch
|
mcote
:
review+
|
Details | Diff | Splinter Review |
I've been looking into the logging in Autophone. Some of the ideas I've considering: * at a minimum we should use TimedRotatingFileHandler's to rotate our logs everyday to keep them from growing without bound. * by using exc_info=1 in logging calls we can automatically get exception information in the message or we can use logging.exception to automatically get it as well and clean up some of our error handling code. * we can use StreamHandler's to customize which log messages are sent to the console and eliminate almost all uses of print. For example, we could say that only ERROR log messages are printed to the console while all messages are written to the file log. * it is possible to use SMTPHandler's and send specific log messages via email if we override the emit method to use smtplib.SMTP_SSL rather than smtplib.SMTP with smtplib.starttls() which doesn't work with mail.mozilla.com. * In individual tests, we often log messages that do not contain any additional metadata about what the test is doing. For example, in s1s2 we might say 'running fennec' but not say which build is being tested, whether caching is enabled or not, etc. It should be possible to use special formatters for our tests so that we can easily include all of the relevant metadata. * We sometimes use print statements (not often) and also call into mozbase.mozdevice.deviceManagerSUT which does use print. As a result we capture the stdout for workers and report it to .out text files. I really hate this partly due to the additional files but also the loss of any timing / sequence information between the messages in the log file and the out file. I suppose we could capture stdout from dm calls and log them with the other worker log messages, but I'm open to suggestions. * Currently the workers post status messages back to the Autophone process using a multiprocessing.Queue which are then logged into autophone's main log file. It would be possible I think to use a log server in the autophone process and a SocketHandler in the worker processes so that log messages are routed back to the Autophone process and reported there. Right now each worker logs its results to a separate log file. Losing that granularity is an issue, but it might be possible to use a different handler in Autophone to report worker log messages to separate file. *This is the most problematic idea and one that I can't justifiy.* Mark, Jeff: what do you think?
Comment 1•11 years ago
|
||
I am not super-knowledgeable of the problem space, but what you outlined above seems well-thought out and a great place to start. I would like to curate lessons learned to mozlog, but it is a bit of a stub class so I wouldn't want to impede progress with it either. WRT logging the print statements of device manager... * Personally, when using as an API, I don't generally think DM should print anything, at least as a blanket solution. Maybe you pass it callbacks for whatever levels make sense internal to DM. That said, I don't know the code as well as others nor do I use the code as much as others, so its at best a back-seat opinion * It is possible to capture stdout/stderr for logging purposes. In fact, I would say that in reality you do want to capture stdout/stderr for logging purposes. Doing this is easy (monkey patch e.g. sys.stdout), but making sure everything continues to work is hard. E.g. cleanup, what if someone else is patching them, etc. I've found this an annoying problem that I've "solved" with continuous plugging of leaks. That said, I'd love to see a good solid solution to the problem. I don't really have an opinion vs log/worker vs message-passing and one log total. I would be concerned with messages that don't get received due to system failure, but I don't know the system, here again.
Comment 2•11 years ago
|
||
Yup all good ideas. I would rank SMTP support lower than the others if our system is reasonably stable, i.e., only if it would really help debug errors. I think now that AutoPhone is physically maintainable reduces the need for this. Looks like using proper logging in DeviceManager is closed to finished (bug 793202). And yeah the socketserver is interesting, but I don't think it really solves any particular problem right now (YAGNI). We already redirect sysout in worker.py (top of PhoneWorkerSubProcess.run()). We could probably change that to use the worker's logger instead without too much trouble.
Assignee | ||
Comment 3•11 years ago
|
||
I have several patches unrelated to logging which apply upon one another so I'll just attach them here instead of filing independent bugs for each. Hope you don't mind.
Assignee | ||
Comment 4•11 years ago
|
||
Attachment #731826 -
Flags: review?(mcote)
Assignee | ||
Comment 5•11 years ago
|
||
catch operational errors when the sqlite jobs db is locked.
Attachment #731828 -
Flags: review?(mcote)
Assignee | ||
Comment 6•11 years ago
|
||
When I updated the guid's in quitter, I stupidly limited the version to 19+. This changes it to 2+. *binary patch - use git apply*
Attachment #731831 -
Flags: review?(mcote)
Assignee | ||
Comment 7•11 years ago
|
||
Attachment #731833 -
Flags: review?(mcote)
Assignee | ||
Comment 8•11 years ago
|
||
Adds hierarchical log names, log decorators and multiprocessing wrappers for Stream and TimedRotating Filehandlers. I'll attach examples in a bit.
Attachment #731835 -
Flags: review?(mcote)
Assignee | ||
Comment 9•11 years ago
|
||
This removes the 90 second max time which caused more trouble than it was worth.
Attachment #731837 -
Flags: review?(mcote)
Assignee | ||
Comment 10•11 years ago
|
||
short examples of logs.
Comment 11•11 years ago
|
||
Comment on attachment 731826 [details] [diff] [review] fix retryLimit typo in worker.py Review of attachment 731826 [details] [diff] [review]: ----------------------------------------------------------------- Cool thanks. I wonder why we even have **kwargs as an argument to DeviceManagerSUT.__init__(). We don't use it, and it just hides errors like this. Going to ask someone about that.
Attachment #731826 -
Flags: review?(mcote) → review+
Comment 12•11 years ago
|
||
Comment on attachment 731828 [details] [diff] [review] catch sqlite3.OperationalError's in jobs.py Review of attachment 731828 [details] [diff] [review]: ----------------------------------------------------------------- So this makes sense for fetching the next job and checking for pending jobs, since the worker will later retry, but I'm concerned about the calls that post a new job, clear the queue, and mark a job as completed. I suppose it's not terrible to rerun jobs (the potential results of the latter two), although it would be a waste of time, but a failed call to post a new job will result in that job being skipped altogether. I think these should be in a retry loop, and I'd even say they should be retried forever, probably with an email notification if it takes more than X attempts, since in that case something is really wrong (frozen process locking the db or something).
Attachment #731828 -
Flags: review?(mcote) → review-
Comment 13•11 years ago
|
||
Comment on attachment 731831 [details] [diff] [review] fix quitter's version limits Review of attachment 731831 [details] [diff] [review]: ----------------------------------------------------------------- Gonna take your word on this one. :)
Attachment #731831 -
Flags: review?(mcote) → review+
Comment 14•11 years ago
|
||
Comment on attachment 731833 [details] [diff] [review] make sure worker's locks are released Review of attachment 731833 [details] [diff] [review]: ----------------------------------------------------------------- This looks okay, but given that any exception in autophone.py should terminate the main process, is this really needed?
Attachment #731833 -
Flags: review?(mcote) → review+
Assignee | ||
Comment 15•11 years ago
|
||
(In reply to Mark Côté ( :mcote ) from comment #14) > This looks okay, but given that any exception in autophone.py should > terminate the main process, is this really needed? In experience I was getting disconnected phones which would not recover and which would not respond to commands sent to autophone even though autophone would say it had received them. Part of the fix may have been the job exceptions patch, but I don't think it was the full solution. Reviewing the code, I can't justify it off hand and don't remember if I had a particular reason when I added it other than eliminating possible causes of disconnected phones being unrecoverable. If I recall correctly though, with this patch+the job exceptions patch my disconnected phones would reconnect and I didn't have further situations where I had to restart autophone.py to recover a phone. I don't think the job exceptions patch was sufficient but I won't swear to it.
Assignee | ||
Comment 16•11 years ago
|
||
PS. Once the logging patch lands, I can add some additional logging statements and see if/when I hit these finally clauses.
Comment 17•11 years ago
|
||
Comment on attachment 731835 [details] [diff] [review] logging Review of attachment 731835 [details] [diff] [review]: ----------------------------------------------------------------- Okay now that I mostly understand what you're doing, this looks good. :) Some nits below. ::: autophone.py @@ +31,4 @@ > from mailer import Mailer > from worker import PhoneWorker > > +from multiprocessinghandlers import MultiprocessingStreamHandler, MultiprocessingTimedRotatingFileHandler Move this up with other local "from" imports. @@ +153,1 @@ > 'values.' % phoneid) Fix indentation. Same with a few places below. @@ +155,5 @@ > worker.user_cfg) > del self.restart_workers[phoneid] > continue > > + self.console_logger.error('Worker %s died!' % phoneid) We don't want to log this autophone.log anymore? @@ +431,5 @@ > + logger.setLevel(loglevel) > + > + filehandler = MultiprocessingTimedRotatingFileHandler(logfile, > + when='midnight', > + backupCount=7) Fix indentation. @@ +432,5 @@ > + > + filehandler = MultiprocessingTimedRotatingFileHandler(logfile, > + when='midnight', > + backupCount=7) > + #fileformatter = logging.Formatter('%(asctime)s|%(levelname)s|%(message)s') Don't think this line is needed anymore at all. @@ +438,5 @@ > + # '|%(filename)s|%(funcName)s' > + # '|%(lineno)d|%(name)s' > + # '|%(process)d|%(processName)s' > + # '|%(thread)d|%(threadName)s' > + # '|%(message)s') I'm not crazy about leaving commented-out code in there--it has a tendency to bitrot or be checked in accidentally. Maybe we can control this with an option? ::: logdecorator.py @@ +13,5 @@ > + if re.search('(%[(][a-zA-Z]+[)][^a-z]|%[(][a-zA-Z]+[)]$)', extraformat): > + raise ValueError('format string contains a %(attribute)' > + 'pattern without a type specifier.') > + > + Only one blank line between functions. @@ +21,5 @@ > + extramsg = self._extraformat % extradict > + if '%(msg)s' in extramsg: > + import traceback > + import StringIO > + import sys Why aren't these imported at the top of the file? @@ +31,5 @@ > + sys.stderr.write('msg: %s\n' % msg) > + sys.stderr.write('extradict: %s\n' % extradict) > + sys.stderr.write('extraformat: %s\n' % self._extraformat) > + sys.stderr.write('extramsg: %s\n' % extramsg) > + sys.stderr.write(80*'-' + '\n') This should probably have (a) a comment in the code explaining why it's here and (b) a comment in the stderr output explaining why this is being output. ::: multiprocessinghandlers.py @@ +10,5 @@ > + > + lock = multiprocessing.Lock() > + > + def emit(self, record): > + Remove blank line (same below). @@ +15,5 @@ > + self.lock.acquire() > + try: > + logging.StreamHandler.emit(self, record) > + except Exception, e: > + pass Why are exceptions being completely ignored here (and below)? ::: tests/runtestsremote.py @@ +30,5 @@ > class UnitTest(PhoneTest): > > def runjob(self, build_metadata, worker_subprocess): > + logger = self.logger > + loggerdeco = self.loggerdeco We do this in every test now. Can it be moved up to PhoneTest somehow? @@ +45,5 @@ > + self.logger = logger > + self.loggerdeco = loggerdeco > + > + def setup_and_runtest(self, build_metadata, worker_subprocess): > + Remove blank line. ::: tests/s1s2test.py @@ +80,5 @@ > + 'testname': testname}, > + '%(phoneid)s|%(phoneip)s|%(buildid)s|' > + 'Cache: %(cache_enabled)s|' > + '%(testname)s|%(msg)s') > + self.loggerdeco.info('Running test (%d/%d) for %s iterations' % Just realized the %s should be a %d here. @@ +89,5 @@ > attempt = 0 > while not success and attempt < 3: > # Set status > self.set_status(msg='Test %d/%d, run %s, attempt %s for url %s' % > + (testnum, len(self._urls.keys()), i, attempt, Same with the first two %s's. ::: worker.py @@ +515,5 @@ > + # '|%(filename)s|%(funcName)s' > + # '|%(lineno)d|%(name)s' > + # '|%(process)d|%(processName)s' > + # '|%(thread)d|%(threadName)s' > + # '|%(message)s') As with autophone.py, maybe we can control this with the same option?
Attachment #731835 -
Flags: review?(mcote) → review+
Comment 18•11 years ago
|
||
Comment on attachment 731837 [details] [diff] [review] analyze logcat Review of attachment 731837 [details] [diff] [review]: ----------------------------------------------------------------- ::: tests/s1s2test.py @@ +252,4 @@ > else: > fennec_crashed = False > if throbstart and throbstop == 0 and not fennec_crashed: > + #throbstop = int(throbstart) + max_time * 1000 Just take out the commented-out line altogether. We can always go back through history if we want to know what we used to do, but I can't see us bringing it back anytime soon.
Attachment #731837 -
Flags: review?(mcote) → review+
Assignee | ||
Comment 19•11 years ago
|
||
(In reply to Mark Côté ( :mcote ) from comment #17) > Comment on attachment 731835 [details] [diff] [review] > logging > > Review of attachment 731835 [details] [diff] [review]: > > I'm not crazy about leaving commented-out code in there--it has a tendency > to bitrot or be checked in accidentally. Maybe we can control this with an > option? It really isn't that useful. It was mostly educational for me to see which loggers were emitting statements. I mostly left it in for you in case you wanted to see the multiprocess behavior. I'd rather just remove it altogether really. > > @@ +21,5 @@ > > + extramsg = self._extraformat % extradict > > + if '%(msg)s' in extramsg: > > + import traceback > > + import StringIO > > + import sys > > Why aren't these imported at the top of the file? > > @@ +31,5 @@ > > + sys.stderr.write('msg: %s\n' % msg) > > + sys.stderr.write('extradict: %s\n' % extradict) > > + sys.stderr.write('extraformat: %s\n' % self._extraformat) > > + sys.stderr.write('extramsg: %s\n' % extramsg) > > + sys.stderr.write(80*'-' + '\n') > > This should probably have (a) a comment in the code explaining why it's here > and (b) a comment in the stderr output explaining why this is being output. > They were a debugging aid before I added the ValueError in the init. I think I should just remove it altogether since if I'm correct the ValueError in init would block this from ever happening. > ::: multiprocessinghandlers.py > @@ +10,5 @@ > > + > > + lock = multiprocessing.Lock() > > + > > + def emit(self, record): > > + > > Remove blank line (same below). > > @@ +15,5 @@ > > + self.lock.acquire() > > + try: > > + logging.StreamHandler.emit(self, record) > > + except Exception, e: > > + pass > > Why are exceptions being completely ignored here (and below)? > http://docs.python.org/2/library/logging.html#logging.Handler.handleError I was following the advice that logging exceptions be ignored but really I think I should remove the try: and let the logging system decide how to handle it as described in that link. > ::: tests/runtestsremote.py > @@ +30,5 @@ > > class UnitTest(PhoneTest): > > > > def runjob(self, build_metadata, worker_subprocess): > > + logger = self.logger > > + loggerdeco = self.loggerdeco > > We do this in every test now. Can it be moved up to PhoneTest somehow? Let me think about it a bit. I need to be careful to reset the values after they have changed. I may not need the logger part though. > > ::: worker.py > @@ +515,5 @@ > > + # '|%(filename)s|%(funcName)s' > > + # '|%(lineno)d|%(name)s' > > + # '|%(process)d|%(processName)s' > > + # '|%(thread)d|%(threadName)s' > > + # '|%(message)s') > > As with autophone.py, maybe we can control this with the same option? just remove it altogether as in above?
Comment 20•11 years ago
|
||
(In reply to Bob Clary [:bc:] from comment #19) > (In reply to Mark Côté ( :mcote ) from comment #17) > > Comment on attachment 731835 [details] [diff] [review] > > logging > > > > Review of attachment 731835 [details] [diff] [review]: > > > > > I'm not crazy about leaving commented-out code in there--it has a tendency > > to bitrot or be checked in accidentally. Maybe we can control this with an > > option? > > It really isn't that useful. It was mostly educational for me to see which > loggers were emitting statements. I mostly left it in for you in case you > wanted to see the multiprocess behavior. I'd rather just remove it > altogether really. Fine by me. > > ::: multiprocessinghandlers.py > > @@ +10,5 @@ > > > + > > > + lock = multiprocessing.Lock() > > > + > > > + def emit(self, record): > > > + > > > > Remove blank line (same below). > > > > @@ +15,5 @@ > > > + self.lock.acquire() > > > + try: > > > + logging.StreamHandler.emit(self, record) > > > + except Exception, e: > > > + pass > > > > Why are exceptions being completely ignored here (and below)? > > > > http://docs.python.org/2/library/logging.html#logging.Handler.handleError > I was following the advice that logging exceptions be ignored but really I > think I should remove the try: and let the logging system decide how to > handle it as described in that link. Ok. > > ::: worker.py > > @@ +515,5 @@ > > > + # '|%(filename)s|%(funcName)s' > > > + # '|%(lineno)d|%(name)s' > > > + # '|%(process)d|%(processName)s' > > > + # '|%(thread)d|%(threadName)s' > > > + # '|%(message)s') > > > > As with autophone.py, maybe we can control this with the same option? > > just remove it altogether as in above? Yeah I think so.
Assignee | ||
Comment 21•11 years ago
|
||
I moved this to apply after the logging patch so we could use the new logging here.
Attachment #731828 -
Attachment is obsolete: true
Attachment #733903 -
Flags: review?(mcote)
Assignee | ||
Comment 22•11 years ago
|
||
FYI, Hopefully covers all of your review comments. Feel free to do a new review if you want. ;-0
Attachment #731835 -
Attachment is obsolete: true
Attachment #733905 -
Flags: feedback?(mcote)
Assignee | ||
Comment 23•11 years ago
|
||
(In reply to Bob Clary [:bc:] from comment #22) > Created attachment 733905 [details] [diff] [review] > logging There is a problem with this patch. When the day rolls over the worker processes continue to write to the original autophone.log file which has been renamed autophone.log.YYYY-MM-DD. Their private logs rollover correctly. Trying to figure this out.
Assignee | ||
Comment 24•11 years ago
|
||
This patch contains mostly a bunch of logging statements. The meat is in main_loop where we recover a phone if it has_error() and sleep for a bit if all commands and jobs are processed. Without the sleep we get into an infinite loop that starves the cpu and prevents us from getting anything from the multiprocessing queue. The main_loop: Commands and jobs processed log message can be pretty verbose in the log, so we might want to take that out, but its debug only so really shouldn't be a problem in production runs.
Attachment #734670 -
Flags: review?(mcote)
Assignee | ||
Comment 25•11 years ago
|
||
Attachment #734671 -
Flags: review?(mcote)
Assignee | ||
Comment 26•11 years ago
|
||
Attachment #734673 -
Flags: review?(mcote)
Assignee | ||
Comment 27•11 years ago
|
||
When failing to get the symbols file, the IOError exception ioerror, does not alway contain the duplicate ioerror.strerror.strerror attribute properties. I'm not really sure what's up with the ftp error handling, but this patch converts the exception to a string and checks for the 550 in that.
Attachment #734677 -
Flags: review?(mcote)
Comment 28•11 years ago
|
||
Comment on attachment 733903 [details] [diff] [review] catch sqlite3.OperationalErrors in jobs.py Review of attachment 733903 [details] [diff] [review]: ----------------------------------------------------------------- A few comments below. The r- is specifically for the code duplication, which will be a pain to maintain. ::: jobs.py @@ +44,5 @@ > + 'connection errors.') > + logger.exception('Attempt %d failed to connect to jobs ' > + 'database. Waiting for %d seconds.' % > + (attempt, self.SQL_RETRY_DELAY)) > + time.sleep(self.SQL_RETRY_DELAY) A few comments about these blocks: Waiting a full minute seems a bit excessive, no? I'm thinking an exception should be raised if it can't get a connection after a certain number of retries. You'd probably have to shut down everything anyway if one or more processes got this stuck. This code is repeated several times. I'm sure you could refactor it, possibly into a decorator. Finally, why have this wrapper in both _conn() *and* the functions that call it?
Attachment #733903 -
Flags: review?(mcote) → review-
Comment 29•11 years ago
|
||
Comment on attachment 733905 [details] [diff] [review] logging, v2 Just renaming this so I can try an interdiff.
Attachment #733905 -
Attachment description: logging → logging, v2
Assignee | ||
Comment 30•11 years ago
|
||
(In reply to Mark Côté ( :mcote ) from comment #28) > Comment on attachment 733903 [details] [diff] [review] > catch sqlite3.OperationalErrors in jobs.py > > Review of attachment 733903 [details] [diff] [review]: > ----------------------------------------------------------------- > > A few comments below. The r- is specifically for the code duplication, > which will be a pain to maintain. > > ::: jobs.py > @@ +44,5 @@ > > + 'connection errors.') > > + logger.exception('Attempt %d failed to connect to jobs ' > > + 'database. Waiting for %d seconds.' % > > + (attempt, self.SQL_RETRY_DELAY)) > > + time.sleep(self.SQL_RETRY_DELAY) > > A few comments about these blocks: > > Waiting a full minute seems a bit excessive, no? > Not really I think. With 3G network emulation it takes several minutes to install a build. I'm not entirely clear how and where the sqlite db gets locked. One minute seemed safe, not too short, not too long... > I'm thinking an exception should be raised if it can't get a connection > after a certain number of retries. You'd probably have to shut down > everything anyway if one or more processes got this stuck. (In reply to Mark Côté ( :mcote ) from comment #12) > Comment on attachment 731828 [details] [diff] [review] > catch sqlite3.OperationalError's in jobs.py ... > will result in that job being skipped altogether. I think these should be > in a retry loop, and I'd even say they should be retried forever, probably > with an email notification if it takes more than X attempts, since in that > case something is really wrong (frozen process locking the db or something). > > This code is repeated several times. I'm sure you could refactor it, > possibly into a decorator. > Not sure what you mean about a decorator. > Finally, why have this wrapper in both _conn() *and* the functions that call > it? Because I wanted to distinguish the different cases.
Assignee | ||
Comment 31•11 years ago
|
||
Attachment #734847 -
Flags: review?(mcote)
Comment 32•11 years ago
|
||
Comment on attachment 733905 [details] [diff] [review] logging, v2 Review of attachment 733905 [details] [diff] [review]: ----------------------------------------------------------------- ::: autophone.py @@ +22,4 @@ > from manifestparser import TestManifest > from mozdevice.devicemanager import NetworkTools > from pulsebuildmonitor import start_pulse_monitor > +from multiprocessinghandlers import MultiprocessingStreamHandler, MultiprocessingTimedRotatingFileHandler Heh I didn't mean up this far--I meant with the other "from" imports of files in the same directory, e.g. mailer and worker.
Attachment #733905 -
Flags: feedback?(mcote) → feedback+
Comment 33•11 years ago
|
||
Comment on attachment 734670 [details] [diff] [review] fix hang in worker main_loop Review of attachment 734670 [details] [diff] [review]: ----------------------------------------------------------------- r+ing only as a temporary workaround, as discussed below. ::: worker.py @@ +483,4 @@ > self.ping() > + else: > + try: > + self.loggerdeco.debug('handle_cmd: Unknown request' % request[0]) Missing a %s here. @@ +484,5 @@ > + else: > + try: > + self.loggerdeco.debug('handle_cmd: Unknown request' % request[0]) > + except: > + self.loggerdeco.exception('handle_cmd: Unknown request') Why the blanket except here? @@ +519,5 @@ > if job: > self.handle_job(job) > + else: > + self.loggerdeco.debug('main_loop: Commands and jobs processed') > + time.sleep(10) As discussed, I don't think this should be necessary; I structured the loop such that it should block only on self.cmd_queue.get(...). But if you want to add it in just to temporarily fix your system, I can try to figure it out next week.
Attachment #734670 -
Flags: review?(mcote) → review+
Comment 34•11 years ago
|
||
Comment on attachment 734671 [details] [diff] [review] fix zipfile.BadZipfile typo in builds.py Review of attachment 734671 [details] [diff] [review]: ----------------------------------------------------------------- Nice catch. Stupid dynamic language.
Attachment #734671 -
Flags: review?(mcote) → review+
Comment 35•11 years ago
|
||
Comment on attachment 734673 [details] [diff] [review] fix missing DMError import in s1s2test.py Review of attachment 734673 [details] [diff] [review]: ----------------------------------------------------------------- Another good catch. Not sure why pyflakes isn't showing it to me.
Attachment #734673 -
Flags: review?(mcote) → review+
Comment 36•11 years ago
|
||
Comment on attachment 734677 [details] [diff] [review] fix ftp 550 Failed to change directory getting symbols Review of attachment 734677 [details] [diff] [review]: ----------------------------------------------------------------- ::: builds.py @@ +280,4 @@ > symbols_zipfile.extractall(symbols_path) > symbols_zipfile.close() > except IOError, ioerror: > + if '550 Failed to change directory' in ioerror.__str__(): I *think* str(ioerror) is the preferred way to do this, as it expressly tries to get a string out of the target object. __str__() is more of an implementation detail, one way to create this string. If this returns the same thing here, that is.
Attachment #734677 -
Flags: review?(mcote) → review+
Assignee | ||
Comment 37•11 years ago
|
||
Attachment #733903 -
Attachment is obsolete: true
Assignee | ||
Comment 38•11 years ago
|
||
I changed the loops to a single loop and it seems much more readable to me.
Attachment #734670 -
Attachment is obsolete: true
Attachment #735200 -
Flags: review?(mcote)
Assignee | ||
Comment 39•11 years ago
|
||
(In reply to Mark Côté ( :mcote ) from comment #36) > I *think* str(ioerror) is the preferred way to do this, as it expressly > tries to get a string out of the target object Right. I'll update locally.
Assignee | ||
Comment 40•11 years ago
|
||
(In reply to Bob Clary [:bc:] from comment #31) > Created attachment 734847 [details] [diff] [review] > logging follow up patch to fix rollovers fwiw, this worked fine overnight with 8 workers. I like it. :-)
Assignee | ||
Comment 41•11 years ago
|
||
Comment on attachment 734847 [details] [diff] [review] logging follow up patch to fix rollovers Not quite ready for prime time. I still have a concurrency issue around rollover.
Attachment #734847 -
Flags: review?(mcote)
Assignee | ||
Comment 42•11 years ago
|
||
If the log file is deleted by another process during rollover, wait until it is recreated before returning True the log should be reopened. I thought of using filelock and possibly other approaches, but I think this is cleanest. I have tested it somewhat but need to do some heavier testing with quicker rollovers to be absolutely sure this is it. You'll probably want me to take out the comments about Why? and Wait?
Attachment #736805 -
Flags: review?(mcote)
Comment 43•11 years ago
|
||
Comment on attachment 735200 [details] [diff] [review] fix hang in worker main_loop, v2 Review of attachment 735200 [details] [diff] [review]: ----------------------------------------------------------------- I like this; in particular, the removal of the complicated condition at the top of the old loop improves readability. I haven't tested it, but I mentally went through the loop a few times and it looks fine. Just a couple little tweaks. ::: worker.py @@ +501,5 @@ > + except Queue.Empty: > + request = None > + if self.has_error(): > + self.recover_phone() > + else: I think this should be "if not self.has_error()" rather than the "else" in case self.recover_phone() succeeded. @@ +504,5 @@ > + self.recover_phone() > + else: > + job = self.jobs.get_next_job() > + if job: > + self.handle_job(job) I think in this case, we should use "continue" or put the next bit in an "else". Otherwise, if there is another job queued up but no commands, we will wait for the timeout for no reason.
Attachment #735200 -
Flags: review?(mcote) → review+
Comment 44•11 years ago
|
||
Comment on attachment 736805 [details] [diff] [review] logging follow up patch to fix rollovers, v2 Review of attachment 736805 [details] [diff] [review]: ----------------------------------------------------------------- This seems almost right, with the change from filesize to device/inode comparison, but I am concerned about the unexpected OSError. We need to figure that out. ::: multiprocessinghandlers.py @@ +55,5 @@ > + """Check if the log file has already been rotated by another > + process by comparing the file size of the logfile name to the > + file size of the file pointed to by the stream attribute. If > + they differ, the log file named by baseFilename has been > + rotated, and the stream now points to the previous version.""" As discussed, the right way to do this is stat(self.baseFilename), fstat(self.stream.fileno()), and seeing if st_ino and st_dev are equal. If so, self.stream is pointing to self.baseFilename. @@ +64,5 @@ > + current_size = self.stream.tell() > + if current_size != baseFilename_size: > + return True > + return False > + except OSError, oserror: Narrow the scope of this try/except so we know exactly where the OSError is coming from (presumably when trying to access self.baseFilename, but let's be sure). @@ +65,5 @@ > + if current_size != baseFilename_size: > + return True > + return False > + except OSError, oserror: > + if oserror.errno != 2: Use the "errno" module instead of raw numbers, since they are platform dependent. In this case it would be errno.ENOENT. @@ +69,5 @@ > + if oserror.errno != 2: > + raise > + # The logfile has been deleted during a rollover by > + # another process. Why didn't the multiprocessing.lock > + # save me? Wait for the logfile to be recreated? We need to figure out why this is happening, rather than potentially masking a real problem that may present itself elsewhere later.
Attachment #736805 -
Flags: review?(mcote) → review-
Comment 45•11 years ago
|
||
If you are having trouble replicating the missing-file problem, then you can leave that loop in, but it would be nice to log the problem (with another emit call or something). I would like to figure it out.
Assignee | ||
Comment 46•11 years ago
|
||
This has been running locally since 4-13 but I haven't seen the rollover issue again. :-(
Attachment #736805 -
Attachment is obsolete: true
Attachment #738136 -
Flags: review?(mcote)
Comment 47•11 years ago
|
||
Comment on attachment 738136 [details] [diff] [review] logging follow up patch to fix rollovers, v3 Review of attachment 738136 [details] [diff] [review]: ----------------------------------------------------------------- Looks great!
Attachment #738136 -
Flags: review?(mcote) → review+
Assignee | ||
Comment 48•11 years ago
|
||
https://github.com/mozilla/autophone/commit/645b2ab48231e78269d82e08dcee97d40228f978 https://github.com/mozilla/autophone/commit/71846cb33c1dad1b7734d0e6531455d23b90614a https://github.com/mozilla/autophone/commit/0726e2d98be82c8d7aef9eb35cf585cc7f4e7de2 https://github.com/mozilla/autophone/commit/1102e039d60fe19dee5f2438b4ac289bdc3eb7e1 https://github.com/mozilla/autophone/commit/7b8a21687beeb5ff027bc9eb4be0f98fe78cb07e https://github.com/mozilla/autophone/commit/645d88fb69d21ffaec1783ba29ecb842b40b8027 https://github.com/mozilla/autophone/commit/8ae96cd943d7c6ee1da0ac7920d28c78f124b0da https://github.com/mozilla/autophone/commit/31a25041f6e7c6338bf395d5202b065cef0f261d https://github.com/mozilla/autophone/commit/fac1755a44f18ca362e0436d5658c4ce940ab8d9 https://github.com/mozilla/autophone/commit/bb64435c6df362369ae8a67cd288d7846160ba0b https://github.com/mozilla/autophone/commit/f247b50034010127c87c33a90fc83a0e52eeade6 done.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Updated•2 years ago
|
Product: Testing → Testing Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•