Closed
Bug 1131228
Opened 9 years ago
Closed 9 years ago
SymbolLRUCacheManager occasionally raising OSError trying to stat files
Categories
(Socorro :: Backend, task)
Socorro
Backend
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: rhelmer, Assigned: rhelmer)
Details
Not sure why this is happening, as the file appears to be there when I check it: Exception in thread Thread-1: Traceback (most recent call last): File "/usr/lib64/python2.6/threading.py", line 532, in __bootstrap_inner self.run() File "/data/socorro/socorro-virtualenv/lib/python2.6/site-packages/pyinotify.py", line 1542, in run self.loop() File "/data/socorro/socorro-virtualenv/lib/python2.6/site-packages/pyinotify.py", line 1528, in loop self.process_events() File "/data/socorro/socorro-virtualenv/lib/python2.6/site-packages/pyinotify.py", line 1328, in process_events self._default_proc_fun(revent) File "/data/socorro/socorro-virtualenv/lib/python2.6/site-packages/pyinotify.py", line 967, in __call__ return _ProcessEvent.__call__(self, event) File "/data/socorro/socorro-virtualenv/lib/python2.6/site-packages/pyinotify.py", line 687, in __call__ return meth(event) File "/data/socorro/socorro-virtualenv/lib/python2.6/site-packages/socorro-master-py2.6.egg/socorro/processor/symbol_cache_manager.py", line 76, in process_IN_OPEN self.monitor._update_cache(event.pathname) File "/data/socorro/socorro-virtualenv/lib/python2.6/site-packages/socorro-master-py2.6.egg/socorro/processor/symbol_cache_manager.py", line 192, in _update_cache size = os.stat(path).st_size OSError: [Errno 2] No such file or directory: '/tmp/symbols/firefox.pdb/20885C58B68D486A83F2D7503F9D35662/firefox.sym'
Assignee | ||
Comment 1•9 years ago
|
||
This stops the cache cleaning from happening so we should probably handle more errors here (e.g. some other process coming along and wiping out files), but I'd like to know why this happens...
Comment 2•9 years ago
|
||
puppet?
Assignee | ||
Comment 3•9 years ago
|
||
(In reply to Chris Lonnen :lonnen from comment #2) > puppet? Maaaybe... it could be that a new stackwalker instance is downloading the file so I am wrong about it being there when the stat() fails. Or maybe we're running SymbolLRUCacheManager on more than one thread accidentally? phrawzty, we're trying to keep a temporary symbol cache (for symbols downloaded from S3) in /tmp/symbols/ and they seem to be disappearing out from under our cache cleaner code - do you think this is an OK place to keep these, if not can you suggest a better one? :)
Flags: needinfo?(dmaher)
Comment 4•9 years ago
|
||
(In reply to Robert Helmer [:rhelmer] from comment #3) > phrawzty, we're trying to keep a temporary symbol cache (for symbols > downloaded from S3) in /tmp/symbols/ and they seem to be disappearing out > from under our cache cleaner code - do you think this is an OK place to keep > these, if not can you suggest a better one? :) You didn't note the name of a machine, so I guessed "socorro-processor1.stage.metrics.phx1". The rest of this comment refers to that machine. First off, I note that /tmp/ has a bunch of files of glob "symbol??????" (ex. "symbolrQi0eV"). Is that normal? Also, there's no particular reason that Puppet would be causing this behaviour, so I'd look elsewhere first. As for whether /tmp/symbols/ is an acceptable place to put these things - yeah, it's fine. The tmpwatch runs daily but it's configured to only remove things between 10 and 30 days old (depending on the item). Some quick searching seems to suggest that pynotify can be buggy, though most of the references to this fact are for older versions. That said, if does make me wonder if there's something screwy going on there - for example, if pynotify is triggering before the file has been written entirely? Alternatively, yes, something else is wiping out the file - somewhat ironically, an inotify-based tool might be useful in logging this behaviour, though I'd sooner employ audit.
Flags: needinfo?(dmaher)
Comment 5•9 years ago
|
||
My best guess here is that the file gets opened and then removed before SymbolLRUCacheManager gets the open notification. I wouldn't think that was likely with the way things are written in HTTPSymbolSupplier, since it downloads to a temp file (the /tmp/symbol???? files phrawzty is describing above) and then renames to the final file destination after successfully downloading: https://github.com/mozilla/socorro/blob/master/minidump-stackwalk/http_symbol_supplier.cc#L289 ...which means that the stackwalker shouldn't ever delete a file under /tmp/symbols, so I'm not sure what's up. If you think that tmpwatch thing is screwing things up you could try changing the location of the symbols cache to somwhere outside of /tmp.
Comment 6•9 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #5) > If you think that tmpwatch thing is > screwing things up you could try changing the location of the symbols cache > to somwhere outside of /tmp. No, I *don't* think tmpwatch is screwing things up - unless the files that are "disappearing" more than 10 days old (which seems unlikely).
Comment 7•9 years ago
|
||
"if pynotify is triggering before the file has been written entirely" This is my guess. I'm tempted to say "I told you so" on pyinotify. :) I think we should make our SymbolLRUCacheManager class more defensive and careful. A simple `if os.path.isfile(...)` should do. That pyinotify says something should only be considered a hint/clue. Not a fact.
Assignee | ||
Comment 8•9 years ago
|
||
(In reply to Peter Bengtsson [:peterbe] from comment #7) > "if pynotify is triggering before the file has been written entirely" > This is my guess. > > I'm tempted to say "I told you so" on pyinotify. :) > > I think we should make our SymbolLRUCacheManager class more defensive and > careful. A simple `if os.path.isfile(...)` should do. That pyinotify says > something should only be considered a hint/clue. Not a fact. Yeah, the exception is being thrown when we're trying to stat the file so doing that (or catching the exception) would work: https://github.com/mozilla/socorro/blob/master/socorro/processor/symbol_cache_manager.py#L192
Assignee | ||
Comment 9•9 years ago
|
||
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #5) > My best guess here is that the file gets opened and then removed before > SymbolLRUCacheManager gets the open notification. I wouldn't think that was > likely with the way things are written in HTTPSymbolSupplier, since it > downloads to a temp file (the /tmp/symbol???? files phrawzty is describing > above) and then renames to the final file destination after successfully > downloading: > https://github.com/mozilla/socorro/blob/master/minidump-stackwalk/ > http_symbol_supplier.cc#L289 I think the reason these are getting left behind is that we're killing stackwalker after 30s while it's in the middle of downloading symbols (since we're remote from S3 and the cache is cold), filed bug 1131296 for that.
Comment 10•9 years ago
|
||
I'm still not sure why that would result in the file not existing. If we've called open() on it it should exist on disk, right?
Assignee | ||
Comment 11•9 years ago
|
||
PR: https://github.com/mozilla/socorro/pull/2629
Status: NEW → ASSIGNED
Comment 12•9 years ago
|
||
Commits pushed to master at https://github.com/mozilla/socorro https://github.com/mozilla/socorro/commit/43fafddc04ba46c2bdc4433a27ee6a2de067dd37 fix bug 1131228 - catch unexpected OSError and return early https://github.com/mozilla/socorro/commit/b5f09025afe9862b54442015bccd247209b41584 Merge pull request #2629 from rhelmer/bug1131228-catch-oserror fix bug 1131228 - catch unexpected OSError and return early
Updated•9 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•