consider adding more metrics around stackwalking
Categories
(Socorro :: Processor, task, P2)
Tracking
(Not tracked)
People
(Reporter: willkg, Assigned: willkg)
References
Details
Attachments
(4 files)
We periodically see slowness with the stackwalker, but it's difficult to drill down on whether there's something about the minidumps it's processing that's causing the slowness.
It would be great to get additional metrics from the stackwalker:
- minidump size
- number of modules
- number of threads
- total number of frames across threads
Maybe there are other things we'd be interested in? Maybe we can add product and release channel tags, too? (NB release channel is user provided and not validated, so we'd want to only emit a tag value if it's within a specific set of known values.)
| Assignee | ||
Comment 1•2 years ago
|
||
It'd be good to get metrics like "number of zombie/orphaned processes" and "number of open files" and other numbers from inside the container as well.
The threaded task manager has a wait_for_completion method that ticks every second. We could add some processing and metrics emission to that. Even something cheap like "number of processes" per instance could be illuminating.
Grabbing this to implement something really basic that we can expand on over time.
| Assignee | ||
Updated•2 years ago
|
| Assignee | ||
Comment 2•2 years ago
•
|
||
I think in a first pass, I want to add psutil to the requirements and emit metrics for these things:
- total processes
- total processes with tag:type (processor, cache manager, stackwalker, other?)
- total processes with tag:status (https://psutil.readthedocs.io/en/latest/#process-status-constants)
- total
open_filesacross all processes
| Assignee | ||
Comment 3•2 years ago
|
||
| Assignee | ||
Comment 4•2 years ago
|
||
| Assignee | ||
Comment 5•2 years ago
|
||
| Assignee | ||
Comment 6•2 years ago
|
||
| Assignee | ||
Comment 7•2 years ago
|
||
https://mozilla.sentry.io/issues/4444674193/
IndexError: list index out of range
File "socorro/processor/processor_app.py", line 303, in heartbeat
if cmdline[0] in ["/bin/sh", "/bin/bash"]:
It's unclear why cmdline is an empty list.
| Assignee | ||
Comment 8•2 years ago
|
||
| Assignee | ||
Comment 9•2 years ago
|
||
| Assignee | ||
Comment 10•2 years ago
|
||
https://mozilla.sentry.io/issues/4446062241/
PermissionError: [Errno 13] Permission denied: '/proc/4166/fd'
File "psutil/_pslinux.py", line 1653, in wrapper
return fun(self, *args, **kwargs)
File "psutil/_pslinux.py", line 2196, in open_files
files = os.listdir("%s/%s/fd" % (self._procfs_path, self.pid))
AccessDenied: (pid=4166)
File "socorro/processor/processor_app.py", line 316, in heartbeat
open_files += len(proc.open_files())
File "__init__.py", line 1147, in open_files
return self._proc.open_files()
File "psutil/_pslinux.py", line 1655, in wrapper
raise AccessDenied(self.pid, self._name)
Should handle that and ignore that process. I'm not sure why there are processes running inside the docker container that have a different uid--that's curious.
| Assignee | ||
Comment 11•2 years ago
|
||
| Assignee | ||
Comment 12•2 years ago
|
||
| Assignee | ||
Comment 13•2 years ago
|
||
I changed the scope of this in comment #2. I think that's sufficient for now.
I deployed this to prod just now in bug #1851648. Marking as FIXED.
Description
•