Closed Bug 1848002 Opened 2 years ago Closed 2 years ago

consider adding more metrics around stackwalking

Categories

(Socorro :: Processor, task, P2)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: willkg, Assigned: willkg)

References

Details

Attachments

(4 files)

We periodically see slowness with the stackwalker, but it's difficult to drill down on whether there's something about the minidumps it's processing that's causing the slowness.

It would be great to get additional metrics from the stackwalker:

  • minidump size
  • number of modules
  • number of threads
  • total number of frames across threads

Maybe there are other things we'd be interested in? Maybe we can add product and release channel tags, too? (NB release channel is user provided and not validated, so we'd want to only emit a tag value if it's within a specific set of known values.)

It'd be good to get metrics like "number of zombie/orphaned processes" and "number of open files" and other numbers from inside the container as well.

The threaded task manager has a wait_for_completion method that ticks every second. We could add some processing and metrics emission to that. Even something cheap like "number of processes" per instance could be illuminating.

Grabbing this to implement something really basic that we can expand on over time.

Assignee: nobody → willkg
Blocks: 1795017
Status: NEW → ASSIGNED
Summary: consider adding more metrics to MinidumpStackwalkerRule → consider adding more metrics around stackwalking

I think in a first pass, I want to add psutil to the requirements and emit metrics for these things:

https://pypi.org/project/psutil/

https://psutil.readthedocs.io/en/latest/#processes

https://mozilla.sentry.io/issues/4444674193/

IndexError: list index out of range
  File "socorro/processor/processor_app.py", line 303, in heartbeat
    if cmdline[0] in ["/bin/sh", "/bin/bash"]:

It's unclear why cmdline is an empty list.

https://mozilla.sentry.io/issues/4446062241/

PermissionError: [Errno 13] Permission denied: '/proc/4166/fd'
  File "psutil/_pslinux.py", line 1653, in wrapper
    return fun(self, *args, **kwargs)
  File "psutil/_pslinux.py", line 2196, in open_files
    files = os.listdir("%s/%s/fd" % (self._procfs_path, self.pid))

AccessDenied: (pid=4166)
  File "socorro/processor/processor_app.py", line 316, in heartbeat
    open_files += len(proc.open_files())
  File "__init__.py", line 1147, in open_files
    return self._proc.open_files()
  File "psutil/_pslinux.py", line 1655, in wrapper
    raise AccessDenied(self.pid, self._name)

Should handle that and ignore that process. I'm not sure why there are processes running inside the docker container that have a different uid--that's curious.

I changed the scope of this in comment #2. I think that's sufficient for now.

I deployed this to prod just now in bug #1851648. Marking as FIXED.

Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: