Open Bug 1296756 Opened 4 years ago Updated 10 months ago

Record number of open file descriptors in crash report

Categories

(Toolkit :: Crash Reporting, defect)

defect
Not set

Tracking

()

People

(Reporter: billm, Assigned: kanru, NeedInfo)

References

Details

Attachments

(1 file)

It would be nice to know if we're crashing because we've run out of file descriptors. This seems to happen a lot on Macs especially. It would be nice to report the number of open file descriptors in the crash report. We'll have to be careful to avoid opening new ones, though, since that's likely to fail.
I'll take this. The approach will be first getting the number of fds we can open then test each one with fstat. So we can get the number of opened fds and their type.
Assignee: nobody → kchen
Blocks: 1051567
https://treeherder.mozilla.org/#/jobs?repo=try&revision=a13a128aad19

Should be able to catch some Mac Mn-e10s errors.
Comment on attachment 8785239 [details]
Bug 1296756 - Record number of open file descriptors in crash report.

https://reviewboard.mozilla.org/r/74514/#review72516

This looks good to me. I'm still a little concerned that we won't have enough fds to generate a crash report. Maybe it would be good to "reserve" a few at startup by opening a pipe or something, and then closing them when we crash. I'll leave that up to Ted though since he knows that stuff much better than me.

::: toolkit/crashreporter/nsExceptionHandler.cpp:67
(Diff revision 1)
>  #include "client/linux/handler/exception_handler.h"
>  #include "common/linux/eintr_wrapper.h"
>  #include <fcntl.h>
>  #include <sys/types.h>
>  #include <sys/wait.h>
> +#include <sys/stat.h>

Duplicated.

::: toolkit/crashreporter/nsExceptionHandler.cpp:2164
(Diff revision 1)
> +  struct stat sb;
> +  int ok;

Should be able to declare |sb| and |ok| inside the loop.
Attachment #8785239 - Flags: review?(wmccloskey) → review+
Comment on attachment 8785239 [details]
Bug 1296756 - Record number of open file descriptors in crash report.

https://reviewboard.mozilla.org/r/74514/#review72724

::: toolkit/crashreporter/nsExceptionHandler.cpp:2154
(Diff revision 1)
> +  kFileTotal
> +};
> +
> +void AnnotateOpenedFileDescriptors()
> +{
> +#if defined(XP_MAC) || defined(XP_LINUX)

Should be XP_MACOSX
Comment on attachment 8785239 [details]
Bug 1296756 - Record number of open file descriptors in crash report.

https://reviewboard.mozilla.org/r/74514/#review72734
We've hit issues around open FDs before: bug 869917.
Comment on attachment 8785239 [details]
Bug 1296756 - Record number of open file descriptors in crash report.

https://reviewboard.mozilla.org/r/74514/#review77322

I don't love the format, I think it's a little hard to interpret, but having this information definitely seems useful. Could you add a unit test for this in testing/crashreporter/tests/unit? The only tricky bit would be that if you can't reliably simulate these conditions you might have to expose `AnnotateOpenedFileDescriptors` on the `nsICrashreporter` interface so you could call it from JS. Aside from that you'd just write a test that uses `do_crash` like:
https://dxr.mozilla.org/mozilla-central/source/toolkit/crashreporter/test/unit/test_crashreporter_crash.js

Adding the annotation in the first function (where that test passes null) and then checking that it's set and has sensible data in the second function.

::: toolkit/crashreporter/nsExceptionHandler.h:85
(Diff revision 1)
>  nsresult SetGarbageCollecting(bool collecting);
>  void SetEventloopNestingLevel(uint32_t level);
>  
> +void AnnotateOpenedFileDescriptors();
> +
>  void AnnotateTexturesSize(size_t size);

nit: just get rid of the blank lines before and after the new line you're adding.

::: toolkit/crashreporter/nsExceptionHandler.cpp:2154
(Diff revision 1)
> +  kFileTotal
> +};
> +
> +void AnnotateOpenedFileDescriptors()
> +{
> +#if defined(XP_MAC) || defined(XP_LINUX)

Should this just be `#ifdef XP_UNIX`?
Attachment #8785239 - Flags: review?(ted) → review+
I do think that low-FD conditions will cause issues in writing crash reports, since at the very least we open an FD to write the minidump and we open the executable and each shared library to get some information out of them. That's sort of orthogonal to this patch though.

I wonder if we could safely annotate this in all crash reports (in some way that didn't involve allocating memory) if we'd see an upper bound in the crash reports we collect, where we'd just fail to write the dump when there aren't enough available FDs?
Comment on attachment 8785239 [details]
Bug 1296756 - Record number of open file descriptors in crash report.

https://reviewboard.mozilla.org/r/74514/#review77328

::: toolkit/crashreporter/nsExceptionHandler.cpp:2159
(Diff revision 1)
> +#if defined(XP_MAC) || defined(XP_LINUX)
> +  // Get maximum number of FDs we can create
> +  int maxFds = getdtablesize();
> +  if (maxFds < 0) {
> +    // It's unlikely that getdtablesize() can fail. Make a best guess.
> +    maxFds = 65536;

I think you should report this number as well, since it could vary depending on the `ulimit`. It's hard to know if the number of open FDs is a problem unless we know how many there are to work with!
bug 532863 is a duplicate?
Flags: needinfo?(kanru)
You need to log in before you can comment on or make changes to this bug.