Open Bug 1541200 Opened 6 years ago Updated 6 months ago

Avoid nsExceptionHandler::SetupExtraData() doing main thread IO super early on startup

Categories

(Toolkit :: Crash Reporting, defect, P2)

defect

Tracking

()

Performance Impact medium
Tracking Status
firefox68 --- affected

People

(Reporter: Gijs, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: main-thread-io, perf:startup, Whiteboard: [fxperfsize:S])

There is main thread IO to several different files in:

https://searchfox.org/mozilla-central/rev/44a212460990ffffecf50a8e972d3cbde2e7216b/toolkit/crashreporter/nsExceptionHandler.cpp#1771-1833

We should either move the IO to the crash reporter itself, or move it off the main thread and (ideally) later in startup (which I understand might be problematic for crash report info).

If we're storing info this way, perhaps it's possible to do so in env vars to avoid touching the disk, like we do for the crashreporter override ini path.

Gabriele, can you comment briefly with how hard it would be to do this, and what the realistic options are?

Flags: needinfo?(gsvelto)

With the exception of the "Pending Pings" directory most of that code is very, very old and I'm not 100% sure why it was designed that way. We certainly need to get hang of the "Crash Reports" directory very early during startup because we use it in the exception handler so there's no way to avoid that. For "LastCrash" we could just set the path into an environment variable, hand it over to the crash reporter client and let it read it before submitting a crash. "InstallTime" on the other hand needs to be dealt with during startup. It's clunky though, so if there's a better solution I'm happy to go with it. Finally "Pending Pings" can probably be dealt with by the crash reporter client directly, there's no real need to set it up at startup.

Flags: needinfo?(gsvelto)

I'm marking it as P2 but when we start on it, I suggest that we split this bug further to separate the easier and the harder parts of it. Every main-thread-IO removal is useful, so I think it's worth doing the easiest half without feeling blocked by the hardest one.

Whiteboard: [fxperf] → [fxperf:p2]

Marking this small (<= 1 week) to at least get the easier bits out of the way. comment #2 being on point; we can file follow-ups for the harder bits.

Whiteboard: [fxperf:p2] → [fxperf:p2] [fxperfsize:S]

(In reply to Gabriele Svelto [:gsvelto] from comment #1)

We certainly need to get hang of the "Crash Reports" directory very early during startup because we use it in the exception handler so there's no way to avoid that.

Well, we can get the path - but why do we need to do the IO at that point? Can the exception handler not create the folder if it doesn't exist? What happens if the folder gets deleted between Firefox starting and the exception handler running? :-)

"InstallTime" on the other hand needs to be dealt with during startup. It's clunky though, so if there's a better solution I'm happy to go with it.

Similar naive question here - why do we need to do IO here? What's the bigger picture, ie what's the "point" of these InstallTime[BUILDID] files with an installation time in them?

Flags: needinfo?(gsvelto)

(In reply to :Gijs (he/him) from comment #4)

Well, we can get the path - but why do we need to do the IO at that point? Can the exception handler not create the folder if it doesn't exist?

Right now in the context of the exception handler we can't create the folder. This is both a limitation of breakpad and of certain platforms (e.g. we can't take another exception when we're in the exception handler, we also can't touch the heap).

What happens if the folder gets deleted between Firefox starting and the exception handler running? :-)

The minidumps will not be generated and the crash reporter will only suggest the user to restart Firefox since there won't be anything to be sent. Note that this all applies to main browser crashes. The issue here is that the process of writing the minidump and launching the crash reporter client all happens from within the exception handler with all its limitations. We don't have this issue for content process crashes because we handle them outside of the crashed process; something we can't do with the main process. We've been talking for years of moving this out into a separate monitoring process - not unlike Chrome does on some platforms - but it hasn't happened yet mostly because it's a lot of work and it's never been prioritized.

Similar naive question here - why do we need to do IO here? What's the bigger picture, ie what's the "point" of these InstallTime[BUILDID] files with an installation time in them?

InstallTime is one of the early annotations we put in a crash report and it's used to tell if different crashes are coming from the same Firefox installation. The reason we're doing it this way is that we need a value that's available very early during startup, that allows you to tell if different crash reports are coming from the same installation without making it easy to correlate the reports with one user. The installation time works well for this because it's almost unique without exposing any PII data.

Flags: needinfo?(gsvelto)

I've spent some time going over this code again and I think we can certainly get rid of creating the "Pending Pings" directory. It's only used in the crash reporter client so we just need the path, not the actual directory which can be created lazily at crash time.

For "Crash Reports" and InstallTime* I can investigate if there's an exception-safe way of creating them lazily. There might be but it could be fragile so I'm a bit wary of that unless it's absolutely necessary.

Priority: -- → P1

Demoting this to P2 because depending on how much time I'll have available I might rewrite this code in Rust soon and move the folder creation out-of-process.

Priority: P1 → P2
Performance Impact: --- → P2
Keywords: perfperf:startup
Whiteboard: [fxperf:p2] [fxperfsize:S] → [fxperfsize:S]
Severity: normal → S3

Based on a new performance review, this issue still stands as a 'Performance Impact: Medium' bug - are there any new updates on this bug?

Flags: needinfo?(gsvelto)

This will be fixed once I'm done with bug 587729 which should happen before the end of this year.

Flags: needinfo?(gsvelto)
You need to log in before you can comment on or make changes to this bug.