Closed Bug 1422087 Opened 3 years ago Closed 3 years ago

Purge startup caches after startup crash

Categories

(Core :: XPCOM, enhancement, P1)

enhancement

Tracking

()

RESOLVED FIXED
mozilla60
Tracking Status
firefox60 --- fixed

People

(Reporter: kmag, Assigned: kmag)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

We tend to see a lot of repeated startup crashes from the same user, either in decoding XDR scripts, or executing what appear to be corrupted, decoded scripts.

Assuming that most of these crashes are due to corrupted XDR cache data, we should be able to mitigate the problem by purging the startup caches on startup if we detect that the last session had a startup crash.
As discovered in Bug 1373934, another way this manifests is due to disk corruption causing repeated startup crashes as the mmap'd file tries to swap in a page and that fails.
Yeah, I've expected disk corruption for a while. Even if it weren't for the page fault issues, we know there are people who repeatedly get startup crashes that go away when we purge the startup crashes (for whatever reason), so this is something we should definitely be doing.
Priority: -- → P1
(This is easier done than tested...)
Comment on attachment 8956581 [details]
Bug 1422087: Purge startup caches after an incomplete startup.

https://reviewboard.mozilla.org/r/225494/#review231452

::: toolkit/components/startup/nsAppStartup.cpp:63
(Diff revision 1)
>  #define kPrefLastSuccess "toolkit.startup.last_success"
>  #define kPrefMaxResumedCrashes "toolkit.startup.max_resumed_crashes"
>  #define kPrefRecentCrashes "toolkit.startup.recent_crashes"
>  #define kPrefAlwaysUseSafeMode "toolkit.startup.always_use_safe_mode"
>  
> +#define FILE_STARTUP_INCOMPLETE NS_LITERAL_STRING(".startup-incomplete")

If we can't put this in a shared place please add a comment to each define referencing the other so we know to keep them in sync.

::: toolkit/xre/nsAppRunner.cpp:3823
(Diff revision 1)
> +  // exists, this fails, and we know the last startup was a success. If it
> +  // doesn't already exist, it is created, and will be removed at the end of
> +  // the startup crash detection window.
> +  AutoFDClose fd;
> +  Unused << crashFile->OpenNSPRFileDesc(PR_WRONLY | PR_CREATE_FILE | PR_EXCL,
> +                                        0666, &fd.rwget());

Why not just use crashFile->Create here?

::: toolkit/xre/nsAppRunner.cpp:4842
(Diff revision 1)
>    bool exit = false;
>    int result = XRE_mainInit(&exit);
>    if (result != 0 || exit)
>      return result;
>  
> +  // If we exit gracefully, remove the startup crash canary file.

Why would the startup file still be around if we successfully made it through the startup process which deletes the file?
Attachment #8956581 - Flags: review?(dtownsend)
Comment on attachment 8956581 [details]
Bug 1422087: Purge startup caches after an incomplete startup.

https://reviewboard.mozilla.org/r/225494/#review231452

> Why not just use crashFile->Create here?

Two reasons:

1) Oddly enough, it didn't occur to me until after I wrote the NSPR version.
2) nsIFile::Create tries to recursively create the entire directory tree for the file, which can be expensive, and is unnecessary since we already know the profile directory exists at this poin. OpenNSPRFile does the whole check in a single, atomic operation unless a stale file already exists.

> Why would the startup file still be around if we successfully made it through the startup process which deletes the file?

There are other successful ways for the app to exit after this point. We might process a command line flag and then quit. We might decide to restart into safe mode. Or we might just be using a non-Firefox app that never calls trackStartupCrashEnd.

This way, we always delete the file on a clean shutdown, even if we don't wind up completing the normal browser startup process.
Comment on attachment 8956581 [details]
Bug 1422087: Purge startup caches after an incomplete startup.

https://reviewboard.mozilla.org/r/225494/#review232202
Attachment #8956581 - Flags: review?(dtownsend) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/6c2569d760cd117c7fc12fba9b1ffc9d9077082a
Bug 1422087: Follow-up: Add null check for when running from GTests. r=bustage CLOSED TREE
https://hg.mozilla.org/mozilla-central/rev/686b970032b4
https://hg.mozilla.org/mozilla-central/rev/6c2569d760cd
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
You need to log in before you can comment on or make changes to this bug.