Open Bug 1554979 Opened 6 years ago Updated 2 years ago

Improve developer workflow to track down why restored pages may fail to load

Categories

(Firefox :: Session Restore, task, P2)

task

Tracking

()

People

(Reporter: mikedeboer, Unassigned)

References

(Blocks 1 open bug)

Details

Jason went through a heroic effort in bug 1535674 to track down what might've caused tabs and their respective documents to fail loading (intermittently) when they've been restored by Session Restore - or rather: right after startup. The problem with this, imho, is that it needed a heroic effort at all!

This problem occurs often enough that it warrants spending time to improve our set of tools to help us track down this kind of failure better. Right now I have the following questions for our experts:

Jason, what would have helped your analysis in bug 1535674? Some kind of end-to-end logging, perhaps? Where would you expect these log points to be at and what kind of information would you expect from them? Something else entirely, perhaps?

Dave, I believe the Browser Architecture Group might be interested in getting the full picture here, right?
We ace the flow of 'open a tab' > 'enter URL' > 'load URL' > 'render document', but can't paint a good picture of what happens when we're loading multiple documents somewhere during browser startup. Race conditions as demonstrated in bug 1535674 are hard to uncover and I think they shouldn't be, because, well, we're supposed to be acing doing the browsing thing. What I'm looking for is ideas to make it easier to uncover tab loading issues using a uniform approach. Maybe this means that we need to rethink our startup sequence and centrally manage the plethora of notifications.

There's a similar story for browser shutdown, by the way. I'm getting quite a number of reports that Nightly didn't manage to save the full session before finally quitting, which meant losing quite a number of windows/ tabs upon restarting. No one is able to reproduce it and it's not happening for all our Nightly users, but it's worrying me nonetheless. A similar system that we may be able to think up to instrument browser startup might help tracing browser shutdown issues as well. I would like to be able to say: 'Here, this is the pref you need to flip or WebExtension you need to opt-in for to start collecting useful data. Once the issue occurred for you, please hit Send.'

Flags: needinfo?(jorendorff)
Flags: needinfo?(dtownsend)

I got lucky. I'm terrible at debugging Firefox -- it isn't necessary for what I usually work on.

One thing that helped a ton:

  • --disable-e10s, when attaching lldb. Without this, or else tons of hand-holding and support (which were not forthcoming), I wouldn't have been able to begin to make progress.

Other things that would help:

  • An easy way to save a profile that reproduces a problem would have saved a ton of time. I tried just copying the profile folder, but the problem then sometimes failed to reproduce. I had to write down a super-detailed STR including the whole setup. That probably took an hour; the steps took at least 8 minutes per attempt.

  • Pervasive logging.

  • Any kind of documentation about how to debug this kind of bug, possibly including stuff like:

    • how to dump the async stack (someone told me this is in Error().stack but I did not see that when debugging this issue);
    • some way to list async "processes", particularly ones that haven't made progress recently;
    • whether it's a good idea to try to use GDB on Mac;
    • how to use the DevTools Debugger to debug Gecko and whether it's worthwhile;
    • detailed documentation about how loading works and how tasks are split between chrome and content processes.

I should have kept better notes.

Flags: needinfo?(jorendorff)

I did talk a bit about this with the team recently but nothing particularly sprung out of it. We agree that startup (and shutdown!) could really use some work to both make them more performant and easier to debug issues and performance problems. Zibi has a proposal around somewhere to split startup into various phases as a part of this but so far it hasn't risen above the level of other things that need doing.

Flags: needinfo?(dtownsend)
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.