Add logging for session restore prefs and decision making
Categories
(Firefox :: Session Restore, task)
Tracking
()
Tracking | Status | |
---|---|---|
firefox127 | --- | fixed |
People
(Reporter: dao, Assigned: sfoster)
References
(Blocks 2 open bugs)
Details
(Whiteboard: [fidefe-session-restore])
Attachments
(2 files)
While trying to figure out bug 1853704, Sam has brought up that having logging for what's going on internally in session restore might be helpful here and for future incidents.
Additional notes from gijs:
I think adding logging for the decision we make on startup, and what prompted it (pref value, crash, something else), tracking the last time the pref was changed, and tracking any failures in reading session store state in the same log, would help.
even if they don't point to a smoking gun they would help exclude some possibilities.
and although you're right that session store is not directly responsible for the pref, even if it had a permanent pref observer that logged changes, at this point that would help (ie we could determine if this had happened, when the pref changed, and could potentially correlate that with sync logs or machine reboots or similar)
Updated•1 year ago
|
Assignee | ||
Comment 2•11 months ago
|
||
i'm going to un-dupe this as bug 1874742 is really focused on some telemetry additions. Separately, I think we want a pref-controlled logger we can turn on (and ask bug filers to turn on) as needed to capture more verbose debug details across browser restarts.
Updated•10 months ago
|
Updated•10 months ago
|
Updated•10 months ago
|
Assignee | ||
Comment 3•10 months ago
|
||
Assignee | ||
Comment 4•10 months ago
|
||
Depends on D208815
Updated•10 months ago
|
Assignee | ||
Comment 5•9 months ago
|
||
I've got a WIP patch on here to add the necessary prefs and some logging to the session restore flow. There's a couple of questions I could use some feedback on before I get too much further into this:
-
The default log levels: we want logging to be verbose enough by default to be useful when a user encounters a problem that can't be reproduced. But we don't want to incur a performance hit or needlessly spam the disk when all is going well. Do the prefs and levels I've set there look reasonable?
-
I'm not sure when the best time to flush the log buffer to disk is. We want to capture that stuff in the event of a crash or something, but again, we don't want to be writing to disk during startup. And do I flush often, or just at some key milestones in the application lifecycle?
-
Is there any value in having the console appender as well as the dump and file appender? dump is kinda nice because it goes straight to stdout but I'm not sure if its used much in practice? I can tweak that at the pref/loglevel
-
It looks like the way the LogManager works right now is that if any error is logged, a flag is tripped so that we log more verbosely until the next flush. I think that same logic will work in our favor with the session restore use case, but I'm not sure. There are a lot of "errors" in the session restore code like files possibly not existing which are expected, so I think we'll need to be careful (and probably iterate quite a bit) with what and how we log stuff.
Assignee | ||
Updated•9 months ago
|
Comment 7•9 months ago
|
||
This looks good to me - I left a few comments/questions.
Comment 8•9 months ago
|
||
(In reply to Sam Foster [:sfoster] (he/him) from comment #5)
- The default log levels: we want logging to be verbose enough by default to be useful when a user encounters a problem that can't be reproduced. But we don't want to incur a performance hit or needlessly spam the disk when all is going well.
IIRC, the disk never gets hit until the log is flushed (which obviously has memory implications, but that's beyond what's being discussed here). In the case if sync, the "all is going well" case should never cause IO - we don't keep "success" logs by default. So if your scenario is similar (ie, no need for logs on the file-system when all is going well) then I think that will be fine. Sync has a pref which does keep success logs, which defaults to false on release and true elsewhere, and it's not uncommon for us to ask users to flip this pref, then reproduce their issue, then send us those logs. I'm not sure if that makes sense for your use-case.
- I'm not sure when the best time to flush the log buffer to disk is. We want to capture that stuff in the event of a crash or something, but again, we don't want to be writing to disk during startup. And do I flush often, or just at some key milestones in the application lifecycle?
This has similar considerations to the above - the flushing is IO, so too often is clearly bad, too rarely risks losing the logs. For sync, we assume sync isn't going to be the cause of crashes, so tend to take an optimistic view. Without knowing too much about why you are writing persistent logs, I assume it's similar to sync - you want to ensure you have logs for the feature mis-behaving, but don't expect that mis-behaviour to result in crashes. If that's true, it sounds like erring on the side of flushing less often is right for your use-case.
- Is there any value in having the console appender as well as the dump and file appender? dump is kinda nice because it goes straight to stdout but I'm not sure if its used much in practice? I can tweak that at the pref/loglevel
Sync uses a console appender with a default value of "error" - which can be useful so that errors are more obvious - you don't need to open a file just to determine if an error log was written. The "dump" appender really just has internal value and has a use-case almost identical to the console appender, it's just that the output is somewhere slightly different. I personally find it useful because can see the logs directly in the terminal when using ./mach run
, but if it has no value to your team internally, it seems fine to avoid it.
- It looks like the way the LogManager works right now is that if any error is logged, a flag is tripped so that we log more verbosely until the next flush.
IIRC, it doesn't change the verbosity of any logs, it just flags the fact an error happened. Combined with the fact mentioned above that sync doesn't write "success" logs to disk by default, the end result of this is just to ensure that if an error is logged then we can be sure a log file will be written, even if sync thinks it was entirely successful (ie, in the case where some deep component logged the error, but decided to continue and ended up "succeeding" anyway)
Reporter | ||
Comment 10•9 months ago
|
||
(In reply to Mark Hammond [:markh] [:mhammond] from comment #8)
- Is there any value in having the console appender as well as the dump and file appender? dump is kinda nice because it goes straight to stdout but I'm not sure if its used much in practice? I can tweak that at the pref/loglevel
Sync uses a console appender with a default value of "error" - which can be useful so that errors are more obvious - you don't need to open a file just to determine if an error log was written. The "dump" appender really just has internal value and has a use-case almost identical to the console appender, it's just that the output is somewhere slightly different. I personally find it useful because can see the logs directly in the terminal when using
./mach run
, but if it has no value to your team internally, it seems fine to avoid it.
Probably makes sense for session restore to use dump so you can still see stuff when quitting / across restart.
Updated•9 months ago
|
Assignee | ||
Comment 11•9 months ago
|
||
Comment on attachment 9398963 [details]
Bug 1854373 Use the loggers to capture errors and debug details during session restore.r?#sessionstore-reviewers!
Beta/Release Uplift Approval Request
- User impact if declined: No visible user impact, though we'll start writing log files into the profile directory. Without the patch, the log files are not created.
- Is this code covered by automated tests?: No
- Has the fix been verified in Nightly?: No
- Needs manual test from QE?: No
- If yes, steps to reproduce: I don't think this needs extra QA. If we do want to verify, we would just want to confirm that a session restore error results in a error log files being written to {profile}/sessionstore-logs/.
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): No user-visible changes, no functional changes, the patch just adds new logging to the session restore process.
- String changes made/needed: None
- Is Android affected?: No
Assignee | ||
Updated•9 months ago
|
Comment 12•9 months ago
|
||
:sfoster, I'm wondering about the uplift request here.
Fx126 is now in release, Fx127 moves to beta next week.
I see there are two patches on this. One patch (D208815) has landed and the other patch (D208816) is pending.
Can you confirm expectations around the beta uplift request?
Do you plan on landing it before Monday? if so it will ride the train to beta with Fx127, no beta uplift will be required.
Assignee | ||
Comment 13•9 months ago
|
||
(In reply to Donal Meehan [:dmeehan] from comment #12)
:sfoster, I'm wondering about the uplift request here.
Fx126 is now in release, Fx127 moves to beta next week.
I see there are two patches on this. One patch (D208815) has landed and the other patch (D208816) is pending.
Can you confirm expectations around the beta uplift request?
Do you plan on landing it before Monday? if so it will ride the train to beta with Fx127, no beta uplift will be required.
The first patch was some ground work and that is already in 127 and is good to go with or without the 2nd patch.
The 2nd patch has r+ and is ready to land as of this morning. It would be nice to get that into 127. I can wait for the merge and uplift, or I could go ahead and land during the soft-freeze. Which do you prefer?
Comment 14•9 months ago
|
||
It would be nice to get that into 127. I can wait for the merge and uplift, or I could go ahead and land during the soft-freeze. Which do you prefer?
If the intention is to get it into 127 beta, then I think it's better to land now. It will get some coverage in Nightly, and ride the train to beta with Fx127.
Your uplift request mentions it's low risk, so it seems like something that's still ok to land during soft code freeze.
Comment 16•9 months ago
|
||
Assignee | ||
Comment 17•9 months ago
|
||
Comment on attachment 9398963 [details]
Bug 1854373 Use the loggers to capture errors and debug details during session restore.r?#sessionstore-reviewers!
Cancelling uplift request for now. With luck this will land before the merge and not be necessary.
Comment 18•9 months ago
|
||
bugherder |
Description
•