Closed Bug 1854373 Opened 1 year ago Closed 9 months ago

Add logging for session restore prefs and decision making

Tracking

()

Status:

RESOLVED FIXED

Milestone:

127 Branch

Tracking Flags:

Tracking

Status

firefox127

---

fixed

People

(Reporter: dao, Assigned: sfoster)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [fidefe-session-restore])

Attachments

(2 files)

Bug 1854373 - Move logmanager.sys.mjs from services/common to toolkit/modules. r?standard8,markh 10 months ago Sam Foster [:sfoster] (he/him) 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1854373 Use the loggers to capture errors and debug details during session restore.r?#sessionstore-reviewers! 10 months ago Sam Foster [:sfoster] (he/him) 48 bytes, text/x-phabricator-request		Details \| Review

Dão Gottwald [:dao]

Reporter

Description

•

1 year ago

While trying to figure out bug 1853704, Sam has brought up that having logging for what's going on internally in session restore might be helpful here and for future incidents.

Additional notes from gijs:

I think adding logging for the decision we make on startup, and what prompted it (pref value, crash, something else), tracking the last time the pref was changed, and tracking any failures in reading session store state in the same log, would help.

even if they don't point to a smoking gun they would help exclude some possibilities.

and although you're right that session store is not directly responsible for the pref, even if it had a permanent pref observer that logged changes, at this point that would help (ie we could determine if this had happened, when the pref changed, and could potentially correlate that with sync logs or machine reboots or similar)

Dão Gottwald [:dao]

Reporter

Updated

•

1 year ago

Updated

•

1 year ago

Updated

•

1 year ago

Status: NEW → RESOLVED

Closed: 1 year ago

Duplicate of bug: 1874742

Resolution: --- → DUPLICATE

Sam Foster [:sfoster] (he/him)

Assignee

Comment 2

•

11 months ago

i'm going to un-dupe this as bug 1874742 is really focused on some telemetry additions. Separately, I think we want a pref-controlled logger we can turn on (and ask bug filers to turn on) as needed to capture more verbose debug details across browser restarts.

Status: RESOLVED → REOPENED

No longer duplicate of bug: 1874742

Resolution: DUPLICATE → ---

Sarah Clements [:sclements]

Updated

•

10 months ago

Whiteboard: [fidefe-session-restore]

Jira Integration Bot

Updated

•

10 months ago

See Also: → https://mozilla-hub.atlassian.net/browse/FIDEFE-4873

Stuart Colville [:muffinresearch]

Updated

•

10 months ago

Assignee: nobody → sfoster

Sam Foster [:sfoster] (he/him)

Assignee

Comment 3

•

10 months ago

Attached file Bug 1854373 - Move logmanager.sys.mjs from services/common to toolkit/modules. r?standard8,markh — Details

Sam Foster [:sfoster] (he/him)

Assignee

Comment 4

•

10 months ago

Attached file Bug 1854373 Use the loggers to capture errors and debug details during session restore.r?#sessionstore-reviewers! — Details

Depends on D208815

Phabricator Automation

Updated

•

10 months ago

Attachment #9398962 - Attachment description: WIP: Bug 1854373 - Move logmanager.sys.mjs from services/common to toolkit/modules. r?standard8,markh → Bug 1854373 - Move logmanager.sys.mjs from services/common to toolkit/modules. r?standard8,markh

Sam Foster [:sfoster] (he/him)

Assignee

Comment 5

•

9 months ago

I've got a WIP patch on here to add the necessary prefs and some logging to the session restore flow. There's a couple of questions I could use some feedback on before I get too much further into this:

The default log levels: we want logging to be verbose enough by default to be useful when a user encounters a problem that can't be reproduced. But we don't want to incur a performance hit or needlessly spam the disk when all is going well. Do the prefs and levels I've set there look reasonable?
I'm not sure when the best time to flush the log buffer to disk is. We want to capture that stuff in the event of a crash or something, but again, we don't want to be writing to disk during startup. And do I flush often, or just at some key milestones in the application lifecycle?
Is there any value in having the console appender as well as the dump and file appender? dump is kinda nice because it goes straight to stdout but I'm not sure if its used much in practice? I can tweak that at the pref/loglevel
It looks like the way the LogManager works right now is that if any error is logged, a flag is tripped so that we log more verbosely until the next flush. I think that same logic will work in our favor with the session restore use case, but I'm not sure. There are a lot of "errors" in the session restore code like files possibly not existing which are expected, so I think we'll need to be careful (and probably iterate quite a bit) with what and how we log stuff.

Flags: needinfo?(sclements)

Flags: needinfo?(markh)

Flags: needinfo?(dao+bmo)

Sam Foster [:sfoster] (he/him)

Assignee

Updated

•

9 months ago

Keywords: leave-open

Pulsebot

Comment 6

•

9 months ago

Pushed by sfoster@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/53e6aedfaca2 Move logmanager.sys.mjs from services/common to toolkit/modules. r=Standard8,markh

Sarah Clements [:sclements]

Comment 7

•

9 months ago

This looks good to me - I left a few comments/questions.

Flags: needinfo?(sclements)

Mark Hammond [:markh] [:mhammond]

Comment 8

•

9 months ago

(In reply to Sam Foster [:sfoster] (he/him) from comment #5)

The default log levels: we want logging to be verbose enough by default to be useful when a user encounters a problem that can't be reproduced. But we don't want to incur a performance hit or needlessly spam the disk when all is going well.

IIRC, the disk never gets hit until the log is flushed (which obviously has memory implications, but that's beyond what's being discussed here). In the case if sync, the "all is going well" case should never cause IO - we don't keep "success" logs by default. So if your scenario is similar (ie, no need for logs on the file-system when all is going well) then I think that will be fine. Sync has a pref which does keep success logs, which defaults to false on release and true elsewhere, and it's not uncommon for us to ask users to flip this pref, then reproduce their issue, then send us those logs. I'm not sure if that makes sense for your use-case.

I'm not sure when the best time to flush the log buffer to disk is. We want to capture that stuff in the event of a crash or something, but again, we don't want to be writing to disk during startup. And do I flush often, or just at some key milestones in the application lifecycle?

This has similar considerations to the above - the flushing is IO, so too often is clearly bad, too rarely risks losing the logs. For sync, we assume sync isn't going to be the cause of crashes, so tend to take an optimistic view. Without knowing too much about why you are writing persistent logs, I assume it's similar to sync - you want to ensure you have logs for the feature mis-behaving, but don't expect that mis-behaviour to result in crashes. If that's true, it sounds like erring on the side of flushing less often is right for your use-case.

Is there any value in having the console appender as well as the dump and file appender? dump is kinda nice because it goes straight to stdout but I'm not sure if its used much in practice? I can tweak that at the pref/loglevel

Sync uses a console appender with a default value of "error" - which can be useful so that errors are more obvious - you don't need to open a file just to determine if an error log was written. The "dump" appender really just has internal value and has a use-case almost identical to the console appender, it's just that the output is somewhere slightly different. I personally find it useful because can see the logs directly in the terminal when using ./mach run, but if it has no value to your team internally, it seems fine to avoid it.

It looks like the way the LogManager works right now is that if any error is logged, a flag is tripped so that we log more verbosely until the next flush.

IIRC, it doesn't change the verbosity of any logs, it just flags the fact an error happened. Combined with the fact mentioned above that sync doesn't write "success" logs to disk by default, the end result of this is just to ensure that if an error is logged then we can be sure a log file will be written, even if sync thinks it was entirely successful (ie, in the case where some deep component logged the error, but decided to continue and ended up "succeeding" anyway)

Flags: needinfo?(markh)

pstanciu

Comment 9

•

9 months ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/53e6aedfaca2

Dão Gottwald [:dao]

Reporter

Comment 10

•

9 months ago

(In reply to Mark Hammond [:markh] [:mhammond] from comment #8)

Is there any value in having the console appender as well as the dump and file appender? dump is kinda nice because it goes straight to stdout but I'm not sure if its used much in practice? I can tweak that at the pref/loglevel

Sync uses a console appender with a default value of "error" - which can be useful so that errors are more obvious - you don't need to open a file just to determine if an error log was written. The "dump" appender really just has internal value and has a use-case almost identical to the console appender, it's just that the output is somewhere slightly different. I personally find it useful because can see the logs directly in the terminal when using ./mach run, but if it has no value to your team internally, it seems fine to avoid it.

Probably makes sense for session restore to use dump so you can still see stuff when quitting / across restart.

Flags: needinfo?(dao+bmo)

Phabricator Automation

Updated

•

9 months ago

Attachment #9398963 - Attachment description: WIP: Bug 1854373 Use the loggers to capture errors and debug details during session restore.r?#sessionstore-reviewers → Bug 1854373 Use the loggers to capture errors and debug details during session restore.r?#sessionstore-reviewers!

Sam Foster [:sfoster] (he/him)

Assignee

Comment 11

•

9 months ago

Comment on attachment 9398963 [details]
Bug 1854373 Use the loggers to capture errors and debug details during session restore.r?#sessionstore-reviewers!

Beta/Release Uplift Approval Request

User impact if declined: No visible user impact, though we'll start writing log files into the profile directory. Without the patch, the log files are not created.
Is this code covered by automated tests?: No
Has the fix been verified in Nightly?: No
Needs manual test from QE?: No
If yes, steps to reproduce: I don't think this needs extra QA. If we do want to verify, we would just want to confirm that a session restore error results in a error log files being written to {profile}/sessionstore-logs/.
List of other uplifts needed: None
Risk to taking this patch: Low
Why is the change risky/not risky? (and alternatives if risky): No user-visible changes, no functional changes, the patch just adds new logging to the session restore process.
String changes made/needed: None
Is Android affected?: No

Attachment #9398963 - Flags: approval-mozilla-beta?

Sam Foster [:sfoster] (he/him)

Assignee

Updated

•

9 months ago

Keywords: leave-open

Donal Meehan [:dmeehan]

Comment 12

•

9 months ago

:sfoster, I'm wondering about the uplift request here.
Fx126 is now in release, Fx127 moves to beta next week.
I see there are two patches on this. One patch (D208815) has landed and the other patch (D208816) is pending.
Can you confirm expectations around the beta uplift request?
Do you plan on landing it before Monday? if so it will ride the train to beta with Fx127, no beta uplift will be required.

Flags: needinfo?(sfoster)

Sam Foster [:sfoster] (he/him)

Assignee

Comment 13

•

9 months ago

(In reply to Donal Meehan [:dmeehan] from comment #12)

:sfoster, I'm wondering about the uplift request here.
Fx126 is now in release, Fx127 moves to beta next week.
I see there are two patches on this. One patch (D208815) has landed and the other patch (D208816) is pending.
Can you confirm expectations around the beta uplift request?
Do you plan on landing it before Monday? if so it will ride the train to beta with Fx127, no beta uplift will be required.

The first patch was some ground work and that is already in 127 and is good to go with or without the 2nd patch.
The 2nd patch has r+ and is ready to land as of this morning. It would be nice to get that into 127. I can wait for the merge and uplift, or I could go ahead and land during the soft-freeze. Which do you prefer?

Flags: needinfo?(sfoster) → needinfo?(dmeehan)

Donal Meehan [:dmeehan]

Comment 14

•

9 months ago

It would be nice to get that into 127. I can wait for the merge and uplift, or I could go ahead and land during the soft-freeze. Which do you prefer?

If the intention is to get it into 127 beta, then I think it's better to land now. It will get some coverage in Nightly, and ride the train to beta with Fx127.
Your uplift request mentions it's low risk, so it seems like something that's still ok to land during soft code freeze.

Flags: needinfo?(dmeehan) → needinfo?(sfoster)

Sam Foster [:sfoster] (he/him)

Assignee

Comment 15

•

9 months ago

Ok that's queued for landing. Thanks!

Flags: needinfo?(sfoster)

Pulsebot

Comment 16

•

9 months ago

Pushed by sfoster@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/195d3e733db6 Use the loggers to capture errors and debug details during session restore.r=sessionstore-reviewers,markh,sclements

Sam Foster [:sfoster] (he/him)

Assignee

Comment 17

•

9 months ago

Comment on attachment 9398963 [details]
Bug 1854373 Use the loggers to capture errors and debug details during session restore.r?#sessionstore-reviewers!

Cancelling uplift request for now. With luck this will land before the merge and not be necessary.

Attachment #9398963 - Flags: approval-mozilla-beta?

Natalia Csoregi [:nataliaCs]

Comment 18

•

9 months ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/195d3e733db6

Status: REOPENED → RESOLVED

Closed: 1 year ago → 9 months ago

status-firefox127: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 127 Branch

Sam Foster [:sfoster] (he/him)

Assignee

Updated

•

9 months ago

Blocks: 1445547

Sam Foster [:sfoster] (he/him)

Assignee

Updated

•

9 months ago

Blocks: 1896791

You need to log in before you can comment on or make changes to this bug.