1804692 - Telemetry shutdown happens after profile-before-change and wants to write files but osfile is already shut down

Reporter

Description

•

2 years ago

In many logfiles we can see a sequence like

[task 2022-12-06T23:04:04.445Z] 23:04:04     INFO - GECKO(5360) | WARNING: A blocker encountered an error while we were waiting.
[task 2022-12-06T23:04:04.445Z] 23:04:04     INFO - GECKO(5360) |           Blocker:  Waiting for ping task
[task 2022-12-06T23:04:04.445Z] 23:04:04     INFO - GECKO(5360) |           Phase: TelemetryController: Waiting for pending ping activity
[task 2022-12-06T23:04:04.445Z] 23:04:04     INFO - GECKO(5360) |           State: (none)
[task 2022-12-06T23:04:04.445Z] 23:04:04     INFO - GECKO(5360) | WARNING: Error: Phase "profile-before-change" is finished, it is too late to register completion condition "OS.File: flush I/O queued before profileBeforeChange"
[task 2022-12-06T23:04:04.445Z] 23:04:04     INFO - GECKO(5360) | WARNING: addBlocker@resource://gre/modules/AsyncShutdown.jsm:718:15
[task 2022-12-06T23:04:04.446Z] 23:04:04     INFO - GECKO(5360) | addBlocker@resource://gre/modules/AsyncShutdown.jsm:523:26
[task 2022-12-06T23:04:04.446Z] 23:04:04     INFO - GECKO(5360) | addBlocker@resource://gre/modules/AsyncShutdown.jsm:462:15
[task 2022-12-06T23:04:04.446Z] 23:04:04     INFO - GECKO(5360) | setupShutdown@resource://gre/modules/osfile/osfile_async_front.jsm:1548:28
[task 2022-12-06T23:04:04.448Z] 23:04:04     INFO - GECKO(5360) | @resource://gre/modules/osfile/osfile_async_front.jsm:1568:16
[task 2022-12-06T23:04:04.449Z] 23:04:04     INFO - GECKO(5360) | @resource://gre/modules/osfile.jsm:12:30
[task 2022-12-06T23:04:04.450Z] 23:04:04     INFO - GECKO(5360) | @resource://gre/modules/TelemetryStorage.jsm:19:28
[task 2022-12-06T23:04:04.450Z] 23:04:04     INFO - GECKO(5360) | promiseArchivePing@resource://gre/modules/TelemetryArchive.jsm:103:5
[task 2022-12-06T23:04:04.451Z] 23:04:04     INFO - GECKO(5360) | promiseArchivePing@resource://gre/modules/TelemetryArchive.jsm:63:33
[task 2022-12-06T23:04:04.452Z] 23:04:04     INFO - GECKO(5360) | _submitPingLogic@resource://gre/modules/TelemetryControllerParent.jsm:540:48
[task 2022-12-06T23:04:04.452Z] 23:04:04     INFO - GECKO(5360) | send@resource://gre/modules/TelemetryControllerParent.jsm:628:24
[task 2022-12-06T23:04:04.453Z] 23:04:04     INFO - GECKO(5360) | submitExternalPing@resource://gre/modules/TelemetryControllerParent.jsm:182:17
[task 2022-12-06T23:04:04.454Z] 23:04:04     INFO - GECKO(5360) | sendPing@resource://gre/modules/EventPing.jsm:51:30
[task 2022-12-06T23:04:04.454Z] 23:04:04     INFO - GECKO(5360) | _submitPing@resource://gre/modules/EventPing.jsm:229:12
[task 2022-12-06T23:04:04.455Z] 23:04:04     INFO - GECKO(5360) | shutdown@resource://gre/modules/EventPing.jsm:112:10
[task 2022-12-06T23:04:04.456Z] 23:04:04     INFO - GECKO(5360) | _cleanupOnShutdown@resource://gre/modules/TelemetryControllerParent.jsm:940:31
[task 2022-12-06T23:04:04.457Z] 23:04:04     INFO - GECKO(5360) | shutdown/<@resource://gre/modules/TelemetryControllerParent.jsm:1010:24
[task 2022-12-06T23:04:04.457Z] 23:04:04     INFO - GECKO(5360) | observe@resource://gre/modules/AsyncShutdown.jsm:572:16

where the telemetry wants to archive remaining pings on shutdown, but osfile has been shut down and while trying to re-surrect it, adding a blocker for the already passed phase fails. This seems to not have any bad consequences, but it bloats log files significantly.

The shutdown blocker of telemetry.

The shutdown blocker setting from osfile.

It might just be the case that we want osfile to shutdown later, but I am not sure what other side effects this might have.

Chris H-C :chutten

Comment 1

•

2 years ago

May be related to some of the fun issues Barret found while they were trying to remove osfile from t/c/t in bug 1772938?

Flags: needinfo?(brennie)

Shane Hughes [:aminomancer]

Comment 2

•

2 years ago

•

Edited

Attached file mochitest error log — Details

I think this might be related to an issue I'm having - mochitests not working on Windows 10. It started today EDIT: My mochitest issue was unrelated to the log spam. If anyone else sees this issue and does the same search I did, here's how I resolved it. I figured the port was blocked so I just restarted my PC, but then it happened again so I assumed it must not be the port. Apparently hyper-V reserves a random range of ports, and I guess that just coincidentally included 2828 both times. Running this basically clears the port exclusion: reg add HKLM\SYSTEM\CurrentControlSet\Services\hns\State /v EnableExcludedPortRange /d 0 /f netsh int ipv[46] set dynamic tcp start=49152 num=16384 net stop winnat net start winnat

Beth Rennie [:beth] (she/her)

Comment 3

•

2 years ago

OK Here is what is happening:

Shutdown is triggered. At this point, no one has imported osfile.jsm, so it has not had a chance to set up its Shutdown blockers.

We hit this part of the stack trace: promiseArchivePing@resource://gre/modules/TelemetryArchive.jsm:103:5:

return lazy.TelemetryStorage.saveArchivedPing(ping);

We then load TelemetryStorage, which on line 19 imports osfile.jsm

Since this is the first import os osfile.jsm, we try to set up its shutdown blockers, but we're already in shutdown so we end up throwing the above error.

We can bandaid this with a check in osfile to not set up its shutdown handlers if we are already in shutdown and this is the first time it was imported.

Alternatively, bug 1802551 will solve this (patch is WIP locally -- haven't had a chance to finish it up yet).

Flags: needinfo?(brennie)

Jan-Erik Rediger [:janerik]

Updated

•

2 years ago

Duplicate of this bug: 1805153

Henrik Skupin [:whimboo][⌚️UTC+2]

Updated

•

2 years ago

Blocks: 1580652

Henrik Skupin [:whimboo][⌚️UTC+2]

Comment 5

•

2 years ago

(In reply to Jens Stutte [:jstutte] from comment #0)

where the telemetry wants to archive remaining pings on shutdown, but osfile has been shut down and while trying to re-surrect it, adding a blocker for the already passed phase fails. This seems to not have any bad consequences, but it bloats log files significantly.

It seems to delay the shutdown by around 5s and due to mochitests starting and quitting Firefox quite often cause a lot of jobs to run into a task timeout (see bug 1580652).

Jens Stutte [:jstutte]

Reporter

Comment 6

•

2 years ago

(In reply to Barret Rennie [:barret] (they/them) from comment #3)

OK Here is what is happening:

Shutdown is triggered. At this point, no one has imported osfile.jsm, so it has not had a chance to set up its Shutdown blockers.

Are we running an imported JS only once on the ever first import? IMHO I think we would see the same if we already passed the shutdown blocker, removed it (correctly) and then run some JS that wants to import osfile.jsm again (and thus sets up the blocker again). AFAICS there is nothing that prevents osfile from adding it again here? (Actually to my reading we are re-adding the same shutdown blocker at each import?)

We hit this part of the stack trace: promiseArchivePing@resource://gre/modules/TelemetryArchive.jsm:103:5:
return lazy.TelemetryStorage.saveArchivedPing(ping);
We then load TelemetryStorage, which on line 19 imports osfile.jsm

Since this is the first import os osfile.jsm, we try to set up its shutdown blockers, but we're already in shutdown so we end up throwing the above error.

We can bandaid this with a check in osfile to not set up its shutdown handlers if we are already in shutdown and this is the first time it was imported.

Yes, though I'd assume we need it to throw an error that should be handled more gently by whoever uses us and/or be prepared to receive subsequent calls for file operations that we should just refuse?

Alternatively, bug 1802551 will solve this (patch is WIP locally -- haven't had a chance to finish it up yet).

I think osfile should be more resilient on its own, though having telemetry not need to rely on it is probably a good thing.

Beth Rennie [:beth] (she/her)

Comment 7

•

2 years ago

(In reply to Jens Stutte [:jstutte] from comment #6)

(In reply to Barret Rennie [:barret] (they/them) from comment #3)

OK Here is what is happening:

Shutdown is triggered. At this point, no one has imported osfile.jsm, so it has not had a chance to set up its Shutdown blockers.

Are we running an imported JS only once on the ever first import? IMHO I think we would see the same if we already passed the shutdown blocker, removed it (correctly) and then run some JS that wants to import osfile.jsm again (and thus sets up the blocker again). AFAICS there is nothing that prevents osfile from adding it again here? (Actually to my reading we are re-adding the same shutdown blocker at each import?)

JSMs will only ever execute once.

We hit this part of the stack trace: promiseArchivePing@resource://gre/modules/TelemetryArchive.jsm:103:5:
return lazy.TelemetryStorage.saveArchivedPing(ping);
We then load TelemetryStorage, which on line 19 imports osfile.jsm

Since this is the first import os osfile.jsm, we try to set up its shutdown blockers, but we're already in shutdown so we end up throwing the above error.

We can bandaid this with a check in osfile to not set up its shutdown handlers if we are already in shutdown and this is the first time it was imported.
Yes, though I'd assume we need it to throw an error that should be handled more gently by whoever uses us and/or be prepared to receive subsequent calls for file operations that we should just refuse?

Alternatively, bug 1802551 will solve this (patch is WIP locally -- haven't had a chance to finish it up yet).

I think osfile should be more resilient on its own, though having telemetry not need to rely on it is probably a good thing.

osfile is going to be removed in bug 1776480 once bug 1737308 is finished, so anything we add to it will only be temporary.

Jens Stutte [:jstutte]

Reporter

Comment 8

•

2 years ago

(In reply to Barret Rennie [:barret] (they/them) from comment #7)

JSMs will only ever execute once.

Thanks, TIL.

osfile is going to be removed in bug 1776480 once bug 1737308 is finished, so anything we add to it will only be temporary.

Thanks for working on that!

Comment hidden (Intermittent Failures Robot)

Alessio Placitelli [:Dexter]

Comment 10

•

2 years ago

Hey Chris, would you kindly set the appropriate severity field for this? I'm getting scary emails :)

Flags: needinfo?(chutten)

Chris H-C :chutten

Comment 11

•

2 years ago

Happy to dupe this against bug 1802551 if that's where brennie's taking care of it.

Status: NEW → RESOLVED

Closed: 2 years ago

Duplicate of bug: 1802551

Flags: needinfo?(chutten)

Resolution: --- → DUPLICATE

Bugzilla

Telemetry shutdown happens after profile-before-change and wants to write files but osfile is already shut down

Categories

(Toolkit :: Telemetry, defect)

Tracking

()

People

(Reporter: jstutte, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Comment 3

Updated

Updated

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Attachment

General

Description

File Name

Content Type