Closed Bug 1804692 Opened 2 years ago Closed 2 years ago

Telemetry shutdown happens after profile-before-change and wants to write files but osfile is already shut down

Categories

(Toolkit :: Telemetry, defect)

defect

Tracking

()

RESOLVED DUPLICATE of bug 1802551

People

(Reporter: jstutte, Unassigned)

References

Details

Attachments

(1 file)

In many logfiles we can see a sequence like

[task 2022-12-06T23:04:04.445Z] 23:04:04     INFO - GECKO(5360) | WARNING: A blocker encountered an error while we were waiting.
[task 2022-12-06T23:04:04.445Z] 23:04:04     INFO - GECKO(5360) |           Blocker:  Waiting for ping task
[task 2022-12-06T23:04:04.445Z] 23:04:04     INFO - GECKO(5360) |           Phase: TelemetryController: Waiting for pending ping activity
[task 2022-12-06T23:04:04.445Z] 23:04:04     INFO - GECKO(5360) |           State: (none)
[task 2022-12-06T23:04:04.445Z] 23:04:04     INFO - GECKO(5360) | WARNING: Error: Phase "profile-before-change" is finished, it is too late to register completion condition "OS.File: flush I/O queued before profileBeforeChange"
[task 2022-12-06T23:04:04.445Z] 23:04:04     INFO - GECKO(5360) | WARNING: addBlocker@resource://gre/modules/AsyncShutdown.jsm:718:15
[task 2022-12-06T23:04:04.446Z] 23:04:04     INFO - GECKO(5360) | addBlocker@resource://gre/modules/AsyncShutdown.jsm:523:26
[task 2022-12-06T23:04:04.446Z] 23:04:04     INFO - GECKO(5360) | addBlocker@resource://gre/modules/AsyncShutdown.jsm:462:15
[task 2022-12-06T23:04:04.446Z] 23:04:04     INFO - GECKO(5360) | setupShutdown@resource://gre/modules/osfile/osfile_async_front.jsm:1548:28
[task 2022-12-06T23:04:04.448Z] 23:04:04     INFO - GECKO(5360) | @resource://gre/modules/osfile/osfile_async_front.jsm:1568:16
[task 2022-12-06T23:04:04.449Z] 23:04:04     INFO - GECKO(5360) | @resource://gre/modules/osfile.jsm:12:30
[task 2022-12-06T23:04:04.450Z] 23:04:04     INFO - GECKO(5360) | @resource://gre/modules/TelemetryStorage.jsm:19:28
[task 2022-12-06T23:04:04.450Z] 23:04:04     INFO - GECKO(5360) | promiseArchivePing@resource://gre/modules/TelemetryArchive.jsm:103:5
[task 2022-12-06T23:04:04.451Z] 23:04:04     INFO - GECKO(5360) | promiseArchivePing@resource://gre/modules/TelemetryArchive.jsm:63:33
[task 2022-12-06T23:04:04.452Z] 23:04:04     INFO - GECKO(5360) | _submitPingLogic@resource://gre/modules/TelemetryControllerParent.jsm:540:48
[task 2022-12-06T23:04:04.452Z] 23:04:04     INFO - GECKO(5360) | send@resource://gre/modules/TelemetryControllerParent.jsm:628:24
[task 2022-12-06T23:04:04.453Z] 23:04:04     INFO - GECKO(5360) | submitExternalPing@resource://gre/modules/TelemetryControllerParent.jsm:182:17
[task 2022-12-06T23:04:04.454Z] 23:04:04     INFO - GECKO(5360) | sendPing@resource://gre/modules/EventPing.jsm:51:30
[task 2022-12-06T23:04:04.454Z] 23:04:04     INFO - GECKO(5360) | _submitPing@resource://gre/modules/EventPing.jsm:229:12
[task 2022-12-06T23:04:04.455Z] 23:04:04     INFO - GECKO(5360) | shutdown@resource://gre/modules/EventPing.jsm:112:10
[task 2022-12-06T23:04:04.456Z] 23:04:04     INFO - GECKO(5360) | _cleanupOnShutdown@resource://gre/modules/TelemetryControllerParent.jsm:940:31
[task 2022-12-06T23:04:04.457Z] 23:04:04     INFO - GECKO(5360) | shutdown/<@resource://gre/modules/TelemetryControllerParent.jsm:1010:24
[task 2022-12-06T23:04:04.457Z] 23:04:04     INFO - GECKO(5360) | observe@resource://gre/modules/AsyncShutdown.jsm:572:16

where the telemetry wants to archive remaining pings on shutdown, but osfile has been shut down and while trying to re-surrect it, adding a blocker for the already passed phase fails. This seems to not have any bad consequences, but it bloats log files significantly.

The shutdown blocker of telemetry.

The shutdown blocker setting from osfile.

It might just be the case that we want osfile to shutdown later, but I am not sure what other side effects this might have.

May be related to some of the fun issues Barret found while they were trying to remove osfile from t/c/t in bug 1772938?

Flags: needinfo?(brennie)
Attached file mochitest error log
I think this might be related to an issue I'm having - mochitests not working on Windows 10. It started today EDIT: My mochitest issue was unrelated to the log spam. If anyone else sees this issue and does the same search I did, here's how I resolved it. I figured the port was blocked so I just restarted my PC, but then it happened again so I assumed it must not be the port. Apparently hyper-V reserves a random range of ports, and I guess that just coincidentally included 2828 both times. Running this basically clears the port exclusion: reg add HKLM\SYSTEM\CurrentControlSet\Services\hns\State /v EnableExcludedPortRange /d 0 /f netsh int ipv[46] set dynamic tcp start=49152 num=16384 net stop winnat net start winnat

OK Here is what is happening:

Shutdown is triggered. At this point, no one has imported osfile.jsm, so it has not had a chance to set up its Shutdown blockers.

We hit this part of the stack trace: promiseArchivePing@resource://gre/modules/TelemetryArchive.jsm:103:5:

return lazy.TelemetryStorage.saveArchivedPing(ping);

We then load TelemetryStorage, which on line 19 imports osfile.jsm

Since this is the first import os osfile.jsm, we try to set up its shutdown blockers, but we're already in shutdown so we end up throwing the above error.

We can bandaid this with a check in osfile to not set up its shutdown handlers if we are already in shutdown and this is the first time it was imported.

Alternatively, bug 1802551 will solve this (patch is WIP locally -- haven't had a chance to finish it up yet).

Flags: needinfo?(brennie)
Duplicate of this bug: 1805153

(In reply to Jens Stutte [:jstutte] from comment #0)

where the telemetry wants to archive remaining pings on shutdown, but osfile has been shut down and while trying to re-surrect it, adding a blocker for the already passed phase fails. This seems to not have any bad consequences, but it bloats log files significantly.

It seems to delay the shutdown by around 5s and due to mochitests starting and quitting Firefox quite often cause a lot of jobs to run into a task timeout (see bug 1580652).

(In reply to Barret Rennie [:barret] (they/them) from comment #3)

OK Here is what is happening:

Shutdown is triggered. At this point, no one has imported osfile.jsm, so it has not had a chance to set up its Shutdown blockers.

Are we running an imported JS only once on the ever first import? IMHO I think we would see the same if we already passed the shutdown blocker, removed it (correctly) and then run some JS that wants to import osfile.jsm again (and thus sets up the blocker again). AFAICS there is nothing that prevents osfile from adding it again here? (Actually to my reading we are re-adding the same shutdown blocker at each import?)

We hit this part of the stack trace: promiseArchivePing@resource://gre/modules/TelemetryArchive.jsm:103:5:

return lazy.TelemetryStorage.saveArchivedPing(ping);

We then load TelemetryStorage, which on line 19 imports osfile.jsm

Since this is the first import os osfile.jsm, we try to set up its shutdown blockers, but we're already in shutdown so we end up throwing the above error.

We can bandaid this with a check in osfile to not set up its shutdown handlers if we are already in shutdown and this is the first time it was imported.

Yes, though I'd assume we need it to throw an error that should be handled more gently by whoever uses us and/or be prepared to receive subsequent calls for file operations that we should just refuse?

Alternatively, bug 1802551 will solve this (patch is WIP locally -- haven't had a chance to finish it up yet).

I think osfile should be more resilient on its own, though having telemetry not need to rely on it is probably a good thing.

(In reply to Jens Stutte [:jstutte] from comment #6)

(In reply to Barret Rennie [:barret] (they/them) from comment #3)

OK Here is what is happening:

Shutdown is triggered. At this point, no one has imported osfile.jsm, so it has not had a chance to set up its Shutdown blockers.

Are we running an imported JS only once on the ever first import? IMHO I think we would see the same if we already passed the shutdown blocker, removed it (correctly) and then run some JS that wants to import osfile.jsm again (and thus sets up the blocker again). AFAICS there is nothing that prevents osfile from adding it again here? (Actually to my reading we are re-adding the same shutdown blocker at each import?)

JSMs will only ever execute once.

We hit this part of the stack trace: promiseArchivePing@resource://gre/modules/TelemetryArchive.jsm:103:5:

return lazy.TelemetryStorage.saveArchivedPing(ping);

We then load TelemetryStorage, which on line 19 imports osfile.jsm

Since this is the first import os osfile.jsm, we try to set up its shutdown blockers, but we're already in shutdown so we end up throwing the above error.

We can bandaid this with a check in osfile to not set up its shutdown handlers if we are already in shutdown and this is the first time it was imported.

Yes, though I'd assume we need it to throw an error that should be handled more gently by whoever uses us and/or be prepared to receive subsequent calls for file operations that we should just refuse?

Alternatively, bug 1802551 will solve this (patch is WIP locally -- haven't had a chance to finish it up yet).

I think osfile should be more resilient on its own, though having telemetry not need to rely on it is probably a good thing.

osfile is going to be removed in bug 1776480 once bug 1737308 is finished, so anything we add to it will only be temporary.

(In reply to Barret Rennie [:barret] (they/them) from comment #7)

JSMs will only ever execute once.

Thanks, TIL.

osfile is going to be removed in bug 1776480 once bug 1737308 is finished, so anything we add to it will only be temporary.

Thanks for working on that!

Hey Chris, would you kindly set the appropriate severity field for this? I'm getting scary emails :)

Flags: needinfo?(chutten)

Happy to dupe this against bug 1802551 if that's where brennie's taking care of it.

Status: NEW → RESOLVED
Closed: 2 years ago
Duplicate of bug: 1802551
Flags: needinfo?(chutten)
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: