Telemetry shutdown happens after profile-before-change and wants to write files but osfile is already shut down
Categories
(Toolkit :: Telemetry, defect)
Tracking
()
People
(Reporter: jstutte, Unassigned)
References
Details
Attachments
(1 file)
9.89 KB,
text/plain
|
Details |
In many logfiles we can see a sequence like
[task 2022-12-06T23:04:04.445Z] 23:04:04 INFO - GECKO(5360) | WARNING: A blocker encountered an error while we were waiting.
[task 2022-12-06T23:04:04.445Z] 23:04:04 INFO - GECKO(5360) | Blocker: Waiting for ping task
[task 2022-12-06T23:04:04.445Z] 23:04:04 INFO - GECKO(5360) | Phase: TelemetryController: Waiting for pending ping activity
[task 2022-12-06T23:04:04.445Z] 23:04:04 INFO - GECKO(5360) | State: (none)
[task 2022-12-06T23:04:04.445Z] 23:04:04 INFO - GECKO(5360) | WARNING: Error: Phase "profile-before-change" is finished, it is too late to register completion condition "OS.File: flush I/O queued before profileBeforeChange"
[task 2022-12-06T23:04:04.445Z] 23:04:04 INFO - GECKO(5360) | WARNING: addBlocker@resource://gre/modules/AsyncShutdown.jsm:718:15
[task 2022-12-06T23:04:04.446Z] 23:04:04 INFO - GECKO(5360) | addBlocker@resource://gre/modules/AsyncShutdown.jsm:523:26
[task 2022-12-06T23:04:04.446Z] 23:04:04 INFO - GECKO(5360) | addBlocker@resource://gre/modules/AsyncShutdown.jsm:462:15
[task 2022-12-06T23:04:04.446Z] 23:04:04 INFO - GECKO(5360) | setupShutdown@resource://gre/modules/osfile/osfile_async_front.jsm:1548:28
[task 2022-12-06T23:04:04.448Z] 23:04:04 INFO - GECKO(5360) | @resource://gre/modules/osfile/osfile_async_front.jsm:1568:16
[task 2022-12-06T23:04:04.449Z] 23:04:04 INFO - GECKO(5360) | @resource://gre/modules/osfile.jsm:12:30
[task 2022-12-06T23:04:04.450Z] 23:04:04 INFO - GECKO(5360) | @resource://gre/modules/TelemetryStorage.jsm:19:28
[task 2022-12-06T23:04:04.450Z] 23:04:04 INFO - GECKO(5360) | promiseArchivePing@resource://gre/modules/TelemetryArchive.jsm:103:5
[task 2022-12-06T23:04:04.451Z] 23:04:04 INFO - GECKO(5360) | promiseArchivePing@resource://gre/modules/TelemetryArchive.jsm:63:33
[task 2022-12-06T23:04:04.452Z] 23:04:04 INFO - GECKO(5360) | _submitPingLogic@resource://gre/modules/TelemetryControllerParent.jsm:540:48
[task 2022-12-06T23:04:04.452Z] 23:04:04 INFO - GECKO(5360) | send@resource://gre/modules/TelemetryControllerParent.jsm:628:24
[task 2022-12-06T23:04:04.453Z] 23:04:04 INFO - GECKO(5360) | submitExternalPing@resource://gre/modules/TelemetryControllerParent.jsm:182:17
[task 2022-12-06T23:04:04.454Z] 23:04:04 INFO - GECKO(5360) | sendPing@resource://gre/modules/EventPing.jsm:51:30
[task 2022-12-06T23:04:04.454Z] 23:04:04 INFO - GECKO(5360) | _submitPing@resource://gre/modules/EventPing.jsm:229:12
[task 2022-12-06T23:04:04.455Z] 23:04:04 INFO - GECKO(5360) | shutdown@resource://gre/modules/EventPing.jsm:112:10
[task 2022-12-06T23:04:04.456Z] 23:04:04 INFO - GECKO(5360) | _cleanupOnShutdown@resource://gre/modules/TelemetryControllerParent.jsm:940:31
[task 2022-12-06T23:04:04.457Z] 23:04:04 INFO - GECKO(5360) | shutdown/<@resource://gre/modules/TelemetryControllerParent.jsm:1010:24
[task 2022-12-06T23:04:04.457Z] 23:04:04 INFO - GECKO(5360) | observe@resource://gre/modules/AsyncShutdown.jsm:572:16
where the telemetry wants to archive remaining pings on shutdown, but osfile has been shut down and while trying to re-surrect it, adding a blocker for the already passed phase fails. This seems to not have any bad consequences, but it bloats log files significantly.
The shutdown blocker of telemetry.
The shutdown blocker setting from osfile.
It might just be the case that we want osfile to shutdown later, but I am not sure what other side effects this might have.
Comment 1•2 years ago
|
||
May be related to some of the fun issues Barret found while they were trying to remove osfile from t/c/t in bug 1772938?
Comment 2•2 years ago
•
|
||
Comment 3•2 years ago
|
||
OK Here is what is happening:
Shutdown is triggered. At this point, no one has imported osfile.jsm, so it has not had a chance to set up its Shutdown blockers.
We hit this part of the stack trace: promiseArchivePing@resource://gre/modules/TelemetryArchive.jsm:103:5
:
return lazy.TelemetryStorage.saveArchivedPing(ping);
We then load TelemetryStorage, which on line 19 imports osfile.jsm
Since this is the first import os osfile.jsm, we try to set up its shutdown blockers, but we're already in shutdown so we end up throwing the above error.
We can bandaid this with a check in osfile to not set up its shutdown handlers if we are already in shutdown and this is the first time it was imported.
Alternatively, bug 1802551 will solve this (patch is WIP locally -- haven't had a chance to finish it up yet).
Comment 5•2 years ago
|
||
(In reply to Jens Stutte [:jstutte] from comment #0)
where the telemetry wants to archive remaining pings on shutdown, but osfile has been shut down and while trying to re-surrect it, adding a blocker for the already passed phase fails. This seems to not have any bad consequences, but it bloats log files significantly.
It seems to delay the shutdown by around 5s and due to mochitests starting and quitting Firefox quite often cause a lot of jobs to run into a task timeout (see bug 1580652).
Reporter | ||
Comment 6•2 years ago
|
||
(In reply to Barret Rennie [:barret] (they/them) from comment #3)
OK Here is what is happening:
Shutdown is triggered. At this point, no one has imported osfile.jsm, so it has not had a chance to set up its Shutdown blockers.
Are we running an imported JS only once on the ever first import? IMHO I think we would see the same if we already passed the shutdown blocker, removed it (correctly) and then run some JS that wants to import osfile.jsm again (and thus sets up the blocker again). AFAICS there is nothing that prevents osfile from adding it again here? (Actually to my reading we are re-adding the same shutdown blocker at each import?)
We hit this part of the stack trace:
promiseArchivePing@resource://gre/modules/TelemetryArchive.jsm:103:5
:return lazy.TelemetryStorage.saveArchivedPing(ping);
We then load TelemetryStorage, which on line 19 imports osfile.jsm
Since this is the first import os osfile.jsm, we try to set up its shutdown blockers, but we're already in shutdown so we end up throwing the above error.
We can bandaid this with a check in osfile to not set up its shutdown handlers if we are already in shutdown and this is the first time it was imported.
Yes, though I'd assume we need it to throw an error that should be handled more gently by whoever uses us and/or be prepared to receive subsequent calls for file operations that we should just refuse?
Alternatively, bug 1802551 will solve this (patch is WIP locally -- haven't had a chance to finish it up yet).
I think osfile should be more resilient on its own, though having telemetry not need to rely on it is probably a good thing.
Comment 7•2 years ago
|
||
(In reply to Jens Stutte [:jstutte] from comment #6)
(In reply to Barret Rennie [:barret] (they/them) from comment #3)
OK Here is what is happening:
Shutdown is triggered. At this point, no one has imported osfile.jsm, so it has not had a chance to set up its Shutdown blockers.
Are we running an imported JS only once on the ever first import? IMHO I think we would see the same if we already passed the shutdown blocker, removed it (correctly) and then run some JS that wants to import osfile.jsm again (and thus sets up the blocker again). AFAICS there is nothing that prevents osfile from adding it again here? (Actually to my reading we are re-adding the same shutdown blocker at each import?)
JSMs will only ever execute once.
We hit this part of the stack trace:
promiseArchivePing@resource://gre/modules/TelemetryArchive.jsm:103:5
:return lazy.TelemetryStorage.saveArchivedPing(ping);
We then load TelemetryStorage, which on line 19 imports osfile.jsm
Since this is the first import os osfile.jsm, we try to set up its shutdown blockers, but we're already in shutdown so we end up throwing the above error.
We can bandaid this with a check in osfile to not set up its shutdown handlers if we are already in shutdown and this is the first time it was imported.
Yes, though I'd assume we need it to throw an error that should be handled more gently by whoever uses us and/or be prepared to receive subsequent calls for file operations that we should just refuse?
Alternatively, bug 1802551 will solve this (patch is WIP locally -- haven't had a chance to finish it up yet).
I think osfile should be more resilient on its own, though having telemetry not need to rely on it is probably a good thing.
osfile is going to be removed in bug 1776480 once bug 1737308 is finished, so anything we add to it will only be temporary.
Reporter | ||
Comment 8•2 years ago
|
||
(In reply to Barret Rennie [:barret] (they/them) from comment #7)
JSMs will only ever execute once.
Thanks, TIL.
osfile is going to be removed in bug 1776480 once bug 1737308 is finished, so anything we add to it will only be temporary.
Thanks for working on that!
Comment hidden (Intermittent Failures Robot) |
Comment 10•2 years ago
|
||
Hey Chris, would you kindly set the appropriate severity field for this? I'm getting scary emails :)
Comment 11•2 years ago
|
||
Happy to dupe this against bug 1802551 if that's where brennie's taking care of it.
Description
•