Bug 1832089 Comment 1 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

Note also that there are things that potentially do something **after** the `RunBackgroundTask` returned (thus after any timeout we add there):

1. [`AddScreenWakeLockListener();`](https://searchfox.org/mozilla-central/rev/373d05f4eabdb90a6480d5d36d983b8bff36c9d8/widget/windows/nsAppShell.cpp#612) (I wonder if that can ever be useful for background tasks?)
2. [`GetTopSEHFilter();`](https://searchfox.org/mozilla-central/rev/445a5ab684b73eb56d807d0f3b2fabcc85a7c3dd/ipc/chromium/src/base/message_loop.cc#354-359) (not sure how to follow the code path here, but setting exception handlers might have OS level side effects hard to control?)

I wonder if we should better wrap the background task into a runnable we dispatch to the main thread, letting it be executed later on top of our normal message loop run? Apart from the above, if ever something was able to dispatch events asynchronously to MT before we come here, we would run them before our payload runs.

Moving the discussion from bug 1798904:

(In reply to Nick Alexander :nalexander [he/him] from comment #12)
> (In reply to Jens Stutte [:jstutte] from comment #9)
> > Interestingly we made a quite good job in [avoiding AsyncShutdownTimeout for background tasks](https://crash-stats.mozilla.org/search/?async_shutdown_timeout=%21__null__&background_task_name=%21__null__&product=Firefox&date=%3E%3D2023-04-09T07%3A23%3A00.000Z&date=%3C2023-05-09T07%3A23%3A00.000Z&_facets=signature&_sort=-date&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature), see for example bug 1820517, bug 1814104, bug 1799421. The remaining hangs are probably "things that start and were never even designed to interact with shutdown" such that the terminator catches them.
> > 
> > I think we have several paths here (and might want all):
> > 
> > 1. Mitigate the impact by having an (arbitrary) timeout at the end of the background task before starting  the shutdown sequence of the process. This is an easy hack but obviously not really nice. And we should check the `Shutdown Reason` annotation if we see many cases of OS shutdown (my gut feeling is: no, but it is not yet aggregatable, so we should take a closer look) - delaying process end could delay OS shutdown. But I assume the `backgroundupdate` is something we run anytime during normal operations, IIUC.
> > Edit regarding 1.: I assume [here in `runBackgroundTaskNamed`](https://searchfox.org/mozilla-central/rev/373d05f4eabdb90a6480d5d36d983b8bff36c9d8/toolkit/components/backgroundtasks/BackgroundTasksManager.sys.mjs#237-239) could be a good place to add a waiting promise.
> 
> `backgroundupdate` has an arbitrary short wait as it tries to let Glean uploads proceed: https://searchfox.org/mozilla-central/rev/373d05f4eabdb90a6480d5d36d983b8bff36c9d8/toolkit/mozapps/update/BackgroundTask_backgroundupdate.sys.mjs#414-416.  Glean has improved greatly so this is no longer neccessary.
> 
> I'd be happy to lift this delay to a per-task-configured delay (like the per-task-configured timeout) in the tasks manager, and to extend it, if it might help.  But I doubt that it would help in a non-linear fashion: we have racing startup and shutdown and racers gonna race.  I'd like to see more labeling of activities at startup and shutdown to help us understand ones that are failing in the wild.

I think we might want to be able to try around a bit with different values for that delay. 500ms reads pretty short, at least on low end machines.
Note also that there are things that potentially do something **after** the `RunBackgroundTask` returned (thus after any timeout we add there):

1. [`AddScreenWakeLockListener();`](https://searchfox.org/mozilla-central/rev/373d05f4eabdb90a6480d5d36d983b8bff36c9d8/widget/windows/nsAppShell.cpp#612) (I wonder if that can ever be useful for background tasks?)
2. [`GetTopSEHFilter();`](https://searchfox.org/mozilla-central/rev/445a5ab684b73eb56d807d0f3b2fabcc85a7c3dd/ipc/chromium/src/base/message_loop.cc#354-359) (not sure how to follow the code path here, but setting exception handlers might have OS level side effects hard to control?)

I wonder if we should better [wrap the background task into a runnable](https://searchfox.org/mozilla-central/rev/445a5ab684b73eb56d807d0f3b2fabcc85a7c3dd/toolkit/xre/nsAppRunner.cpp#5651) we dispatch to the main thread, letting it be executed later on top of our normal message loop run? Apart from the above, if ever something was able to dispatch events asynchronously to MT before we come here, we would run them before our payload runs.

Moving the discussion from bug 1798904:

(In reply to Nick Alexander :nalexander [he/him] from comment #12)
> (In reply to Jens Stutte [:jstutte] from comment #9)
> > Interestingly we made a quite good job in [avoiding AsyncShutdownTimeout for background tasks](https://crash-stats.mozilla.org/search/?async_shutdown_timeout=%21__null__&background_task_name=%21__null__&product=Firefox&date=%3E%3D2023-04-09T07%3A23%3A00.000Z&date=%3C2023-05-09T07%3A23%3A00.000Z&_facets=signature&_sort=-date&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature), see for example bug 1820517, bug 1814104, bug 1799421. The remaining hangs are probably "things that start and were never even designed to interact with shutdown" such that the terminator catches them.
> > 
> > I think we have several paths here (and might want all):
> > 
> > 1. Mitigate the impact by having an (arbitrary) timeout at the end of the background task before starting  the shutdown sequence of the process. This is an easy hack but obviously not really nice. And we should check the `Shutdown Reason` annotation if we see many cases of OS shutdown (my gut feeling is: no, but it is not yet aggregatable, so we should take a closer look) - delaying process end could delay OS shutdown. But I assume the `backgroundupdate` is something we run anytime during normal operations, IIUC.
> > Edit regarding 1.: I assume [here in `runBackgroundTaskNamed`](https://searchfox.org/mozilla-central/rev/373d05f4eabdb90a6480d5d36d983b8bff36c9d8/toolkit/components/backgroundtasks/BackgroundTasksManager.sys.mjs#237-239) could be a good place to add a waiting promise.
> 
> `backgroundupdate` has an arbitrary short wait as it tries to let Glean uploads proceed: https://searchfox.org/mozilla-central/rev/373d05f4eabdb90a6480d5d36d983b8bff36c9d8/toolkit/mozapps/update/BackgroundTask_backgroundupdate.sys.mjs#414-416.  Glean has improved greatly so this is no longer neccessary.
> 
> I'd be happy to lift this delay to a per-task-configured delay (like the per-task-configured timeout) in the tasks manager, and to extend it, if it might help.  But I doubt that it would help in a non-linear fashion: we have racing startup and shutdown and racers gonna race.  I'd like to see more labeling of activities at startup and shutdown to help us understand ones that are failing in the wild.

I think we might want to be able to try around a bit with different values for that delay. 500ms reads pretty short, at least on low end machines.
Note also that there are things that potentially do something **after** the `RunBackgroundTask` returned (thus after any timeout we add there):

1. [`AddScreenWakeLockListener();`](https://searchfox.org/mozilla-central/rev/373d05f4eabdb90a6480d5d36d983b8bff36c9d8/widget/windows/nsAppShell.cpp#612) (I wonder if that can ever be useful for background tasks?)
2. [`GetTopSEHFilter();`](https://searchfox.org/mozilla-central/rev/445a5ab684b73eb56d807d0f3b2fabcc85a7c3dd/ipc/chromium/src/base/message_loop.cc#354-359) (not sure how to follow the code path here, but setting exception handlers might have OS level side effects hard to control?)

I wonder if we should better [wrap the background task into a runnable](https://searchfox.org/mozilla-central/rev/445a5ab684b73eb56d807d0f3b2fabcc85a7c3dd/toolkit/xre/nsAppRunner.cpp#5651) we dispatch to the main thread, letting it be executed later on top of our normal message loop run? Apart from the above, if ever something was able to dispatch events asynchronously to MT before we come here, we would run them before our payload runs.

Moving the discussion from bug 1798904:

(In reply to Nick Alexander :nalexander [he/him] from comment #12)
> (In reply to Jens Stutte [:jstutte] from comment #9)
> > Interestingly we made a quite good job in [avoiding AsyncShutdownTimeout for background tasks](https://crash-stats.mozilla.org/search/?async_shutdown_timeout=%21__null__&background_task_name=%21__null__&product=Firefox&date=%3E%3D2023-04-09T07%3A23%3A00.000Z&date=%3C2023-05-09T07%3A23%3A00.000Z&_facets=signature&_sort=-date&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature), see for example bug 1820517, bug 1814104, bug 1799421. The remaining hangs are probably "things that start and were never even designed to interact with shutdown" such that the terminator catches them.
> > 
> > I think we have several paths here (and might want all):
> > 
> > 1. Mitigate the impact by having an (arbitrary) timeout at the end of the background task before starting  the shutdown sequence of the process. This is an easy hack but obviously not really nice. And we should check the `Shutdown Reason` annotation if we see many cases of OS shutdown (my gut feeling is: no, but it is not yet aggregatable, so we should take a closer look) - delaying process end could delay OS shutdown. But I assume the `backgroundupdate` is something we run anytime during normal operations, IIUC.
> > Edit regarding 1.: I assume [here in `runBackgroundTaskNamed`](https://searchfox.org/mozilla-central/rev/373d05f4eabdb90a6480d5d36d983b8bff36c9d8/toolkit/components/backgroundtasks/BackgroundTasksManager.sys.mjs#237-239) could be a good place to add a waiting promise.
> 
> `backgroundupdate` has an arbitrary short wait as it tries to let Glean uploads proceed: https://searchfox.org/mozilla-central/rev/373d05f4eabdb90a6480d5d36d983b8bff36c9d8/toolkit/mozapps/update/BackgroundTask_backgroundupdate.sys.mjs#414-416.  Glean has improved greatly so this is no longer neccessary.
> 
> I'd be happy to lift this delay to a per-task-configured delay (like the per-task-configured timeout) in the tasks manager, and to extend it, if it might help.  But I doubt that it would help in a non-linear fashion: we have racing startup and shutdown and racers gonna race.  I'd like to see more labeling of activities at startup and shutdown to help us understand ones that are failing in the wild.

I think we might want to be able to try around a bit with different values for that delay. 500ms reads pretty short, at least on low end machines. And we might want to consider also having a "launch delay" for the above runnable, to give more things the occasion to dispatch their events to MT before we block it with our payload.
Note also that there are things that potentially do something **after** the `RunBackgroundTask` returned (~~thus after any timeout we add there~~ this is not true, as any async/await will actually make us return and dispatch a new delayed runnable that will then continue the execution. So we might return even in the middle of the execution and then enter our main loop):

1. [`AddScreenWakeLockListener();`](https://searchfox.org/mozilla-central/rev/373d05f4eabdb90a6480d5d36d983b8bff36c9d8/widget/windows/nsAppShell.cpp#612) (I wonder if that can ever be useful for background tasks?)
2. [`GetTopSEHFilter();`](https://searchfox.org/mozilla-central/rev/445a5ab684b73eb56d807d0f3b2fabcc85a7c3dd/ipc/chromium/src/base/message_loop.cc#354-359) (not sure how to follow the code path here, but setting exception handlers might have OS level side effects hard to control?)

I wonder if we should better [wrap the background task into a runnable](https://searchfox.org/mozilla-central/rev/445a5ab684b73eb56d807d0f3b2fabcc85a7c3dd/toolkit/xre/nsAppRunner.cpp#5651) we dispatch to the main thread, letting it be executed later on top of our normal message loop run? Apart from the above, if ever something was able to dispatch events asynchronously to MT before we come here, we would run them before our payload runs.

Moving the discussion from bug 1798904:

(In reply to Nick Alexander :nalexander [he/him] from comment #12)
> (In reply to Jens Stutte [:jstutte] from comment #9)
> > Interestingly we made a quite good job in [avoiding AsyncShutdownTimeout for background tasks](https://crash-stats.mozilla.org/search/?async_shutdown_timeout=%21__null__&background_task_name=%21__null__&product=Firefox&date=%3E%3D2023-04-09T07%3A23%3A00.000Z&date=%3C2023-05-09T07%3A23%3A00.000Z&_facets=signature&_sort=-date&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature), see for example bug 1820517, bug 1814104, bug 1799421. The remaining hangs are probably "things that start and were never even designed to interact with shutdown" such that the terminator catches them.
> > 
> > I think we have several paths here (and might want all):
> > 
> > 1. Mitigate the impact by having an (arbitrary) timeout at the end of the background task before starting  the shutdown sequence of the process. This is an easy hack but obviously not really nice. And we should check the `Shutdown Reason` annotation if we see many cases of OS shutdown (my gut feeling is: no, but it is not yet aggregatable, so we should take a closer look) - delaying process end could delay OS shutdown. But I assume the `backgroundupdate` is something we run anytime during normal operations, IIUC.
> > Edit regarding 1.: I assume [here in `runBackgroundTaskNamed`](https://searchfox.org/mozilla-central/rev/373d05f4eabdb90a6480d5d36d983b8bff36c9d8/toolkit/components/backgroundtasks/BackgroundTasksManager.sys.mjs#237-239) could be a good place to add a waiting promise.
> 
> `backgroundupdate` has an arbitrary short wait as it tries to let Glean uploads proceed: https://searchfox.org/mozilla-central/rev/373d05f4eabdb90a6480d5d36d983b8bff36c9d8/toolkit/mozapps/update/BackgroundTask_backgroundupdate.sys.mjs#414-416.  Glean has improved greatly so this is no longer neccessary.
> 
> I'd be happy to lift this delay to a per-task-configured delay (like the per-task-configured timeout) in the tasks manager, and to extend it, if it might help.  But I doubt that it would help in a non-linear fashion: we have racing startup and shutdown and racers gonna race.  I'd like to see more labeling of activities at startup and shutdown to help us understand ones that are failing in the wild.

I think we might want to be able to try around a bit with different values for that delay. 500ms reads pretty short, at least on low end machines. And we might want to consider also having a "launch delay" for the above runnable, to give more things the occasion to dispatch their events to MT before we block it with our payload.

Back to Bug 1832089 Comment 1