Open Bug 1707020 Opened 3 years ago Updated 2 years ago

`--backgroundtask backgroundupdate` starts BITS job but doesn't exit as expected

Categories

(Toolkit :: Application Update, defect)

defect

Tracking

()

Tracking Status
firefox-esr78 --- unaffected
firefox88 --- unaffected
firefox89 --- disabled
firefox90 --- wontfix
firefox91 --- fix-optional

People

(Reporter: nalexander, Unassigned)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: leave-open, regression, Whiteboard: [fidedi-ope])

Attachments

(5 files)

:mhowell reports a wedged background update task, and provides logs. What's interesting is that the logs rather clearly show a BITS job being started and the task not exiting as expected multiple times, like:

2021-04-16 07:30:09.708000 UTC - [Parent 6256: Main Thread]: D/Dump [Backstage.Dump] *** AUS:SVC Downloader:downloadUpdate - Starting BITS download with url: https://archive.mozilla.org/pub/firefox/nightly/partials/2021/04/2021-04-15-21-46-43-mozilla-central/firefox-mozilla-central-89.0a1-win64-en-US-20210415133200-20210415214643.partial.mar, updateDir: C:\ProgramData\Mozilla\updates\6F193CCC56814779\updates\downloading, filename: update.mar
2021-04-16 07:30:09.755000 UTC - [Parent 6256: Main Thread]: D/Dump [Backstage.Dump] *** AUS:SVC Downloader:downloadUpdate - BITS download running. BITS ID: {2637EB5C-6171-4C04-8744-D832A4A449D1}
2021-04-16 08:30:09.629000 UTC - [Parent 6256: Main Thread]: D/Dump [Backstage.Dump] *** AUS:SVC Downloader:onStopRequest - End of update download detected and ignored because we are restricted to update downloads this session. We will continue with this update next session.

Note the hour of wall time in between the last two messages.

Similarly, with the wedged task:

2021-04-20 10:11:42.287000 UTC - [Parent 3500: Main Thread]: D/Dump [Backstage.Dump] *** AUS:SVC Downloader:downloadUpdate - Starting BITS download with url: https://archive.mozilla.org/pub/firefox/nightly/partials/2021/04/2021-04-19-22-16-26-mozilla-central/firefox-mozilla-central-90.0a1-win64-en-US-20210419094740-20210419221626.partial.mar, updateDir: C:\ProgramData\Mozilla\updates\6F193CCC56814779\updates\downloading, filename: update.mar
2021-04-20 10:11:43.028000 UTC - [Parent 3500: Main Thread]: D/Dump [Backstage.Dump] *** AUS:SVC Downloader:downloadUpdate - BITS download running. BITS ID: {97212DF7-6111-4E67-8C26-7B108C9ABFEA}
2021-04-20 11:11:42.438000 UTC - [Parent 3500: Main Thread]: D/Dump [Backstage.Dump] *** AUS:SVC Downloader:onStopRequest - End of update download detected and ignored because we are restricted to update downloads this session. We will continue with this update next session.

There are at least two things to address here:

  1. Figure out why the approach in Bug 1700156 isn't succeeding.
  2. Limit the run time of the background task, either in the task itself or in the registration with the Windows Task Scheduler.

Here's a log that I just collected while the machine was awake. This time, the exit logic worked, but to a kind of unfortunate result; all the bytes were downloaded before the task exited, as shown by the logged onProgress, but the task doesn't appear to have noticed. When I started the browser again afterwards, it immediately showed me an update badge, I assume because the BITS job was ready but hadn't yet been completed, so the update couldn't be applied during that startup.

A possibly relevant fact is that, when the task that hung was started, the computer was on standby (as in, it's a laptop and the lid was closed). When I collected this one, it was awake. I'm going to let it run again overnight once or twice and see if I can get this to reproduce that way.

Set release status flags based on info from the regressing bug 1700156

Here's the logs from the overnight runs. These appear to have done the same thing as the previous one I posted, but there's no exit entry in the log file for the last one, and task scheduler says it terminated unexpectedly. I don't see anything that indicates to me why that might have happened.

I'm not sure why this would happen. My best guess is that being on standby is screwing up the background threads that we use to query BITS. Perhaps BITS simply does not respond to our queries in standby mode? We do have a timeout for how much time we allow to expire before we report an error, but it looks like it has no mechanism for interrupting the BITS API (relevant code).

However, I see that we are still entering a DOWNLOADING state. So perhaps if we made the exit conditions a bit more flexible, we could avoid the problem that way. I think this could be done simply by changing this conditional to this:

} else if (
  status == AppUpdater.STATUS.DOWNLOADING &&
  (
    UpdateService.onlyDownloadUpdatesThisSession ||
    (progress !== undefined && progressMax !== undefined)
  )
) {
Depends on: 1708752
Assignee: nobody → ksteuber
Status: NEW → ASSIGNED
Pushed by ksteuber@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/fa14c7491898
Exit Background Update Task even if we never get a progress update r=nalexander

I think I'd like to leave this bug open after this patch merges, at least for now. We don't really know the root cause of the issue, so I think it's quite possible that this patch will not fix it in every case.

This patch's early exit relies on this being the instance of the task that kicked off the download. But if the download is halfway done when the Background Update Task runs, it should exit immediately, but I suspect that, in standby mode, it may instead suffer the same problem we saw here.

Attached file moz_log content

This issue is still reproducible on my end, I've let a task for several days, while doing usual things. My PC was in different states, including standby mode around the time the task ran and the error was triggered, but I can't say for sure what was the exact state when it was actually ran).

As far as I can tell, that log does not describe any instances of this problem. I'm guessing (please correct me if I'm wrong here), that you are concerned that the log ends with:

AUS:SVC Downloader:onStopRequest - End of update download detected and ignored because we are restricted to update downloads this session. We will continue with this update next session.

But that's not actually an error; it's just an informational message. Just a few lines away from that message, we see these messages:

BackgroundTasksManager: Backgroundtask named 'backgroundupdate' completed with exit code 0
BackgroundTasksManager: Invoking Services.startup.quit(..., 0)

These suggest that the task was not stuck. It appears to have exited, as expected.

I also don't see an exceptionally long time gap between any of the messages, like we did with some of the messages from Molly's logs. As near as I can tell, the task ran and exited quickly.

So, I don't think anything unexpected or undesired is going on in this log. That being said, I do still think that it is possible to encounter this bug through the mechanism I described in Comment 7. I think that we are ok with that for the time being. I'm working on a patch for Bug 1708752, which should mitigate this problem if we do encounter it.

(In reply to Kirk Steuber (he/him) [:bytesized] from comment #10)

As far as I can tell, that log does not describe any instances of this problem. I'm guessing (please correct me if I'm wrong here), that you are concerned that the log ends with:

AUS:SVC Downloader:onStopRequest - End of update download detected and ignored because we are restricted to update downloads this session. We will continue with this update next session.

Yes, that's right. It crossed my mind initially that what happens here is correct, but I wasn't sure, so I've thought better to signaled a behavior on which I have uncertainty than to let it slip.
Thank you for taking time to clarify this for me!

Does this bug need to be open still?

Flags: needinfo?(ksteuber)

(In reply to Ryan VanderMeulen [:RyanVM] from comment #12)

Does this bug need to be open still?

I would like it to remain open, because the patch that merged only mitigated the problem, but did not actually solve it. We may want to figure out a way to solve this problem in the future.

The mechanism by which this bug could still occur is described in Comment 7.

Flags: needinfo?(ksteuber)
Assignee: ksteuber → nobody
Status: ASSIGNED → NEW

Can you assign a severity to this bug? Did you unassign yourself because you think it is low?

Flags: needinfo?(ksteuber)

(In reply to Marco Castelluccio [:marco] from comment #14)

Did you unassign yourself because you think it is low?

The severity is low, and it's not clear at the moment how to address the remaining issue.

These are the reasons why I think this is very low severity:

  • I believe that I fixed the most likely case where it could occur (in this bug).
  • This only seems to happens when the computer enters standby, and I never really had expectations that we would update successfully in that situation.
  • Even if it does hang, we will eventually run it again. So we aren't prevented from updating, just delayed.
  • Any hangs were mitigated by the work in Bug 1708752, which implements 2 timeout mechanisms to ensure that the task never runs for more than 20 minutes.
Severity: -- → S4
Flags: needinfo?(ksteuber)

Please don't mess with flags.

The leave-open keyword is there and there is no activity for 6 months.
:bhearsum, maybe it's time to close this bug?

Flags: needinfo?(bhearsum)

(In reply to Release mgmt bot [:sylvestre / :calixte / :marco for bugbug] from comment #17)

The leave-open keyword is there and there is no activity for 6 months.
:bhearsum, maybe it's time to close this bug?

Comment #15 confirms we should keep this open.

Flags: needinfo?(bhearsum)
Has Regression Range: --- → yes
Whiteboard: [fidedi-ope]

The leave-open keyword is there and there is no activity for 6 months.
:nrishel, maybe it's time to close this bug?
For more information, please visit auto_nag documentation.

Flags: needinfo?(nrishel)

(In reply to Release mgmt bot [:suhaib / :marco/ :calixte] from comment #19)

The leave-open keyword is there and there is no activity for 6 months.
:nrishel, maybe it's time to close this bug?
For more information, please visit auto_nag documentation.

comment #15 - again

Flags: needinfo?(nrishel)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: