757835 - Background updates should fall back to not using the service if there is a service error

Reporter

Description

•

12 years ago

If there is a service error, then updates should fallback to using a normal update without the service.  This no longer works with background updates. 

See the relevant code here:
http://mxr.mozilla.org/mozilla-central/source/toolkit/mozapps/update/nsUpdateService.js#1471

Steps to reproduce:
1) Install a background update build
2) If the fix for bug 757711 hasn't landed, then go to 3).  If it has landed, then manually install an older service before the background updates landed.  (Note you can use any service error, such as signature check, but I'm just giving steps for 1 particular one though).
3) Go to the about dialog.
4) Says update failed. Download the latest version
5) Look at logs and other info:
  - Maintenance service log:  Error checking if the updaters are the same.
  - Update.status contains failed: 27
6) On the next browser restart sets the update.status to pending, but the MAR doesn't exist.
6.alt) If you go back to the about dialog the update fails again.

Expected results:
Both 6) and 6.alt) should fallback to not using the service.
For some reason the MAR is being deleted in this case and it shouldn't be, not deleting the MAR on these types of errors might be all that's needed to fix this. 

You can reproduce this if the fix for Bug 757711 hasn't landed yet.  You can also reproduce this by manually uninstalling

(no longer active)

Assignee

Comment 1

•

12 years ago

Attached patch Possible fix — Details — Splinter Review

Assignee: nobody → ehsan

(no longer active)

Assignee

Updated

•

12 years ago

Blocks: bgupdates

No longer depends on: bgupdates

(no longer active)

Assignee

Updated

•

12 years ago

Attachment #626465 - Flags: review?(robert.bugzilla)

(no longer active)

Assignee

Comment 2

•

12 years ago

Brian, can you please download this Nightly which has the patch attached here and see if it fixes the problem for you?

http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2012-05-23-09-34-09-oak/

Thanks!

Brian R. Bondy [:bbondy]

Reporter

Comment 3

•

12 years ago

> Brian, can you please download this Nightly which has the patch attached here > and see if it fixes the problem for you?
>
> http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2012-05-23-09-34-09-oak/

I tried it with this build.
It still fails but in a different way now from Comment 0.  
Before it would say update failed.  Now it just hangs on applying update...

Steps to reproduce:
1) Install http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2012-05-23-09-34-09-oak/
2) Uninstalled maintenance service from control panel
3) Run "C:\Program Files (x86)\Aurora\maintenanceservice_installer.exe" which installs an older service
4) Open Firefox and go to About, it'll hang on applying...
5) A single maintenanceservice.log is generated:
 
> Executing service command software-update, ID: 62f8128a-64c7-44f0-a856-1ec822f37a89
> Error checking if the updaters are the same.
> Path 1: C:\Users\bbondy\AppData\Local\Mozilla\Firefox\Nightly\updates\0\updater.exe
> Path 2: C:\Program Files (x86)\Nightly\updated\updater.exe
> Service command software-update complete.
> service command MozillaMaintenance complete with result: Failure.

I tried to check update.status at:
C:\Users\bbondy\AppData\Local\Mozilla\Firefox\Nightly\updates\0
But it did not exist.

(no longer active)

Assignee

Comment 4

•

12 years ago

I suspect that comment 3 will be fixed with bug 757885.  New nightlies with that patch are on the way.

Robert Strong (they/them - no direct email)

Comment 5

•

12 years ago

Comment on attachment 626465 [details] [diff] [review]
Possible fix

Since there is no response to comment #3 r-'ing

Attachment #626465 - Flags: review?(robert.bugzilla) → review-

Robert Strong (they/them - no direct email)

Comment 6

•

12 years ago

bah... I hadn't refreshed the page so I didn't see comment #4. Bug 757885 needs a new patch though so leaving as r-

(no longer active)

Assignee

Comment 7

•

12 years ago

FWIW, we're waiting for builds in order to verify if this has been fixed or not.

Robert Strong (they/them - no direct email)

Comment 8

•

12 years ago

Comment on attachment 626465 [details] [diff] [review]
Possible fix

Looking

Attachment #626465 - Flags: review- → review?

Robert Strong (they/them - no direct email)

Comment 9

•

12 years ago

Comment on attachment 626465 [details] [diff] [review]
Possible fix

The only concern I have is that staging the update will increment the fail count and stop the service from being used when it is able to apply the update.

I think they should be separate but I'd like to get bbondy and your opinion about whether they should be separated. If we think they should then it can be done in a followup bug.

Attachment #626465 - Flags: review? → review+

(no longer active)

Assignee

Comment 10

•

12 years ago

Hmm...  So I thought about this a bit, and I don't think that they should be separate.  The service failing to stage an update is almost as serious as any other failure scenario, so separating them doesn't make a lot of sense to me.  This will indeed cut down the number of tries on update operations in half if the service is completely broken for a user, but unless there's some real science behind why we chose 10 as the threshold number, I don't feel particularly bad about that.

But of course Brian should weigh in here.

Robert Strong (they/them - no direct email)

Comment 11

•

12 years ago

The issue I have with that is if staging consistently fails (for example, if a user is near the limit on disk space available) and bypassing the UAC consistently succeeds then the bypassing of the UAC will be disabled for the user. From the point of view of "it should all just work" I agree with your assessment but from the point of view of "minimizing user pain" I think it might make sense to go with separate preferences for controlling this.

(no longer active)

Assignee

Comment 12

•

12 years ago

Hmm, yeah OK this is a good point which I had not thought about before.  If Brian agrees, I'm willing to file a follow-up.

Just one question, would this block us re-enabling background updates?

Brian R. Bondy [:bbondy]

Reporter

Comment 13

•

12 years ago

So the service related errors that increment the fail count are:

> SERVICE_UPDATER_COULD_NOT_BE_STARTED
> SERVICE_NOT_ENOUGH_COMMAND_LINE_ARGS
> SERVICE_UPDATER_SIGN_ERROR
> SERVICE_UPDATER_COMPARE_ERROR 
> SERVICE_UPDATER_IDENTITY_ERROR
> SERVICE_STILL_APPLYING_ON_SUCCESS
> SERVICE_STILL_APPLYING_ON_FAILURE
> SERVICE_UPDATER_NOT_FIXED_DRIVE
> SERVICE_COULD_NOT_LOCK_UPDATER
> SERVICE_INSTALLDIR_ERROR

These errors could happen from a software bug or from a releng related problem like something wasn't signed correctly.
Or from someone trying to do an attack.

I don't think that something like low disk space would hit any of these errors. 

What would having 2 preferences for staved vs replace solve? When either one of them reached max we are better off just skipping background updates completely, and I think we already do.  Maybe having a app.update.service.backroundupdates.errors that when it reaches 10 would disable background updates but not the service.  The service would then disable itself on its own merits.

(no longer active)

Assignee

Comment 14

•

12 years ago

(In reply to Brian R. Bondy [:bbondy] from comment #13)
> What would having 2 preferences for staved vs replace solve? When either one
> of them reached max we are better off just skipping background updates
> completely, and I think we already do.  Maybe having a
> app.update.service.backroundupdates.errors that when it reaches 10 would
> disable background updates but not the service.  The service would then
> disable itself on its own merits.

That would only be useful if the causes which would make the service fail to stage an update were inherently different than the causes which would make the service fail to apply an update on a restart.  But the opposite seems to be the case to me.

Brian R. Bondy [:bbondy]

Reporter

Comment 15

•

12 years ago

I think from Comment 14 my Comment 13 wasn't understood correctly.

I was suggesting that maybe background updates (both staging + replacing) will have a bug and cause some of those errors.  But using the service without background updates wouldn't hit those errors because it didn't have that same bug.  So if we were doing a new count it should be on background updates vs not background updates.
And not a count on staged vs replacing.

(no longer active)

Assignee

Updated

•

12 years ago

Blocks: 758101

Robert Strong (they/them - no direct email)

Comment 16

•

12 years ago

(In reply to Brian R. Bondy [:bbondy] from comment #13)
> So the service related errors that increment the fail count are:
> 
> > SERVICE_UPDATER_COULD_NOT_BE_STARTED
> > SERVICE_NOT_ENOUGH_COMMAND_LINE_ARGS
> > SERVICE_UPDATER_SIGN_ERROR
> > SERVICE_UPDATER_COMPARE_ERROR 
> > SERVICE_UPDATER_IDENTITY_ERROR
> > SERVICE_STILL_APPLYING_ON_SUCCESS
> > SERVICE_STILL_APPLYING_ON_FAILURE
> > SERVICE_UPDATER_NOT_FIXED_DRIVE
> > SERVICE_COULD_NOT_LOCK_UPDATER
> > SERVICE_INSTALLDIR_ERROR
> 
> These errors could happen from a software bug or from a releng related
> problem like something wasn't signed correctly.
> Or from someone trying to do an attack.
> 
> I don't think that something like low disk space would hit any of these
> errors. 
If during the staging process we ran out of disk space the error count would increment... would it not?

Brian R. Bondy [:bbondy]

Reporter

Comment 17

•

12 years ago

No I don't think the error count would increment.  I think we'd probably get a write error (WRITE_ERROR 7) in that case.  The fail count is only for error codes which can come from the service only and cannot happen from normal updates.

Brian R. Bondy [:bbondy]

Reporter

Comment 18

•

12 years ago

What I was thinking above is of having some kind of fail safety for disabling background updates for lots of errors but not disabling silent updates using the service. It would be independent of the current fail count.

Brian R. Bondy [:bbondy]

Reporter

Comment 19

•

12 years ago

If we do a background specific fail count like I was suggesting though, I think the error codes should include more than just the service related error codes though as Rob mentioned.

Brian R. Bondy [:bbondy]

Reporter

Comment 20

•

12 years ago

Actually it would increment in the background update specific errors only and not have any service errors at all.  That way we aren't double tolerant for errors like a signing problem by having 2 counts.

Brian R. Bondy [:bbondy]

Reporter

Comment 21

•

12 years ago

I tried this with today's Nightlies and it's still failing.
The problem is that we're deleting the MAR file and we shouldn't be.

Here are more detailed steps, please read:
0) I turnd on app.update.stage.enabled;true (since it is disabled by default currently)
1) Followed steps 1-5 in Comment 3, the maintenanceservice.log is the same as in comment 3, it says "Update failed. Download the latest version" and the update.status file exists and says failed: 27 
2) The MAR no longer exists.
3) If I close and re-open the about dialog it downloads, applies and gets the same error as I mentioned in #1 of this comment.
4) If I close the browser while update.status has failed: 27, it tries to fallback by setting the update.status to pending.  On the next browser restart after that it should take the update, but since no MAR exists the update fails and it clears the update directory.  If you go back to the about dialog you loop back to #1 of this comment.

(no longer active)

Assignee

Comment 22

•

12 years ago

(In reply to Brian R. Bondy [:bbondy] from comment #20)
> Actually it would increment in the background update specific errors only
> and not have any service errors at all.  That way we aren't double tolerant
> for errors like a signing problem by having 2 counts.

What do you consider as background update specific errors?  Any errors (including service errors) which happen during staging?

(no longer active)

Assignee

Comment 23

•

12 years ago

(In reply to Brian R. Bondy [:bbondy] from comment #21)
> I tried this with today's Nightlies and it's still failing.
> The problem is that we're deleting the MAR file and we shouldn't be.
> 
> Here are more detailed steps, please read:
> 0) I turnd on app.update.stage.enabled;true (since it is disabled by default
> currently)
> 1) Followed steps 1-5 in Comment 3, the maintenanceservice.log is the same
> as in comment 3, it says "Update failed. Download the latest version" and
> the update.status file exists and says failed: 27 
> 2) The MAR no longer exists.
> 3) If I close and re-open the about dialog it downloads, applies and gets
> the same error as I mentioned in #1 of this comment.
> 4) If I close the browser while update.status has failed: 27, it tries to
> fallback by setting the update.status to pending.  On the next browser
> restart after that it should take the update, but since no MAR exists the
> update fails and it clears the update directory.  If you go back to the
> about dialog you loop back to #1 of this comment.

You should test this with an Oak nightly such as <http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2012-05-23-17-03-39-oak/>

Brian R. Bondy [:bbondy]

Reporter

Comment 24

•

12 years ago

Oops ya Comment 21 was tested on m-c, I re-tested on Oak and the fallback is working great.  So Comment 3 is no longer a concern once the related patches land on m-c.

Brian R. Bondy [:bbondy]

Reporter

Comment 25

•

12 years ago

> What do you consider as background update specific errors?  
> Any errors (including service errors) which happen during staging?

No I think the service errors would continue to update the app.update.service.errors no matter where they came from. I don't think any of your code specifically changes how those errors in particular would happen. 

If we did do a new pref though...
If it is any error that is thrown from within the new code that you added for background updates, then it would update another pref app.update.stage.errors.
Such as an error when moving the folder for example.   
I'm fine with not doing this other pref though.

(no longer active)

Assignee

Comment 26

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/9a223634160b

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla15

Brian R. Bondy [:bbondy]

Reporter

Comment 27

•

12 years ago

Re: app.update.stage.errors

If we did that maybe for all errors except for service errors if that's easier.

(no longer active)

Assignee

Comment 28

•

12 years ago

(In reply to Brian R. Bondy [:bbondy] from comment #27)
> Re: app.update.stage.errors
> 
> If we did that maybe for all errors except for service errors if that's
> easier.

I'd be fine with that.

Robin Steuber (they/them) [:bytesized]

Updated

•

10 months ago

Updated

•

10 months ago