Closed Bug 757835 Opened 9 years ago Closed 9 years ago

Background updates should fall back to not using the service if there is a service error

Categories

(Toolkit :: Application Update, defect)

12 Branch
x86_64
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla15

People

(Reporter: bbondy, Assigned: ehsan)

References

Details

Attachments

(1 file)

If there is a service error, then updates should fallback to using a normal update without the service.  This no longer works with background updates. 

See the relevant code here:
http://mxr.mozilla.org/mozilla-central/source/toolkit/mozapps/update/nsUpdateService.js#1471

Steps to reproduce:
1) Install a background update build
2) If the fix for bug 757711 hasn't landed, then go to 3).  If it has landed, then manually install an older service before the background updates landed.  (Note you can use any service error, such as signature check, but I'm just giving steps for 1 particular one though).
3) Go to the about dialog.
4) Says update failed. Download the latest version
5) Look at logs and other info:
  - Maintenance service log:  Error checking if the updaters are the same.
  - Update.status contains failed: 27
6) On the next browser restart sets the update.status to pending, but the MAR doesn't exist.
6.alt) If you go back to the about dialog the update fails again.

Expected results:
Both 6) and 6.alt) should fallback to not using the service.
For some reason the MAR is being deleted in this case and it shouldn't be, not deleting the MAR on these types of errors might be all that's needed to fix this. 

You can reproduce this if the fix for Bug 757711 hasn't landed yet.  You can also reproduce this by manually uninstalling
Attached patch Possible fixSplinter Review
Assignee: nobody → ehsan
Blocks: bgupdates
No longer depends on: bgupdates
Attachment #626465 - Flags: review?(robert.bugzilla)
Brian, can you please download this Nightly which has the patch attached here and see if it fixes the problem for you?

http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2012-05-23-09-34-09-oak/

Thanks!
> Brian, can you please download this Nightly which has the patch attached here > and see if it fixes the problem for you?
>
> http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2012-05-23-09-34-09-oak/

I tried it with this build.
It still fails but in a different way now from Comment 0.  
Before it would say update failed.  Now it just hangs on applying update...

Steps to reproduce:
1) Install http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2012-05-23-09-34-09-oak/
2) Uninstalled maintenance service from control panel
3) Run "C:\Program Files (x86)\Aurora\maintenanceservice_installer.exe" which installs an older service
4) Open Firefox and go to About, it'll hang on applying...
5) A single maintenanceservice.log is generated:
 
> Executing service command software-update, ID: 62f8128a-64c7-44f0-a856-1ec822f37a89
> Error checking if the updaters are the same.
> Path 1: C:\Users\bbondy\AppData\Local\Mozilla\Firefox\Nightly\updates\0\updater.exe
> Path 2: C:\Program Files (x86)\Nightly\updated\updater.exe
> Service command software-update complete.
> service command MozillaMaintenance complete with result: Failure.

I tried to check update.status at:
C:\Users\bbondy\AppData\Local\Mozilla\Firefox\Nightly\updates\0
But it did not exist.
I suspect that comment 3 will be fixed with bug 757885.  New nightlies with that patch are on the way.
Comment on attachment 626465 [details] [diff] [review]
Possible fix

Since there is no response to comment #3 r-'ing
Attachment #626465 - Flags: review?(robert.bugzilla) → review-
bah... I hadn't refreshed the page so I didn't see comment #4. Bug 757885 needs a new patch though so leaving as r-
FWIW, we're waiting for builds in order to verify if this has been fixed or not.
Comment on attachment 626465 [details] [diff] [review]
Possible fix

Looking
Attachment #626465 - Flags: review- → review?
Comment on attachment 626465 [details] [diff] [review]
Possible fix

The only concern I have is that staging the update will increment the fail count and stop the service from being used when it is able to apply the update.

I think they should be separate but I'd like to get bbondy and your opinion about whether they should be separated. If we think they should then it can be done in a followup bug.
Attachment #626465 - Flags: review? → review+
Hmm...  So I thought about this a bit, and I don't think that they should be separate.  The service failing to stage an update is almost as serious as any other failure scenario, so separating them doesn't make a lot of sense to me.  This will indeed cut down the number of tries on update operations in half if the service is completely broken for a user, but unless there's some real science behind why we chose 10 as the threshold number, I don't feel particularly bad about that.

But of course Brian should weigh in here.
The issue I have with that is if staging consistently fails (for example, if a user is near the limit on disk space available) and bypassing the UAC consistently succeeds then the bypassing of the UAC will be disabled for the user. From the point of view of "it should all just work" I agree with your assessment but from the point of view of "minimizing user pain" I think it might make sense to go with separate preferences for controlling this.
Hmm, yeah OK this is a good point which I had not thought about before.  If Brian agrees, I'm willing to file a follow-up.

Just one question, would this block us re-enabling background updates?
So the service related errors that increment the fail count are:

> SERVICE_UPDATER_COULD_NOT_BE_STARTED
> SERVICE_NOT_ENOUGH_COMMAND_LINE_ARGS
> SERVICE_UPDATER_SIGN_ERROR
> SERVICE_UPDATER_COMPARE_ERROR 
> SERVICE_UPDATER_IDENTITY_ERROR
> SERVICE_STILL_APPLYING_ON_SUCCESS
> SERVICE_STILL_APPLYING_ON_FAILURE
> SERVICE_UPDATER_NOT_FIXED_DRIVE
> SERVICE_COULD_NOT_LOCK_UPDATER
> SERVICE_INSTALLDIR_ERROR

These errors could happen from a software bug or from a releng related problem like something wasn't signed correctly.
Or from someone trying to do an attack.

I don't think that something like low disk space would hit any of these errors. 

What would having 2 preferences for staved vs replace solve? When either one of them reached max we are better off just skipping background updates completely, and I think we already do.  Maybe having a app.update.service.backroundupdates.errors that when it reaches 10 would disable background updates but not the service.  The service would then disable itself on its own merits.
(In reply to Brian R. Bondy [:bbondy] from comment #13)
> What would having 2 preferences for staved vs replace solve? When either one
> of them reached max we are better off just skipping background updates
> completely, and I think we already do.  Maybe having a
> app.update.service.backroundupdates.errors that when it reaches 10 would
> disable background updates but not the service.  The service would then
> disable itself on its own merits.

That would only be useful if the causes which would make the service fail to stage an update were inherently different than the causes which would make the service fail to apply an update on a restart.  But the opposite seems to be the case to me.
I think from Comment 14 my Comment 13 wasn't understood correctly.

I was suggesting that maybe background updates (both staging + replacing) will have a bug and cause some of those errors.  But using the service without background updates wouldn't hit those errors because it didn't have that same bug.  So if we were doing a new count it should be on background updates vs not background updates.
And not a count on staged vs replacing.
Blocks: 758101
(In reply to Brian R. Bondy [:bbondy] from comment #13)
> So the service related errors that increment the fail count are:
> 
> > SERVICE_UPDATER_COULD_NOT_BE_STARTED
> > SERVICE_NOT_ENOUGH_COMMAND_LINE_ARGS
> > SERVICE_UPDATER_SIGN_ERROR
> > SERVICE_UPDATER_COMPARE_ERROR 
> > SERVICE_UPDATER_IDENTITY_ERROR
> > SERVICE_STILL_APPLYING_ON_SUCCESS
> > SERVICE_STILL_APPLYING_ON_FAILURE
> > SERVICE_UPDATER_NOT_FIXED_DRIVE
> > SERVICE_COULD_NOT_LOCK_UPDATER
> > SERVICE_INSTALLDIR_ERROR
> 
> These errors could happen from a software bug or from a releng related
> problem like something wasn't signed correctly.
> Or from someone trying to do an attack.
> 
> I don't think that something like low disk space would hit any of these
> errors. 
If during the staging process we ran out of disk space the error count would increment... would it not?
No I don't think the error count would increment.  I think we'd probably get a write error (WRITE_ERROR 7) in that case.  The fail count is only for error codes which can come from the service only and cannot happen from normal updates.
What I was thinking above is of having some kind of fail safety for disabling background updates for lots of errors but not disabling silent updates using the service. It would be independent of the current fail count.
If we do a background specific fail count like I was suggesting though, I think the error codes should include more than just the service related error codes though as Rob mentioned.
Actually it would increment in the background update specific errors only and not have any service errors at all.  That way we aren't double tolerant for errors like a signing problem by having 2 counts.
I tried this with today's Nightlies and it's still failing.
The problem is that we're deleting the MAR file and we shouldn't be.

Here are more detailed steps, please read:
0) I turnd on app.update.stage.enabled;true (since it is disabled by default currently)
1) Followed steps 1-5 in Comment 3, the maintenanceservice.log is the same as in comment 3, it says "Update failed. Download the latest version" and the update.status file exists and says failed: 27 
2) The MAR no longer exists.
3) If I close and re-open the about dialog it downloads, applies and gets the same error as I mentioned in #1 of this comment.
4) If I close the browser while update.status has failed: 27, it tries to fallback by setting the update.status to pending.  On the next browser restart after that it should take the update, but since no MAR exists the update fails and it clears the update directory.  If you go back to the about dialog you loop back to #1 of this comment.
(In reply to Brian R. Bondy [:bbondy] from comment #20)
> Actually it would increment in the background update specific errors only
> and not have any service errors at all.  That way we aren't double tolerant
> for errors like a signing problem by having 2 counts.

What do you consider as background update specific errors?  Any errors (including service errors) which happen during staging?
(In reply to Brian R. Bondy [:bbondy] from comment #21)
> I tried this with today's Nightlies and it's still failing.
> The problem is that we're deleting the MAR file and we shouldn't be.
> 
> Here are more detailed steps, please read:
> 0) I turnd on app.update.stage.enabled;true (since it is disabled by default
> currently)
> 1) Followed steps 1-5 in Comment 3, the maintenanceservice.log is the same
> as in comment 3, it says "Update failed. Download the latest version" and
> the update.status file exists and says failed: 27 
> 2) The MAR no longer exists.
> 3) If I close and re-open the about dialog it downloads, applies and gets
> the same error as I mentioned in #1 of this comment.
> 4) If I close the browser while update.status has failed: 27, it tries to
> fallback by setting the update.status to pending.  On the next browser
> restart after that it should take the update, but since no MAR exists the
> update fails and it clears the update directory.  If you go back to the
> about dialog you loop back to #1 of this comment.

You should test this with an Oak nightly such as <http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2012-05-23-17-03-39-oak/>
Oops ya Comment 21 was tested on m-c, I re-tested on Oak and the fallback is working great.  So Comment 3 is no longer a concern once the related patches land on m-c.
> What do you consider as background update specific errors?  
> Any errors (including service errors) which happen during staging?

No I think the service errors would continue to update the app.update.service.errors no matter where they came from. I don't think any of your code specifically changes how those errors in particular would happen. 

If we did do a new pref though...
If it is any error that is thrown from within the new code that you added for background updates, then it would update another pref app.update.stage.errors.
Such as an error when moving the folder for example.   
I'm fine with not doing this other pref though.
https://hg.mozilla.org/mozilla-central/rev/9a223634160b
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla15
Re: app.update.stage.errors

If we did that maybe for all errors except for service errors if that's easier.
(In reply to Brian R. Bondy [:bbondy] from comment #27)
> Re: app.update.stage.errors
> 
> If we did that maybe for all errors except for service errors if that's
> easier.

I'd be fine with that.
Duplicate of this bug: 757717
You need to log in before you can comment on or make changes to this bug.