Last Comment Bug 765227 - bgupdates upgrades the MozillaMaintenance twice, should only run on replace
: bgupdates upgrades the MozillaMaintenance twice, should only run on replace
Status: RESOLVED FIXED
:
Product: Toolkit
Classification: Components
Component: Application Update (show other bugs)
: Trunk
: x86_64 Windows 7
: -- normal (vote)
: mozilla16
Assigned To: :Ehsan Akhgari
:
: Robert Strong [:rstrong] (use needinfo to contact me)
Mentors:
Depends on:
Blocks: bgupdates
  Show dependency treegraph
 
Reported: 2012-06-15 07:03 PDT by Brian R. Bondy [:bbondy]
Modified: 2012-06-21 16:30 PDT (History)
3 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
+
fixed
+
fixed


Attachments
Patch (v1) (3.29 KB, patch)
2012-06-18 13:08 PDT, :Ehsan Akhgari
netzen: review+
akeybl: approval‑mozilla‑aurora+
Details | Diff | Splinter Review

Description Brian R. Bondy [:bbondy] 2012-06-15 07:03:50 PDT
During a bgupdate I noticed that maintenanceservice_installer gets executed at the apply stage and not the replace stage.
I'm not opposed to this but it made me think that I should make sure the correct maintenanceservice_installer.exe was being run. 
I ran procmon during a bgupdate and noticed that maintenanceservice_installer.exe gets executed from the installationdir and not the insallationdir\updated directory.
This means that we do execute the maintenanceservice updagrade with staged updates, but not with the correct maintenanceservice.exe.  It'll always be one version behind.

Expected results:
We should instead run maintenanceservice_installer.exe from the installdir/updated directory or do the whole service upgrade on the /replace stage.
I think from the installdir/updated directory is the easiest fix here.
The maintenanceservice upgrade should happen on the new files and not the old files.
Comment 1 :Ehsan Akhgari 2012-06-17 11:02:23 PDT
As far as I can see, we should be running maintenanceservice_installer.exe both during the staging and replace steps, from here: <http://mxr.mozilla.org/mozilla-central/source/toolkit/components/maintenanceservice/workmonitor.cpp#418>.  If the service does not get used, we only run the installer in the replace step <http://mxr.mozilla.org/mozilla-central/source/toolkit/mozapps/update/updater/updater.cpp#2867> (note that in the staging step, we'll never have a callback application.)

I think it makes sense to only run the installer in the replace step, and we indeed use the wrong directory (the installation directory) when running it in the staging step in the first case above, but first I want to figure out why you're not seeing the installer get run in the replace step...
Comment 2 Brian R. Bondy [:bbondy] 2012-06-17 12:30:13 PDT
>  but first I want to figure out why you're not seeing the installer get run in the replace step...

Just verified and it runs both times, I previously only ran the test only up until the apply stage.  I updated the title to indicate that it should only run on the replace step.
Comment 3 :Ehsan Akhgari 2012-06-18 13:08:48 PDT
Created attachment 634150 [details] [diff] [review]
Patch (v1)
Comment 4 Brian R. Bondy [:bbondy] 2012-06-18 13:11:06 PDT
Comment on attachment 634150 [details] [diff] [review]
Patch (v1)

I'll take this review.
Comment 5 Brian R. Bondy [:bbondy] 2012-06-18 13:27:58 PDT
Comment on attachment 634150 [details] [diff] [review]
Patch (v1)

Review of attachment 634150 [details] [diff] [review]:
-----------------------------------------------------------------

When determining whether or not to run PostUpdate you do a check:
bool backgroundUpdate = (argc == 4 && !wcscmp(argv[3], L"-1"));

It would be better to call UpdateBeingStaged here to avoid code duplication.


I seen you pushed to Oak already, if tests pass and upgrades work you're good to go.

::: toolkit/components/maintenanceservice/workmonitor.cpp
@@ +88,5 @@
> + * @param argv    The argv value normally sent to updater.exe
> + * @param boolean True if we're staging an update
> + */
> +static bool
> +UpdateBeingStaged(int argc, LPWSTR *argv)

nit: IsUpdateBeingStaged

@@ +90,5 @@
> + */
> +static bool
> +UpdateBeingStaged(int argc, LPWSTR *argv)
> +{
> +  return (argc == 4 && !wcscmp(argv[3], L"-1"));

nit: there's no need for the surrounding parentheses.

@@ +112,5 @@
>    // Make sure that the path does not include trailing backslashes
>    if (backSlash && (backSlash[1] == L'\0')) {
>      *backSlash = L'\0';
>    }
>    // PID will be set to -1 if we're supposed to perform a background update.

nit: This comment would be better inside the IsUpdateBeingStaged function above the return.
Comment 6 :Ehsan Akhgari 2012-06-18 13:48:04 PDT
OK, will land with the nits addressed once I can confirm that the patch works correctly.
Comment 7 :Ehsan Akhgari 2012-06-19 09:04:11 PDT
I'm trying to test this with no success.  Updates are failing with error 19, and I don't have the fallback key (I double checked).  Not sure what's going on. :(
Comment 8 Brian R. Bondy [:bbondy] 2012-06-19 09:06:00 PDT
I'll give the build a spin now and report back.
Comment 9 Brian R. Bondy [:bbondy] 2012-06-19 09:13:46 PDT
So I tried using the second most recent build, but the first most recent isn't done yet.  I used the 3rd most recent upgrading to the 2nd most recent and the upgrade succeeded.  But the revision ID was not the tip, it was 91ba8802ad45.

Anyway I assume you did the same test on Oak and got error code 19.  I know you said you don't have the fallback key and that you double checked but maybe you should triple check! :P J/K (kindof).  Maybe you're using a VM and you checked the host registry instead of the VM's or something?

So a way you can debug this further on your machine is to open the updater.exe that you have the problem with in the resource editor and extract the embedded primary DER file which is used to verify the update.  Then check to see which one it matches in toolkit\mozapps\update\updater.
Comment 10 Brian R. Bondy [:bbondy] 2012-06-19 09:15:26 PDT
In Comment 9 I meant the revision number I started with was c1c0ad689e33.
Comment 11 :Ehsan Akhgari 2012-06-19 10:16:58 PDT
(In reply to Brian R. Bondy [:bbondy] from comment #9)
> So I tried using the second most recent build, but the first most recent
> isn't done yet.  I used the 3rd most recent upgrading to the 2nd most recent
> and the upgrade succeeded.  But the revision ID was not the tip, it was
> 91ba8802ad45.

91ba8802ad45 is oak's tip, isn't it?

> Anyway I assume you did the same test on Oak and got error code 19.  I know
> you said you don't have the fallback key and that you double checked but
> maybe you should triple check! :P J/K (kindof).  Maybe you're using a VM and
> you checked the host registry instead of the VM's or something?

I'll find another machine after lunch and will test it there.

> So a way you can debug this further on your machine is to open the
> updater.exe that you have the problem with in the resource editor and
> extract the embedded primary DER file which is used to verify the update. 
> Then check to see which one it matches in toolkit\mozapps\update\updater.

OK, will do.
Comment 12 :Ehsan Akhgari 2012-06-19 10:17:10 PDT
(In reply to Brian R. Bondy [:bbondy] from comment #10)
> In Comment 9 I meant the revision number I started with was c1c0ad689e33.

That's fine, isn't it?
Comment 13 Brian R. Bondy [:bbondy] 2012-06-19 10:20:28 PDT
It is the original patch landed ya, but I just wanted to indicate that in case you did a push afterwards, possibly addressing the nits that could have some other effect.
Comment 14 :Ehsan Akhgari 2012-06-19 10:37:51 PDT
Oh, no I was talking about the original patch.  I'll land the nits part now, but they shouldn't change the behavior.  We can re-verify on central if needed.
Comment 16 Brian R. Bondy [:bbondy] 2012-06-19 10:57:01 PDT
Did you find the reason for the problem on the failing machine by the way?
Comment 17 :Ehsan Akhgari 2012-06-19 11:06:52 PDT
The primary cert is nightly_aurora_level3_primary.der, and the backup cert is nightly_aurora_level3_secondary.der.
Comment 18 Brian R. Bondy [:bbondy] 2012-06-19 11:12:47 PDT
That looks correct. 
Could you respond to the following:

1) Were you testing with the exact same build on both machines?
2) Can you reproduce this problem with an older Oak build that is known to not get this error?
3) Could you triple verify that this does not exist on the machine with the error 19 in regedit.exe? SOFTWARE\Mozilla\MaintenanceService\3932ecacee736d366d6436db0f55bce4
Comment 19 :Ehsan Akhgari 2012-06-19 11:42:38 PDT
(In reply to Brian R. Bondy [:bbondy] from comment #18)
> That looks correct. 
> Could you respond to the following:
> 
> 1) Were you testing with the exact same build on both machines?
> 2) Can you reproduce this problem with an older Oak build that is known to
> not get this error?
> 3) Could you triple verify that this does not exist on the machine with the
> error 19 in regedit.exe?
> SOFTWARE\Mozilla\MaintenanceService\3932ecacee736d366d6436db0f55bce4

This happens with every oak build that I've tried on this machine, and the same builds work fine elsewhere.  The registry key is definitely not there (I usually rename the 3932ecacee736d366d6436db0f55bce4 key to xxxx-3932ecacee736d366d6436db0f55bce4 or something, and the latter name is in the registry but that shouldn't matter).

Anyways, I think I'll stop pursuing this in favor of more important work!
Comment 20 Brian R. Bondy [:bbondy] 2012-06-19 12:00:46 PDT
Telemetry data shows that error code 19 reports is virtually non-existent, so I'll chalk it up as something is wrong with your machine. If you do believe it to be some bug though please post a task and I can investigate further.
Comment 21 :Ehsan Akhgari 2012-06-19 12:46:06 PDT
Hmm, I looked using ProcMon, and the updater is indeed getting "not found" errors when it tries to open the fallback reg key path.  If I give you remote access to the machine, can you please take a look at it?
Comment 22 Brian R. Bondy [:bbondy] 2012-06-19 12:49:20 PDT
will do, let me know the info.  Just email me it.
Comment 23 Brian R. Bondy [:bbondy] 2012-06-19 13:46:59 PDT
We found that it was due to using an override app update service serving unsigned mars.  So the sign error was correct and there is no additional issue to investigate.
Comment 24 :Ehsan Akhgari 2012-06-21 13:44:57 PDT
Comment on attachment 634150 [details] [diff] [review]
Patch (v1)

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 307181
User impact if declined: The maintenance service would remain one version behind, all the time.
Testing completed (on m-c, etc.): locally and on m-c
Risk to taking this patch (and alternatives if risky): minimal
String or UUID changes made by this patch: none
Comment 25 Alex Keybl [:akeybl] 2012-06-21 15:38:53 PDT
Comment on attachment 634150 [details] [diff] [review]
Patch (v1)

[Triage Comment]
Approving for Aurora 15 in support of landing bug 767125 in the next few days.

Note You need to log in before you can comment on or make changes to this bug.