Closed Bug 735713 Opened 8 years ago Closed 8 years ago

Version downgrade checks are wrong causing upgrades to be broken

Categories

(Toolkit :: Application Update, defect, critical)

x86
Windows 7
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla14
Tracking Status
firefox12 + verified
firefox13 + verified
firefox14 + verified

People

(Reporter: ehsan, Assigned: ehsan)

References

Details

(Keywords: regression, Whiteboard: [qa+])

Attachments

(2 files, 2 obsolete files)

The NS_CompareVersion call which we do is broken.  We check for -1, which happens when appVersion is *less* than the version of the mar file, but in that case we return downgrade error.

This is affecting Nightly upgrades from 13.0a1 to 14.0a1, which means that we should probably reach out to Nightly users somehow and ask them to reinstall Firefox.  We could also downgrade central to 13.0a1 so that the updates start to work again, but that won't fix the problem for users who don't update in the period where m-c is downgraded.

The bug is also on Aurora, but it won't kick in until the next migration where the Aurora version is bumped, so we can easily fix it there.

I have a patch, which is coming up shortly.
Is bug 735693 a dupe of this or vice versa?
Attached patch Patch (v1) (obsolete) — Splinter Review
I will also see if I can write a test for this.
Assignee: nobody → ehsan
Status: NEW → ASSIGNED
Attachment #605794 - Flags: review?(robert.bugzilla)
hrm I tested a major upgrade on elm
Attached patch Patch (v1)Splinter Review
This is the right patch!
Attachment #605794 - Attachment is obsolete: true
Attachment #605794 - Flags: review?(robert.bugzilla)
Attachment #605800 - Flags: review?(robert.bugzilla)
Duplicate of this bug: 735693
We can fix this by just hardcoding a value in the generated mar files of 13.0a1
I'm confused though because I did try a version upgrade on elm here:
https://tbpl.mozilla.org/?tree=Elm&rev=d3befcff0d50 and it passed.

The test suite does cover a version change but it fails the sign check so passes for that reason. I'll submit a patch in another ticket for the test suite.
I've shut off Windows nightly updates:
[ffxbld@dm-ausstage01 mozilla-central]$ ls -l
total 256
drwxr-xr-x 13 ffxbld users 20480 Mar 14 09:32 Darwin_x86_64-gcc3
lrwxrwxrwx  1 ffxbld users    18 Sep 27  2010 Darwin_x86_64-gcc3-u-i386-x86_64 -> Darwin_x86_64-gcc3
lrwxrwxrwx  1 root   root     18 Oct  4  2010 Darwin_x86-gcc3-u-i386-x86_64 -> Darwin_x86_64-gcc3
lrwxrwxrwx  1 root   root     18 Oct  1  2010 Darwin_x86-gcc3-u-ppc-i386 -> Darwin_x86_64-gcc3
drwxr-xr-x 13 ffxbld users 69632 Mar 14 04:46 Linux_x86_64-gcc3
drwxr-xr-x 13 ffxbld users 73728 Mar 14 04:41 Linux_x86-gcc3
drwx------ 13 ffxbld users 12288 Mar 14 05:52 WINNT_x86_64-msvc
drwx------ 13 ffxbld users 73728 Mar 14 07:56 WINNT_x86-msvc

Need to chmod back to 755 when we're ready.
So firstly I think Ehsan should land his fix on m-c.

---

I mentioned a plan about hardcoding a version in MAR files that would cover most everyone.  

The following though is another plan that would leave no users:

Could we return a special MAR from the AUS server if the current version is v13 and the release channel is mozilla-central?

1) Strip the official win32 x86 MAR's signature:
mar -r Official_Signed_MC_Mar_With_Ehsans_fix.mar not_signed.mar

2) Hard code 13.0a1 into that official MAR:
mar -H firefox-mozilla-central -V 13.0a1 -i not_signed.mar

3) Re-sign that MAR:
mar -d NSSConfigDir -n certname -s not_signed.mar special_signed.mar


When a request comes in from a v13 x86 build running on m-c direct the user to special_signed.mar
(In reply to Brian R. Bondy [:bbondy] from comment #8)
> So firstly I think Ehsan should land his fix on m-c.
> 
> ---
> 
> I mentioned a plan about hardcoding a version in MAR files that would cover
> most everyone.  
> 
> The following though is another plan that would leave no users:
> 
> Could we return a special MAR from the AUS server if the current version is
> v13 and the release channel is mozilla-central?

This might be difficult because of how nightly updates work on AUS...Nick, do you think this is possible, and safe?
(In reply to Brian R. Bondy [:bbondy] from comment #6)
> We can fix this by just hardcoding a value in the generated mar files of
> 13.0a1
> I'm confused though because I did try a version upgrade on elm here:
> https://tbpl.mozilla.org/?tree=Elm&rev=d3befcff0d50 and it passed.

I constructed artificial version numbers 13.0a1 and 13.0a2 in the debugger and verified that NS_CompareVersions still returns -1.  It is also easy to verify by looking at its implementation.  So I'm not sure how that happened.  :/
I'm not sure, I did that push purposely to try a Nightly upgrade with version increased. Maybe I hit another nightly upgrade after it I'm not sure.  I verified manually as well and your patch is correct ehsan.
https://hg.mozilla.org/mozilla-central/rev/a888f210af4e
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla14
Attached patch Hardcoding 13.0a1 (obsolete) — Splinter Review
This patch is related to, and only applicable to if we use:

bbondy:
> I mentioned a plan about hardcoding a version in MAR files that would cover most everyone.  

---

Should we decide to do that, 13.0a1 users will be able to consume the MAR files with the fix to it.  We should leave this temporary push on for however long we feel safe that users will be updated.  

In that time period users who manually download 14.0a1 from the web will get version downgrade errors until we revert this patch.
Attachment #605848 - Flags: feedback?(robert.bugzilla)
Comment on attachment 605800 [details] [diff] [review]
Patch (v1)

[Approval Request Comment]
Regression caused by (bug #): 708690
User impact if declined: Nobody would be able to upgrade to future versions on Windows.
Testing completed (on m-c, etc.): Verified the fix in a debugger
Risk to taking this patch (and alternatives if risky): No evident risks.
String changes made by this patch: None.
Attachment #605800 - Flags: approval-mozilla-beta?
Attachment #605800 - Flags: approval-mozilla-aurora?
btw since this is already marked as resolved/fixed Ill post another bug with a patch once we decide on which option we are moving forward with.
(In reply to Brian R. Bondy [:bbondy] from comment #8)
> When a request comes in from a v13 x86 build running on m-c direct the user
> to special_signed.mar

We should do the same for x64 as well, right?
(In reply to Brian R. Bondy [:bbondy] from comment #15)
> btw since this is already marked as resolved/fixed Ill post another bug with
> a patch once we decide on which option we are moving forward with.

Good idea, especially since the rest of this doesn't affect Aurora or Beta.
Brian said that he is going to fix the tests so that they would be able to catch this kind of thing in the future.
Flags: in-testsuite?
> We should do the same for x64 as well, right?

No, the checks are only being done on x86.  The check is gated by MOZ_VERIFY_MAR_SIGNATURE which is only set for windows x86 builds.
Blocks: 708690
Keywords: regression
Blocks: 735784
Comment on attachment 605848 [details] [diff] [review]
Hardcoding 13.0a1

Tracking this patch in Bug 735784
Attachment #605848 - Attachment is obsolete: true
Attachment #605848 - Flags: feedback?(robert.bugzilla)
Comment on attachment 605848 [details] [diff] [review]
Hardcoding 13.0a1

I *think* it would be better to create a single mar to fix the installations affected instead of going this route and since productVersion can be changed on the command line this patch shouldn't be necessary. As stated previously, what would be needed is to direct all nightly users using 13.* or the subset that require the version check to this mar. I'd like nthomas to weigh in on this before going other routes.
Attachment #605848 - Attachment is obsolete: false
Comment on attachment 605848 [details] [diff] [review]
Hardcoding 13.0a1

Re-hiding this patch since it is avail in: Bug 735784 
I think the obsolete flag was removed when feedback was added unintentionally.
Attachment #605848 - Attachment is obsolete: true
Just wanted to mention for RelEng that once this lands on beta and aurora we should spin a new nightly but not change the version numbers to indicate a new version.  Example if it is 13.0a1 on aurora it should remain 13.0a1.  If it is 12.0a1 on beta it should remain 12.0a1.
Versions have already been bumped on aurora and beta. 12.0b1 is in progress with the code that merged yesterday from aurora. Is that going to be a problem?

And on aurora, updates are disabled right now, and we could keep them that way until we have a fix in hand. But if the broken code has been on aurora for awhile already I don't know what to do.
I don't think there is a problem on aurora nor beta as long as the version numbers are not bumped further.  

People who had beta didn't have the the new security checks, they will be upgraded now to a build that does have the security checks but there is no problem if the version stays the same.

People who had aurora had the security checks but if they failed it would fall back to using an update without the security checks with a uac prompt on the next browser restart.
In case Comment 25 was a bit misleading...
Ehsan's fix should be applied as soon as possible to aurora and beta, I was just explaining about why people aren't affected today on those channels.
Comment on attachment 605800 [details] [diff] [review]
Patch (v1)

[Triage Comment]
Approved for Aurora 13 to unbreak future updates.
Attachment #605800 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 605800 [details] [diff] [review]
Patch (v1)

[Triage Comment]
Approved for Beta 12 as well. How can we 100% verify whether we need to re-spin our first beta Brian?
Attachment #605800 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
> How can we 100% verify whether we need to re-spin our first beta Brian?

If you don't re-spin with the same version number, then the b1 users will try to upgrade to b2 with the service and fail because of the version downgrade error. They will however fallback to using the old update process without the security checks on the next browser restart. So they will be able to get the good b2 either way.

Relating to aurora users:
They need to upgrade to a nightly build with an unchanged version number.  There is no fallback since the m-c code was migrated to aurora. 

To be 100% I think we would need to setup a cloned / signed in the same way branch and test out the scenarios.
I backed this out from all branches for test failures like this:

https://tbpl.mozilla.org/php/getParsedLog.php?id=10070674&tree=Firefox&full=1#error0

It seems like the version inside the mar file is an older one, and the update was being applied previously because of this bug, and the fix causes the update to not be applied any more.

Brian said he'll investigate the version of the mar files.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
The mar files have 12.0a1 so ehsan's patch is correct but the test is testing the opposite of what it should.  I will strip the MAR signature, bump up the version number, resign and push my patch and ehsan's to try.
I suggested that we should use "*" as the mar version number which is the highest possible version that NS_CompareVersions knows about.
Attached patch New MAR filesSplinter Review
This is simply new MAR files generated with a max version number to go along with ehsan's patch.

Ehsan, could you conditionally r+ this as long as it passes with your patch in try?  I will carry forward the a+ since it is covered by your patch.
Attachment #606002 - Flags: review?(ehsan)
It uses * as the version number and the signatures have changed.  Otherwise they are the same as before.
Comment on attachment 606002 [details] [diff] [review]
New MAR files

r+ based on Brian's description (nope, didn't actually review the binary content ;)
Attachment #606002 - Flags: review?(ehsan) → review+
(In reply to Ben Hearsum [:bhearsum] from comment #7)
> I've shut off Windows nightly updates:
> [ffxbld@dm-ausstage01 mozilla-central]$ ls -l
> total 256
> drwxr-xr-x 13 ffxbld users 20480 Mar 14 09:32 Darwin_x86_64-gcc3
> lrwxrwxrwx  1 ffxbld users    18 Sep 27  2010
> Darwin_x86_64-gcc3-u-i386-x86_64 -> Darwin_x86_64-gcc3
> lrwxrwxrwx  1 root   root     18 Oct  4  2010 Darwin_x86-gcc3-u-i386-x86_64
> -> Darwin_x86_64-gcc3
> lrwxrwxrwx  1 root   root     18 Oct  1  2010 Darwin_x86-gcc3-u-ppc-i386 ->
> Darwin_x86_64-gcc3
> drwxr-xr-x 13 ffxbld users 69632 Mar 14 04:46 Linux_x86_64-gcc3
> drwxr-xr-x 13 ffxbld users 73728 Mar 14 04:41 Linux_x86-gcc3
> drwx------ 13 ffxbld users 12288 Mar 14 05:52 WINNT_x86_64-msvc
> drwx------ 13 ffxbld users 73728 Mar 14 07:56 WINNT_x86-msvc
> 
> Need to chmod back to 755 when we're ready.

I chmod'ed these back yesterday after working nightlies had been produced.
Target Milestone: mozilla14 → ---
Summary: Version upgrades are broken → Version downgrade checks are wrong causing upgrades to be broken
Whoops, accidentally removed a bunch of flags. I can't restore the tracking ones, but I've fixed the rest.
Summary: Version downgrade checks are wrong causing upgrades to be broken → Version upgrades are broken
Target Milestone: --- → mozilla14
Summary: Version upgrades are broken → Version downgrade checks are wrong causing upgrades to be broken
From telemetry data:

Only 0.971% of users of Nightly builds updated fast enough to hit this error.
Of these users who updated this quickly, likely pretty much all of them will update again within the next 2 weeks which is the range we're leaving the work around for.

I've confirmed the work around works and also have had a couple other people tell me the updates work now. 

More than 99% of these affected 0.971% of users will upgrade again within the next 2 weeks based on some other data we have, but these 0.971% of users are the ones who update fast most likely.
Restoring lost tracking flags.
No longer blocks: 738061
What is the testcase for this bug? I assume something like:

1) Install fallback update from 11 -> 12
2) Install fallback update from 12 -> 13
3) Install fallback update from 13 -> 14
Whiteboard: [qa?]
That's correct.  It can even ben something like 12.0a1 -> 12.0a2
Whiteboard: [qa?] → [qa+]
Mozilla/5.0 (Windows NT 6.1; rv:12.0) Gecko/20100101 Firefox/12.0

Verified that updates work:
- from Firefox 11 beta 8  to Firefox 12 beta 3
and
- from Firefox 12 beta 2 to Firefox 12 beta 3.
I previously mentioned to release drivers that I would provide the final stats in this bug.  I provided those details in bug 735969 instead.
Mozilla/5.0 (Windows NT 5.1; rv:13.0) Gecko/20100101 Firefox/13.0
Mozilla/5.0 (Windows NT 6.1; rv:13.0) Gecko/20100101 Firefox/13.0
Mozilla/5.0 (Windows NT 6.0; rv:13.0) Gecko/20100101 Firefox/13.0

Verified that updates work (Windows XP, Windows 7 and Windows Vista):
- from Firefox 12 beta 6 to Firefox 13 beta 2 
and
- from Firefox 13 beta 1 to Firefox 13 beta 2.
Verified that updates work (on Windows XP, Windows Vista and on Windows 7):
- from Firefox 13 beta 7 to Firefox 14 beta 7 
and 
- from Firefox 14 beta 6 to Firefox 14 beta 7.

Mozilla/5.0 (Windows NT 6.1; rv:14.0) Gecko/20100101 Firefox/14.0
Mozilla/5.0 (Windows NT 6.0; rv:14.0) Gecko/20100101 Firefox/14.0
Mozilla/5.0 (Windows NT 5.1; rv:14.0) Gecko/20100101 Firefox/14.0
You need to log in before you can comment on or make changes to this bug.