Closed
Bug 732526
Opened 13 years ago
Closed 12 years ago
Partial Updates failing on OS X (Contents/MacOS/updater.app/Contents/MacOS/updater - destination file crc 1370062889 does not match expected crc -1219750014)
Categories
(Release Engineering :: General, defect, P4)
Tracking
(Not tracked)
RESOLVED
INCOMPLETE
People
(Reporter: rcampbell, Unassigned)
References
Details
(Whiteboard: [updates])
Attachments
(5 files)
I've seen a few partial updates fail this week. Updater falls back to full update.
See attached logs.
nightly originally downloaded from nightly.mozilla.org.
Reporter | ||
Comment 1•13 years ago
|
||
Comment 2•13 years ago
|
||
I am unable to reproduce. I downloaded the Mac nightly for 3/1 and updated multiple times to Mac nightly 3/2 using the partial mar. I also did the same with the bundle set to launch in 32 bit mode.
Is anyone else seeing this?
Is there anything that you know of that might have modified the following file?
Contents/MacOS/updater.app/Contents/MacOS/updater
Comment 3•13 years ago
|
||
Seeing the same. Just tried to update my Mac OS X Nightly from 2012-03-01 to 2012-03-02 and it failed saying the partial update could not be applied. I created a copy of the bundle while Nightly was still displaying the "Update Failed" dialog.
http://timtaubert.de/wp-content/Nightly.zip
Updated•13 years ago
|
Summary: Partial Updates failing on OS X → Partial Updates failing on OS X (Contents/MacOS/updater.app/Contents/MacOS/updater - destination file crc 1370062889 does not match expected crc -1219750014)
Comment 4•13 years ago
|
||
I'm also seeing this on Nightly with the same crc's. Full update worked.
Comment 5•13 years ago
|
||
There are definitely differences in the updater binary on Mac OS X when comparing the downloaded bundle for a specific build and a bundle that was updated to the same build. What - if any - operations are performed on the updater binary that are different between the creation of the universal or the dmg and the creation of the mar file?
Comment 6•13 years ago
|
||
Any possibility this is happening on the releng side -- did anything change recently? I see bug 400296 (sign OS X builds) is close to landing, maybe some deps or an accidental push? I could see a bug like this happening if, say, partial mars were expecting an unsigned updater but the full mars were a signed updater...
Pure conjecture, not trying to preassign blame to the fine folks in releng. :)
Comment 7•13 years ago
|
||
Huh. This seems odd. I wanted to poke at the differences between the updater in my existing Nightly (which failed the partial update), and the updater in the fill .dmg download...
My Nightly w/usual profile is, from About dialog, 2012-03-01. So I downloaded firefox-13.0a1.en-US.mac.dmg from ftp://ftp.mozilla.org/pub/firefox/nightly/2012-03-01-11-39-32-mozilla-central/ There's also a ftp://ftp.mozilla.org/pub/firefox/nightly/2012-03-01-11-39-32-mozilla-central/ with just some linux stuff, what's up with that]
This screenshot shows about:buildconfig for my existing Nightly on the left, and the from my .dmg download on the right. Both report 2012-03-01 in the About dialog, and both report being build from 1c3b291d0830.
So why does one say it was built on moz2-darwin10-slave45, and the other moz2-darwin10-slave50?
Comment 8•13 years ago
|
||
dolkse, your two dirs are the same, which ones did you mean ?
From the screenshot it looks like two nightlies have been triggered with the same revision, and probably using the same buildID so that partial generation gets mucked up. This may be related to bug 730548 causing a build to go red, but after the update is published and a second build being manually or automatically started.
Component: Application Update → Release Engineering: Automation
Product: Toolkit → mozilla.org
QA Contact: application.update → catlee
Version: unspecified → other
Comment 9•13 years ago
|
||
Whooops.
ftp://ftp.mozilla.org/pub/firefox/nightly/2012-03-01-11-39-32-mozilla-central/ (the link I used twice) is the dir with just some Linux stuff in it.
ftp://ftp.mozilla.org/pub/firefox/nightly/2012-03-01-03-11-35-mozilla-central/ (which I meant to use at least once!) is the dir from which I downloaded the .DMG.
Comment 10•13 years ago
|
||
(In reply to Justin Dolske [:Dolske] from comment #6)
> Any possibility this is happening on the releng side -- did anything change
> recently? I see bug 400296 (sign OS X builds) is close to landing, maybe
> some deps or an accidental push? I could see a bug like this happening if,
> say, partial mars were expecting an unsigned updater but the full mars were
> a signed updater...
>
> Pure conjecture, not trying to preassign blame to the fine folks in releng.
> :)
None of the OSX signing code has landed yet. (It will probably be landing early this week.) So bug 400296 should be unrelated to this one.
Comment 11•13 years ago
|
||
Mine keeps blowing up. I click the "Apply Update" button, Aurora shuts down, the dialog goes away...then it just stops. The app hangs ("Application Not Responding"), it starts eating my CPU (99.9-100%), and my fans spin up.
This time when I killed the app I got a dialog to report the problem to Apple. I'll attach the content of that report plus my app bundle here.
My updates are failing basically every time.
Comment 12•13 years ago
|
||
Comment 13•13 years ago
|
||
App bundle is in my dropbox here: http://dl.dropbox.com/u/12775088/Aurora.zip
Reporter | ||
Comment 14•13 years ago
|
||
note that deb's running Aurora (Fx12). I haven't checked the update logs in that bundle but would be very curious to know if it's the same or a different updater bug.
Comment 15•13 years ago
|
||
I think an updater crash is probably unrelated to this. Might be related to some of Brian Bondy's work though...
Comment 16•13 years ago
|
||
dria, what you are experiencing is different than this bug. Please file a new bug under toolkit -> application update. Thanks
Comment 17•13 years ago
|
||
Re comment 15:
Pretty much everything added into normal updater code for the security enhancements are gated around the currently only Windows x86 defined #ifdef MOZ_VERIFY_MAR_SIGNATURE. But it's always possible.
Updated•13 years ago
|
Priority: -- → P2
Whiteboard: [updates]
Comment 18•13 years ago
|
||
Just to be clear... What's the state of this / who owns it?
Comment 8 makes this sound like a releng bug. Having broken updates makes me rather nervous, if we can at least confirm what the problem is for sure that would be a fine first step.
Comment 19•13 years ago
|
||
Could I get an updates.xml from one or more people who saw failed partials, along with an indication of which days you had issues. Thanks!
Comment 20•13 years ago
|
||
(In reply to Justin Dolske [:Dolske] from comment #18)
> Just to be clear... What's the state of this / who owns it?
>
> Comment 8 makes this sound like a releng bug. Having broken updates makes me
> rather nervous, if we can at least confirm what the problem is for sure that
> would be a fine first step.
Made me nervous as well but after investigating it was found that the nightly didn't have this problem which indicates that the problem was likely (comment #2) and there have been no other reports which indicates it was likely due to what Nick stated in comment #8. The crc's for the broken updater bin were also the same between broken builds which suggests they were different for for the same build and that the partial was for a different build. For these reasons along with the lack of bug reports makes it so I am not terribly worried.
(In reply to Nick Thomas [:nthomas] from comment #19)
> Could I get an updates.xml from one or more people who saw failed partials,
> along with an indication of which days you had issues. Thanks!
The following contains an active-update.xml and I'll attach it in a sec.
http://timtaubert.de/wp-content/Nightly.zip
Comment 21•13 years ago
|
||
Nick, I don't think a second active-update.xml for a bundle that is experiencing this problem will provide anything in addition to this active-update.xml.
Comment 22•13 years ago
|
||
Good point Rob. It turns out there were two nightlies built on 2012-03-01, and Tim got updated to the first one. The partial to the 2012-03-02 nightly expects the second, and doesn't apply to the first.
We got two nightlies because of bug 730548. The first one failed to trigger the l10n nightlies, which happens at the very end of the job after we've uploaded the dmg, partial & complete mar and enabled that update path. The second was a rebuild (via tbpl/self-serve) of the original build so we get the same buildID and revision. The dmg and complete mar get overwritten on the ftp server, and a very small partial is generated. I'm not sure if that partial ever gets served offered or not.
This was a somewhat unusual situation where the buildbot master got into a confused state, which we can certainly resolve that much faster in the future. We could also adjust self-serve so that
* it's not possible to rebuild nightly jobs (eg using the + icon on tbpl)
* it's not possible to rebuild nightly jobs if the 'make upload' step has succeeded
In both cases self-serve could still allow a new set of nightlies on the same revision (or not if we choose) and we'd get a different buildID. The potential downside is that we get all platforms. Comments ?
Comment 23•13 years ago
|
||
I don't think m-c should have the ability to rebuild nightly builds via the + icon on tbpl if 'make upload' succeeded or possibly at all though that is just my opinion based on the belief that we don't need to do this all that often on m-c and that other repos that support nightly builds should be used for the type of testing where developers need to do this.
Comment 24•13 years ago
|
||
If we believe that no steps after make upload constitute failure, we should probably think about not having them mark a build as having failed.
Comment 25•13 years ago
|
||
If it's important to respin a nightly, is the side effect of the partial failing bad enough to warrant not allowing the respin?
Comment 26•13 years ago
|
||
(In reply to Aki Sasaki [:aki] from comment #25)
> If it's important to respin a nightly, is the side effect of the partial
> failing bad enough to warrant not allowing the respin?
we can respin a nightly by doing a new build on the same revision, rather than rebuilding the failed nightly. doing it this way will create a new buildid, which should avoid these issues.
Comment 27•13 years ago
|
||
so, what's actionable here? change TBPL to put up a big warning that re-building nightlies is probably not what you want to do?
Updated•13 years ago
|
Priority: P2 → P4
Comment 28•13 years ago
|
||
(In reply to Chris AtLee [:catlee] from comment #27)
> so, what's actionable here? change TBPL to put up a big warning that
> re-building nightlies is probably not what you want to do?
or make "re-building nightlies" not break updates, per your comment 26?
Though really, I don't think anyone ever really cares to re-build nightly builds using the same revision. I assume that by far the most common use case is "create new nightlies now" (from the tip, using whatever green-changeset-selection algorithm is used for normally triggered nightlies).
Reporter | ||
Comment 29•13 years ago
|
||
agreed. I think "make a new nightly from tip" is probably what people will want 90+% of the time.
Comment 30•13 years ago
|
||
That's not actually the case. I don't keep track of either how often people intentionally trigger a new nightly on every platform or of how often I retrigger a single platform because only that one platform failed, but I'd guess that it's closer to 5:1 me to them.
It may very well be the case that we'll decide it's too much trouble to fix, and that it's better to tell me to waste both build and bandwidth resources by needlessly replacing perfectly good nightlies on other platforms in order to fix the problem that one platform failed, serving far more full updates in the process to all the people who missed updating to the first one (possibly after a heated discussion about whether or not buildduty is 24/7, or even 6-that-matter/7), but it's not at all the case that nobody wants to retrigger one platform's nightly on the same rev where all the other platforms succeeded.
Comment 31•13 years ago
|
||
I recognize the value in being able to trigger nightlies only for a single platform. However, I don't recognize the value in having that single-platform-nightly be "on the same revision as other nightlies" - if it failed because the revision is bad, then you'll want a newer revision anyways, and it if failed for some other reason, might as well rebuild the latest code. The nightlies being out of sync across platforms isn't really a problem, particularly if it only occurs in exceptional cases.
Comment 32•12 years ago
|
||
This bug is big and long and without any clear action. Concrete actions should be spun out into specific bugs.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → INCOMPLETE
Assignee | ||
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
Assignee | ||
Updated•7 years ago
|
Component: General Automation → General
You need to log in
before you can comment on or make changes to this bug.
Description
•