Closed Bug 732526 Opened 13 years ago Closed 12 years ago

Partial Updates failing on OS X (Contents/MacOS/updater.app/Contents/MacOS/updater - destination file crc 1370062889 does not match expected crc -1219750014)

Categories

(Release Engineering :: General, defect, P4)

x86
macOS
defect

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: rcampbell, Unassigned)

References

Details

(Whiteboard: [updates])

Attachments

(5 files)

Attached file backup.log
I've seen a few partial updates fail this week. Updater falls back to full update. See attached logs. nightly originally downloaded from nightly.mozilla.org.
Attached file last-update.log
I am unable to reproduce. I downloaded the Mac nightly for 3/1 and updated multiple times to Mac nightly 3/2 using the partial mar. I also did the same with the bundle set to launch in 32 bit mode. Is anyone else seeing this? Is there anything that you know of that might have modified the following file? Contents/MacOS/updater.app/Contents/MacOS/updater
Seeing the same. Just tried to update my Mac OS X Nightly from 2012-03-01 to 2012-03-02 and it failed saying the partial update could not be applied. I created a copy of the bundle while Nightly was still displaying the "Update Failed" dialog. http://timtaubert.de/wp-content/Nightly.zip
Summary: Partial Updates failing on OS X → Partial Updates failing on OS X (Contents/MacOS/updater.app/Contents/MacOS/updater - destination file crc 1370062889 does not match expected crc -1219750014)
I'm also seeing this on Nightly with the same crc's. Full update worked.
There are definitely differences in the updater binary on Mac OS X when comparing the downloaded bundle for a specific build and a bundle that was updated to the same build. What - if any - operations are performed on the updater binary that are different between the creation of the universal or the dmg and the creation of the mar file?
Any possibility this is happening on the releng side -- did anything change recently? I see bug 400296 (sign OS X builds) is close to landing, maybe some deps or an accidental push? I could see a bug like this happening if, say, partial mars were expecting an unsigned updater but the full mars were a signed updater... Pure conjecture, not trying to preassign blame to the fine folks in releng. :)
Attached image Headscratch?
Huh. This seems odd. I wanted to poke at the differences between the updater in my existing Nightly (which failed the partial update), and the updater in the fill .dmg download... My Nightly w/usual profile is, from About dialog, 2012-03-01. So I downloaded firefox-13.0a1.en-US.mac.dmg from ftp://ftp.mozilla.org/pub/firefox/nightly/2012-03-01-11-39-32-mozilla-central/ There's also a ftp://ftp.mozilla.org/pub/firefox/nightly/2012-03-01-11-39-32-mozilla-central/ with just some linux stuff, what's up with that] This screenshot shows about:buildconfig for my existing Nightly on the left, and the from my .dmg download on the right. Both report 2012-03-01 in the About dialog, and both report being build from 1c3b291d0830. So why does one say it was built on moz2-darwin10-slave45, and the other moz2-darwin10-slave50?
dolkse, your two dirs are the same, which ones did you mean ? From the screenshot it looks like two nightlies have been triggered with the same revision, and probably using the same buildID so that partial generation gets mucked up. This may be related to bug 730548 causing a build to go red, but after the update is published and a second build being manually or automatically started.
Component: Application Update → Release Engineering: Automation
Product: Toolkit → mozilla.org
QA Contact: application.update → catlee
Version: unspecified → other
Whooops. ftp://ftp.mozilla.org/pub/firefox/nightly/2012-03-01-11-39-32-mozilla-central/ (the link I used twice) is the dir with just some Linux stuff in it. ftp://ftp.mozilla.org/pub/firefox/nightly/2012-03-01-03-11-35-mozilla-central/ (which I meant to use at least once!) is the dir from which I downloaded the .DMG.
(In reply to Justin Dolske [:Dolske] from comment #6) > Any possibility this is happening on the releng side -- did anything change > recently? I see bug 400296 (sign OS X builds) is close to landing, maybe > some deps or an accidental push? I could see a bug like this happening if, > say, partial mars were expecting an unsigned updater but the full mars were > a signed updater... > > Pure conjecture, not trying to preassign blame to the fine folks in releng. > :) None of the OSX signing code has landed yet. (It will probably be landing early this week.) So bug 400296 should be unrelated to this one.
Mine keeps blowing up. I click the "Apply Update" button, Aurora shuts down, the dialog goes away...then it just stops. The app hangs ("Application Not Responding"), it starts eating my CPU (99.9-100%), and my fans spin up. This time when I killed the app I got a dialog to report the problem to Apple. I'll attach the content of that report plus my app bundle here. My updates are failing basically every time.
note that deb's running Aurora (Fx12). I haven't checked the update logs in that bundle but would be very curious to know if it's the same or a different updater bug.
I think an updater crash is probably unrelated to this. Might be related to some of Brian Bondy's work though...
dria, what you are experiencing is different than this bug. Please file a new bug under toolkit -> application update. Thanks
Re comment 15: Pretty much everything added into normal updater code for the security enhancements are gated around the currently only Windows x86 defined #ifdef MOZ_VERIFY_MAR_SIGNATURE. But it's always possible.
Priority: -- → P2
Whiteboard: [updates]
Just to be clear... What's the state of this / who owns it? Comment 8 makes this sound like a releng bug. Having broken updates makes me rather nervous, if we can at least confirm what the problem is for sure that would be a fine first step.
Could I get an updates.xml from one or more people who saw failed partials, along with an indication of which days you had issues. Thanks!
(In reply to Justin Dolske [:Dolske] from comment #18) > Just to be clear... What's the state of this / who owns it? > > Comment 8 makes this sound like a releng bug. Having broken updates makes me > rather nervous, if we can at least confirm what the problem is for sure that > would be a fine first step. Made me nervous as well but after investigating it was found that the nightly didn't have this problem which indicates that the problem was likely (comment #2) and there have been no other reports which indicates it was likely due to what Nick stated in comment #8. The crc's for the broken updater bin were also the same between broken builds which suggests they were different for for the same build and that the partial was for a different build. For these reasons along with the lack of bug reports makes it so I am not terribly worried. (In reply to Nick Thomas [:nthomas] from comment #19) > Could I get an updates.xml from one or more people who saw failed partials, > along with an indication of which days you had issues. Thanks! The following contains an active-update.xml and I'll attach it in a sec. http://timtaubert.de/wp-content/Nightly.zip
Attached file active-update.xml
Nick, I don't think a second active-update.xml for a bundle that is experiencing this problem will provide anything in addition to this active-update.xml.
Good point Rob. It turns out there were two nightlies built on 2012-03-01, and Tim got updated to the first one. The partial to the 2012-03-02 nightly expects the second, and doesn't apply to the first. We got two nightlies because of bug 730548. The first one failed to trigger the l10n nightlies, which happens at the very end of the job after we've uploaded the dmg, partial & complete mar and enabled that update path. The second was a rebuild (via tbpl/self-serve) of the original build so we get the same buildID and revision. The dmg and complete mar get overwritten on the ftp server, and a very small partial is generated. I'm not sure if that partial ever gets served offered or not. This was a somewhat unusual situation where the buildbot master got into a confused state, which we can certainly resolve that much faster in the future. We could also adjust self-serve so that * it's not possible to rebuild nightly jobs (eg using the + icon on tbpl) * it's not possible to rebuild nightly jobs if the 'make upload' step has succeeded In both cases self-serve could still allow a new set of nightlies on the same revision (or not if we choose) and we'd get a different buildID. The potential downside is that we get all platforms. Comments ?
I don't think m-c should have the ability to rebuild nightly builds via the + icon on tbpl if 'make upload' succeeded or possibly at all though that is just my opinion based on the belief that we don't need to do this all that often on m-c and that other repos that support nightly builds should be used for the type of testing where developers need to do this.
If we believe that no steps after make upload constitute failure, we should probably think about not having them mark a build as having failed.
If it's important to respin a nightly, is the side effect of the partial failing bad enough to warrant not allowing the respin?
(In reply to Aki Sasaki [:aki] from comment #25) > If it's important to respin a nightly, is the side effect of the partial > failing bad enough to warrant not allowing the respin? we can respin a nightly by doing a new build on the same revision, rather than rebuilding the failed nightly. doing it this way will create a new buildid, which should avoid these issues.
so, what's actionable here? change TBPL to put up a big warning that re-building nightlies is probably not what you want to do?
Priority: P2 → P4
(In reply to Chris AtLee [:catlee] from comment #27) > so, what's actionable here? change TBPL to put up a big warning that > re-building nightlies is probably not what you want to do? or make "re-building nightlies" not break updates, per your comment 26? Though really, I don't think anyone ever really cares to re-build nightly builds using the same revision. I assume that by far the most common use case is "create new nightlies now" (from the tip, using whatever green-changeset-selection algorithm is used for normally triggered nightlies).
agreed. I think "make a new nightly from tip" is probably what people will want 90+% of the time.
That's not actually the case. I don't keep track of either how often people intentionally trigger a new nightly on every platform or of how often I retrigger a single platform because only that one platform failed, but I'd guess that it's closer to 5:1 me to them. It may very well be the case that we'll decide it's too much trouble to fix, and that it's better to tell me to waste both build and bandwidth resources by needlessly replacing perfectly good nightlies on other platforms in order to fix the problem that one platform failed, serving far more full updates in the process to all the people who missed updating to the first one (possibly after a heated discussion about whether or not buildduty is 24/7, or even 6-that-matter/7), but it's not at all the case that nobody wants to retrigger one platform's nightly on the same rev where all the other platforms succeeded.
I recognize the value in being able to trigger nightlies only for a single platform. However, I don't recognize the value in having that single-platform-nightly be "on the same revision as other nightlies" - if it failed because the revision is bad, then you'll want a newer revision anyways, and it if failed for some other reason, might as well rebuild the latest code. The nightlies being out of sync across platforms isn't really a problem, particularly if it only occurs in exceptional cases.
Blocks: 758975
This bug is big and long and without any clear action. Concrete actions should be spun out into specific bugs.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → INCOMPLETE
Product: mozilla.org → Release Engineering
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: