Closed
Bug 1430535
Opened 6 years ago
Closed 6 years ago
Partial updates failing with "destination file crc X does not match expected crc Y"
Categories
(Release Engineering :: General, defect)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: nthomas, Unassigned)
References
Details
I first started looking at this because of update verify failures in DevEdition 59.0b1, for partial updates but not completes. Then I found it's present on Nightly too. While this doesn't block updates it does slow them down a lot. Here's how it manifests, eg in a last-update.log ... PREPARE PATCH Contents/Info.plist EXECUTE PATCH Contents/_CodeSignature/CodeResources LoadSourceFile: destination file crc -1001769479 does not match expected crc 262245848 LoadSourceFile failed ### execution failed ... update.status contains "failed: 4". The specific file being patched is dependent on the platform; on win64 it's xul.dll.sig. Firefox/Nightly goes on to download the complete update and apply it. We see this in telemetry for Nightly: https://telemetry.mozilla.org/new-pipeline/evo.html#!aggregates=bucket-4!bucket-9&cumulative=0&end_date=2018-01-12&keys=!__none__!__none__&max_channel_version=nightly%252F59&measure=UPDATE_STATUS_ERROR_CODE_PARTIAL_STARTUP&min_channel_version=nightly%252F59&processType=*&product=Firefox&sanitize=1&sort_keys=submissions&start_date=2017-11-13&trim=1&use_submission_date=0 with the number of submissions growing rapidly on 2018-01-11. The regression range appears to be > https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=4db166f0442dddc5b9011c722d7499501fedf283&tochange=e61c4485494ec4823da22217f665d73858c57e35 because it's OK in the 20180110221942 mac nightly, and fails in 20180111100722. It's non-trival to reproduce this unless you have access to the update server, and can set up a test channel that ensures a partial update (or apply mar files manually). Nothing is popping out to me from that set of changes. The two candidates I initially suspected Bug 1418425 - Remove bz2 and use the zlib crc32 function instead Bug 1401995 - Migrate funsize to python3 are too late and early respectively. Simon, were there any other funsize changes which might fit ? Perhaps we can find something in the logs for generating the funsize images.
Flags: needinfo?(sfraser)
Comment 1•6 years ago
|
||
This is probably me. I saw that during the dev phase when I was trying to replace bz2.crc by zlib.crc. This is why we didn't made that change yet (we are just using the crctable from bz2 https://hg.mozilla.org/mozilla-central/rev/2e95ebdcae6d) However, I did update bsdiff as I didn't see any issue in the CI. https://hg.mozilla.org/mozilla-central/rev/47fbcaf5ad05 Maybe this is the issue.
Comment 2•6 years ago
|
||
(In reply to Nick Thomas [:nthomas] (UTC+13) from comment #0) > > Nothing is popping out to me from that set of changes. The two candidates I > initially suspected > Bug 1418425 - Remove bz2 and use the zlib crc32 function instead > Bug 1401995 - Migrate funsize to python3 > are too late and early respectively. > > Simon, were there any other funsize changes which might fit ? Perhaps we can > find something in the logs for generating the funsize images. The async one is plausible, if there are odd side effects running them in parallel, but if bsdiff changed then that's a strong candidate, as that's the tool that does the grind work
Flags: needinfo?(sfraser)
Reporter | ||
Comment 3•6 years ago
|
||
It does look like it is bsdiff. With that change first in 20180112220334 I tried these (partial) update paths: 20180111100722 --> 20180111220102 works 20180111100722 --> 20180112100121 does too 20180111100722 --> 20180112220334 does as well, unexpectedly with bsdiff modified now 20180111100722 --> 20180113100255 doesn't IIRC we build mbsdiff as part of nightlies and upload it to https://archive.mozilla.org/pub/firefox/nightly/latest-mozilla-central/mar-tools/linux64/mbsdiff We pull that back into funsize to generate the partial and there's a 4 hour max-age header on that, so it's kinda plausible any change in the crc function would show up off-by-1 nightly.
Comment 6•6 years ago
|
||
hitting this on 58.0 update testing: https://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-release-win32/release-mozilla-release_firefox_win32_update_verify-bm72-build1-build274.txt.gz as well as 59.0b1 again. Unclear how since it was assumed backout was in place prior to start of release
Comment 7•6 years ago
|
||
We needed to back out, then have a nightly run to build a new copy of mbsdiff - it looks like the next nightly was having intermittent signing errors so it may not have uploaded the right files yet. We did ask cloudops to replace mbsdiff in ftp.m.o, which they did, but something may be caching it.
Comment 8•6 years ago
|
||
s3://tc-gp-private-1d-us-east-1/releng/mbsdiff-cache/ is the new hypothesis, confidence 90%. I added S3 caching of mbsdiff artifacts to speed up funsize tasks, and of course the source and destination files are the same, so it's not even running the new, fixed, mbsdiff, it's just using the cached copy of the results. Short term: Will ask taskcluster team to purge the cache, so we don't have to wait for the 24h expiry period. Longer term: I will improve the caching so that it takes account of tooling differences.
Comment 10•6 years ago
|
||
I've just added https://github.com/mozilla-releng/releasewarrior-2.0/wiki/Purging-the-Partials-Cache as a note on how to clear the partials artifact cache.
Reporter | ||
Comment 11•6 years ago
|
||
We had green update verify jobs in 59.0b1 build10 \o/. Telemetry for nightly isn't conclusive yet as it only goes up to Sunday builds.
Comment 12•6 years ago
|
||
I think we can close this specific bug out that was blocking releases. If there is a longer term fix, perhaps we file a follow up?
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Comment 13•6 years ago
|
||
I will report follow up bugs. Just one question, Johan told me that the update verification is going to be part of the regular CI in the next couple of months. Correct?
Comment 14•6 years ago
|
||
I just tried to update from the 1/16/2018 build to today's Nightly and I'm getting 'Up to date' message again.
Comment 15•6 years ago
|
||
Reopening due to failure to update again today.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Reporter | ||
Comment 16•6 years ago
|
||
That may be related to bug 1431110. It would be helpful if you could make sure app.update.log is set to true, and report the https://aus5.mozilla.org URL being used (look in the Browser Console).
Comment 17•6 years ago
|
||
Thanks Nick, - this is the spew from the app.update.log: 1516231454012 Toolkit.Telemetry WARN GCTelemetry::Number of properties exceeded in the GC telemetry data ping AUS:SVC Checker: checkForUpdates, force: true AUS:SVC Checker:getUpdateURL - update URL: https://aus5.mozilla.org/update/6/Firefox/59.0a1/20180116220110/WINNT_x86_64-msvc-x64/en-US/nightly/Windows_NT%2010.0.0.0.16299.192%20(x64)(noBug1296630v1)(nowebsense)/ISET:SSE4_2,MEM:8096/default/default/update.xml?force=1 AUS:SVC Checker:checkForUpdates - sending request to: https://aus5.mozilla.org/update/6/Firefox/59.0a1/20180116220110/WINNT_x86_64-msvc-x64/en-US/nightly/Windows_NT%2010.0.0.0.16299.192%20(x64)(noBug1296630v1)(nowebsense)/ISET:SSE4_2,MEM:8096/default/default/update.xml?force=1 AUS:SVC Checker:onLoad - request completed downloading document AUS:SVC Checker:onLoad - number of updates available: 0
Reporter | ||
Comment 18•6 years ago
|
||
OK, it's definitely bug 1431110 preventing data getting into the update server. The latest it knows about for win64 is the build you're using; same for win32. Linux & Mac are up to date.
Comment 19•6 years ago
|
||
Closing this again. The unrelated windows update issue should also be fixed now, per the other bug.
Status: REOPENED → RESOLVED
Closed: 6 years ago → 6 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•