Closed Bug 805812 Opened 13 years ago Closed 13 years ago

Caching of l10n files results in bad repacks (and Pulse messages for builds offering an update to the same build)

Categories

(Release Engineering :: General, defect)

defect
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: davehunt, Unassigned)

References

()

Details

(Whiteboard: [mozmill-test-failure])

Attachments

(3 files)

Automated Mozmill update tests for localisation failed due the the current/target build ID being identical. firefox-18.0a2.[locale].win32.partial.20121025042010-20121025042010.mar
Whiteboard: [mozmill-test-failure]
As it looks like all the locale builds got another mar file beside the old one: firefox-18.0a2.fr.win32.partial.20121023042010-20121025042010.mar 25-Oct-2012 13:15 3.4M [ ] firefox-18.0a2.fr.win32.partial.20121025042010-20121025042010.mar 25-Oct-2012 23:15 18K 18K looks kinda broken too. But more interesting is that we do not have an update available from Oct 24th.
I suspect that's fall-out from missing windows builds. The other thing could be that the stub installers are now on, not sure if that's surprising the "figure out last good build" logic.
I don't think it is related to the stub installer because updates were working fine the last days since it has been landed.
Can we get any feedback from RelEng about this issue? It now happened again 2 days ago for the fr locale.
Summary: Locale updates for Windows are offering the same build for updating from/to → Updates for localized builds are offering the same build for updating from/to
This is happening more and more. Yesterday even with an en-US Nightly build which is the second one on that day. Could that be the reason that we mess-up with the id's when producing the pulse message for builds on the same day?
Severity: normal → major
Summary: Updates for localized builds are offering the same build for updating from/to → Pulse messages for builds offering an update to the same build are sent out regularly
Here a copy of the properties: > INFO:automation: completeMarFilename: firefox-19.0a1.en-US.linux-x86_64.complete.mar > INFO:automation: appName: Firefox > INFO:automation: build_failed: False > INFO:automation: got_revision: 87928cd21b40 > INFO:automation: forced_clobber: False > INFO:automation: packageSize: 31034779 > INFO:automation:partialsnippetFilename: build/obj-firefox/dist/update/partial.update.snippet > INFO:automation: purge_actual: 12.51GB > INFO:automation: previous_buildid: 20121114102936 > INFO:automation: previousMarFilename: firefox-19.0a1.en-US.linux-x86_64.complete.mar > INFO:automation: slavename: bld-linux64-ec2-018 > INFO:automation: jsshellUrl: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux64/1352917776/jsshell-linux-x86_64.zip > INFO:automation: project: > INFO:automation: builddir: m-cen-lnx64-ntly > INFO:automation: filepath: None > INFO:automation: packageFilename: firefox-19.0a1.en-US.linux-x86_64.tar.bz2 > INFO:automation: basedir: /builds/slave/m-cen-lnx64-ntly > INFO:automation:completesnippetFilename: build/obj-firefox/dist/update/complete.update.snippet > INFO:automation: appVersion: 19.0a1 > INFO:automation: comments: > INFO:automation: purge_target: 7GB > INFO:automation: platform: linux64 > INFO:automation: master: http://buildbot-master35.srv.releng.scl3.mozilla.com:8001/ > INFO:automation: testresults: [[u'num_ctors', u'num_ctors', 198, u'198']] > INFO:automation: branch: mozilla-central > INFO:automation: partialMarFilename: firefox-19.0a1.en-US.linux-x86_64.partial.20121114102936-20121114102936.mar > INFO:automation: stage_platform: linux64 > INFO:automation: revision: 87928cd21b40 > INFO:automation: product: firefox > INFO:automation: completeMarSize: 30979099 > INFO:automation: repository: > INFO:automation: buildername: Linux x86-64 mozilla-central nightly > INFO:automation: buildid: 20121114102936 > INFO:automation: completeMarUrl: http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2012/11/2012-11-14-10-29-36-mozilla-central/firefox-19.0a1.en-US.linux-x86_64.complete.mar > INFO:automation: packageHash: 7dd778da6e69cbed4e59eb14bb3e49eebbe0fa0e33e31efe3ffefbcc5ad83cc7e2365402afdafa46566c8219b2de661f4935f27132673df29c8284cbc6cdca2c > INFO:automation: completeMarHash: 8898ce4f8c0100b0082da2211cae5bc5cb9487f7a05fbe05813e1fed27ef50fea11686e3bf2f8fb1cc23e2e23beddcefb3bfe4f293561efdc505d6e3e46ba419 > INFO:automation: hashType: sha512 > INFO:automation: previous_inipath: previous/application.ini > INFO:automation: toolsdir: /builds/slave/m-cen-lnx64-ntly/tools > INFO:automation: num_ctors: 198 > INFO:automation: symbolsUrl: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux64/1352917776/firefox-19.0a1.en-US.linux-x86_64.crashreporter-symbols.zip > INFO:automation: packageUrl: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux64/1352917776/firefox-19.0a1.en-US.linux-x86_64.tar.bz2 > INFO:automation: partialMarUrl: http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2012/11/2012-11-14-10-29-36-mozilla-central/firefox-19.0a1.en-US.linux-x86_64.partial.20121114102936-20121114102936.mar > INFO:automation: purged_clobber: True > INFO:automation: nightly_build: True > INFO:automation: buildnumber: 59 > INFO:automation: testsUrl: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux64/1352917776/firefox-19.0a1.en-US.linux-x86_64.tests.zip > INFO:automation: periodic_clobber: False > INFO:automation: partialMarHash: db3a7d7e05c3a639ef6b233d4cbf83417bfd4c784bf25f396dfbeb7e2194c86d9af74ae1bc7af1b33da32715d80fa0dcfa20ca8f1b083a516aef664308bb202f > INFO:automation: partialMarSize: 2032964 > INFO:automation: builduid: 675b29fb3cfc481a97e8dca9e39d3448 > INFO:automation: slavebuilddir: m-cen-lnx64-ntly
Can we please get feedback from RelEng on that problem? I'd kinda annoying.
Flags: needinfo?(catlee)
If you could provide one more example that would be great. We have to go digging on the masters within a few days, otherwise the logs are deleted and it's very hard to make progress.
I will do once I see it again. But to be honest, not sure when this will happen. So it might take some days.
Attached file pulse message (11/30)
Nick, this pulse message is from Nov 30th and hopefully not too late.
Can't get absolutely everything, but hopefully enough to figure it out. There were two en-US nightlies that day: * regular nightly * buildID 20121129030820 * windows was done 06:37 Pacific * bits at http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2012/11/2012-11-29-03-08-20-mozilla-central/ * rebuild by edmorley * buildID 20121129074803 * windows was done 11:18 Pacific * bits at http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2012/11/2012-11-29-07-48-03-mozilla-central/ Both of those trigger l10n builds, which probably buildbot won't merge. Two l10n builds ran for win32 fr locale. * the first was fine * ran 10:48 until 11:01 so caught the first nightly, mw32-ix-slave12 * generated firefox-20.0a1.fr.win32.partial.20121128030742-20121129030820.mar * log: http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2012/11/2012-11-29-03-08-20-mozilla-central-l10n/mozilla-central-win32-l10n-nightly-fr-bm32-build1-build4043.txt.gz * the second wasn't so great * ran from 16:26 to 16:38, mw32-ix-slave07 (different slave) * generated firefox-20.0a1.fr.win32.partial.20121129030820-20121129030820.mar * log: http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2012/11/2012-11-29-03-08-20-mozilla-central-l10n/mozilla-central-win32-l10n-nightly-fr-bm30-build1-build6844.txt.gz So that has reused the older en-US nightly, and I'm pointing the finger at caching on ftp.m.o.
Flags: needinfo?(catlee)
There are a total of 161 mar files matching *20121129030820-20121129030820.mar in http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2012/11/2012-11-29-03-08-20-mozilla-central-l10n/, so win32 fr isn't an isolated example. We make requests like this: wget --no-cache -nv -N <url> but if I create a test file and modify it the --no-cache doesn't force the new content to return. Adding a random query string like ?asdf does.
Jake, any objections to a .htaccess containing Header set Cache-Control "max-age=0" in /pub/mozilla.org/{firefox,thunderbird,seamonkey}/nightly/latest*-l10n/ ? It should not be a large number of hits from internal machines, but I can't speak to external hits.
Where not large means tiny, one or two hits per file per day.
Flags: needinfo?(nmaul)
Okay with me for that volume... I doubt external hits are very significant either... and offhand I don't think that anyone will be hitting the CDN for those, so there's minimal risk of a poor cache hit ratio causing us trouble.
Flags: needinfo?(nmaul)
Thanks. I've done firefox and thunderbird. Existing caches expire after an hour (at most). Callek, could you do cd /pub/mozilla.org/seamonkey/nightly/ echo 'Header set Cache-Control "max-age=0"' > latest-comm-central-l10n/.htaccess echo 'Header set Cache-Control "max-age=0"' > latest-comm-aurora-l10n/.htaccess
Summary: Pulse messages for builds offering an update to the same build are sent out regularly → Caching of l10n files results in bad repacks (and Pulse messages for builds offering an update to the same build)
(In reply to Nick Thomas [:nthomas] from comment #17) > Callek, could you do > cd /pub/mozilla.org/seamonkey/nightly/ > echo 'Header set Cache-Control "max-age=0"' > > latest-comm-central-l10n/.htaccess Doesn't exist (latest-comm-central-trunk-l10n does) > echo 'Header set Cache-Control "max-age=0"' > > latest-comm-aurora-l10n/.htaccess [seabld@upload1.dmz.scl3 nightly]$ cd /pub/mozilla.org/seamonkey/nightly/ [seabld@upload1.dmz.scl3 nightly]$ echo 'Header set Cache-Control "max-age=0"' >latest-comm-aurora-l10n/.htaccess [seabld@upload1.dmz.scl3 nightly]$ echo 'Header set Cache-Control "max-age=0"' > latest-comm-central-trunk-l10n/.htaccess [seabld@upload1.dmz.scl3 nightly]$ ls -al latest-comm-central-trunk-l10n/.htaccess -rw-rw-r-- 1 seabld seamonkey 37 Dec 5 18:22 latest-comm-central-trunk-l10n/.htaccess [seabld@upload1.dmz.scl3 nightly]$ ls -al latest-comm-aurora-l10n/.htaccess -rw-rw-r-- 1 seabld seamonkey 37 Dec 5 18:20 latest-comm-aurora-l10n/.htaccess
OK, lets call this fixed.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Thanks Nick! I will watch out if it has been stopped for our Mozmill update tests.
This is not fixed and happened again today for mozilla-central builds. In tthis case the fr build we are testing. I will attach the pulse log. The properties mention an update: 20121206030737 -> 20121206030737
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Need more information to troubleshoot this as a caching problem. Specifically, can you provide the exact URL that is giving old data? At least one example. Here's my test, which shows a 3600-second max-age. [jakemaul@jakebook ~]$ curl -I 'http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-mozilla-central/firefox-20.0a1.en-US.langpack.xpi' HTTP/1.1 200 OK Server: Apache X-Backend-Server: ftp3.dmz.scl3.mozilla.com Cache-Control: max-age=3600 Content-Type: application/x-xpinstall Date: Mon, 10 Dec 2012 22:41:13 GMT Expires: Mon, 10 Dec 2012 23:41:13 GMT Accept-Ranges: bytes Access-Control-Allow-Origin: * ETag: "1fac759-4b4ed-4d080963acab6" Last-Modified: Mon, 10 Dec 2012 14:43:21 GMT Content-Length: 308461 Connection: Keep-Alive X-Cache-Info: cached Note this is for ftp.mozilla.org - no CDN involved. The only caching is at the Zeus layer. I've just checked, and it has a cap of 3600 seconds for HTTP requests, and 600 seconds for HTTPS requests. Beneath those caps, it will obey Cache-Control headers.
I don't know where I can help out here. I have no idea where Pulse is getting this data. Most likely from the build machines, right? We on the Mozmill side are simply following the information we get. So you probably want feedback from Chris or Nick?
I think the problem here is the the .htaccess I added for latest-mozilla-central-l10n got removed by a cron job.
This is what mobile is doing already.
Attachment #692061 - Flags: review?(aki)
Attachment #692061 - Flags: review?(aki) → review+
Comment on attachment 692061 [details] [diff] [review] [svn] exclude .htaccess files in latest-*-l10n from deletion Sending ffxbld Transmitting file data . Committed revision 54399.
Attachment #692061 - Flags: checked-in+
Status: REOPENED → RESOLVED
Closed: 13 years ago13 years ago
Resolution: --- → FIXED
Something might have been changed but we got a ton of failures today across Windows and Linux for fr Aurora builds: Linux: http://mm-ci-master.qa.scl3.mozilla.com:8080/job/mozilla-aurora_update/91/ Windows: http://mm-ci-master.qa.scl3.mozilla.com:8080/job/mozilla-aurora_update/95/ In both cases the previous and current build id refer to 20130117110358, while previous should have been 20130117042017.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Gah, the .htaccess files are gone again. Was that the first instance of failures ? I can probably use that to track down which cron job is responsible.
Yes, we haven't had this before. So here a possible time frame: http://mozmill-ci.blargon7.com/#/update/reports?branch=20.0&platform=All&from=2013-01-18&to=2013-01-18 Last working update: 2013-01-18T00:34:24.000Z Last failing update: 2013-01-18T02:47:44.000Z I hope that helps to track this down.
OK, so it might be something to do with the two nightlies on the 17th. Turns out the .htaccess files are still there and functional.
Ok, tracked this bug down. It's a different issue which I've filed bug 832510 on.
Status: REOPENED → RESOLVED
Closed: 13 years ago13 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: