Closed Bug 502612 Opened 15 years ago Closed 15 years ago

Setup nightly update system in staging

Categories

(Release Engineering :: General, defect, P2)

x86
Linux

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: coop, Assigned: armenzg)

References

Details

Last week, I tried to push nightly updates for l10n into production (bug 480081). Armen and Nick had done some one-off patch generation testing using the production system (prometheus-vm) which seemed to work fine. When we flipped the switch on Friday, the patch generation system lost its mind. When it tried to generate updates in bulk, the patcher script started getting locales confused, e.g.: when trying to build an fr partial, patcher started downloading an hr complete mar. This would likely have been caught and fixed if we had patcher running in staging. Since I have a Q3 goal of unifying and sanitizing the patch generation process, I'll require a proper staging environment to test any changes. The first step is to get the production partial patch generation VM (prometheus-vm) cloned.
One thing I want to do early on is get patcher reading config data from an external file (e.g. config.py) so we can have both production and staging configs living side-by-side.
Status: NEW → ASSIGNED
Priority: -- → P2
Summary: Setup patcher is staging → Setup patcher in staging
Depends on: 502614
Summary: Setup patcher in staging → Setup nightly update system in staging
Blocks: 444050
staging-nightly-updates is setup now. I made some simple changes to the patcher script to point it at staging-stage. I also setup the missing thunderbird build dirs on staging-stage. The rsync was failing silently ($? == 0) on those missing dirs. Armen has a new windows nightly running in staging right now, after which the l10n nightlies will run. With any luck, we'll see either successful partial generation or a similar script failure to production later today.
The VM seems to be doing what expected generating partial mars. The problem is that I have not yet been able to do a complete or a partial update. This is the build that I have tried: http://staging-master.build.mozilla.org/pub/mozilla.org/firefox/nightly/2009-07-05-19-mozilla-central-l10n/firefox-3.6a1pre.af.win32.zip I am following these instructions to make it look in staging-stage for the updates: https://bugzilla.mozilla.org/show_bug.cgi?id=496196#c29 The snippets are: http://staging-master.build.mozilla.org/aus2/build/0/Firefox/mozilla-central/WINNT_x86-msvc/20090705195410/af/complete.txt http://staging-master.build.mozilla.org/aus2/incoming/2/Firefox/mozilla-central/WINNT_x86-msvc/20090705195410/af/complete.txt http://staging-master.build.mozilla.org/aus2/incoming/2/Firefox/mozilla-central/WINNT_x86-msvc/20090705195410/af/partial.txt and should match to this update URL (but it doesn't): http://staging-stage.build.mozilla.org/update/3/Firefox/3.6a1pre/20090705195410/WINNT_x86-msvc/af/nightly/Windows_NT%205.0/default/default/update.xml?force=1 An example of the uploaded partial mars for win32: http://staging-stage.build.mozilla.org/pub/mozilla.org/firefox/nightly/2009/07/2009-07-05-19-mozilla-central-l10n/ At this moment there has only been a new set of win32: [cltbld@staging-stage mozilla-central]$ pwd /opt/aus2/incoming/2/Firefox/mozilla-central [cltbld@staging-stage mozilla-central]$ find . -iname partial.txt | grep es-ES ./WINNT_x86-msvc/20090709091211/es-ES/partial.txt ./WINNT_x86-msvc/20090705195410/es-ES/partial.txt ./Darwin_Universal-gcc3/20090707102620/es-ES/partial.txt ./Linux_x86-gcc3/20090705052137/es-ES/partial.txt [cltbld@staging-stage mozilla-central]$ cat WINNT_x86-msvc/20090705195410/af/partial.txt | grep url url=http://staging-stage.build.mozilla.org/pub/mozilla.org/firefox/nightly/2009/07/2009-07-05-19-mozilla-central-l10n/firefox-3.6a1pre.af.win32.partial.20090705195410-20090709091211.mar Another thing that I have found is that the script is generating the en-US partials out of what is on ftp.mozilla.org since there were no dated dirs on staging for en-US. [cltbld@staging-stage mozilla-central]$ cat Darwin_Universal-gcc3/20090703031521/en-US/complete.txt | grep url url=http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2009/07/2009-07-04-03-mozilla-central/firefox-3.6a1pre.en-US.mac.complete.mar Check them here: http://staging-stage.build.mozilla.org/pub/mozilla.org/firefox/nightly/2009/07/2009-07-03-03-mozilla-central/
(In reply to comment #3) > Another thing that I have found is that the script is generating the en-US > partials out of what is on ftp.mozilla.org since there were no dated dirs on > staging for en-US. > [cltbld@staging-stage mozilla-central]$ cat > Darwin_Universal-gcc3/20090703031521/en-US/complete.txt | grep url > url=http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2009/07/2009-07-04-03-mozilla-central/firefox-3.6a1pre.en-US.mac.complete.mar > Check them here: > http://staging-stage.build.mozilla.org/pub/mozilla.org/firefox/nightly/2009/07/2009-07-03-03-mozilla-central/ This looks to be a problem with the patcher script, i.e. it's fixing up the urls based on a hard-coded 'ftp.mozilla.org' string. I've changed that regexp to use an $ftp_host param from the config file. I've also added Armen to the list of recipients for the reports from the staging patcher script (set in /builds/nightly-partial-generation/app/patch-packager-cron.sh).
The cronjob for cleaning up staging-stage has bit us again. It has deleted the MAR files that were about to be downloaded. I have deleted all the snippets that had a URL pointing to a missing MAR file. How could we avoid this? Can the snippets in /opt/aus2/build be removed once processed by the script? I hope a good run goes tonight and over the weekend will be better Touching /builds/nightly-partial-generation/aus/0/Firefox/mozilla-central/Darwin_Universal-gcc3/20090707102620/pa-IN/complete.txt... done Downloading http://staging-stage.build.mozilla.org/pub/mozilla.org/firefox/nightly/2009/07/2009-07-10-03-mozilla-central-l10n/firefox-3.6a1pre.pa-IN.mac.complete.mar... --16:55:02-- http://staging-stage.build.mozilla.org/pub/mozilla.org/firefox/nightly/2009/07/2009-07-10-03-mozilla-central-l10n/firefox-3.6a1pre.pa-IN.mac.complete.mar => `to.mar' Resolving staging-stage.build.mozilla.org... 10.2.71.82 Connecting to staging-stage.build.mozilla.org|10.2.71.82|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 17,605,584 (17M) [text/plain] 0K ........ ........ ........ ........ ........ ........ 17% 53.32 MB/s 3072K ........ ........ ........ ........ ........ ........ 35% 55.68 MB/s 6144K ........ ........ ........ ........ ........ ........ 53% 54.84 MB/s 9216K ........ ........ ........ ........ ........ ........ 71% 61.69 MB/s 12288K ........ ........ ........ ........ ........ ........ 89% 68.68 MB/s 15360K ........ ........ ........ .... 100% 68.41 MB/s 16:55:03 (59.26 MB/s) - `to.mar' saved [17605584/17605584] done Touching /builds/nightly-partial-generation/aus/0/Firefox/mozilla-central/Darwin_Universal-gcc3/20090707102620/pa-IN/partial.txt... done Downloading http://staging-stage.build.mozilla.org/pub/mozilla.org/firefox/nightly/2009/07/2009-07-07-10-mozilla-central-l10n/firefox-3.6a1pre.pa-IN.mac.complete.mar... --16:55:03-- http://staging-stage.build.mozilla.org/pub/mozilla.org/firefox/nightly/2009/07/2009-07-07-10-mozilla-central-l10n/firefox-3.6a1pre.pa-IN.mac.complete.mar => `from.mar' Resolving staging-stage.build.mozilla.org... 10.2.71.82 Connecting to staging-stage.build.mozilla.org|10.2.71.82|:80... connected. HTTP request sent, awaiting response... 404 Not Found 16:55:03 ERROR 404: Not Found.
(In reply to comment #5) > The cronjob for cleaning up staging-stage has bit us again. > It has deleted the MAR files that were about to be downloaded. > I have deleted all the snippets that had a URL pointing to a missing MAR file. > > How could we avoid this? Can the snippets in /opt/aus2/build be removed once > processed by the script? I hope a good run goes tonight and over the weekend > will be better Thar might have been over-aggressive cleanup on my part, but I'm not sure. We had filled up /builds completely with our repeated attempts over the last few days and the cleanup cronjobs weren't keeping up. They expire files older than 5 days. /opt/aus2/build is tiny...only 2.2M at present.
I have been able to have a partial and complete update for the "af" language on staging. I downloaded http://staging-stage.build.mozilla.org/pub/mozilla.org/firefox/nightly/2009/07/2009-07-10-04-mozilla-central-l10n/firefox-3.6a1pre.af.win32.zip the URL update checked by it was: http://staging-stage.build.mozilla.org/update/3/Firefox/3.6a1pre/20090710044907/WINNT_x86-msvc/af/nightly/Windows_NT%205.0/default/default/update.xml?force=1 and got updated accordingly. Here is the partial MAR I was offered: http://staging-stage.build.mozilla.org/pub/mozilla.org/firefox/nightly/2009/07/2009-07-10-04-mozilla-central-l10n/firefox-3.6a1pre.af.win32.partial.20090710044907-20090713044326.mar The only thing "unnatural" that I have done is to wget the latest-mozilla-central en-US binaries into staging-stage since the en-US nightly never reached their turn to run on staging over the weekend. This night I will schedule the en-US runs on staging to happen naturally past midnight and make a run of many more locales than just "af" early in the morning. I will also test the scenario for Darwin and Linux.
Assignee: ccooper → armenzg
Partials properly generated for the 15th in: http://staging-stage.build.mozilla.org/pub/mozilla.org/firefox/nightly/2009/07/2009-07-15-01-mozilla-central-l10n/ http://staging-stage.build.mozilla.org/pub/mozilla.org/firefox/nightly/2009/07/2009-07-15-04-mozilla-central-l10n/ * Today is the first day that the removing script has not removed more than it should have. I hope no more fine tuning is required * I have been able to test the update for es-AR for windows and mac from the 15th and the 16th. If someone could test it for linux that would be awesome (it requires vpn connection to mpt) * 2 out of the 6 locales for windows were not completed since the slave was lost while they were being generated * I wonder why the locale "ar" on windows did not get a partial for it. I should look at the snippets for the run of the 15th and the 16th * The only bug that I am aware of and that I have to confirm is that the "make wget-en-US" target on mac did not download the latest en-US (regarless of the output) and therefore the buildid was of the 15th rather than 16th. I run wget as per the log (without going through the make target) in the slave and it got updated to the latest. I have the feeling that wget might be slightly different on mac than on windows and linux. I will have to confirm this.
Partials properly generated for the 16th in: http://staging-stage.build.mozilla.org/pub/mozilla.org/firefox/nightly/2009/07/2009-07-16-01-mozilla-central-l10n/ http://staging-stage.build.mozilla.org/pub/mozilla.org/firefox/nightly/2009/07/2009-07-16-03-mozilla-central-l10n/ Tested the updates from 16th to 17th on mac and windows. (In reply to comment #10) > * I wonder why the locale "ar" on windows did not get a partial for it. I > should look at the snippets for the run of the 15th and the 16th The snippets were not generated since the slave got lost before uploading the snippets > * The only bug that I am aware of and that I have to confirm is that the "make > wget-en-US" target on mac did not download the latest en-US (regarless of the > output) and therefore the buildid was of the 15th rather than 16th. I run wget > as per the log (without going through the make target) in the slave and it got > updated to the latest. I have the feeling that wget might be slightly different > on mac than on windows and linux. I will have to confirm this. I have confirmed that the wget is working properly on mac without any of my intervention. The version is 1.11 instead 1.10.2 but that is about it. I have checked the log of the last 3 days and seen that the first run of every day did as supposed. With so much going on during the group gathering I might have not looked properly at the start time of the jobs.
I am pretty satisfied with the results as it just required the fine tuning of the removing scripts on staging-stage. When can we turn this live?
(In reply to comment #12) > I am pretty satisfied with the results as it just required the fine tuning of > the removing scripts on staging-stage. > > When can we turn this live? coop/catlee/bhearsum: what kind of review/testing do we need to go through before trying this in production again? These landings have bounced off a few times already, so I'd like to be cautious. Also, does this require a downtime or just a reconfig?
The only comment before we turn this live is that we have to remove any snippets left in aus from previous runs. Patcher was trying to generate updates for MAR files that had already been deleted from stage (this caught us off guard). (In reply to comment #13) > Also, does this require a downtime or just a reconfig? Reconfig (non-downtime)
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
I forgot to mention that I am closing since the system is setup and working. Nothing left to be done.
There has been some weirdness with the version bumping and with the Wince related work (it might be related with the version bumping as well). I fixed it by running: | for file in `find . -name 'complete.txt' -type f -mtime -10`; do url=`grep | ^http $file`; echo $file `curl -sI $url | grep ^HTTP` | grep 404 || continue; | rm $file; done in | [cltbld@staging-nightly-updates Firefox]$ pwd | /builds/nightly-partial-generation/build/0/Firefox and | [cltbld@staging-stage Firefox]$ pwd | /opt/aus2/build/0/Firefox I believe that what we have to note is that snippets being processed that point to missing MAR files should not stop the partial MAR generation but continue. OUTPUT: ############################## Building patches for Firefox/mozilla-central/WINCE_arm-msvc/en-US/20090813031716 => Updating to Firefox/mozilla-central/WINCE_arm-msvc/en-US/20090814042320 Touching /builds/nightly-partial-generation/aus/0/Firefox/mozilla-central/WINCE_arm-msvc/20090813031716/en-US/complete.txt... done Downloading http://staging-stage.build.mozilla.org/pub/mozilla.org/firefox/nightly/2009/08/2009-08-14-04-mozilla-central/firefox-3.7a1pre.en-US.wince-arm.complete.mar... --06:20:06-- http://staging-stage.build.mozilla.org/pub/mozilla.org/firefox/nightly/2009/08/2009-08-14-04-mozilla-central/firefox-3.7a1pre.en-US.wince-arm.complete.mar => `to.mar' Resolving staging-stage.build.mozilla.org... 10.2.71.82 Connecting to staging-stage.build.mozilla.org|10.2.71.82|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 10,262,720 (9.8M) [text/plain] 0K ........ ........ ........ ........ ........ ........ 30% 97.37 MB/s 3072K ........ ........ ........ ........ ........ ........ 61% 104.00 MB/s 6144K ........ ........ ........ ........ ........ ........ 91% 103.00 MB/s 9216K ........ .... 100% 107.48 MB/s 06:20:06 (101.84 MB/s) - `to.mar' saved [10262720/10262720] done Touching /builds/nightly-partial-generation/aus/0/Firefox/mozilla-central/WINCE_arm-msvc/20090813031716/en-US/partial.txt... done Downloading http://staging-stage.build.mozilla.org/pub/mozilla.org/firefox/nightly/2009/08/2009-08-13-03-mozilla-central/firefox-3.6a2pre.en-US.wince-arm.complete.mar... --06:20:06-- http://staging-stage.build.mozilla.org/pub/mozilla.org/firefox/nightly/2009/08/2009-08-13-03-mozilla-central/firefox-3.6a2pre.en-US.wince-arm.complete.mar => `from.mar' Resolving staging-stage.build.mozilla.org... 10.2.71.82 Connecting to staging-stage.build.mozilla.org|10.2.71.82|:80... connected. HTTP request sent, awaiting response... 404 Not Found 06:20:06 ERROR 404: Not Found. create_partial_patch: 'wget -O from.mar http://staging-stage.build.mozilla.org/pub/mozilla.org/firefox/nightly/2009/08/2009-08-13-03-mozilla-central/firefox-3.6a2pre.en-US.wince-arm.complete.mar' returned non-zero! at /builds/nightly-partial-generation/app/patch-packager.pl line 640. ############################## Building patches for Firefox/mozilla-1.9.1/Linux_x86-gcc3/en-US/20090811032917 => Updating to Firefox/mozilla-1.9.1/Linux_x86-gcc3/en-US/20090814021625 Touching /builds/nightly-partial-generation/aus/0/Firefox/mozilla-1.9.1/Linux_x86-gcc3/20090811032917/en-US/complete.txt... done Downloading http://staging-stage.build.mozilla.org/pub/mozilla.org/firefox/nightly/2009/08/2009-08-14-02-mozilla-1.9.1/firefox-3.5.3pre.en-US.linux-i686.complete.mar... --06:40:03-- http://staging-stage.build.mozilla.org/pub/mozilla.org/firefox/nightly/2009/08/2009-08-14-02-mozilla-1.9.1/firefox-3.5.3pre.en-US.linux-i686.complete.mar => `to.mar' Resolving staging-stage.build.mozilla.org... 10.2.71.82 Connecting to staging-stage.build.mozilla.org|10.2.71.82|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 9,932,605 (9.5M) [text/plain] 0K ........ ........ ........ ........ ........ ........ 31% 32.21 MB/s 3072K ........ ........ ........ ........ ........ ........ 63% 99.19 MB/s 6144K ........ ........ ........ ........ ........ ........ 95% 93.59 MB/s 9216K ....... 100% 71.45 MB/s 06:40:03 (58.46 MB/s) - `to.mar' saved [9932605/9932605] done Touching /builds/nightly-partial-generation/aus/0/Firefox/mozilla-1.9.1/Linux_x86-gcc3/20090811032917/en-US/partial.txt... done Downloading http://staging-stage.build.mozilla.org/pub/mozilla.org/firefox/nightly/2009/08/2009-08-11-03-mozilla-1.9.1/firefox-3.5.3pre.en-US.linux-i686.complete.mar... --06:40:03-- http://staging-stage.build.mozilla.org/pub/mozilla.org/firefox/nightly/2009/08/2009-08-11-03-mozilla-1.9.1/firefox-3.5.3pre.en-US.linux-i686.complete.mar => `from.mar' Resolving staging-stage.build.mozilla.org... 10.2.71.82 Connecting to staging-stage.build.mozilla.org|10.2.71.82|:80... connected. HTTP request sent, awaiting response... 404 Not Found 06:40:03 ERROR 404: Not Found. create_partial_patch: 'wget -O from.mar http://staging-stage.build.mozilla.org/pub/mozilla.org/firefox/nightly/2009/08/2009-08-11-03-mozilla-1.9.1/firefox-3.5.3pre.en-US.linux-i686.complete.mar' returned non-zero! at /builds/nightly-partial-generation/app/patch-packager.pl line 640.
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.