Closed Bug 757122 Opened 13 years ago Closed 13 years ago

Pulse notifications for finished builds should not be send out before the builds are available for download

Categories

(Release Engineering :: General, defect)

defect
Not set
critical

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: whimboo, Unassigned)

References

()

Details

Lately we fail a lot in our Mozmill tests for daily builds because those are not available when the pulse message has been sent: [mozilla-aurora_functional] $ cmd.exe /C '"mozmill-env\run mozmill-automation\download.py --type=%BUILD_TYPE% --branch=mozilla-aurora --platform=%PLATFORM% --locale=%LOCALE% --build-id=%BUILD_ID% --directory=builds && exit %%ERRORLEVEL%%"' Retrieving list of builds from https://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2012/05/ Traceback (most recent call last): File "mozmill-automation/download.py", line 180, in <module> main() File "mozmill-automation/download.py", line 177, in main build.download() File "c:\jenkins\workspace\mozilla-aurora_functional\mozmill-automation\libs\scraper.py", line 238, in download if os.path.isfile(os.path.abspath(self.target)): File "c:\jenkins\workspace\mozilla-aurora_functional\mozmill-automation\libs\scraper.py", line 208, in target self.build_filename(self.binary)) File "c:\jenkins\workspace\mozilla-aurora_functional\mozmill-automation\libs\scraper.py", line 149, in binary raise NotFoundException("Binary not found in folder", self.path) libs.scraper.NotFoundException: Binary not found in folder: https://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2012/05/2012-05-21-04-20-08-mozilla-aurora-l10n We already wait about 3-4 seconds before we start the appropriate jobs but the build is still not available at this time. Given the unknown amount of time and looking at consistency the pulse notifications should really be sent out *after* the build is available on FTP.
So just to make it clear. Our download script tries to get the builds from the nightly subfolder of the FTP server: http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/ So not sure if those notifications are getting sent out when the builds are available via the appropriate tinderbox folder. If that's the case we should really delay them until the builds have also copied to the nightly folder.
The basic sequence on the buildbot side is * setup, build, upload, finish up job and after that * upload logs, add job to history database, send pulse message so I we're already doing the right thing AIUI. I am fairly confident that the problem is that ftp.m.o is caching the response, which it appears to do for 300 seconds (shorter than I expected). Given that's fairly short, can you add a slightly longer delay to verify that ?
So what amount of time are you expecting us to use for testing? While this is fine for investigation would it be possible to reduce this caching time on your side or would that be contra productive? When it comes to testing rapid beta nightly builds I really don't want to have to wait about 300s until we can start testing the newly builds.
There are two things that make this impossible to fix: many of these processes here are happening asynchronously, and we have no little control over the caching policies of ftp. Any distributed system needs to be tolerant of failures and have retries built into the system. I suggest you make your scripts delay a bit longer and work around the ftp cache by adding appropriate cache control headers to your request.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → WONTFIX
(In reply to Chris AtLee [:catlee] from comment #4) > Any distributed system needs to be tolerant of failures and have retries > built into the system. I suggest you make your scripts delay a bit longer I have raised the value to 60s yesterday and it didn't work. Now I have raised it to 300s for testing. Lets see if that works. > and work around the ftp cache by adding appropriate cache control headers to > your request. Means we should do something like? import httplib2 h = httplib2.Http("cache") resp, content = h.request(FTP, headers={'cache-control':'no-cache'})
(In reply to Henrik Skupin (:whimboo) from comment #5) > (In reply to Chris AtLee [:catlee] from comment #4) > > Any distributed system needs to be tolerant of failures and have retries > > built into the system. I suggest you make your scripts delay a bit longer > > I have raised the value to 60s yesterday and it didn't work. Now I have > raised it to 300s for testing. Lets see if that works. > > > and work around the ftp cache by adding appropriate cache control headers to > > your request. > > Means we should do something like? > > import httplib2 > h = httplib2.Http("cache") > resp, content = h.request(FTP, headers={'cache-control':'no-cache'}) I think you should actually be sending "Cache-Control: max-age=0". You could also experiment by adding random query strings at the end of the url to prevent caching, e.g. http://ftp.mozilla.org/.../firefox-15.0a1-win32.zip?r=12345
So the problem here were again failed builds we have had a lot in the last couple of hours. With the latest update to our CI we now successfully check if a build has been created successfully and cause tests only under such a condition. Sorry for the noise. Looks like we are fine now.
Product: mozilla.org → Release Engineering
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.