Closed
Bug 757122
Opened 13 years ago
Closed 13 years ago
Pulse notifications for finished builds should not be send out before the builds are available for download
Categories
(Release Engineering :: General, defect)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
WONTFIX
People
(Reporter: whimboo, Unassigned)
References
()
Details
Lately we fail a lot in our Mozmill tests for daily builds because those are not available when the pulse message has been sent:
[mozilla-aurora_functional] $ cmd.exe /C '"mozmill-env\run mozmill-automation\download.py --type=%BUILD_TYPE% --branch=mozilla-aurora --platform=%PLATFORM% --locale=%LOCALE% --build-id=%BUILD_ID% --directory=builds && exit %%ERRORLEVEL%%"'
Retrieving list of builds from https://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2012/05/
Traceback (most recent call last):
File "mozmill-automation/download.py", line 180, in <module>
main()
File "mozmill-automation/download.py", line 177, in main
build.download()
File "c:\jenkins\workspace\mozilla-aurora_functional\mozmill-automation\libs\scraper.py", line 238, in download
if os.path.isfile(os.path.abspath(self.target)):
File "c:\jenkins\workspace\mozilla-aurora_functional\mozmill-automation\libs\scraper.py", line 208, in target
self.build_filename(self.binary))
File "c:\jenkins\workspace\mozilla-aurora_functional\mozmill-automation\libs\scraper.py", line 149, in binary
raise NotFoundException("Binary not found in folder", self.path)
libs.scraper.NotFoundException: Binary not found in folder: https://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2012/05/2012-05-21-04-20-08-mozilla-aurora-l10n
We already wait about 3-4 seconds before we start the appropriate jobs but the build is still not available at this time. Given the unknown amount of time and looking at consistency the pulse notifications should really be sent out *after* the build is available on FTP.
Reporter | ||
Comment 1•13 years ago
|
||
So just to make it clear. Our download script tries to get the builds from the nightly subfolder of the FTP server:
http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/
So not sure if those notifications are getting sent out when the builds are available via the appropriate tinderbox folder. If that's the case we should really delay them until the builds have also copied to the nightly folder.
Comment 2•13 years ago
|
||
The basic sequence on the buildbot side is
* setup, build, upload, finish up job
and after that
* upload logs, add job to history database, send pulse message
so I we're already doing the right thing AIUI. I am fairly confident that the problem is that ftp.m.o is caching the response, which it appears to do for 300 seconds (shorter than I expected). Given that's fairly short, can you add a slightly longer delay to verify that ?
Reporter | ||
Comment 3•13 years ago
|
||
So what amount of time are you expecting us to use for testing? While this is fine for investigation would it be possible to reduce this caching time on your side or would that be contra productive?
When it comes to testing rapid beta nightly builds I really don't want to have to wait about 300s until we can start testing the newly builds.
Comment 4•13 years ago
|
||
There are two things that make this impossible to fix: many of these processes here are happening asynchronously, and we have no little control over the caching policies of ftp.
Any distributed system needs to be tolerant of failures and have retries built into the system. I suggest you make your scripts delay a bit longer and work around the ftp cache by adding appropriate cache control headers to your request.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → WONTFIX
Reporter | ||
Comment 5•13 years ago
|
||
(In reply to Chris AtLee [:catlee] from comment #4)
> Any distributed system needs to be tolerant of failures and have retries
> built into the system. I suggest you make your scripts delay a bit longer
I have raised the value to 60s yesterday and it didn't work. Now I have raised it to 300s for testing. Lets see if that works.
> and work around the ftp cache by adding appropriate cache control headers to
> your request.
Means we should do something like?
import httplib2
h = httplib2.Http("cache")
resp, content = h.request(FTP, headers={'cache-control':'no-cache'})
Comment 6•13 years ago
|
||
(In reply to Henrik Skupin (:whimboo) from comment #5)
> (In reply to Chris AtLee [:catlee] from comment #4)
> > Any distributed system needs to be tolerant of failures and have retries
> > built into the system. I suggest you make your scripts delay a bit longer
>
> I have raised the value to 60s yesterday and it didn't work. Now I have
> raised it to 300s for testing. Lets see if that works.
>
> > and work around the ftp cache by adding appropriate cache control headers to
> > your request.
>
> Means we should do something like?
>
> import httplib2
> h = httplib2.Http("cache")
> resp, content = h.request(FTP, headers={'cache-control':'no-cache'})
I think you should actually be sending "Cache-Control: max-age=0". You could also experiment by adding random query strings at the end of the url to prevent caching, e.g.
http://ftp.mozilla.org/.../firefox-15.0a1-win32.zip?r=12345
Reporter | ||
Comment 7•13 years ago
|
||
So the problem here were again failed builds we have had a lot in the last couple of hours. With the latest update to our CI we now successfully check if a build has been created successfully and cause tests only under such a condition. Sorry for the noise. Looks like we are fine now.
Assignee | ||
Updated•12 years ago
|
Product: mozilla.org → Release Engineering
Assignee | ||
Updated•7 years ago
|
Component: General Automation → General
You need to log in
before you can comment on or make changes to this bug.
Description
•