Closed Bug 862724 Opened 11 years ago Closed 9 years ago

Additional pulse message for same nightly builds sent (with previous_buildid == buildid)

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: whimboo, Unassigned)

References

Details

(Whiteboard: [qa-automation-wanted])

As seen today we got multiple pulse messages for the same build. The second one is broken and contains the same previous_buildid as the buildid. In this case '20130416004017'. There is a delay of about 3.5h when those have been sent out:

-rw-rw-r-- 1 mozauto mozauto 1015 Apr 16 06:38 log/mozilla-aurora/20130416004017_firefox_en-US_linux_build.mozilla-aurora-linux-nightly.18.log_uploaded.log

-rw-rw-r-- 1 mozauto mozauto 1016 Apr 16 02:51 log/mozilla-aurora/20130416004017_firefox_en-US_linux_build.mozilla-aurora-linux-nightly.92.log_uploaded.log

18:
{"locale": "en-US", "testsurl": "http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-aurora-linux/1366098017/firefox-22.0a2.en-US.linux-i686.tests.zip", "previous_buildid": "20130416004017", "job_number": 18, "build_number": null, "builddate": 1366098017, "buildername": "Linux mozilla-aurora nightly", "platform": "linux", "version": "22.0a2", "revision": "59a419eca6359683a2eb031f3de01946d162594c", "status": 0, "buildtype": "opt", "product": "firefox", "slave": "bld-linux64-ix-023", "tags": ["nightly"], "buildid": "20130416004017", "timestamp": "2013-04-16T13:38:08Z", "key": "build.mozilla-aurora-linux-nightly.18.log_uploaded", "logurl": "http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2013/04/2013-04-16-00-40-17-mozilla-aurora/mozilla-aurora-linux-nightly-bm12-build1-build18.txt.gz", "tree": "mozilla-aurora", "buildurl": "http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-aurora-linux/1366098017/firefox-22.0a2.en-US.linux-i686.tar.bz2", "release": null}

92:
{"locale": "en-US", "testsurl": "http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-aurora-linux/1366098017/firefox-22.0a2.en-US.linux-i686.tests.zip", "previous_buildid": "20130415004014", "job_number": 92, "build_number": null, "builddate": 1366098017, "buildername": "Linux mozilla-aurora nightly", "platform": "linux", "version": "22.0a2", "revision": "59a419eca6359683a2eb031f3de01946d162594c", "status": 2, "buildtype": "opt", "product": "firefox", "slave": "bld-linux64-ec2-457", "tags": ["nightly"], "buildid": "20130416004017", "timestamp": "2013-04-16T09:51:33Z", "key": "build.mozilla-aurora-linux-nightly.92.log_uploaded", "logurl": "http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2013/04/2013-04-16-00-40-17-mozilla-aurora/mozilla-aurora-linux-nightly-bm49-build1-build92.txt.gz", "tree": "mozilla-aurora", "buildurl": "http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-aurora-linux/1366098017/firefox-22.0a2.en-US.linux-i686.tar.bz2", "release": null}

The broken message here is the one with job number 18.
Whiteboard: [qa-automation-wanted]
The nightly builds on aurora were rebuilt yesterday. build 92 is the first build that failed to publish snippets properly. build 18 succeeded after being manually re-triggered by sheriffs.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WORKSFORME
Chris, do you have a bug # which describes the problem in publishing the snippets? Also why does job 18 uses the same build id also for the previous build?

Such a behavior looks broken given that we do not get useful information which we can base on for our automated testing.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
This particular instance was caused by a network outage which prevented the first build (92) from uploading snippets. This is why it has a result of "2" which means "failed".

Rebuilding this job was the correct thing to do, and the new build (18) would use the same  buildid as before. Since the previous build was uploaded successfully, it became the new "previous build".
That means uploading the snippets is decoupled from the build process at the moment? And whenever it fails we set the results to '2' but update the previous_buildid anyway? Shouldn't we only update this id when the snippets have been successfully uploaded?
We got this again today. Would be nice to get an answer to my last question.
Flags: needinfo?(catlee)
Yes, snippet uploading is decoupled from publishing the build. And previous buildid discovery is based on what builds are published.
Flags: needinfo?(catlee)
Product: mozilla.org → Release Engineering
This happened again yesterday with Aurora on Windows. Previoud_buildid and buildid is "20140109004001":

{"locale": "en-US", "testsurl": "http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-aurora-win32/1389256801/firefox-28.0a2.en-US.win32.tests.zip", "previous_buildid": "20140109004001", "job_number": 4, "build_number": null, "builddate": 1389256801, "buildername": "WINNT 5.2 mozilla-aurora nightly", "platform": "win32", "version": null, "revision": "2c8f8683bd0d08b8f549bc139176677daaa99fa7", "status": 0, "buildtype": "opt", "product": "firefox", "slave": "w64-ix-slave24", "tags": ["nightly"], "buildid": "20140109004001", "timestamp": "2014-01-09T16:45:17Z", "key": "build.mozilla-aurora-win32-nightly.4.log_uploaded", "locales": null, "logurl": "http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014/01/2014-01-09-00-40-01-mozilla-aurora/mozilla-aurora-win32-nightly-bm85-build1-build4.txt.gz", "repack": null, "tree": "mozilla-aurora", "buildurl": "http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-aurora-win32/1389256801/firefox-28.0a2.en-US.win32.zip", "release": null}
(In reply to Chris AtLee [:catlee] from comment #6)
> Yes, snippet uploading is decoupled from publishing the build. And previous
> buildid discovery is based on what builds are published.

So we have a glitch here which our automation cannot know about. Why can't we include the snippet upload results into the build results? Isn't it part of the whole build process everyone should know about? If we cannot upload the update snippets and don't report failures back, how can we make sure that people are getting updated to this build?
Status: REOPENED → NEW
Flags: needinfo?(catlee)
This happened again with aurora, I ran it locally and it failed, then I checked on ftp and it was the latest build, then on job status where the BUILD_ID vas the sema with TARGET_BUILD_ID.
Flags: needinfo?(nthomas)
I presume you're talking about Linux64, which had two builds today. The first one failed while compiling, so it didn't upload anything. The second looks entirely normal and found the right previous_buildID (20140129004017) according to the properties in buildbot. The partial was named 
  firefox-28.0a2.en-US.linux-x86_64.partial.20140129004017-20140130004003.mar
The full log is at
http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014-01-30-00-40-03-mozilla-aurora/mozilla-aurora-linux64-nightly-bm82-build1-build4.txt.gz
if you want to poke at it.

Without more information I can't tell anything more.
Flags: needinfo?(nthomas)
Oh hey, you're talking about win32, but on January 29. Please provide that information next time.

Three builds - https://tbpl.mozilla.org/?tree=Mozilla-Aurora&jobname=nightly&rev=7d1173c4b173

First one interrupted by a network glitch, no uploads.
Second time we uploaded the mar files, and pushed the updates to Balrog. The last build step before finishing is cleaning up the build dir, and this was interrupted by network again.
Third time finished normally, but it makes the silly partial with the idential buildIDs.

Buildbot automatically retries on network disconnections, so that's where the 2nd and 3rd builds came from. Maybe we could except nightlies from that, maybe ?
Or you could tell us how our tests should behave. We could easily implement that if both buildids are the same that we do not run update tests. But I'm a bit worried that it could mask real underlying problems.

I might have forgotten but how does this behave for end-users? Do they get an update served to the same build or is this simply a problem with how we create the pulse notifications? Personally I think that whenever something fails in the build-process even after uploading the mar files, we should be able to invalidate the whole pieces, when clicking on the rebuild button.
This failed again yesterday when we ran the update tests for aurora, buildID 20140319004002
No update was given under the AUS file.
Failed on 19/03/2014 around 19.00.
http://mozmill-daily.blargon7.com/#/update/reports?app=Firefox&branch=30.0&platform=All&from=2014-03-19&to=2014-03-19
This happened again today for the 20150521030204 build of Nightly on OS X. We got an extra pulse notification with the mentioned buildid also set as previous_buildid.
Flags: needinfo?(catlee)
We are going to stop generating partial mars and publishing complete mars as a part of bug 1173459.
Status: NEW → RESOLVED
Closed: 11 years ago9 years ago
Resolution: --- → WONTFIX
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.