862724 - Additional pulse message for same nightly builds sent (with previous_buildid == buildid)

Reporter

Description

•

12 years ago

As seen today we got multiple pulse messages for the same build. The second one is broken and contains the same previous_buildid as the buildid. In this case '20130416004017'. There is a delay of about 3.5h when those have been sent out: -rw-rw-r-- 1 mozauto mozauto 1015 Apr 16 06:38 log/mozilla-aurora/20130416004017_firefox_en-US_linux_build.mozilla-aurora-linux-nightly.18.log_uploaded.log -rw-rw-r-- 1 mozauto mozauto 1016 Apr 16 02:51 log/mozilla-aurora/20130416004017_firefox_en-US_linux_build.mozilla-aurora-linux-nightly.92.log_uploaded.log 18: {"locale": "en-US", "testsurl": "http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-aurora-linux/1366098017/firefox-22.0a2.en-US.linux-i686.tests.zip", "previous_buildid": "20130416004017", "job_number": 18, "build_number": null, "builddate": 1366098017, "buildername": "Linux mozilla-aurora nightly", "platform": "linux", "version": "22.0a2", "revision": "59a419eca6359683a2eb031f3de01946d162594c", "status": 0, "buildtype": "opt", "product": "firefox", "slave": "bld-linux64-ix-023", "tags": ["nightly"], "buildid": "20130416004017", "timestamp": "2013-04-16T13:38:08Z", "key": "build.mozilla-aurora-linux-nightly.18.log_uploaded", "logurl": "http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2013/04/2013-04-16-00-40-17-mozilla-aurora/mozilla-aurora-linux-nightly-bm12-build1-build18.txt.gz", "tree": "mozilla-aurora", "buildurl": "http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-aurora-linux/1366098017/firefox-22.0a2.en-US.linux-i686.tar.bz2", "release": null} 92: {"locale": "en-US", "testsurl": "http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-aurora-linux/1366098017/firefox-22.0a2.en-US.linux-i686.tests.zip", "previous_buildid": "20130415004014", "job_number": 92, "build_number": null, "builddate": 1366098017, "buildername": "Linux mozilla-aurora nightly", "platform": "linux", "version": "22.0a2", "revision": "59a419eca6359683a2eb031f3de01946d162594c", "status": 2, "buildtype": "opt", "product": "firefox", "slave": "bld-linux64-ec2-457", "tags": ["nightly"], "buildid": "20130416004017", "timestamp": "2013-04-16T09:51:33Z", "key": "build.mozilla-aurora-linux-nightly.92.log_uploaded", "logurl": "http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2013/04/2013-04-16-00-40-17-mozilla-aurora/mozilla-aurora-linux-nightly-bm49-build1-build92.txt.gz", "tree": "mozilla-aurora", "buildurl": "http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-aurora-linux/1366098017/firefox-22.0a2.en-US.linux-i686.tar.bz2", "release": null} The broken message here is the one with job number 18.

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Updated

•

12 years ago

Whiteboard: [qa-automation-wanted]

Chris AtLee [:catlee]

Comment 1

•

12 years ago

The nightly builds on aurora were rebuilt yesterday. build 92 is the first build that failed to publish snippets properly. build 18 succeeded after being manually re-triggered by sheriffs.

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → WORKSFORME

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 2

•

12 years ago

Chris, do you have a bug # which describes the problem in publishing the snippets? Also why does job 18 uses the same build id also for the previous build? Such a behavior looks broken given that we do not get useful information which we can base on for our automated testing.

Status: RESOLVED → REOPENED

Resolution: WORKSFORME → ---

Chris AtLee [:catlee]

Comment 3

•

12 years ago

This particular instance was caused by a network outage which prevented the first build (92) from uploading snippets. This is why it has a result of "2" which means "failed". Rebuilding this job was the correct thing to do, and the new build (18) would use the same buildid as before. Since the previous build was uploaded successfully, it became the new "previous build".

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 4

•

12 years ago

That means uploading the snippets is decoupled from the build process at the moment? And whenever it fails we set the results to '2' but update the previous_buildid anyway? Shouldn't we only update this id when the snippets have been successfully uploaded?

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 5

•

12 years ago

We got this again today. Would be nice to get an answer to my last question.

Flags: needinfo?(catlee)

Chris AtLee [:catlee]

Comment 6

•

12 years ago

Yes, snippet uploading is decoupled from publishing the build. And previous buildid discovery is based on what builds are published.

Flags: needinfo?(catlee)

Nobody; OK to take it and work on it

Assignee

Updated

•

12 years ago

Product: mozilla.org → Release Engineering

Andreea Matei [:AndreeaMatei]

Comment 7

•

12 years ago

This happened again yesterday with Aurora on Windows. Previoud_buildid and buildid is "20140109004001": {"locale": "en-US", "testsurl": "http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-aurora-win32/1389256801/firefox-28.0a2.en-US.win32.tests.zip", "previous_buildid": "20140109004001", "job_number": 4, "build_number": null, "builddate": 1389256801, "buildername": "WINNT 5.2 mozilla-aurora nightly", "platform": "win32", "version": null, "revision": "2c8f8683bd0d08b8f549bc139176677daaa99fa7", "status": 0, "buildtype": "opt", "product": "firefox", "slave": "w64-ix-slave24", "tags": ["nightly"], "buildid": "20140109004001", "timestamp": "2014-01-09T16:45:17Z", "key": "build.mozilla-aurora-win32-nightly.4.log_uploaded", "locales": null, "logurl": "http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014/01/2014-01-09-00-40-01-mozilla-aurora/mozilla-aurora-win32-nightly-bm85-build1-build4.txt.gz", "repack": null, "tree": "mozilla-aurora", "buildurl": "http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-aurora-win32/1389256801/firefox-28.0a2.en-US.win32.zip", "release": null}

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 8

•

12 years ago

(In reply to Chris AtLee [:catlee] from comment #6) > Yes, snippet uploading is decoupled from publishing the build. And previous > buildid discovery is based on what builds are published. So we have a glitch here which our automation cannot know about. Why can't we include the snippet upload results into the build results? Isn't it part of the whole build process everyone should know about? If we cannot upload the update snippets and don't report failures back, how can we make sure that people are getting updated to this build?

Status: REOPENED → NEW

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Updated

•

12 years ago

Flags: needinfo?(catlee)

Cosmin Malutan, [:cosmin-malutan]

Comment 10

•

12 years ago

This happened again with aurora, I ran it locally and it failed, then I checked on ftp and it was the latest build, then on job status where the BUILD_ID vas the sema with TARGET_BUILD_ID.

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Updated

•

12 years ago

Flags: needinfo?(nthomas)

Nick Thomas [:nthomas] (UTC+12)

Comment 11

•

12 years ago

I presume you're talking about Linux64, which had two builds today. The first one failed while compiling, so it didn't upload anything. The second looks entirely normal and found the right previous_buildID (20140129004017) according to the properties in buildbot. The partial was named firefox-28.0a2.en-US.linux-x86_64.partial.20140129004017-20140130004003.mar The full log is at http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014-01-30-00-40-03-mozilla-aurora/mozilla-aurora-linux64-nightly-bm82-build1-build4.txt.gz if you want to poke at it. Without more information I can't tell anything more.

Flags: needinfo?(nthomas)

Nick Thomas [:nthomas] (UTC+12)

Comment 12

•

12 years ago

Oh hey, you're talking about win32, but on January 29. Please provide that information next time. Three builds - https://tbpl.mozilla.org/?tree=Mozilla-Aurora&jobname=nightly&rev=7d1173c4b173 First one interrupted by a network glitch, no uploads. Second time we uploaded the mar files, and pushed the updates to Balrog. The last build step before finishing is cleaning up the build dir, and this was interrupted by network again. Third time finished normally, but it makes the silly partial with the idential buildIDs. Buildbot automatically retries on network disconnections, so that's where the 2nd and 3rd builds came from. Maybe we could except nightlies from that, maybe ?

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 13

•

12 years ago

Or you could tell us how our tests should behave. We could easily implement that if both buildids are the same that we do not run update tests. But I'm a bit worried that it could mask real underlying problems. I might have forgotten but how does this behave for end-users? Do they get an update served to the same build or is this simply a problem with how we create the pulse notifications? Personally I think that whenever something fails in the build-process even after uploading the mar files, we should be able to invalidate the whole pieces, when clicking on the rebuild button.

Cosmin Malutan, [:cosmin-malutan]

Comment 14

•

12 years ago

This failed again yesterday when we ran the update tests for aurora, buildID 20140319004002 No update was given under the AUS file. Failed on 19/03/2014 around 19.00. http://mozmill-daily.blargon7.com/#/update/reports?app=Firefox&branch=30.0&platform=All&from=2014-03-19&to=2014-03-19

Henrik Skupin [:whimboo][⌚️UTC+1]

Reporter

Comment 15

•

10 years ago

This happened again today for the 20150521030204 build of Nightly on OS X. We got an extra pulse notification with the mentioned buildid also set as previous_buildid.

Chris AtLee [:catlee]

Updated

•

10 years ago

Flags: needinfo?(catlee)

Rail Aliiev [:rail]

Comment 16

•

10 years ago

We are going to stop generating partial mars and publishing complete mars as a part of bug 1173459.

Status: NEW → RESOLVED

Closed: 12 years ago → 10 years ago

Resolution: --- → WONTFIX

Nobody; OK to take it and work on it

Assignee

Updated

•

7 years ago

Component: General Automation → General

Bugzilla

Additional pulse message for same nightly builds sent (with previous_buildid == buildid)

Categories

(Release Engineering :: General, defect)

Tracking

(Not tracked)

People

(Reporter: whimboo, Unassigned)

References

Details

(Whiteboard: [qa-automation-wanted])

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Updated

Comment 7

Comment 8

Updated

Comment 10

Updated

Comment 11

Comment 12

Comment 13

Comment 14

Comment 15

Updated

Comment 16

Updated