Closed
Bug 781129
Opened 12 years ago
Closed 11 years ago
Notifications for outdated builds are getting send via Pulse
Categories
(Release Engineering :: General, defect)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: whimboo, Unassigned)
References
()
Details
(Whiteboard: [mozmill-test-failure][qa-automation-blocked])
Attachments
(1 file)
26.79 KB,
text/plain
|
Details |
I have seen this already a couple of times but wasn't able to nail this down so far because I was too late. But today I have seen again a report from our Mozmill CI system which caught a failure in the update report: http://mozmill-ci.blargon7.com/#/update/report/3491a2617d5af3ec9bb5c88aee015de5 Given that failure we were trying to upgrade a build from 20120801030520 to 20120802030533. Thankfully I log the message we arrive via Pulse to the console. The following entry is visible: INFO:automation:2012-08-02T05:23:06+01:00 - Product: firefox, Branch: mozilla-central, Platform: macosx64, Locale: fr INFO:automation:Trigger tests for firefox 17.0a1 mac fr 20120802030533 20120801030520 I will attach the whole notification in a bit. There were also some more notifications we do not obey. Here some examples: INFO:automation:2012-08-02T05:23:29+01:00 - Product: firefox, Branch: mozilla-central, Platform: macosx64, Locale: it INFO:automation:2012-08-02T05:23:53+01:00 - Product: firefox, Branch: mozilla-central, Platform: macosx64, Locale: kk INFO:automation:2012-08-02T05:24:00+01:00 - Product: thunderbird, Branch: comm-central, Platform: win32, Locale: sr INFO:automation:2012-08-02T05:24:16+01:00 - Product: firefox, Branch: mozilla-central, Platform: macosx64, Locale: hr INFO:automation:2012-08-02T05:24:53+01:00 - Product: thunderbird, Branch: comm-aurora, Platform: linux64, Locale: pl INFO:automation:2012-08-02T05:25:42+01:00 - Product: firefox, Branch: mozilla-central, Platform: macosx64, Locale: kn It looks like that messages are stuck somewhere and getting send out at a random time.
Comment 1•12 years ago
|
||
I'm sorry, I don't really understand what you mean. In what way are the builds outdated?
Comment 2•12 years ago
|
||
possibly related to bug 781128?
Reporter | ||
Comment 3•12 years ago
|
||
Well, if there are getting sent build finished notifications through Pulse on Aug 8th for builds from 2012-08-02T05:23:06+01:00, I would rather call those builds and notifications outdated. I'm not sure if there is anything related to the bug you pointed out, given I don't know the details. CC'ing Ed for possible better input.
Comment 4•12 years ago
|
||
I'm sorry I don't know anything about Pulse, this is a releng issue, rather than a sheriffing issue.
Comment 5•12 years ago
|
||
The referenced bug was that new builds would use old buildids, which broke a whole bunch of stuff. Unless this is still happening, I'm going to blame bug 781128 for this.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 6•12 years ago
|
||
Thanks Chris. I will check that. I haven't seen such a situation in the last couple of weeks.
Reporter | ||
Comment 7•12 years ago
|
||
Not fixed. I have seen it again right now:
> INFO:automation: jsshellUrl: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-macosx64/1348999570/jsshell-mac.zip
> INFO:automation: project:
> INFO:automation: builddir: m-cen-osx64-ntly
> INFO:automation: filepath: None
> INFO:automation: packageFilename: firefox-18.0a1.en-US.mac.dmg
> INFO:automation: basedir: /builds/slave/m-cen-osx64-ntly
> INFO:automation:completesnippetFilename: build/obj-firefox/i386/dist/update/complete.update.snippet
> INFO:automation: appVersion: 18.0a1
> INFO:automation: comments:
> INFO:automation: purge_target: 12GB
> INFO:automation: platform: macosx64
> INFO:automation: master: http://buildbot-master30.srv.releng.scl3.mozilla.com:8001/
> INFO:automation: branch: mozilla-central
> INFO:automation: partialMarFilename: firefox-18.0a1.en-US.mac.partial.20120929191424-20120930030610.mar
> INFO:automation: stage_platform: macosx64
> INFO:automation: revision: a680fd777c3b92d81650dd51c8cb3e9e5faf6398
> INFO:automation: product: firefox
> INFO:automation: completeMarSize: 45477330
> INFO:automation: repository:
> INFO:automation: buildername: OS X 10.7 mozilla-central nightly
> INFO:automation: buildid: 20120930030610
> INFO:automation: completeMarUrl: http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2012/09/2012-09-30-03-06-10-mozilla-central/firefox-18.0a1.en-US.mac.complete.mar
> INFO:automation: packageHash: 295e54cf07c17901542b9c26c56a5af34ecd7bd6c98b71aa974ce16e9c9a2938bf28cc40e55467d8761d36ab271d7c6e0b7cd1e5be53497185475263de63b907
> INFO:automation: completeMarHash: ef06fba7bbfd2b0ca6171d520a616263d24e2e73d48806cb7e2a66c47977f9d7af6cc62c5a615d9e2799c2fe450af1fac9c43cd59bf392a863b651085a1e2157
> INFO:automation: hashType: sha512
> INFO:automation: previous_inipath: previous/Contents/MacOS/application.ini
> INFO:automation: scheduler: mozilla-central nightly
> INFO:automation: symbolsUrl: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-macosx64/1348999570/firefox-18.0a1.en-US.mac.crashreporter-symbols.zip
> INFO:automation: packageUrl: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-macosx64/1348999570/firefox-18.0a1.en-US.mac.dmg
> INFO:automation: partialMarUrl: http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2012/09/2012-09-30-03-06-10-mozilla-central/firefox-18.0a1.en-US.mac.partial.20120929191424-20120930030610.mar
> INFO:automation: purged_clobber: False
> INFO:automation: nightly_build: True
> INFO:automation: buildnumber: 33
> INFO:automation: testsUrl: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-macosx64/1348999570/firefox-18.0a1.en-US.mac.tests.zip
> INFO:automation: periodic_clobber: False
> INFO:automation: partialMarHash: 54044a09e7306d749c4dbaabf184d28463a555d5790cb945c33826f987951b5c5eae277b0f6f2e756f84c20a58221abd58a4663b3d7b7c6a6d15168dccad3925
> INFO:automation: partialMarSize: 1930419
> INFO:automation: builduid: 3a2d6e8e187b4abb822a7f6db9e3043c
> INFO:automation: slavebuilddir: m-cen-osx64-ntly
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 8•12 years ago
|
||
AIUI rabbit makes no guarantees about message delivery order or delay. Our systems have not received that pulse message within the past 7 days. Did you restart your pulse consumer around the same time? Perhaps this was an un-ack'ed message that got re-delivered?
Comment 9•12 years ago
|
||
http://www.rabbitmq.com/semantics.html describes message order guarantees
Reporter | ||
Comment 10•12 years ago
|
||
(In reply to Chris AtLee [:catlee] from comment #8) > AIUI rabbit makes no guarantees about message delivery order or delay. That would be pretty bad. So why do we rely on Pulse then? If that's the case I see a big gap here. :/ > Did you restart your pulse consumer around the same time? Perhaps this was > an un-ack'ed message that got re-delivered? I can't say that but I don't think so. We acknowledge messages right away. So if it would be the case we would see it more often. Also we do not make use of a resistant queue but get a new one for each reconnect.
Comment 11•12 years ago
|
||
The mozmill queues have collectively over 900 unack'd messages, that may be the root of this issue. qa-auto@mozilla.com|mozmill_daily|mm-ci-master - 143 unack'd messages qa-auto@mozilla.com|mozmill_daily|release3.qa.mtv1.mozilla.com - 226 qa-auto@mozilla.com|mozmill_l10n|release4-osx-106.qa.mtv1.mozilla.com - 179 qa-auto@mozilla.com|mozmill_release|mm-ci-master - 239 qa-auto@mozilla.com|mozmill_release|release3.qa.mtv1.mozilla.com - 204
Reporter | ||
Comment 12•12 years ago
|
||
I really can't see why that happens. It's the first action we are doing when receiving a new message: https://github.com/whimboo/mozmill-ci/blob/master/pulse.py#L206 https://github.com/whimboo/mozmill-ci/blob/master/pulse.py#L171 As jgriffin mentioned on IRC its a very slow increase. So might this be something on the Pulse side?
Comment 13•12 years ago
|
||
I doubt it is on the pulse side, because the only queues I see with unack'd messages are the mozmill queues. Could this be due to network problems between the machines hosting the mozmill automation and pulse? I.e., the network is interrupted between the time pulse delivers the message and it gets acknowledged, or there are problems delivering the ack?
Comment 14•12 years ago
|
||
whimboo, can you figure out why there are so many unack'ed messages in your queues
Assignee: nobody → hskupin
Reporter | ||
Comment 15•12 years ago
|
||
As discussed in our Automation Developer Meeting we want to have a look in using pulsebuildmonitor. I filed the issue directly against our CI and will hopefully have time next week to look at this. https://github.com/mozilla/mozmill-ci/issues/176
Comment 17•11 years ago
|
||
Happened again today on Mac OS X 10.7.5 (x86_64) in: /testDirectUpdate/test3.js http://mozmill-ondemand.blargon7.com/#/update/report/ad726b5c70cf80fbf8135edfca1a9522 and /testFallbackUpdate/test4.js: http://mozmill-ondemand.blargon7.com/#/update/report/ad726b5c70cf80fbf8135edfca1a4a12
Comment 18•11 years ago
|
||
And on Linux (normal runs not Ondemand) with mozilla-central 21.0a1: http://mozmill-ci.blargon7.com/#/update/report/ad726b5c70cf80fbf8135edfca2b0120 http://mozmill-ci.blargon7.com/#/update/report/ad726b5c70cf80fbf8135edfca2b2c58
Reporter | ||
Comment 19•11 years ago
|
||
If ondemand builds are failing that's most likely a misconfiguration by the QA person who triggered the builds. For the other jobs which are triggered by Pulse you do not have to report more issues. We know about them and I'm working on getting us moved to pulsebuildmonitor. This will happen in a couple of days. Once switched and we still discover this problem it would be helpful to comment here. Thanks.
Reporter | ||
Comment 20•11 years ago
|
||
We have switched to pulsebuildmonitor now. So hopefully this bug should be fixed. We will reopen if it happens again.
Status: REOPENED → RESOLVED
Closed: 12 years ago → 11 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 21•11 years ago
|
||
This is not fixed. Today we got a pulse message for the following build: Firefox 23.0a2 en-US on Linux Ubuntu 12.10 32bit (20130520004018 This build is two days old and there was an en-US build yesterday: http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2013/05/2013-05-21-00-40-18-mozilla-aurora/ I will retrieve and attach the pulse message in a bit.
Assignee: hskupin → nobody
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Reporter | ||
Updated•11 years ago
|
Whiteboard: [mozmill-test-failure] → [mozmill-test-failure][qa-automation-blocked]
Reporter | ||
Comment 22•11 years ago
|
||
Drop that. As what I was able to see is that the buildid contained in the pulse message is smaller than the previous_buildid. I will file a new bug.
Status: REOPENED → RESOLVED
Closed: 11 years ago → 11 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
Assignee | ||
Updated•6 years ago
|
Component: General Automation → General
You need to log in
before you can comment on or make changes to this bug.
Description
•