Closed Bug 714806 Opened 9 years ago Closed 8 years ago

Pulse message for nightly builds do not contain previous_buildid

Categories

(Release Engineering :: General, defect, P3)

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: whimboo, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [pulse])

Attachments

(1 file, 1 obsolete file)

To be able to run our daily update tests against en-US builds we need the previous_buildid in the build properties of the pulse message. Affected are all build machines for en-US builds, e.g. 'build.mozilla-aurora-linux.0.finished'

Without that information it makes it really hard for us to figure out what the last build is. Given that we have this information for l10n build machines, I assume that it shouldn't be too hard to fix for en-US build machines too.
Well, it's strange. I have had notifications in the past week when no 'previous_buildid' was available. I can't give an example right now. I will watch over the next day if something comes up.
As of today all en-US build notifications for OS X contained the buildid and the previous_buildid. Sadly I missed to save off the notifications for all other platforms. It will now be done in the next day.
Priority: -- → P3
Whiteboard: [pulse]
As you can see by the results update tests are working perfectly. The previous id is available everywhere:

http://mozmill-ci.blargon7.com/#/update/reports?branch=14.0&platform=All&from=2012-03-27&to=2012-03-30

So lets get this bug closed as WFM.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
We got this again lately on May 18th for the WinNT 5.2 builder. For the given Nightly build of Firefox no previous_buildid was present.

I will attach the pulse message right now.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Attached file Example pulse message (obsolete) —
Just to add this is an important information which should always be present. At least if that happens in the future for beta nightly builds we could miss a regression or any other kind of failure for our update tests.
Severity: normal → major
{
    "payload": {
        "build": {
            "slave": "w64-ix-slave13",
            "builderName": "WINNT 5.2 mozilla-central nightly",
            "text": ["failed", "make_buildsymbols", "slave", "lost"],
            "number": 7,
            "currentStep": null,
            "results": 2,
...

ie the build failed generating symbols for Socorro, which is after the compile but prior to generating updates or any upload. I suggest you add a check for a successful build early in processing the message, eg using the existence of the properties packageUrl, partialMarUrl, and/or completeMarUrl.
Status: REOPENED → RESOLVED
Closed: 9 years ago9 years ago
Resolution: --- → WORKSFORME
So we hit this problem again. This time we have the check for a successful build in-place and only perform tests if the build was built successfully. But still, in some cases we miss the previous_build id. I will attach the appropriate pulse notification for the fr Nightly build on win32 yesterday.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Attachment #625941 - Attachment is obsolete: true
Alright, the problem seems to be in determining where to get the previous nightly's mar:

bash -c 'ssh -l ffxbld -i ~/.ssh/ffxbld_dsa stage.mozilla.org ls -1t /pub/mozilla.org/firefox/nightly/latest-mozilla-central-l10n | grep .fr.win32.complete.mar$ | head -n 1'

ls: reading directory /pub/mozilla.org/firefox/nightly/latest-mozilla-central-l10n: Too many levels of symbolic links

After that error, the rest of the steps for generating updates are skipped and we only get a complete update.
Summary: Pulse message for en-US nightly builds do not contain previous_buildid → Pulse message for nightly builds do not contain previous_buildid
Can we get an update on this issue? Is it something which can easily be fixed? Any ETA yet?
Status: REOPENED → NEW
I don't think it's fixable. There will always be cases where the previous nightly doesn't exist or can't be found. Our behaviour in these cases is to skip creating a partial update.
(In reply to Chris AtLee [:catlee] from comment #13)
> I don't think it's fixable. There will always be cases where the previous
> nightly doesn't exist or can't be found. Our behaviour in these cases is to
> skip creating a partial update.

So how can we handle that for rapid betas then? If it will become an issue on that branch, how should we handle those failures? IMO it will be clearly broken. I don't want to stop running update tests and not sure if simply ignoring those builds is the solution.
it's not broken, you just fall back to having a complete update.
No, the fact is that whenever we get a notification that a build has been finished we know nothing about the previous build to use. That means we cannot test that the update we push to our users works.

If we want to test updates of rapid betas without manual intervention we would have to fix that.
Nightly repacks fail for all sorts of reasons, and generally we don't force them to go through unless there's a specific need for them. Discovery of the previous repack will fail in these cases, or for other reasons as well. In all these situations, there will be no previous build id. Your automation needs to cope with this. These pulse events have very little past state in them - the previous build information in the properties is a side effect of how we propagate information from one buildbot step to the next. If you need more state than is recorded in the pulse message, it's your automation's responsibility to keep that state.

You're also assuming that rapid betas will use the same automation as nightlies. Our plans right now involve using the regular release automation for rapid betas. It's unknown how we would treat the failure of a single repack in rapid betas.
Status: NEW → RESOLVED
Closed: 9 years ago8 years ago
Resolution: --- → WONTFIX
In those cases we will not be able to perform update tests for those builds. But lets wait and see how it will work for rapid betas.
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.