mozharness mach builds are failing to the correct upload dir

RESOLVED FIXED

Status

Release Engineering
Applications: MozharnessCore
RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: jlund, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/3253] )

Attachments

(2 attachments, 1 obsolete attachment)

(Reporter)

Comment 1

4 years ago
Created attachment 8510618 [details] [diff] [review]
141023_mozharness_mach_buildid_fix.patch

what's happening is mozharness is grepping application.ini if it exists for buildid. but this will result in grabbing the previous buildid if the build was not clobbered.

I've changed the logic so we rely on buildbot props if we are building in automation. 

I've also added a check to ensure that the buildid set from buildbot_config matches the buildid that is generated in application.ini from the current build. If it doesn't we throw an error and trigger the build orange so it gets eyes on it but doesn't burn the job for devs.

I've triggered a nightly from each platform on ash. will post results and request a reviewer once those turn up green.
(Reporter)

Comment 2

4 years ago
Comment on attachment 8510618 [details] [diff] [review]
141023_mozharness_mach_buildid_fix.patch

patch looks good:

snippet:
15:17:17     INFO - Verifying buildid from application.ini matches buildid from buildbot
15:17:17     INFO - Getting output from command: ['python', '/builds/slave/ash-lx-ntly-000000000000000000/build/src/config/printconfigsetting.py', '/builds/slave/ash-lx-ntly-000000000000000000/build/src/obj-firefox/dist/bin/application.ini', 'App', 'BuildID'] in /builds/slave/ash-lx-ntly-000000000000000000
15:17:17     INFO - Copy/paste: python /builds/slave/ash-lx-ntly-000000000000000000/build/src/config/printconfigsetting.py /builds/slave/ash-lx-ntly-000000000000000000/build/src/obj-firefox/dist/bin/application.ini App BuildID
15:17:17     INFO - Reading from file tmpfile_stdout
15:17:17     INFO - Output received:
15:17:17     INFO -  20141023040200
15:17:17     INFO - buildid from application.ini: "20141023040200". buildid from buildbot properties: "20141023040200"
15:17:17     INFO - buildids match.

linux log: https://tbpl.mozilla.org/php/getParsedLog.php?id=50925293&full=1&branch=ash
osx log: https://tbpl.mozilla.org/php/getParsedLog.php?id=50929719&full=1&branch=ash
win log: https://tbpl.mozilla.org/php/getParsedLog.php?id=50927009&full=1&branch=ash


Note, the comments where it says:
# finally, let's resort to making a buildid this will happen whe ... etc

are now removed too. they are part of the old way.
Attachment #8510618 - Flags: review?(mgervasini)
(Reporter)

Comment 3

4 years ago
Comment on attachment 8510618 [details] [diff] [review]
141023_mozharness_mach_buildid_fix.patch

rail, mgerva is PTO this afternoon and, since this is live, I'd like to try to land this. Can you r? it or bounce it to someone.

We don't have an abundance of mozharn folk in releng these days but you are not new to reviewing patches for it :)

the logic should mimic MozillaBuildFactory in that, if we don't have a buildid/builduid[1] by the time the builder starts (it's not in buildbot properties), create one.

I've added a check in the end to ensure that this bug doesn't happen again by confirming buildbot's buildid matches the buildid from application.ini

[1] http://mxr.mozilla.org/build/source/buildbotcustom/process/factory.py#849
Attachment #8510618 - Flags: review?(mgervasini) → review?(rail)
Comment on attachment 8510618 [details] [diff] [review]
141023_mozharness_mach_buildid_fix.patch

Review of attachment 8510618 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozharness/mozilla/building/buildbase.py
@@ +651,2 @@
>              self.info("Creating builduid through uuid hex")
> +            builduid = uuid.uuid4().hex

Even though this is a single line, it sounds like a good candidate for a separate method. Something like gernerate_build_UID().

@@ +676,5 @@
>              #  buildbot has not made one, and we are running this script for
>              # the first time in a clean clobbered state
>              self.info("Creating buildid through current time")
> +            buildid = time.strftime("%Y%m%d%H%M%S",
> +                                    time.localtime(time.time()))

The same here.

@@ +1211,5 @@
> +        if self.config.get('is_automation'):
> +            self.info("Verifying buildid from application.ini matches buildid "
> +                      "from buildbot")
> +            app_ini_buildid = self._query_build_prop_from_app_ini('BuildID')
> +            buildbot_buildid = self.buildbot_properties.get('buildid', 'None')

We talked about this in IRC, copying here for the record.
 
In overall the patch looks good to me. The line above looks a bit weird. I'd avoid using 'None' as a string. It bet it'll hit us at some point. Can you use built-in None instead?

@@ +1216,5 @@
> +            self.info(
> +                'buildid from application.ini: "%s". buildid from buildbot '
> +                'properties: "%s"' % (app_ini_buildid, buildbot_buildid)
> +            )
> +            if app_ini_buildid == buildbot_buildid:

Probably need to handle a case where app_ini_buildid == buildbot_buildid == None.

@@ +1222,5 @@
> +            else:
> +                self.error('buildids do not match.')
> +                # set the build to orange if not worse
> +                self.return_code = self.worst_level(
> +                    1,  self.return_code, AUTOMATION_EXIT_CODES[::-1]

Can you use the const value instead of 1? EXIT_STATUS_DICT[TBPL_WARNING] I believe?
Attachment #8510618 - Flags: review?(rail) → review-
(Reporter)

Comment 5

4 years ago
Created attachment 8512131 [details] [diff] [review]
141027_bug_1087101_buildid_upload_dir_fix-mozharness.patch

addresses concerns from r-

tested on push: https://tbpl.mozilla.org/?tree=Ash&rev=2bee4e7591f9
Attachment #8512131 - Flags: review?(rail)
Comment on attachment 8512131 [details] [diff] [review]
141027_bug_1087101_buildid_upload_dir_fix-mozharness.patch

Review of attachment 8512131 [details] [diff] [review]:
-----------------------------------------------------------------

\o/
Attachment #8512131 - Flags: review?(rail) → review+
(Reporter)

Comment 7

4 years ago
Comment on attachment 8512131 [details] [diff] [review]
141027_bug_1087101_buildid_upload_dir_fix-mozharness.patch

https://hg.mozilla.org/build/mozharness/rev/9a8c1b667aa2
Attachment #8512131 - Flags: checked-in+
(Reporter)

Updated

4 years ago
Attachment #8510618 - Attachment is obsolete: true
Looks like it's still uploading nonunified build logs to the unified directory rather than to -nonunified, though.
Created attachment 8512684 [details] [diff] [review]
fix_imports.diff

Before:
$ pyflakes mozharness/mozilla/building/buildbase.py
mozharness/mozilla/building/buildbase.py:32: 'TBPL_FAILURE' imported but unused
mozharness/mozilla/building/buildbase.py:97: undefined name 'TBPL_SUCCESS'
mozharness/mozilla/building/buildbase.py:123: undefined name 'TBPL_WORST_LEVEL_TUPLE'

After:
$ pyflakes mozharness/mozilla/building/buildbase.py
$ echo $?
0
Attachment #8512684 - Flags: review?

Updated

4 years ago
Attachment #8512684 - Flags: review? → review+

Updated

4 years ago
Blocks: 1090186
(Reporter)

Updated

4 years ago
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
(Reporter)

Comment 12

4 years ago
(In reply to Phil Ringnalda (:philor) from comment #8)
> Looks like it's still uploading nonunified build logs to the unified
> directory rather than to -nonunified, though.

hmm, I thought this part is fixed via: https://bugzilla.mozilla.org/show_bug.cgi?id=1055918#c29

see log is now isolated but appears under linux64 opt: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/ash-linux64/1414891806/

I forgot to add:

diff --git a/scripts/fx_desktop_build.py b/scripts/fx_desktop_build.py
index e40921c..205d9ef 100755
--- a/scripts/fx_desktop_build.py
+++ b/scripts/fx_desktop_build.py
@@ -96,7 +96,9 @@ class FxDesktopBuild(BuildScript, object):
                 if c.get('pgo_build'):
                     platform_for_log_url += '-pgo'
                 # postrun.py uses stage_platform buildbot prop as part of the log url
-                self.set_buildbot_property('stage_platform', platform_for_log_url)
+                self.set_buildbot_property('stage_platform',
+                                           platform_for_log_url,
+                                           write_to_file=True)
             else:
                 self.fatal("'stage_platform' not determined and is required in your config")
Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Updated

4 years ago
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/3243]

Updated

4 years ago
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/3243] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/3248]

Updated

4 years ago
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/3248] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/3253]
(Reporter)

Comment 13

4 years ago
fixed with: https://bugzilla.mozilla.org/show_bug.cgi?id=1055918#c35

http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/ash-linux64-nonunified/1415586616/
Status: REOPENED → RESOLVED
Last Resolved: 4 years ago4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.