Closed Bug 1118778 Opened 9 years ago Closed 9 years ago

Better error reporting when 'make upload' fails

Categories

(Release Engineering :: Applications: MozharnessCore, defect)

defect
Not set
normal

Tracking

(firefox43 fixed)

RESOLVED FIXED
Tracking Status
firefox43 --- fixed

People

(Reporter: mshal, Assigned: mshal)

Details

Attachments

(1 file, 1 obsolete file)

In bug 1116605, 'make upload' was failing, though the actual error lines are reported as INFO:

13:55:23 INFO - ssh: connect to host stage.mozilla.org port 22: Connection timed out
13:55:36 INFO - Command '['scp', '-o', 'IdentityFile=/home/mock_mozilla/.ssh/ffxbld_rsa', '/builds/slave/fx-team-lx-d-00000000000000000/build/src/obj-firefox/dist/firefox-37.0a1.en-US.linux-i686.mozinfo.json', 'ffxbld@stage.mozilla.org:/tmp/tmp.UKUGoK0ylz/']' returned non-zero exit status 1
13:55:36 INFO - gmake[2]: *** [upload] Error 2
13:55:36 INFO - gmake[2]: Leaving directory `/builds/slave/fx-team-lx-d-00000000000000000/build/src/obj-firefox/browser/installer'
13:55:36 INFO - gmake[1]: *** [upload] Error 2

Mozharness also reports an error that the build properties can't be found:

13:55:37 ERROR - could not determine packageUrl property to use against sendchange. Was it set after 'mach build'?

I think we should do something like:

1) Report those INFO lines as ERROR, since they are the actual failures (I think we need additional log regexes?)

2) If 'mach build' returns non-zero, we shouldn't look for or report errors that the build properties can't be found. The build properties are only generated if 'mach build' succeeds.
> 1) Report those INFO lines as ERROR, since they are the actual failures (I
> think we need additional log regexes?)

makes sense, though setting the log lines to error: self.error(msg) doesn't change the overall status/color of the build. it would only serve to highlight the error for treeherder. and looking at https://treeherder.mozilla.org/logviewer.html#?repo=b2g-inbound&job_id=1123122 it seems like treeherder already highlights those lines

> 
> 2) If 'mach build' returns non-zero, we shouldn't look for or report errors
> that the build properties can't be found. The build properties are only
> generated if 'mach build' succeeds.

hm, I thought our latest logic was that if 'mach build' returns a non-zero, we stop running the build with a fatal and turn it red? Maybe the logic is broken?

see http://mxr.mozilla.org/build/source/mozharness/mozharness/mozilla/building/buildbase.py?rev=f014178380c6#1395
(In reply to Jordan Lund (:jlund) from comment #1)
> > 1) Report those INFO lines as ERROR, since they are the actual failures (I
> > think we need additional log regexes?)
> 
> makes sense, though setting the log lines to error: self.error(msg) doesn't
> change the overall status/color of the build. it would only serve to
> highlight the error for treeherder. and looking at
> https://treeherder.mozilla.org/logviewer.html#?repo=b2g-
> inbound&job_id=1123122 it seems like treeherder already highlights those
> lines

Ahh, good that treeherder highlighted them. I was thinking that bug 1116605 really should be called "Intermittent make upload error" or "Intermittent ssh connection timed out" rather than the error message about packageUrl not being set. The packageUrl thing is a side-effect of the real error.

> > 2) If 'mach build' returns non-zero, we shouldn't look for or report errors
> > that the build properties can't be found. The build properties are only
> > generated if 'mach build' succeeds.
> 
> hm, I thought our latest logic was that if 'mach build' returns a non-zero,
> we stop running the build with a fatal and turn it red? Maybe the logic is
> broken?
> 
> see
> http://mxr.mozilla.org/build/source/mozharness/mozharness/mozilla/building/
> buildbase.py?rev=f014178380c6#1395

Hmm, I thought so too. I wonder why we see this:

13:55:37 ERROR - could not determine packageUrl property to use against sendchange. Was it set after 'mach build'?

And not this:

1400             self.fatal("'mach build' did not run successfully. Please check "
1401                        "log for errors.")
ah, looking at an example from 1116605 it seems like 'mach build' has been returning 0 even if make upload fails: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/b2g-inbound-linux64/1420760879/b2g-inbound-linux64-bm91-build1-build15.txt.gz

snippet:
17:01:18     INFO -  We know it took a while, but your build finally finished successfully!
17:01:18     INFO -  To take your build for a test drive, run: |mach run|
17:01:18     INFO -  For more information on what to do now, see https://developer.mozilla.org/docs/Developer_Guide/So_You_Just_Built_Firefox
17:01:18     INFO - Return code: 0

Maybe this is a good thing. We probably don't want to burn the build if 'make compile' and all other targets succeed but we hit an infra error trying to upload.

I think the solution is to either: 1) have a specific return code for 'mach build' if 'upload' fails. Otherwise, we 2) simply update the log message in the sendchange step to suggest that missing the packageUrl may mean that 'make upload' failed during 'mach build'.
(In reply to Jordan Lund (:jlund) from comment #3)
> ah, looking at an example from 1116605 it seems like 'mach build' has been
> returning 0 even if make upload fails:
> http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/b2g-inbound-
> linux64/1420760879/b2g-inbound-linux64-bm91-build1-build15.txt.gz
> 
> snippet:
> 17:01:18     INFO -  We know it took a while, but your build finally
> finished successfully!
> 17:01:18     INFO -  To take your build for a test drive, run: |mach run|
> 17:01:18     INFO -  For more information on what to do now, see
> https://developer.mozilla.org/docs/Developer_Guide/So_You_Just_Built_Firefox
> 17:01:18     INFO - Return code: 0

Ugh, this is because the upload command is actually something like 'make upload | tee upload-output.txt', and we're just getting the return value of tee instead of make. Do we require bash already? If so we could use PIPESTATUS. Otherwise maybe it'd be better to just move the getUrlProperties function from build/gen_mach_buildprops.py to build/upload.py, and then gen_mach_buildprops doesn't need upload-output.txt...
This moves the property generation from gen_mach_buildprops.py to upload.py, so that we don't need to save the upload output and parse it later. Then we can remove the '| tee' and actually get any error codes from 'make upload' to bubble up.
Assignee: nobody → mshal
Attachment #8646089 - Flags: review?(mh+mozilla)
Comment on attachment 8646089 [details] [diff] [review]
0001-Bug-1118778-Write-upload-properties-from-upload.py.patch

Review of attachment 8646089 [details] [diff] [review]:
-----------------------------------------------------------------

::: build/moz-automation.mk
@@ +97,5 @@
>  automation/l10n-check: automation/pretty-l10n-check
>  automation/update-packaging: automation/pretty-update-packaging
>  
>  automation/build: $(addprefix automation/,$(MOZ_AUTOMATION_TIERS))
> +	$(PYTHON) $(topsrcdir)/build/gen_mach_buildprops.py --complete-mar-file $(DIST)/$(COMPLETE_MAR) $(addprefix --partial-mar-file ,$(wildcard $(DIST)/$(PARTIAL_MAR))) --upload-properties $(AUTOMATION_UPLOAD_PROPERTIES) --upload-files $(abspath $(UPLOAD_FILES)) --package $(PACKAGE)

The only thing using the --package argument now in gen_mach_buildprops.py is the line that does:
   json_data['packageFilename'] = args.package

That should arguably move to upload.py as well, in which case you don't need the --package argument in this script at all.

::: build/upload.py
@@ +64,5 @@
>      Necessary because we call out to ssh/scp, which are MSYS binaries
>      and expect MSYS paths."""
> +    # If we're not on Windows, or if we already have an MSYS path (starting
> +    # with '/' instead of 'c:' or something), then just return.
> +    if sys.platform != 'win32' or path.startswith('/'):

How is this hunk related?

@@ +138,5 @@
> +        ('testPackagesUrl', lambda m: m.endswith('test_packages.json')),
> +        ('packageUrl', lambda m: m.endswith(package)),
> +    ]
> +    url_re = re.compile(r'''^(https?://.*?\.(?:tar\.bz2|dmg|zip|apk|rpm|mar|tar\.gz|json))$''')
> +    properties = {}

I know you're just copying this over, but note that the whole thing would probably be less dreadful if it was written as a large (verbose) regexp with named captures.

@@ +226,5 @@
>      parser.add_option("-b", "--base-path",
>                        action="store", dest="base_path",
>                        help="Preserve file paths relative to this path when uploading. If unset, all files will be uploaded directly to UPLOAD_PATH.")
> +    parser.add_option("--properties-file",
> +                      action="store", dest="properties_file",

isn't the dest automatic?
Attachment #8646089 - Flags: review?(mh+mozilla) → review+
(In reply to Mike Hommey [:glandium] from comment #6)
> ::: build/moz-automation.mk
> @@ +97,5 @@
> >  automation/l10n-check: automation/pretty-l10n-check
> >  automation/update-packaging: automation/pretty-update-packaging
> >  
> >  automation/build: $(addprefix automation/,$(MOZ_AUTOMATION_TIERS))
> > +	$(PYTHON) $(topsrcdir)/build/gen_mach_buildprops.py --complete-mar-file $(DIST)/$(COMPLETE_MAR) $(addprefix --partial-mar-file ,$(wildcard $(DIST)/$(PARTIAL_MAR))) --upload-properties $(AUTOMATION_UPLOAD_PROPERTIES) --upload-files $(abspath $(UPLOAD_FILES)) --package $(PACKAGE)
> 
> The only thing using the --package argument now in gen_mach_buildprops.py is
> the line that does:
>    json_data['packageFilename'] = args.package
> 
> That should arguably move to upload.py as well, in which case you don't need
> the --package argument in this script at all.

Good idea - I moved this over, and did the same with uploadFiles. It turns out the checksum files aren't in UPLOAD_FILES, but are passed in to upload.py separately as CHECKSUM_FILES, so having upload.py generate the uploadFiles property also gets us the checksum files in the list. Requesting re-review for this change.

> 
> ::: build/upload.py
> @@ +64,5 @@
> >      Necessary because we call out to ssh/scp, which are MSYS binaries
> >      and expect MSYS paths."""
> > +    # If we're not on Windows, or if we already have an MSYS path (starting
> > +    # with '/' instead of 'c:' or something), then just return.
> > +    if sys.platform != 'win32' or path.startswith('/'):
> 
> How is this hunk related?

I don't fully understand the msys/native fun, but running 'make upload' will result in UPLOAD_SSH_KEY being set to '/c/Users/cltbld/.ssh/ffxbld_rsa' in upload.py, while running 'make upload | tee file.txt' will result in UPLOAD_SSH_KEY being set to 'c:/Users/cltbld/.ssh/ffxbld_rsa'. The WindowsPathToMsysPath function works with the latter, but sets the ssh key to something like '/c/c/Users/...' in the former, so it breaks. I suspect the WindowsPathToMsysPath function may no longer be necessary if we're not piping output, but I made it work with both kinds of paths just in case.

> @@ +138,5 @@
> > +        ('testPackagesUrl', lambda m: m.endswith('test_packages.json')),
> > +        ('packageUrl', lambda m: m.endswith(package)),
> > +    ]
> > +    url_re = re.compile(r'''^(https?://.*?\.(?:tar\.bz2|dmg|zip|apk|rpm|mar|tar\.gz|json))$''')
> > +    properties = {}
> 
> I know you're just copying this over, but note that the whole thing would
> probably be less dreadful if it was written as a large (verbose) regexp with
> named captures.

I don't want to address that as part of this bug. Feel free to file a followup.

> 
> @@ +226,5 @@
> >      parser.add_option("-b", "--base-path",
> >                        action="store", dest="base_path",
> >                        help="Preserve file paths relative to this path when uploading. If unset, all files will be uploaded directly to UPLOAD_PATH.")
> > +    parser.add_option("--properties-file",
> > +                      action="store", dest="properties_file",
> 
> isn't the dest automatic?

Yes, I've removed those.
Attachment #8646089 - Attachment is obsolete: true
Attachment #8646703 - Flags: review?(mh+mozilla)
Comment on attachment 8646703 [details] [diff] [review]
0001-Bug-1118778-Write-upload-properties-from-upload.py.patch

Review of attachment 8646703 [details] [diff] [review]:
-----------------------------------------------------------------

::: build/upload.py
@@ +64,5 @@
>      Necessary because we call out to ssh/scp, which are MSYS binaries
>      and expect MSYS paths."""
> +    # If we're not on Windows, or if we already have an MSYS path (starting
> +    # with '/' instead of 'c:' or something), then just return.
> +    if sys.platform != 'win32' or path.startswith('/'):

I'm not going to block you on this particular thing, but this all seems fishy to me and I think it should be addressed. As a matter of fact, I suspect this function is not necessary, and that the cause of all problems is process/factory.py setting UPLOAD_SSH_KEY with a ~ in it and expecting msys to expand it at some point.
Attachment #8646703 - Flags: review?(mh+mozilla) → review+
I filed bug 1193897 for the msys path issue.
Could this be the reason why l10n nightlies are busted? See bug 1194950.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: