Last Comment Bug 1230150 - In case of early aborts (e.g. failed download) log files are not always copied to upload dir
: In case of early aborts (e.g. failed download) log files are not always copie...
Status: RESOLVED FIXED
:
Product: Release Engineering
Classification: Other
Component: Mozharness (show other bugs)
: unspecified
: Unspecified Unspecified
-- normal (vote)
: ---
Assigned To: Henrik Skupin (:whimboo) [away 02/18 - 02/27]
: Jordan Lund (:jlund)
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2015-12-03 06:46 PST by Henrik Skupin (:whimboo) [away 02/18 - 02/27]
Modified: 2016-04-04 03:13 PDT (History)
7 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
wontfix
fixed
fixed


Attachments
log output (6.18 KB, text/plain)
2015-12-03 06:46 PST, Henrik Skupin (:whimboo) [away 02/18 - 02/27]
no flags Details
Patch v1 (firefox-ui-tests only) (2.78 KB, patch)
2015-12-03 09:07 PST, Henrik Skupin (:whimboo) [away 02/18 - 02/27]
no flags Details | Diff | Splinter Review
Always copy logs (6.52 KB, patch)
2016-01-13 05:59 PST, Henrik Skupin (:whimboo) [away 02/18 - 02/27]
no flags Details | Diff | Splinter Review
Always copy logs v1.1 (4.66 KB, patch)
2016-01-13 11:39 PST, Henrik Skupin (:whimboo) [away 02/18 - 02/27]
jlund: review+
aki: feedback+
Details | Diff | Splinter Review
backport patch for mozilla-esr45 (7.29 KB, patch)
2016-03-30 13:30 PDT, Henrik Skupin (:whimboo) [away 02/18 - 02/27]
hskupin: review+
Details | Diff | Splinter Review

Description User image Henrik Skupin (:whimboo) [away 02/18 - 02/27] 2015-12-03 06:46:55 PST
Created attachment 8695293 [details]
log output

Yesterday I have seen that mozharness does not always copy the log files to the upload dir. This happens e.g. when a download fails. See the attached log file. The problem here is that CI systems which rely on log files to be present in the uploaded dir will not upload the log file, so Treeherder stays blank. :(

In our case for firefox-ui-tests I implemented a post-run listener for the report files and this is correctly executed even in the above case. You can also see that in the log output.

So I would assume the registering of the listener for copying the log files is not done correctly and that's why the method is not called.
Comment 1 User image Henrik Skupin (:whimboo) [away 02/18 - 02/27] 2015-12-03 08:37:36 PST
The `copy_logs_to_upload_dir()` method is part of the BaseScript which means that every existent mozharness entry level script makes use of it. What I do not see is that we register it as a postrun method! 

A change like the following would do it:

> +    @PostScriptRun
>      def copy_logs_to_upload_dir(self):

But it looks like there are some other scripts which directly call this method:

* https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/scripts/b2g_build.py?from=b2g_build.py#788
* https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/scripts/vcs-sync/initial_beagle.py#441
* https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/scripts/vcs-sync/vcs_sync.py#595
* https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/scripts/vcs-sync/vcs_sync.py#1080

So I'm not sure if we really want to get this added to BaseScript. I could easily overwrite the method for firefox-ui-tests and mark it with the PostScriptRun decorator which would make it also work.
Comment 2 User image Henrik Skupin (:whimboo) [away 02/18 - 02/27] 2015-12-03 09:07:17 PST
Created attachment 8695342 [details] [diff] [review]
Patch v1 (firefox-ui-tests only)

This would be the fix to fix it in our Firefox ui tests. Please be aware that I also set the dev mode so we do not run the download method of the proxxy.py file.
Comment 3 User image Jordan Lund (:jlund) 2015-12-03 15:40:20 PST
so I'm not sure why aki/gps[1] didn't put this[2] into the finally block. That would force it to run every time regardless of what happens within the try block that executes. It's not clear to me why we explicitly call it within other script's subclasses. maybe as a work around? maybe aki wanted it optional..

[1] https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/base/script.py#1737 
[2] https://hg.mozilla.org/build/mozharness/rev/87a8c74982cf
Comment 4 User image Jordan Lund (:jlund) 2015-12-03 15:40:52 PST
(In reply to Jordan Lund (:jlund) from comment #3)
> so I'm not sure why aki/gps[1] didn't put this[2] into the finally block.

typo: [1] and  [2] are swapped
Comment 5 User image Jordan Lund (:jlund) 2015-12-03 15:47:44 PST
Comment on attachment 8695342 [details] [diff] [review]
Patch v1 (firefox-ui-tests only)

Review of attachment 8695342 [details] [diff] [review]:
-----------------------------------------------------------------

I think you can do this from either post_fatal() or via like you are with a PostScriptRun decorator. I think post_fatal might be better so that we only call copy_logs_to_upload_dir() once in the case where we don't fail to download / call self.fatal()..
Comment 6 User image Henrik Skupin (:whimboo) [away 02/18 - 02/27] 2015-12-04 06:26:13 PST
(In reply to Jordan Lund (:jlund) from comment #3)
> so I'm not sure why aki/gps[1] didn't put this[2] into the finally block.
> That would force it to run every time regardless of what happens within the
> try block that executes. It's not clear to me why we explicitly call it
> within other script's subclasses. maybe as a work around? maybe aki wanted
> it optional..

Lets get an answer from aki and gps then. :)
Comment 7 User image Henrik Skupin (:whimboo) [away 02/18 - 02/27] 2015-12-04 06:34:54 PST
(In reply to Jordan Lund (:jlund) from comment #5)
> I think you can do this from either post_fatal() or via like you are with a
> PostScriptRun decorator. I think post_fatal might be better so that we only
> call copy_logs_to_upload_dir() once in the case where we don't fail to
> download / call self.fatal()..

Good call. But I will wait for the reply from gps / aki first to see what we really have to do here.
Comment 8 User image Aki Sasaki [:aki] 2015-12-04 12:19:19 PST
(In reply to Jordan Lund (:jlund) from comment #3)
> so I'm not sure why aki/gps[1] didn't put this[2] into the finally block.
> That would force it to run every time regardless of what happens within the
> try block that executes. It's not clear to me why we explicitly call it
> within other script's subclasses. maybe as a work around? maybe aki wanted
> it optional..
> 
> [1]
> https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/
> base/script.py#1737 
> [2] https://hg.mozilla.org/build/mozharness/rev/87a8c74982cf

It's been a while since I've looked at this, but https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/base/script.py#1737 is inside the outer |finally|, not the inner |finally|.  Since the inner try/except doesn't throw, we should hit the final lines of the outer |finally|.

(In reply to Henrik Skupin (:whimboo) from comment #1)
> The `copy_logs_to_upload_dir()` method is part of the BaseScript which means
> that every existent mozharness entry level script makes use of it. What I do
> not see is that we register it as a postrun method! 
> 
> A change like the following would do it:
> 
> > +    @PostScriptRun
> >      def copy_logs_to_upload_dir(self):

This is probably the desired behavior.

> But it looks like there are some other scripts which directly call this
> method:

Some history that may or may not be completely accurate:

* upload_dir was in my head since maybe 2005-6.  At a previous company, our uploads happened all over the place in the scripts, and we kept on running into issues where the directory wasn't created yet, or we wanted to change a file that was uploaded and we had to scour the entire codebase to find where/why/when it happened.  Having a single upload dir on the client, that gets uploaded to the server at the end, was my solution to make it sane.
* I coded the client-side portion of upload_dir into mozharness but never really used it directly in most? scripts, since Firefox builds have their own |make upload| syntax/process.  So upload_dir was just a non-functioning stub for most scripts, with the intention of making it useful at some point in the future.
** I used upload_dir in a few scripts, like the vcs-sync ones, and so copying the logs on fatal was something I noticed and fixed there.
* When TaskCluster became a thing, we intended to make use of upload_dir as a real thing.  I never saw this become reality at Mozilla, but if things went according to plan, this may now be true.
* Now that it's in use in more scripts, bugs like this are becoming more evident, and we can fix them properly.

I think the proper fix is the above PostScriptRun copy_logs_to_upload_dir, and removing the script-specific ones.  I'm lacking current context, so there may be additional side effects I'm not thinking about, but I think it'll be good overall.
Comment 9 User image Jordan Lund (:jlund) 2015-12-04 13:58:54 PST
> I think the proper fix is the above PostScriptRun copy_logs_to_upload_dir,
> and removing the script-specific ones.  I'm lacking current context, so
> there may be additional side effects I'm not thinking about, but I think
> it'll be good overall.

thanks aki. +1 to this approach.
Comment 10 User image Henrik Skupin (:whimboo) [away 02/18 - 02/27] 2016-01-13 05:59:54 PST
Created attachment 8707426 [details] [diff] [review]
Always copy logs

Ok, so I updated my patch so we cover all possibly involved scripts. I was able to test this with firefox-ui-tests and it works great. Some questions remain and I hope you can answer them:

1. When I override copy_logs_to_upload_dir in my own class, I absolutely have to add @PostScriptRun again? If I don't do so, the method won't be noticed anymore and is not getting run during the post steps.

2. I'm unclear about this code: https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/scripts/vcs-sync/vcs_sync.py#1079 - Is this still necessary? Where is `preflight_upload()` used? I cannot find any reference in mozharness. If we have to keep it I feel the if condition is wrong and the not has to be removed.

3. How can I test if the vcs changes are fine? For b2g_build.py I would see results via try but not sure about the vcs ones. Hal, could you help here?

Once the above questions have been cleared I will trigger a try build with those changes.
Comment 11 User image Hal Wine [:hwine] (use NI) 2016-01-13 10:11:54 PST
(In reply to Henrik Skupin (:whimboo) from comment #10)
 
> 3. How can I test if the vcs changes are fine? For b2g_build.py I would see
> results via try but not sure about the vcs ones. Hal, could you help here?

There are several different chunks of 'vcs' code in mozharness.

You need not worry about any of the following code:
  mozharness/scripts/vcs-sync/*
  mozharness/scripts/b2g_bumper.py
  mozharness/configs/b2g_bumper/*
  mozharness/configs/vcs_sync/*
Those will be removed with bug 1225364.

As for testing, the 'tox' tests still work, as described in the README.txt file. (I'm not a mozharness developer, just a user.)
Comment 12 User image Henrik Skupin (:whimboo) [away 02/18 - 02/27] 2016-01-13 11:34:26 PST
Great info. Thank you Hal. I think in this case I will simply revert the changes in those files. I will upload a new version of the patch in a minute.
Comment 13 User image Henrik Skupin (:whimboo) [away 02/18 - 02/27] 2016-01-13 11:39:16 PST
Created attachment 8707570 [details] [diff] [review]
Always copy logs v1.1
Comment 14 User image Henrik Skupin (:whimboo) [away 02/18 - 02/27] 2016-01-13 11:44:07 PST
https://treeherder.mozilla.org/#/jobs?repo=try&revision=2c3f0312f1eb
Comment 16 User image Aki Sasaki [:aki] 2016-01-14 14:31:42 PST
Comment on attachment 8707570 [details] [diff] [review]
Always copy logs v1.1

It's been a while, but that passes the eyeball test.
Comment 17 User image Carsten Book [:Tomcat] 2016-01-15 02:55:36 PST
https://hg.mozilla.org/mozilla-central/rev/999cdf95280c
Comment 18 User image Jordan Lund (:jlund) 2016-01-15 13:22:17 PST
in reviewing another patch, I noticed that 1 test is failing now. I suspect by the nature of the test that this bug is related:

$cd testing/mozharness
$source some/venv && pip install tox
$tox

1) FAIL: test_decorators_registered (test_base_script.TestScriptDecorators)

   Traceback (most recent call last):
    test/test_base_script.py line 707 in test_decorators_registered
      self.assertEqual(len(self.s._listeners['post_run']), 2)
   AssertionError: 3 != 2
   -------------------- >> begin captured logging << --------------------
   root: INFO: MultiFileLogger online at 20160115 12:56:23 in /Users/jlund/devel/mozilla/dirtyRepos/mozilla-inbound/testing/mozharness
   root: INFO: Run as /Users/jlund/devel/mozilla/dirtyRepos/mozilla-inbound/testing/mozharness/.tox/py27-hg2.6/bin/nosetests -v --with-xunit --rednose --force-color
   --------------------- >> end captured logging << ---------------------
178 tests run in 50.1 seconds.
1 FAILED, 2 skipped (175 tests passed)
ERROR: InvocationError: '/Users/jlund/devel/mozilla/dirtyRepos/mozilla-inbound/testing/mozharness/.tox/py27-hg2.6/bin/coverage run --source configs,mozharness,scripts --branch /Users/jlund/devel/mozilla/dirtyRepos/mozilla-inbound/testing/mozharness/.tox/py27-hg2.6/bin/nosetests -v --with-xunit --rednose --force-color'
____________________________________________________________________________________ summary _____________________________________________________________________________________
ERROR:   py27-hg2.6: commands failed
Comment 19 User image Henrik Skupin (:whimboo) [away 02/18 - 02/27] 2016-01-19 02:03:52 PST
While also working on the unpack feature on bug 1237706 I noticed the following:

11:01:24     INFO - Using ZipFile to extract /mozilla/code/gecko/testing/mozharness/a/info_busted.zip to /mozilla/code/gecko/testing/mozharness/b
11:01:24    FATAL - File name in directory "info.txt" and header "info.sxt" differ.: /mozilla/code/gecko/testing/mozharness/a/info_busted.zip
11:01:24    FATAL - Running post_fatal callback...
11:01:24    FATAL - Exiting 2

When we call unpack for a broken archive within the constructor of the script, we still don't get the log files copied to the upload folder.
Comment 20 User image Henrik Skupin (:whimboo) [away 02/18 - 02/27] 2016-01-21 12:49:28 PST
(In reply to Henrik Skupin (:whimboo) from comment #19)
> When we call unpack for a broken archive within the constructor of the
> script, we still don't get the log files copied to the upload folder.

I think that this is something we might not be able to change. That's why all probably breaking code should be part of an actual build step.

I will have a look at the test fixes, and would like to also have an uplift for aurora, so we can have this important fix for the 45ESR branch. Jordan, do you think that this works? So far we haven't seen any breakage
Comment 21 User image Henrik Skupin (:whimboo) [away 02/18 - 02/27] 2016-03-30 07:30:53 PDT
(In reply to Jordan Lund (:jlund) from comment #18)
> in reviewing another patch, I noticed that 1 test is failing now. I suspect
> by the nature of the test that this bug is related:

Sorry for the delay here. I wanted to fix it now but noticed that this test failure got fixed along with bug 1244781. So nothing left to do here.
Comment 22 User image Henrik Skupin (:whimboo) [away 02/18 - 02/27] 2016-03-30 13:30:57 PDT
Created attachment 8736457 [details] [diff] [review]
backport patch for mozilla-esr45

Patch for mozilla-esr45 with the remaining test fix included. Mozharness unit tests are passing, but to ensure it doesn't bust anything I started a try build:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=41d2fc7305f3

As callek mentioned on IRC it could be that it wont pass in case the build configs drifted apart. So lets see.
Comment 23 User image Henrik Skupin (:whimboo) [away 02/18 - 02/27] 2016-03-31 01:43:01 PDT
The try build has lots of test failures and some aborts including crashes. But none of them looks to be related to my above change.

So can some please land attachment 8736457 [details] [diff] [review] as a test-only change on mozilla-esr45? Thanks.
Comment 24 User image Carsten Book [:Tomcat] 2016-04-04 03:13:47 PDT
https://hg.mozilla.org/releases/mozilla-esr45/rev/7327a6b02d2e

Note You need to log in before you can comment on or make changes to this bug.