Closed Bug 962011 Opened 10 years ago Closed 10 years ago

l10n testruns don't run on mac

Categories

(Mozilla QA Graveyard :: Infrastructure, defect, P1)

All
macOS
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: cosmin-malutan, Assigned: cosmin-malutan)

Details

l10n testruns don't run on mac because the downloaded build gets the "$DATE" appended and this will determine the testrun script to treat the path to the build as an empty variable.

I checked the mm-osx-108-1 node, where for example under the mozilla-aurora_l10n/builds we have "$DATE-mozilla-aurora-firefox-28.0a2.hu.mac.dmg"

When I tryed to download the build manually with mozdownload and I replaced all the parameters except the $DATE, it worked fine, it didn't appended the "$DATE" to the name of the build.
Here is a log from jenkins:
> 97% |############################################  | ETA:  00:00:00  28.80 M/s
> 98% |############################################# | ETA:  00:00:00  28.85 M/s
> 99% |############################################# | ETA:  00:00:00  28.85 M/s
>100% |##############################################| Time: 00:00:02  28.99 M/s
>11:28:52 [mozilla-aurora_l10n] $ mozmill-env-$ENV_PLATFORM/run testrun_l10n --repository=mozmill-tests --junit=report.xml --workspace=data --report=$REPORT_URL builds/
>11:28:54 hdiutil: attach failed - No such file or directory
>11:28:54 *** Installing build: /Users/mozauto/jenkins/workspace/mozilla-aurora_l10n/builds/$DATE-mozilla-aurora-firefox-28.0a2.ru.mac.dmg
>11:28:54 Traceback (most recent call last):
>11:28:54   File "/Users/mozauto/jenkins/workspace/mozilla-aurora_l10n/mozmill-env-mac/bin/testrun_l10n", line 8, in <module>
>11:28:54     load_entry_point('mozmill-automation==2.0.3', 'console_scripts', 'testrun_l10n')()
>11:28:54   File "/Users/mozauto/jenkins/workspace/mozilla-aurora_l10n/mozmill-env-mac/python-lib/mozmill_automation/testrun.py", line 751, in l10n_cli
>11:28:54     exec_testrun(L10nTestRun)
>11:28:54   File "/Users/mozauto/jenkins/workspace/mozilla-aurora_l10n/mozmill-env-mac/python-lib/mozmill_automation/testrun.py", line 730, in exec_testrun
>11:28:54     cls().run()
>11:28:54   File "/Users/mozauto/jenkins/workspace/mozilla-aurora_l10n/mozmill-env-mac/python-lib/mozmill_automation/testrun.py", line 357, in run
>11:28:54     print "*** Uninstalling build: %s" % self._folder
>11:28:54 AttributeError: 'L10nTestRun' object has no attribute '_folder'
>11:28:54 Build step 'Invoke XShell command' marked build as failure
>11:28:54 Archiving artifacts
>11:28:54 Recording test results
>11:28:54 IRC notifier plugin: Sending notification to: #automation
This is the wrong excerpt of the log from Jenkins! The problem is not in the testrun script but in mozdownload. How does it work on staging? Have you checked that Cosmin?

I would love if we could output the real values and not the PARAMETERS. But that looks like a bug in the xshell plugin. Dave, what do you think?
Flags: needinfo?(dave.hunt)
Flags: needinfo?(cosmin.malutan)
You should be able to see this in the XShell log: https://github.com/jenkinsci/xshell-plugin/blob/master/src/main/java/hudson/plugins/xshell/XShellBuilder.java#L119 you can also see the values of the parameters in the Parameters page for the specified job.

We should also consider adding some logging to mozdownload to indicate the target filename.
Flags: needinfo?(dave.hunt)
We have the same issue on staging.
Flags: needinfo?(cosmin.malutan)
(In reply to Dave Hunt (:davehunt) from comment #3)
> You should be able to see this in the XShell log:
> https://github.com/jenkinsci/xshell-plugin/blob/master/src/main/java/hudson/
> plugins/xshell/XShellBuilder.java#L119 you can also see the values of the
> parameters in the Parameters page for the specified job.

I know that you can see the parameters there, but you don't know what xshell is doing and how it calls the process. So this is not helpful. Also activating the log for xshell didn't give me any clue given that parameters have not been replaced by the values even.

http://mm-ci-staging.qa.scl3.mozilla.com:8080/log/xshell/

(In reply to Cosmin Malutan from comment #4)
> We have the same issue on staging.

So please start the jenkins jlnp client from within a sourced mozmill-env on that box. With that you could add print statements to mozdownload py files and print the command line. That way we would know how mozdownload gets called.
As soon as the OptionParser instance in mozdownload/scraper.py returns the arguments the date option will be "$DATE". At the time we instantiate the TinderboxScraper class the date argument will be "$DATE" so it won't default to None.
https://github.com/mozilla/mozdownload/blob/master/mozdownload/scraper.py#L964

As a workaround we can remove the --date=$DATE argument from:
https://github.com/mozilla/mozmill-ci/blob/master/jenkins-master/jobs/mozilla-aurora_l10n/config.xml#L99
This will determine l10n testruns to run with the latest build, exactly as it should usually and as it ran on windows and ubuntu. I'm not sure why we give a date argument that we expect to be None so we will default to None.
(In reply to Cosmin Malutan from comment #6)
> As soon as the OptionParser instance in mozdownload/scraper.py returns the
> arguments the date option will be "$DATE". At the time we instantiate the
> TinderboxScraper class the date argument will be "$DATE" so it won't default
> to None.

So Jenkins does not replace it with the real value then. That means we call mozdownload like `--date=$DATE`. Cosmin, please give us the exact command which gets executed. You can output it directly in mozmill-env-mac/run.sh

> As a workaround we can remove the --date=$DATE argument from:
> https://github.com/mozilla/mozmill-ci/blob/master/jenkins-master/jobs/
> mozilla-aurora_l10n/config.xml#L99
> This will determine l10n testruns to run with the latest build, exactly as
> it should usually and as it ran on windows and ubuntu. I'm not sure why we
> give a date argument that we expect to be None so we will default to None.

Right, that was a solution I also proposed to Dave but he was not happy with that. As of now we will most likely never run tests for older tinderbox builds, which are also not existent! So we should be totally safe in removing this option.
But lets give Dave a chance to reply.
Flags: needinfo?(dave.hunt)
The primary reason support for tinderbox builds was added was so we could find regression ranges for endurance issues. These issues are time consuming to replicate and often environment specific, so running on Jenkins was desirable. It's admittedly been rarely used, but my preference would be to find out why this is only failing on one environment and come up with a fix/workaround. I don't feel strongly enough to block removing this feature.
Flags: needinfo?(dave.hunt)
Assignee: nobody → cosmin.malutan
>06:32:22 mozdownload --type=tinderbox --branch=mozilla-aurora --platform=mac --locale=ru --build-id=20140120004001 --date=$DATE --retry-attempts=10 --retry-delay=30 --directory=builds
>06:32:22 [mozilla-aurora_l10n] $ mozmill-env-$ENV_PLATFORM/run mozdownload --type=$BUILD_TYPE --branch=mozilla-aurora --platform=$PLATFORM --locale=$LOCALE --build-id=$BUILD_ID --date=$DATE --retry-attempts=10 --retry-delay=30 --directory=builds
>06:32:23   INFO | Downloading from: https://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-aurora-l10n/firefox-28.0a2.ru.mac.dmg
Here is the log from jenkins where I echo the command:
mozdownload --type=tinderbox --branch=mozilla-aurora --platform=mac --locale=ru --build-id=20140120004001 --date=$DATE --retry-attempts=10 --retry-delay=30 --directory=builds
When I executed this command in the environment it downloaded the latest build without a "$DATE" string attached as I said in firs comment.
So I guess is xshell plugin that works differently in jenkins.
Is this only affecting Mac? Is it all Mac versions? In comment 0 you mentioned that we have a file named $DATE-mozilla-aurora-firefox-28.0a2.hu.mac.dmg which implies mozdownload gets a literal "$DATE" value so something must be escaping the variable expansion.

It looks from the stack trace that mozmill-automation has an issue with installing a DMG with $DATE in the name. Perhaps ultimately this is a mozmill-automation issue (though I agree it would be better not to have files with dollar signs in them, and worth finding out why this happens).
(In reply to Dave Hunt (:davehunt) from comment #10)
> It looks from the stack trace that mozmill-automation has an issue with
> installing a DMG with $DATE in the name. Perhaps ultimately this is a
> mozmill-automation issue (though I agree it would be better not to have
> files with dollar signs in them, and worth finding out why this happens).

No. hdiutil fails when you are trying to install such a DMG. There is nothing we can do. It has to be fixed higher up the stack.
Well if it's a valid filename then it could potentially happen in other use cases. Shouldn't we sanitise the filename for hdiutil?

Flagging Cosmin for needinfo so my question about affected Mac versions in comment 10 is not lost.
Flags: needinfo?(cosmin.malutan)
Dave the l10n testruns ran only on osx 10.9 so I had to trigger testrans on all nodes and all are affected.
Flags: needinfo?(cosmin.malutan)
Thanks, good to know this is a general Mac issue. Should be easy enough to replicate and debug then. You have a local CI, right?
I had found the root problem here, the build_filename method in mozdownlod get's overwrited by each subclass(TinderboxScraper,DailyScraper,ReleaseScraper) and the method under the TinderboxScraper class dosen't treat , if we enhance the metod from TinderboxScraper:
https://github.com/mozilla/mozdownload/blob/master/mozdownload/scraper.py#L727
so it will check the type of self.timestamp it will work just fine and it will still work if we gave date
If this solution sounds good for you guys I will file an issue on github for this.

>        try:
>          timestamp = self.date.strftime('%Y-%m-%d')
>        except:
>          timestamp = time.strftime('%Y-%m-%d', time.gmtime())
Flags: needinfo?(hskupin)
Flags: needinfo?(dave.hunt)
Nice find Cosmin,

Its hard for me to follow where your proposed solution should go...
It might better to add a PR in github so everyone can follow the code.
Needinfo but I can't see a question..? I agree with Andrei that a pull request or diff will make your comment much clearer. I'm not sure I currently understand the issue or proposed fix, but it's great to hear this is something we can fix in mozdownload.
Flags: needinfo?(dave.hunt)
Do we know why this only happens for OS X and not for Linux and Windows?
Flags: needinfo?(hskupin)
(In reply to Dave Hunt (:davehunt) from comment #17)
> Needinfo but I can't see a question..?
I created issue https://github.com/mozilla/mozdownload/issues/196

(In reply to Henrik Skupin (:whimboo) from comment #18)
> Do we know why this only happens for OS X and not for Linux and Windows?
Not sure but probably if on Linux or Window an environment variable is empty it will send an "empty" string, on mac it sends the name of the variable, in this particular case at least.
(In reply to Cosmin Malutan from comment #19)
> (In reply to Henrik Skupin (:whimboo) from comment #18)
> > Do we know why this only happens for OS X and not for Linux and Windows?
> Not sure but probably if on Linux or Window an environment variable is empty
> it will send an "empty" string, on mac it sends the name of the variable, in
> this particular case at least.

As said earlier this also sounds like a bug in Jenkins. You might want to have a look at the JIRA database for existing reports.
Here is an issue on xShell plugin:
https://issues.jenkins-ci.org/browse/JENKINS-20478

We might wait for this to be fixed or go with the fix.

Thanks
(In reply to Cosmin Malutan from comment #19)
> > Do we know why this only happens for OS X and not for Linux and Windows?
> Not sure but probably if on Linux or Window an environment variable is empty
> it will send an "empty" string, on mac it sends the name of the variable, in
> this particular case at least.

That is something I have already requested in comment 7 (January 22nd). So finally please report back your results.

This problem has to be fixed ASAP because it causes huge issues for localizers. Nearly none of them have OS X available for testing. So lets get this fixed ASAP. I'm raising the priority to P1.
Priority: -- → P1
(In reply to Dave Hunt (:davehunt) [Unavailable until at least 10th February] from comment #8)
> The primary reason support for tinderbox builds was added was so we could
> find regression ranges for endurance issues. These issues are time consuming

But this does not apply to l10n tinderbox builds! Those are always overwritten once a new build is up. We do not store a history of builds as for en-US. So for the l10n testrun it doesn't make sense to fold in the DATE parameter at all. I totally agree for other jobs like endurance tests, but those are not l10n related and we will only execute for en-US builds.

So I do not see why we shouldn't remove the $DATE parameter or at least the --date option for the mozdownload call. For now I temporarily patched the production instance for the latter, so we can at least run tests on OS X.
Status: NEW → ASSIGNED
(In reply to Henrik Skupin (:whimboo) from comment #22)
> That is something I have already requested in comment 7 (January 22nd). So
> finally please report back your results.
> 
> This problem has to be fixed ASAP because it causes huge issues for
> localizers. Nearly none of them have OS X available for testing. So lets get
> this fixed ASAP. I'm raising the priority to P1.
As I said I couldn't find out why it fails only on mac, though I found a similar issue.

I worked on this most of the day and I still didn't found a reliable way to get the timestamp from a datetime object, this method is not implemented in python 2.7.
I tried :
> timestamp = datetime.fromtimestamp(float(self.date), self.timezone)
> totalseconds = (timestamp - datetime(1970, 1, 1)).total_seconds()
this fails probably because I can't get the datetime(1970, 1, 1) with a timezone, and without that it fails on travis-CI.

I suggest we try datetime.fromtimestamp, and if no error is thrown we assign the self.date to self.timestamp as before.
(In reply to Henrik Skupin (:whimboo) from comment #23)
> But this does not apply to l10n tinderbox builds! Those are always
> overwritten once a new build is up. We do not store a history of builds as
> for en-US. So for the l10n testrun it doesn't make sense to fold in the DATE
> parameter at all. I totally agree for other jobs like endurance tests, but
> those are not l10n related and we will only execute for en-US builds.
> 
> So I do not see why we shouldn't remove the $DATE parameter or at least the
> --date option for the mozdownload call. For now I temporarily patched the
> production instance for the latter, so we can at least run tests on OS X.

Sounds fine to me to remove the DATE parameter/argument from the l10n jobs until the underlying issue is resolved. Another option could be to provide a default value for this parameter with special meaning in mozdownload, such as 'latest'.
This bug has no longer any remaining action items and can really be closed. Mozmill-CI has been updated and mozdownload is about to get an update too via a separate github issue.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Product: Mozilla QA → Mozilla QA Graveyard
You need to log in before you can comment on or make changes to this bug.