Noticed that we have many jobs failing on this particular pool due to different reasons: - command timeouts - failures in connecting to an upload location: e.g: "Connecting to upload.tbirdbld.productdelivery.prod.mozaws.net|22.214.171.124|:80... failed: Operation timed out.Retrying." - failures in running some commands Is there something we can do to fix some of these failures?
Following some investigations, here are the most common types of jobs that are failing and the errors that are seen in the logs: 1. many "Thunderbird <comm-central, comm-aurora> macosx64|l10n nightly" and some "Thunderbird <comm-central, comm-aurora> macosx64|l10n dep" are failing due to being unable to access the location below: "['bash', '-c', 'if [ ! -f mar ]; then wget -O mar --no-check-certificate http://upload.tbirdbld.productdelivery.prod.mozaws.net/pub/mozilla.org/thunderbird/nightly/latest-comm-aurora/mar-tools/macosx64/mar; fi; (test -e mar && test -s mar) || exit 1; chmod 755 mar']" ... "Connecting to upload.tbirdbld.productdelivery.prod.mozaws.net|126.96.36.199|:80... failed: Operation timed out.Giving up." 2. many "Thunderbird <comm-central, comm-aurora> macosx64|l10n dep" builds and "Firefox <comm-central, comm-aurora> macosx64|l10n dep" are failing as the following configuration file is missing: "configure: error: Invalid value --with-l10n-base, ../../l10n doesn't exist" 3. lots of fuzzer-macosx64-lion failures due to fuzzer.sh timeouts: "command timed out: 1800 seconds without output running ['bash', 'scripts/scripts/fuzzing/fuzzer.sh'], attempting to kill" 4. "OS X 10.7 64-bit b2g-inbound debug static analysis build" fail with the following error: "FATAL - 'mach build' did not run successfully. Please check log for errors."
Pointers to log examples for each failure case would be helpful so people could debug with context.
Since links to logs will expire at some point, I guess attaching the log file for each case is a better approach (did not attach a log file for the last issue since most of that kind of jobs are green at the moment).
Is this bug still an ongoing issue?
I checked several bld-lion-r5 slaves and noticed that the jobs for case 1 and case 3 are still failing with the errors from the attachments above. The question I'd have is related to case 2: in bug 1221391 it is mentioned that l10n builders have been disabled (the patch is in production), yet these builders are still present on the masters and they are still failing. Took a closer look at Nick's patch and noticed that the parameter that he set 'enable_l10n_onchange': False while we still have 'enable_l10n_onchange': True, so only the l10n dep builders ending with "NightlyRepackFactory" have been disabled (tested this on my local master). If that was the intended change for that patch, then case 3 also stands (as these jobs are failing).
I don't see any enable_l10n_onchange with value True in production code but if you think there is a problem with the earlier patch, feel feel to submit a new one :-)
Seems that I have missed the fact that we only disabled 'l10n dep' builders and not touch other 'l10n' ones :). Given that, case 2 does not stand but the remaining two they do.
Status: NEW → RESOLVED
Last Resolved: 11 months ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.