Closed Bug 1223399 Opened 9 years ago Closed 7 years ago

High ratio of failed jobs on bld-lion-r5 machines

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

x86_64
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: aselagea, Unassigned)

References

Details

Attachments

(3 files)

Noticed that we have many jobs failing on this particular pool due to different reasons:
    - command timeouts 
    - failures in connecting to an upload location: 
e.g: "Connecting to upload.tbirdbld.productdelivery.prod.mozaws.net|52.88.134.149|:80... failed: Operation timed out.Retrying."
    - failures in running some commands

Is there something we can do to fix some of these failures?
Following some investigations, here are the most common types of jobs that are failing and the errors that are seen in the logs: 

1. many "Thunderbird <comm-central, comm-aurora> macosx64|l10n nightly" and some "Thunderbird <comm-central, comm-aurora> macosx64|l10n dep" are failing due to being unable to access the location below:

"['bash', '-c', 'if [ ! -f mar ]; then wget -O  mar --no-check-certificate http://upload.tbirdbld.productdelivery.prod.mozaws.net/pub/mozilla.org/thunderbird/nightly/latest-comm-aurora/mar-tools/macosx64/mar; fi;        (test -e mar && test -s mar) || exit 1; chmod 755 mar']"
...
"Connecting to upload.tbirdbld.productdelivery.prod.mozaws.net|52.88.134.149|:80... failed: Operation timed out.Giving up."

2. many "Thunderbird <comm-central, comm-aurora> macosx64|l10n dep" builds and "Firefox <comm-central, comm-aurora> macosx64|l10n dep" are failing as the following configuration file is missing:
"configure: error: Invalid value --with-l10n-base, ../../l10n doesn't exist"

3. lots of fuzzer-macosx64-lion failures due to fuzzer.sh timeouts:
"command timed out: 1800 seconds without output running ['bash', 'scripts/scripts/fuzzing/fuzzer.sh'], attempting to kill"

4. "OS X 10.7 64-bit b2g-inbound debug static analysis build" fail with the following error:
"FATAL - 'mach build' did not run successfully. Please check log for errors."
Depends on: 1221391, 1224234
Pointers to log examples for each failure case would be helpful so people could debug with context.
Attached file case_1.txt
Attached file case_2.txt
Attached file case_3.txt
Since links to logs will expire at some point, I guess attaching the log file for each case is a better approach (did not attach a log file for the last issue since most of that kind of jobs are green at the moment).
No longer depends on: 1221391
Is this bug still an ongoing issue?
I checked several bld-lion-r5 slaves and noticed that the jobs for case 1 and case 3 are still failing with the errors from the attachments above.

The question I'd have is related to case 2: in bug 1221391 it is mentioned that l10n builders have been disabled (the patch is in production), yet these builders are still present on the masters and they are still failing.

Took a closer look at Nick's patch and noticed that the parameter that he set 'enable_l10n_onchange': False while we still have 'enable_l10n_onchange': True, so only the l10n dep builders ending with "NightlyRepackFactory" have been disabled (tested this on my local master).

If that was the intended change for that patch, then case 3 also stands (as these jobs are failing).
Flags: needinfo?(kmoir)
I don't see any enable_l10n_onchange with value True in production code but if you think there is a problem with the earlier patch, feel feel to submit a new one :-)
Flags: needinfo?(kmoir)
Seems that I have missed the fact that we only disabled 'l10n dep' builders and not touch other 'l10n' ones :).
Given that, case 2 does not stand but the remaining two they do.
Depends on: 1322344
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: