Closed Bug 957455 Opened 12 years ago Closed 12 years ago

Valgrind builds don't specify --mirror or --bundle (intermittent HTTP Error 413: Request Entity Too Large errors)

Categories

(Release Engineering :: General, defect)

x86
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: nthomas, Assigned: nthomas)

References

Details

Attachments

(1 file, 1 obsolete file)

A typical linux desktop build does: [blah blah] hgtool.py \ --mirror http://hg-internal.dmz.scl3.mozilla.com/integration/mozilla-inbound \ --bundle http://ftp.mozilla.org/pub/mozilla.org/firefox/bundles/mozilla-inbound.hg \ http://hg.mozilla.org/integration/mozilla-inbound build The mirror option points to a set of hg servers reserved for build machines, and bundle uses a snapshot to save on network traffic (the snapshot is updated weekly). If it needs to clone, hgtool will try the bundle first, then the mirror, before hitting hg.m.o. The valgrind script has something much simpler that results in: hgtool.py --rev aac565918cf5bde25d0934bbb01bf043e01af31a \ http://hg.mozilla.org/try src This results in trying to clone the entire try repo, and sometimes we get errors: command: START command: hg clone -U http://hg.mozilla.org/try /builds/hg-shared/try command: cwd: /builds/slave/try-l64-valgrind-0000000000000 command: output: requesting all changes abort: HTTP Error 413: Request Entity Too Large command: ERROR Traceback (most recent call last): File "/builds/slave/try-l64-valgrind-0000000000000/scripts/buildfarm/utils/../../lib/python/util/commands.py", line 47, in run_cmd return subprocess.check_call(cmd, **kwargs) File "/usr/lib64/python2.6/subprocess.py", line 502, in check_call raise CalledProcessError(retcode, cmd) CalledProcessError: Command '['hg', 'clone', '-U', 'http://hg.mozilla.org/try', '/builds/hg-shared/try']' returned non-zero exit status 255 command: END (12.68s elapsed) hg is requesting a very large number of heads via the HTTP header.
Thanks for catching this, nthomas. I wonder if bug 957304 is related to this.
I just hit this (and filed a dupe bug) FWIW. My log: https://tbpl.mozilla.org/php/getParsedLog.php?id=32723408&tree=Try
Adds --mirror and --bundle, and wraps the whole lot in a retry.
Attachment #8357506 - Flags: review?(rail)
Whoops, the OS went all Maverick on me.
Assignee: nobody → nthomas
Attachment #8357506 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #8357506 - Flags: review?(rail)
Attachment #8357507 - Flags: review?(rail)
Comment on attachment 8357507 [details] [diff] [review] [tools] Sync hgtool.py call with linux desktop builds Review of attachment 8357507 [details] [diff] [review]: ----------------------------------------------------------------- ::: scripts/valgrind/valgrind.sh @@ +16,5 @@ > > BRANCHES_JSON=$SCRIPTS_DIR/buildfarm/maintenance/production-branches.json > > HG_REPO=$($JSONTOOL -k ${branch}.repo $BRANCHES_JSON) > + HG_MIRROR=${HG_REPO/hg.\mozilla\.org/hg-internal.dmz.scl3.mozilla.com} A nit: the first backslash should escape ".", not "m".
Attachment #8357507 - Flags: review?(rail) → review+
Comment on attachment 8357507 [details] [diff] [review] [tools] Sync hgtool.py call with linux desktop builds https://hg.mozilla.org/build/tools/rev/9c3c3cc72800
Attachment #8357507 - Flags: checked-in+
The landing included the nit, good catch. I've seen jobs on several trees using the internal hg mirror, nothing using a bundle yet as we haven't caught a slave without a hg share already.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
https://tbpl.mozilla.org/php/getParsedLog.php?id=32879532&tree=Try seems to have tried to use a bundle, and failed, and then 413 failed the job.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
https://tbpl.mozilla.org/php/getParsedLog.php?id=32879212&tree=Try on the other hand, rather than silently failing to unbundle, just timed out unbundling.
The first one is just weird - the unbundle succeeded AFAICT looking at the code. The second one is just taking too long to unbundle. I suspect download time on the bundle.
Oy, there wasn't much running on other trees, so I didn't even notice that it was at the bug 957502 time of day; the push after me also timed out, but before and after the every evening bustage, things went fine.
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
I'm open to making the timeout longer, if the network fixes we're working on don't get us out of the woods. We'd have to do that for the whole valgrind script rather than just the clone.
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: