Closed Bug 957455 Opened 10 years ago Closed 10 years ago

Valgrind builds don't specify --mirror or --bundle (intermittent HTTP Error 413: Request Entity Too Large errors)

Categories

(Release Engineering :: General, defect)

x86
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: nthomas, Assigned: nthomas)

References

Details

Attachments

(1 file, 1 obsolete file)

A typical linux desktop build does:
 [blah blah] hgtool.py \
 --mirror http://hg-internal.dmz.scl3.mozilla.com/integration/mozilla-inbound \
 --bundle http://ftp.mozilla.org/pub/mozilla.org/firefox/bundles/mozilla-inbound.hg \
 http://hg.mozilla.org/integration/mozilla-inbound build

The mirror option points to a set of hg servers reserved for build machines, and bundle uses a snapshot to save on network traffic (the snapshot is updated weekly). If it needs to clone, hgtool will try the bundle first, then the mirror, before hitting hg.m.o.


The valgrind script has something much simpler that results in:
 hgtool.py --rev aac565918cf5bde25d0934bbb01bf043e01af31a \
 http://hg.mozilla.org/try src

This results in trying to clone the entire try repo, and sometimes we get errors:
command: START
command: hg clone -U http://hg.mozilla.org/try /builds/hg-shared/try
command: cwd: /builds/slave/try-l64-valgrind-0000000000000
command: output:
requesting all changes
abort: HTTP Error 413: Request Entity Too Large
command: ERROR
Traceback (most recent call last):
  File "/builds/slave/try-l64-valgrind-0000000000000/scripts/buildfarm/utils/../../lib/python/util/commands.py", line 47, in run_cmd
    return subprocess.check_call(cmd, **kwargs)
  File "/usr/lib64/python2.6/subprocess.py", line 502, in check_call
    raise CalledProcessError(retcode, cmd)
CalledProcessError: Command '['hg', 'clone', '-U', 'http://hg.mozilla.org/try', '/builds/hg-shared/try']' returned non-zero exit status 255
command: END (12.68s elapsed)

hg is requesting a very large number of heads via the HTTP header.
Thanks for catching this, nthomas.

I wonder if bug 957304 is related to this.
I just hit this (and filed a dupe bug) FWIW. My log:
https://tbpl.mozilla.org/php/getParsedLog.php?id=32723408&tree=Try
Adds --mirror and --bundle, and wraps the whole lot in a retry.
Attachment #8357506 - Flags: review?(rail)
Whoops, the OS went all Maverick on me.
Assignee: nobody → nthomas
Attachment #8357506 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #8357506 - Flags: review?(rail)
Attachment #8357507 - Flags: review?(rail)
Comment on attachment 8357507 [details] [diff] [review]
[tools] Sync hgtool.py call with linux desktop builds

Review of attachment 8357507 [details] [diff] [review]:
-----------------------------------------------------------------

::: scripts/valgrind/valgrind.sh
@@ +16,5 @@
>  
>      BRANCHES_JSON=$SCRIPTS_DIR/buildfarm/maintenance/production-branches.json
>  
>      HG_REPO=$($JSONTOOL -k ${branch}.repo $BRANCHES_JSON)
> +    HG_MIRROR=${HG_REPO/hg.\mozilla\.org/hg-internal.dmz.scl3.mozilla.com}

A nit: the first backslash should escape ".", not "m".
Attachment #8357507 - Flags: review?(rail) → review+
Comment on attachment 8357507 [details] [diff] [review]
[tools] Sync hgtool.py call with linux desktop builds

https://hg.mozilla.org/build/tools/rev/9c3c3cc72800
Attachment #8357507 - Flags: checked-in+
The landing included the nit, good catch.

I've seen jobs on several trees using the internal hg mirror, nothing using a bundle yet as we haven't caught a slave without a hg share already.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
https://tbpl.mozilla.org/php/getParsedLog.php?id=32879532&tree=Try seems to have tried to use a bundle, and failed, and then 413 failed the job.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
https://tbpl.mozilla.org/php/getParsedLog.php?id=32879212&tree=Try on the other hand, rather than silently failing to unbundle, just timed out unbundling.
The first one is just weird - the unbundle succeeded AFAICT looking at the code. The second one is just taking too long to unbundle. I suspect download time on the bundle.
Oy, there wasn't much running on other trees, so I didn't even notice that it was at the bug 957502 time of day; the push after me also timed out, but before and after the every evening bustage, things went fine.
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Resolution: --- → FIXED
I'm open to making the timeout longer, if the network fixes we're working on don't get us out of the woods. We'd have to do that for the whole valgrind script rather than just the clone.
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: